Markdown AST elements
Every node in the Markdown abstract syntax tree (AST) is associated with an element[1], providing semantic information to the node (e.g. that the node is a paragraph, or a inline code snippet). In MarkdownAST, each element is an instance of some subtype of AbstractElement, and may (but does not have to) have fields that contain additional information about how to interpret the element (e.g. the language tag of a code block).
MarkdownAST.AbstractElement — Typeabstract type AbstractElementA supertype of all Markdown AST element types.
User-defined elements must not directly inherit this type, but either AbstractBlock or AbstractInline instead.
Interface
By default, each element is assumed to be a leaf element that can not contain other elements as children. An
iscontainermethod can be defined to override this.can_containcan be overridden to constrain what elements can be the direct children of another node. By default, inline container elements can contain any inline element and block container elements can contain any block element.Elements that are implemented as
mutable structs should probably implement the equality operator (==), to make sure that two different instances that are semantically the same would be considered equal.
If an element does contain some fields, it is usually a mutable type so that it would be possible to update it.
When the Markdown AST is represented using Nodes, the corresponding elements can be accessed via the .element field.
Block and inline nodes
In the Markdown AST, the elements can, broadly, be divided into two categories: block and inline elements. The block elements represent the main, top-level structural elements of a document (e.g. paragraphs, headings, block quotes), whereas inline elements represent components of a paragraph (e.g. bold or plain text, inline math or code snippets). In MarkdownAST, every block and inline element is a subtype of AbstractBlock and AbstractInline, respectively.
MarkdownAST.AbstractBlock — Typeabstract type AbstractBlock <: AbstractElementSupertype of all Markdown AST block types.
MarkdownAST.AbstractInline — Typeabstract type AbstractInline <: AbstractElementSupertype of all Markdown AST inline types.
MarkdownAST.isblock — Functionisblock(element::AbstractElement) -> BoolDetermines if element is a block element (a subtype of AbstractBlock).
MarkdownAST.isinline — Functionisinline(element::AbstractElement) -> BoolDetermines if element is an inline element (a subtype of AbstractInline).
Constraints on children
As the AST is a tree, nodes (or elements) can have other nodes or elements as children. However, it does not generally make sense for a node to have arbitrary nodes as children. For this purpose, there are methods to ensure
First, for some elements it does not make sense for them to have any children at all (i.e. they will always be leaf nodes). Whether or not an node is a container node (i.e. whether or not it can have other elements as children) is determined by the iscontainer function.
MarkdownAST.iscontainer — Functioniscontainer(::T) where {T <: AbstractElement} -> BoolDetermines if the particular Markdown element is a container, meaning that is can contain child nodes. Adding child nodes to non-container (leaf) nodes is prohibited.
By default, each user-defined element is assumed to be a leaf node, and each container node should override this method.
However, a more fine-grained control over the allowed child nodes is often necessary. For example, while a paragraph can have child nodes, it does not make sense for a paragraph to have another paragraph as a child node, and in fact it should only have inline nodes as children. Such relationships are defined by the can_contain function (e.g. for a Paragraph it only returns true if the child element is an AbstractInline).
MarkdownAST.can_contain — Functioncan_contain(parent::AbstractElement, child::AbstractElement) -> BoolDetermines if the child element can be a direct child of the parent element.
This is used to constrain the types of valid children for some elements, such as for the elements that are only allowed to have inline child elements or to make sure that Lists only contain Items.
If the parent element is a leaf node (iscontainer(parent) === false)
Extended help
When extending can_contain for custom abstract classes Markdown elements, similar to the AbstractBlock or AbstractInline elements classes, the second argument to can_contain should always be constrained exactly to ::AbstractElement, in order to avoid method ambiguities. I.e. for some abstract type AbstractT <: AbstractElement, the method should be defined as
can_contain(parent::AbstractT, child::AbstractElement) = ...For concrete parent types T, where the first argument is constrained as parent::T it should then be fine to take advantage of multiple dispatch when implementing can_contain.
Usually, the constraint is whether a container node can contain only block elements or only inline elements.
Sometimes it might be desireable to have even more sophisticated constraints on the elements (e.g. perhaps two elements are not allowed to directly follow each other as children of another node). However, it is not practical to over-complicate the APIs here, and simply restricting the child elements of another element seems to strike a good balance.
Instead, in cases where it becomes possible to construct trees that have questionable semantics due to a weird structure that can not be restricted with can_contain, the elements should carefully document how to interpret such problematic trees (e.g. how to interpret a table that has no rows and columns).
CommonMark elements
The CommonMark specification specifies a set of block and inline nodes that can be used to represent Markdown documents.
MarkdownAST.Backslash — Typestruct Backslash <: AbstractInlineRepresents a backslash character \.
MarkdownAST.BlockQuote — Typestruct BlockQuote <: AbstractBlockA singleton container element representing a block quote. It must contain other block elements as children.
MarkdownAST.Code — Typemutable struct Code <: AbstractInlineInline element representing an inline code span.
Fields
.code :: String: raw code
Constructors
Code(code::AbstractString)MarkdownAST.CodeBlock — Typemutable struct CodeBlock <: AbstractBlockA leaf block representing a code block.
Fields
.info :: String: code block info string (e.g. the programming language label).code :: String: code content of the block
MarkdownAST.Emph — Typestruct Emph <: AbstractInlineInline singleton element for emphasis (e.g. italic) styling.
MarkdownAST.HTMLBlock — Typemutable struct HTMLBlock <: AbstractBlockA leaf block representing raw HTML.
MarkdownAST.HTMLInline — Typemutable struct HTMLInline <: AbstractInlineInline leaf element representing raw inline HTML.
Fields
.html :: String: inline raw HTML
Constructors
HTMLInline(html::AbstractString)MarkdownAST.Heading — Typemutable struct Heading <: AbstractBlockRepresents a heading of a specific level. Can only contain inline elements as children.
Fields
.level :: Int: the level of the heading, must be between1and6.
Constructors
Heading(level :: Integer)MarkdownAST.Image — Typemutable struct Image <: AbstractInlineInline element representing a link to an image. Can contain other inline nodes that will represent the image description.
Fields
.destination :: String: destination URL.title :: String: title attribute of the link
Constructors
Image(destination::AbstractString, title::AbstractString)MarkdownAST.Item — Typestruct Item <: AbstractBlockSingleton container representing the items of a List.
MarkdownAST.LineBreak — Typestruct LineBreak <: AbstractInlineRepresents a hard line break in a sequence of inline nodes that should lead to a newline when rendered.
MarkdownAST.Link — Typemutable struct Link <: AbstractInlineInline element representing a link. Can contain other inline nodes, but should not contain other Links.
Fields
.destination :: String: destination URL.title :: String: title attribute of the link
Constructors
Link(destination::AbstractString, title::AbstractString)MarkdownAST.List — Typemutable struct List <: AbstractBlockRepresents a Markdown list. The children of a List should only be Items, representing individual list items.
Fields
.type :: Symbol: determines if this is an ordered (:ordered) or an unordered (:bullet) list..tight :: Bool: determines if the list should be rendered tight or loose.
Constructors
julia List(type :: Symbol, tight :: Bool)`
MarkdownAST.Paragraph — Typestruct Paragraph <: AbstractBlockSingleton container representing a paragraph, containing only inline nodes.
MarkdownAST.SoftBreak — Typestruct SoftBreak <: AbstractInlineRepresents a soft break which can be rendered as a space instead.
MarkdownAST.Strong — Typestruct Strong <: AbstractInlineInline singleton element for strong (e.g. bold) styling.
MarkdownAST.Text — Typemutable struct Text <: AbstractInlineInline leaf element representing a simply a span of text.
MarkdownAST.ThematicBreak — Typestruct ThematicBreak <: AbstractBlockA singleton leaf element representing a thematic break (often rendered as a horizontal rule).
Julia extension elements
The Julia version of Markdown contains additional elements that do not exists in the CommonMark specification (such as tables or math). However, as MarkdownAST is meant to be interoperable with the Markdown standard library parser, it also supports additional elements to accurately represent the Julia Flavored Markdown documents.
MarkdownAST.Admonition — Typemutable struct Admonition <: AbstractBlockA container block representing an admonition. Can contain other block elements as children.
Fields
.category :: String: admonition category.title :: String: admonition title
Constructors
Admonition(category :: AbstractString, title :: AbstractString)MarkdownAST.DisplayMath — Typemutable struct DisplayMath <: AbstractBlockLeaf block representing a mathematical display equation.
Fields
.math :: String: TeX code of the display equation
Constructors
DisplayMath(math :: AbstractString)MarkdownAST.FootnoteDefinition — Typemutable struct FootnoteDefinition <: AbstractBlockContainer block representing the definition of a footnote, containing the definitions of the footnote as children.
Fields
.id :: String: label of the footnote
Constructors
FootnoteDefinition(id :: AbstractString)MarkdownAST.FootnoteLink — Typemutable struct FootnoteLink <: AbstractInlineInline leaf element representing a link to a footnote.
Fields
.id :: String: label of the footnote
Constructors
FootnoteLink(id :: AbstractString)MarkdownAST.InlineMath — Typemutable struct InlineMath <: AbstractInlineLeaf inline element representing an inline mathematical expression.
Fields
.math :: String: TeX code for the inline equation
Constructors
InlineMath(math::String)MarkdownAST.JuliaValue — Typestruct JuliaValue <: AbstractInlineInline leaf element for interpolation of Julia expressions and their evaluated values. Two JuliaValue objects are considered equal if the Julia objects they refer to are equal (even if they originate from different expressions).
Fields
.ex :: Any: contains the original Julia expression (e.g.Expr,Symbol, or some literal value).ref :: Any: stores the Julia object the expression evaluates to
Constructors
JuliaValue(ex, ref = nothing)Tables
Tables are build up from the following elements: Table, TableBody, TableCell, TableHeader, TableRow.
MarkdownAST.Table — Typemutable struct Table <: TableComponentContainer block representing a table, an extension of the CommonMark spec, and should be interpreted as a rectangular grid of cells with a fixed number of rows and columns.
- A
Tablenode can only contain eitherTableHeaderorTableBodynodes as children. TableHeaderandTableBodycan only containTableRows as children. ATableHeadershould contain only a singleTableRow, and any additional ones should be ignored.- Each
TableRowcontains onlyTableCells as children. The row with the largest number of cells determines the width (number of columns) of the table.
Since we can not constrain e.g. the number of children or in what order child nodes can appear in a Markdown tree, it is possible to construct tables that can be difficult to interpret. The following rules should be followed when interpreting tables:
- The decendants of a
Tablenode should be exactly be a singleTableHeaderfollowed by aTableBody.- If the first child is a
TableBody, the header row is assumed to be a list of empty cells. - The rows from any nodes following the first
TableBodyare interpreted as additional table body rows, even if they are contained in aTableHeader. - A
Tablewith no children is interpreted as a table with a single empty header cell.
- If the first child is a
- A
TableHeaderthat is the first child of aTableshould only contain oneTableRow.- If a
TableHeaderthat is the first child of aTablecontains additional rows, the additional rows are interpreted to be body rows. - If a
TableHeaderthat is the first child of aTableis empty, it is assumed to represent a header row with empty cells.
- If a
- Each
TableRowof a table should contain the same number ofTableCell.- Any row that has fewer cells than the longest row should be interpreted as if it is padded with additional empty cells.
MarkdownAST.TableBody — Typestruct TableBody <: TableComponentRepresents the body of a Markdown table and should only occur as the second child of a Table node. See Table for information on how to handle other circumstances.
It can only contain TableRow elements.
MarkdownAST.TableCell — Typemutable struct TableCell <: TableComponentRepresents a single cell in a Markdown table. Can contain inline nodes.
align :: Symbol: declares the alignment of a cell (can be:left,:right, or:center), and should match the.specfield of the ancestorTableheader :: Bool:trueif the cell is part of a header row, and should only be true if the cell belongs to a row that is the firstcolumn :: Int: the column index of the cell, which should match its position in the Markdown tree
It is possible that the fields of TableCell are inconsistent with the real structure of the Markdown tree, in which case the structure or the .spec field should take precedence when interpreting the elements.
MarkdownAST.TableHeader — Typestruct TableHeader <: TableComponentRepresents the header portion of a Markdown table and should only occur as the first child of a Table node and should only contain a single TableRow as a child. See Table for information on how to handle other circumstances.
It can only contain TableRow elements.
MarkdownAST.TableRow — Typestruct TableRow <: TableComponentRepresents a row of a Markdown table. Can only contain TableCells as children.
In addition, to help with the complexity of the table structure, the following helper functions can be used when working with Table elements.
MarkdownAST.tablerows — Functiontablerows(node::Node)Returns an iterable object containing the all the TableRow elements of a table, bypassing the intermediate TableHeader and TableBody nodes. Requires node to be a Table element.
The first element of the iterator should be interpreted to be the header of the table.
MarkdownAST.tablesize — Functiontablesize(node::Node, [dim])Similar to size, returns the number of rows and/or columns of a Table element. The optional dim argument can be passed to return just either the number of rows or columns, and must be 1 to obtain the number of rows, and 2 to obtain the number of columns.
Determining the number of columns is an $O(n \times m)$ operation in the number of rows and columns, due to the required traversal of the linked nodes. Determining only the number of rows with tablesize(node, 1) is an $O(n)$ operation.
Other elements
Document is the root element of a Markdown document.
MarkdownAST.Document — Typestruct Document <: AbstractBlockSingleton top-level element of a Markdown document.
Index
MarkdownAST.AbstractBlockMarkdownAST.AbstractElementMarkdownAST.AbstractInlineMarkdownAST.AdmonitionMarkdownAST.BackslashMarkdownAST.BlockQuoteMarkdownAST.CodeMarkdownAST.CodeBlockMarkdownAST.DisplayMathMarkdownAST.DocumentMarkdownAST.EmphMarkdownAST.FootnoteDefinitionMarkdownAST.FootnoteLinkMarkdownAST.HTMLBlockMarkdownAST.HTMLInlineMarkdownAST.HeadingMarkdownAST.ImageMarkdownAST.InlineMathMarkdownAST.ItemMarkdownAST.JuliaValueMarkdownAST.LineBreakMarkdownAST.LinkMarkdownAST.ListMarkdownAST.ParagraphMarkdownAST.SoftBreakMarkdownAST.StrongMarkdownAST.TableMarkdownAST.TableBodyMarkdownAST.TableCellMarkdownAST.TableHeaderMarkdownAST.TableRowMarkdownAST.TextMarkdownAST.ThematicBreakMarkdownAST.can_containMarkdownAST.isblockMarkdownAST.iscontainerMarkdownAST.isinlineMarkdownAST.tablerowsMarkdownAST.tablesize
- 1This terminology mirrors how each node of the HTML DOM tree is some HTML element.