Markdown AST elements
Every node in the Markdown abstract syntax tree (AST) is associated with an element[1], providing semantic information to the node (e.g. that the node is a paragraph, or a inline code snippet). In MarkdownAST, each element is an instance of some subtype of AbstractElement
, and may (but does not have to) have fields that contain additional information about how to interpret the element (e.g. the language tag of a code block).
MarkdownAST.AbstractElement
— Typeabstract type AbstractElement
A supertype of all Markdown AST element types.
User-defined elements must not directly inherit this type, but either AbstractBlock
or AbstractInline
instead.
Interface
By default, each element is assumed to be a leaf element that can not contain other elements as children. An
iscontainer
method can be defined to override this.can_contain
can be overridden to constrain what elements can be the direct children of another node. By default, inline container elements can contain any inline element and block container elements can contain any block element.Elements that are implemented as
mutable struct
s should probably implement the equality operator (==
), to make sure that two different instances that are semantically the same would be considered equal.
If an element does contain some fields, it is usually a mutable type so that it would be possible to update it.
When the Markdown AST is represented using Node
s, the corresponding elements can be accessed via the .element
field.
Block and inline nodes
In the Markdown AST, the elements can, broadly, be divided into two categories: block and inline elements. The block elements represent the main, top-level structural elements of a document (e.g. paragraphs, headings, block quotes), whereas inline elements represent components of a paragraph (e.g. bold or plain text, inline math or code snippets). In MarkdownAST, every block and inline element is a subtype of AbstractBlock
and AbstractInline
, respectively.
MarkdownAST.AbstractBlock
— Typeabstract type AbstractBlock <: AbstractElement
Supertype of all Markdown AST block types.
MarkdownAST.AbstractInline
— Typeabstract type AbstractInline <: AbstractElement
Supertype of all Markdown AST inline types.
MarkdownAST.isblock
— Functionisblock(element::AbstractElement) -> Bool
Determines if element
is a block element (a subtype of AbstractBlock
).
MarkdownAST.isinline
— Functionisinline(element::AbstractElement) -> Bool
Determines if element
is an inline element (a subtype of AbstractInline
).
Constraints on children
As the AST is a tree, nodes (or elements) can have other nodes or elements as children. However, it does not generally make sense for a node to have arbitrary nodes as children. For this purpose, there are methods to ensure
First, for some elements it does not make sense for them to have any children at all (i.e. they will always be leaf nodes). Whether or not an node is a container node (i.e. whether or not it can have other elements as children) is determined by the iscontainer
function.
MarkdownAST.iscontainer
— Functioniscontainer(::T) where {T <: AbstractElement} -> Bool
Determines if the particular Markdown element is a container, meaning that is can contain child nodes. Adding child nodes to non-container (leaf) nodes is prohibited.
By default, each user-defined element is assumed to be a leaf node, and each container node should override this method.
However, a more fine-grained control over the allowed child nodes is often necessary. For example, while a paragraph can have child nodes, it does not make sense for a paragraph to have another paragraph as a child node, and in fact it should only have inline nodes as children. Such relationships are defined by the can_contain
function (e.g. for a Paragraph
it only returns true
if the child element is an AbstractInline
).
MarkdownAST.can_contain
— Functioncan_contain(parent::AbstractElement, child::AbstractElement) -> Bool
Determines if the child
element can be a direct child of the parent
element.
This is used to constrain the types of valid children for some elements, such as for the elements that are only allowed to have inline child elements or to make sure that List
s only contain Item
s.
If the parent
element is a leaf node (iscontainer(parent) === false
)
Extended help
When extending can_contain
for custom abstract classes Markdown elements, similar to the AbstractBlock
or AbstractInline
elements classes, the second argument to can_contain
should always be constrained exactly to ::AbstractElement
, in order to avoid method ambiguities. I.e. for some abstract type AbstractT <: AbstractElement
, the method should be defined as
can_contain(parent::AbstractT, child::AbstractElement) = ...
For concrete parent types T
, where the first argument is constrained as parent::T
it should then be fine to take advantage of multiple dispatch when implementing can_contain
.
Usually, the constraint is whether a container node can contain only block elements or only inline elements.
Sometimes it might be desireable to have even more sophisticated constraints on the elements (e.g. perhaps two elements are not allowed to directly follow each other as children of another node). However, it is not practical to over-complicate the APIs here, and simply restricting the child elements of another element seems to strike a good balance.
Instead, in cases where it becomes possible to construct trees that have questionable semantics due to a weird structure that can not be restricted with can_contain
, the elements should carefully document how to interpret such problematic trees (e.g. how to interpret a table that has no rows and columns).
CommonMark elements
The CommonMark specification specifies a set of block and inline nodes that can be used to represent Markdown documents.
MarkdownAST.Backslash
— Typestruct Backslash <: AbstractInline
Represents a backslash character \
.
MarkdownAST.BlockQuote
— Typestruct BlockQuote <: AbstractBlock
A singleton container element representing a block quote. It must contain other block elements as children.
MarkdownAST.Code
— Typemutable struct Code <: AbstractInline
Inline element representing an inline code span.
Fields
.code :: String
: raw code
Constructors
Code(code::AbstractString)
MarkdownAST.CodeBlock
— Typemutable struct CodeBlock <: AbstractBlock
A leaf block representing a code block.
Fields
.info :: String
: code block info string (e.g. the programming language label).code :: String
: code content of the block
MarkdownAST.Emph
— Typestruct Emph <: AbstractInline
Inline singleton element for emphasis (e.g. italic) styling.
MarkdownAST.HTMLBlock
— Typemutable struct HTMLBlock <: AbstractBlock
A leaf block representing raw HTML.
MarkdownAST.HTMLInline
— Typemutable struct HTMLInline <: AbstractInline
Inline leaf element representing raw inline HTML.
Fields
.html :: String
: inline raw HTML
Constructors
HTMLInline(html::AbstractString)
MarkdownAST.Heading
— Typemutable struct Heading <: AbstractBlock
Represents a heading of a specific level. Can only contain inline elements as children.
Fields
.level :: Int
: the level of the heading, must be between1
and6
.
Constructors
Heading(level :: Integer)
MarkdownAST.Image
— Typemutable struct Image <: AbstractInline
Inline element representing a link to an image. Can contain other inline nodes that will represent the image description.
Fields
.destination :: String
: destination URL.title :: String
: title attribute of the link
Constructors
Image(destination::AbstractString, title::AbstractString)
MarkdownAST.Item
— Typestruct Item <: AbstractBlock
Singleton container representing the items of a List
.
MarkdownAST.LineBreak
— Typestruct LineBreak <: AbstractInline
Represents a hard line break in a sequence of inline nodes that should lead to a newline when rendered.
MarkdownAST.Link
— Typemutable struct Link <: AbstractInline
Inline element representing a link. Can contain other inline nodes, but should not contain other Link
s.
Fields
.destination :: String
: destination URL.title :: String
: title attribute of the link
Constructors
Link(destination::AbstractString, title::AbstractString)
MarkdownAST.List
— Typemutable struct List <: AbstractBlock
Represents a Markdown list. The children of a List
should only be Item
s, representing individual list items.
Fields
.type :: Symbol
: determines if this is an ordered (:ordered
) or an unordered (:bullet
) list..tight :: Bool
: determines if the list should be rendered tight or loose.
Constructors
julia List(type :: Symbol, tight :: Bool)
`
MarkdownAST.Paragraph
— Typestruct Paragraph <: AbstractBlock
Singleton container representing a paragraph, containing only inline nodes.
MarkdownAST.SoftBreak
— Typestruct SoftBreak <: AbstractInline
Represents a soft break which can be rendered as a space instead.
MarkdownAST.Strong
— Typestruct Strong <: AbstractInline
Inline singleton element for strong (e.g. bold) styling.
MarkdownAST.Text
— Typemutable struct Text <: AbstractInline
Inline leaf element representing a simply a span of text.
MarkdownAST.ThematicBreak
— Typestruct ThematicBreak <: AbstractBlock
A singleton leaf element representing a thematic break (often rendered as a horizontal rule).
Julia extension elements
The Julia version of Markdown contains additional elements that do not exists in the CommonMark specification (such as tables or math). However, as MarkdownAST is meant to be interoperable with the Markdown
standard library parser, it also supports additional elements to accurately represent the Julia Flavored Markdown documents.
MarkdownAST.Admonition
— Typemutable struct Admonition <: AbstractBlock
A container block representing an admonition. Can contain other block elements as children.
Fields
.category :: String
: admonition category.title :: String
: admonition title
Constructors
Admonition(category :: AbstractString, title :: AbstractString)
MarkdownAST.DisplayMath
— Typemutable struct DisplayMath <: AbstractBlock
Leaf block representing a mathematical display equation.
Fields
.math :: String
: TeX code of the display equation
Constructors
DisplayMath(math :: AbstractString)
MarkdownAST.FootnoteDefinition
— Typemutable struct FootnoteDefinition <: AbstractBlock
Container block representing the definition of a footnote, containing the definitions of the footnote as children.
Fields
.id :: String
: label of the footnote
Constructors
FootnoteDefinition(id :: AbstractString)
MarkdownAST.FootnoteLink
— Typemutable struct FootnoteLink <: AbstractInline
Inline leaf element representing a link to a footnote.
Fields
.id :: String
: label of the footnote
Constructors
FootnoteLink(id :: AbstractString)
MarkdownAST.InlineMath
— Typemutable struct InlineMath <: AbstractInline
Leaf inline element representing an inline mathematical expression.
Fields
.math :: String
: TeX code for the inline equation
Constructors
InlineMath(math::String)
MarkdownAST.JuliaValue
— Typestruct JuliaValue <: AbstractInline
Inline leaf element for interpolation of Julia expressions and their evaluated values. Two JuliaValue
objects are considered equal if the Julia objects they refer to are equal (even if they originate from different expressions).
Fields
.ex :: Any
: contains the original Julia expression (e.g.Expr
,Symbol
, or some literal value).ref :: Any
: stores the Julia object the expression evaluates to
Constructors
JuliaValue(ex, ref = nothing)
Tables
Tables are build up from the following elements: Table
, TableBody
, TableCell
, TableHeader
, TableRow
.
MarkdownAST.Table
— Typemutable struct Table <: TableComponent
Container block representing a table, an extension of the CommonMark spec, and should be interpreted as a rectangular grid of cells with a fixed number of rows and columns.
- A
Table
node can only contain eitherTableHeader
orTableBody
nodes as children. TableHeader
andTableBody
can only containTableRow
s as children. ATableHeader
should contain only a singleTableRow
, and any additional ones should be ignored.- Each
TableRow
contains onlyTableCell
s as children. The row with the largest number of cells determines the width (number of columns) of the table.
Since we can not constrain e.g. the number of children or in what order child nodes can appear in a Markdown tree, it is possible to construct tables that can be difficult to interpret. The following rules should be followed when interpreting tables:
- The decendants of a
Table
node should be exactly be a singleTableHeader
followed by aTableBody
.- If the first child is a
TableBody
, the header row is assumed to be a list of empty cells. - The rows from any nodes following the first
TableBody
are interpreted as additional table body rows, even if they are contained in aTableHeader
. - A
Table
with no children is interpreted as a table with a single empty header cell.
- If the first child is a
- A
TableHeader
that is the first child of aTable
should only contain oneTableRow
.- If a
TableHeader
that is the first child of aTable
contains additional rows, the additional rows are interpreted to be body rows. - If a
TableHeader
that is the first child of aTable
is empty, it is assumed to represent a header row with empty cells.
- If a
- Each
TableRow
of a table should contain the same number ofTableCell
.- Any row that has fewer cells than the longest row should be interpreted as if it is padded with additional empty cells.
MarkdownAST.TableBody
— Typestruct TableBody <: TableComponent
Represents the body of a Markdown table and should only occur as the second child of a Table
node. See Table
for information on how to handle other circumstances.
It can only contain TableRow
elements.
MarkdownAST.TableCell
— Typemutable struct TableCell <: TableComponent
Represents a single cell in a Markdown table. Can contain inline nodes.
align :: Symbol
: declares the alignment of a cell (can be:left
,:right
, or:center
), and should match the.spec
field of the ancestorTable
header :: Bool
:true
if the cell is part of a header row, and should only be true if the cell belongs to a row that is the firstcolumn :: Int
: the column index of the cell, which should match its position in the Markdown tree
It is possible that the fields of TableCell
are inconsistent with the real structure of the Markdown tree, in which case the structure or the .spec
field should take precedence when interpreting the elements.
MarkdownAST.TableHeader
— Typestruct TableHeader <: TableComponent
Represents the header portion of a Markdown table and should only occur as the first child of a Table
node and should only contain a single TableRow
as a child. See Table
for information on how to handle other circumstances.
It can only contain TableRow
elements.
MarkdownAST.TableRow
— Typestruct TableRow <: TableComponent
Represents a row of a Markdown table. Can only contain TableCell
s as children.
In addition, to help with the complexity of the table structure, the following helper functions can be used when working with Table
elements.
MarkdownAST.tablerows
— Functiontablerows(node::Node)
Returns an iterable object containing the all the TableRow
elements of a table, bypassing the intermediate TableHeader
and TableBody
nodes. Requires node
to be a Table
element.
The first element of the iterator should be interpreted to be the header of the table.
MarkdownAST.tablesize
— Functiontablesize(node::Node, [dim])
Similar to size
, returns the number of rows and/or columns of a Table
element. The optional dim
argument can be passed to return just either the number of rows or columns, and must be 1
to obtain the number of rows, and 2
to obtain the number of columns.
Determining the number of columns is an $O(n \times m)$ operation in the number of rows and columns, due to the required traversal of the linked nodes. Determining only the number of rows with tablesize(node, 1)
is an $O(n)$ operation.
Other elements
Document
is the root element of a Markdown document.
MarkdownAST.Document
— Typestruct Document <: AbstractBlock
Singleton top-level element of a Markdown document.
Index
MarkdownAST.AbstractBlock
MarkdownAST.AbstractElement
MarkdownAST.AbstractInline
MarkdownAST.Admonition
MarkdownAST.Backslash
MarkdownAST.BlockQuote
MarkdownAST.Code
MarkdownAST.CodeBlock
MarkdownAST.DisplayMath
MarkdownAST.Document
MarkdownAST.Emph
MarkdownAST.FootnoteDefinition
MarkdownAST.FootnoteLink
MarkdownAST.HTMLBlock
MarkdownAST.HTMLInline
MarkdownAST.Heading
MarkdownAST.Image
MarkdownAST.InlineMath
MarkdownAST.Item
MarkdownAST.JuliaValue
MarkdownAST.LineBreak
MarkdownAST.Link
MarkdownAST.List
MarkdownAST.Paragraph
MarkdownAST.SoftBreak
MarkdownAST.Strong
MarkdownAST.Table
MarkdownAST.TableBody
MarkdownAST.TableCell
MarkdownAST.TableHeader
MarkdownAST.TableRow
MarkdownAST.Text
MarkdownAST.ThematicBreak
MarkdownAST.can_contain
MarkdownAST.isblock
MarkdownAST.iscontainer
MarkdownAST.isinline
MarkdownAST.tablerows
MarkdownAST.tablesize
- 1This terminology mirrors how each node of the HTML DOM tree is some HTML element.