Markdown AST elements

Every node in the Markdown abstract syntax tree (AST) is associated with an element[1], providing semantic information to the node (e.g. that the node is a paragraph, or a inline code snippet). In MarkdownAST, each element is an instance of some subtype of AbstractElement, and may (but does not have to) have fields that contain additional information about how to interpret the element (e.g. the language tag of a code block).

MarkdownAST.AbstractElementType
abstract type AbstractElement

A supertype of all Markdown AST element types.

User-defined elements must not directly inherit this type, but either AbstractBlock or AbstractInline instead.

Interface

  • By default, each element is assumed to be a leaf element that can not contain other elements as children. An iscontainer method can be defined to override this.

  • can_contain can be overridden to constrain what elements can be the direct children of another node. By default, inline container elements can contain any inline element and block container elements can contain any block element.

  • Elements that are implemented as mutable structs should probably implement the equality operator (==), to make sure that two different instances that are semantically the same would be considered equal.

source

If an element does contain some fields, it is usually a mutable type so that it would be possible to update it.

When the Markdown AST is represented using Nodes, the corresponding elements can be accessed via the .element field.

Block and inline nodes

In the Markdown AST, the elements can, broadly, be divided into two categories: block and inline elements. The block elements represent the main, top-level structural elements of a document (e.g. paragraphs, headings, block quotes), whereas inline elements represent components of a paragraph (e.g. bold or plain text, inline math or code snippets). In MarkdownAST, every block and inline element is a subtype of AbstractBlock and AbstractInline, respectively.

Constraints on children

As the AST is a tree, nodes (or elements) can have other nodes or elements as children. However, it does not generally make sense for a node to have arbitrary nodes as children. For this purpose, there are methods to ensure

First, for some elements it does not make sense for them to have any children at all (i.e. they will always be leaf nodes). Whether or not an node is a container node (i.e. whether or not it can have other elements as children) is determined by the iscontainer function.

MarkdownAST.iscontainerFunction
iscontainer(::T) where {T <: AbstractElement} -> Bool

Determines if the particular Markdown element is a container, meaning that is can contain child nodes. Adding child nodes to non-container (leaf) nodes is prohibited.

By default, each user-defined element is assumed to be a leaf node, and each container node should override this method.

source

However, a more fine-grained control over the allowed child nodes is often necessary. For example, while a paragraph can have child nodes, it does not make sense for a paragraph to have another paragraph as a child node, and in fact it should only have inline nodes as children. Such relationships are defined by the can_contain function (e.g. for a Paragraph it only returns true if the child element is an AbstractInline).

MarkdownAST.can_containFunction
can_contain(parent::AbstractElement, child::AbstractElement) -> Bool

Determines if the child element can be a direct child of the parent element.

This is used to constrain the types of valid children for some elements, such as for the elements that are only allowed to have inline child elements or to make sure that Lists only contain Items.

If the parent element is a leaf node (iscontainer(parent) === false)

Extended help

When extending can_contain for custom abstract classes Markdown elements, similar to the AbstractBlock or AbstractInline elements classes, the second argument to can_contain should always be constrained exactly to ::AbstractElement, in order to avoid method ambiguities. I.e. for some abstract type AbstractT <: AbstractElement, the method should be defined as

can_contain(parent::AbstractT, child::AbstractElement) = ...

For concrete parent types T, where the first argument is constrained as parent::T it should then be fine to take advantage of multiple dispatch when implementing can_contain.

source

Usually, the constraint is whether a container node can contain only block elements or only inline elements.

Note

Sometimes it might be desireable to have even more sophisticated constraints on the elements (e.g. perhaps two elements are not allowed to directly follow each other as children of another node). However, it is not practical to over-complicate the APIs here, and simply restricting the child elements of another element seems to strike a good balance.

Instead, in cases where it becomes possible to construct trees that have questionable semantics due to a weird structure that can not be restricted with can_contain, the elements should carefully document how to interpret such problematic trees (e.g. how to interpret a table that has no rows and columns).

CommonMark elements

The CommonMark specification specifies a set of block and inline nodes that can be used to represent Markdown documents.

MarkdownAST.BlockQuoteType
struct BlockQuote <: AbstractBlock

A singleton container element representing a block quote. It must contain other block elements as children.

source
MarkdownAST.CodeType
mutable struct Code <: AbstractInline

Inline element representing an inline code span.

Fields

  • .code :: String: raw code

Constructors

Code(code::AbstractString)
source
MarkdownAST.CodeBlockType
mutable struct CodeBlock <: AbstractBlock

A leaf block representing a code block.

Fields

  • .info :: String: code block info string (e.g. the programming language label)
  • .code :: String: code content of the block
source
MarkdownAST.EmphType
struct Emph <: AbstractInline

Inline singleton element for emphasis (e.g. italic) styling.

source
MarkdownAST.HTMLInlineType
mutable struct HTMLInline <: AbstractInline

Inline leaf element representing raw inline HTML.

Fields

  • .html :: String: inline raw HTML

Constructors

HTMLInline(html::AbstractString)
source
MarkdownAST.HeadingType
mutable struct Heading <: AbstractBlock

Represents a heading of a specific level. Can only contain inline elements as children.

Fields

  • .level :: Int: the level of the heading, must be between 1 and 6.

Constructors

Heading(level :: Integer)
source
MarkdownAST.ImageType
mutable struct Image <: AbstractInline

Inline element representing a link to an image. Can contain other inline nodes that will represent the image description.

Fields

  • .destination :: String: destination URL
  • .title :: String: title attribute of the link

Constructors

Image(destination::AbstractString, title::AbstractString)
source
MarkdownAST.LineBreakType
struct LineBreak <: AbstractInline

Represents a hard line break in a sequence of inline nodes that should lead to a newline when rendered.

source
MarkdownAST.LinkType
mutable struct Link <: AbstractInline

Inline element representing a link. Can contain other inline nodes, but should not contain other Links.

Fields

  • .destination :: String: destination URL
  • .title :: String: title attribute of the link

Constructors

Link(destination::AbstractString, title::AbstractString)
source
MarkdownAST.ListType
mutable struct List <: AbstractBlock

Represents a Markdown list. The children of a List should only be Items, representing individual list items.

Fields

  • .type :: Symbol: determines if this is an ordered (:ordered) or an unordered (:bullet) list.
  • .tight :: Bool: determines if the list should be rendered tight or loose.

Constructors

julia List(type :: Symbol, tight :: Bool)`

source
MarkdownAST.ParagraphType
struct Paragraph <: AbstractBlock

Singleton container representing a paragraph, containing only inline nodes.

source
MarkdownAST.SoftBreakType
struct SoftBreak <: AbstractInline

Represents a soft break which can be rendered as a space instead.

source
MarkdownAST.StrongType
struct Strong <: AbstractInline

Inline singleton element for strong (e.g. bold) styling.

source
MarkdownAST.TextType
mutable struct Text <: AbstractInline

Inline leaf element representing a simply a span of text.

source
MarkdownAST.ThematicBreakType
struct ThematicBreak <: AbstractBlock

A singleton leaf element representing a thematic break (often rendered as a horizontal rule).

source

Julia extension elements

The Julia version of Markdown contains additional elements that do not exists in the CommonMark specification (such as tables or math). However, as MarkdownAST is meant to be interoperable with the Markdown standard library parser, it also supports additional elements to accurately represent the Julia Flavored Markdown documents.

MarkdownAST.AdmonitionType
mutable struct Admonition <: AbstractBlock

A container block representing an admonition. Can contain other block elements as children.

Fields

  • .category :: String: admonition category
  • .title :: String: admonition title

Constructors

Admonition(category :: AbstractString, title :: AbstractString)
source
MarkdownAST.DisplayMathType
mutable struct DisplayMath <: AbstractBlock

Leaf block representing a mathematical display equation.

Fields

  • .math :: String: TeX code of the display equation

Constructors

DisplayMath(math :: AbstractString)
source
MarkdownAST.FootnoteDefinitionType
mutable struct FootnoteDefinition <: AbstractBlock

Container block representing the definition of a footnote, containing the definitions of the footnote as children.

Fields

  • .id :: String: label of the footnote

Constructors

FootnoteDefinition(id :: AbstractString)
source
MarkdownAST.FootnoteLinkType
mutable struct FootnoteLink <: AbstractInline

Inline leaf element representing a link to a footnote.

Fields

  • .id :: String: label of the footnote

Constructors

FootnoteLink(id :: AbstractString)
source
MarkdownAST.InlineMathType
mutable struct InlineMath <: AbstractInline

Leaf inline element representing an inline mathematical expression.

Fields

  • .math :: String: TeX code for the inline equation

Constructors

InlineMath(math::String)
source
MarkdownAST.JuliaValueType
struct JuliaValue <: AbstractInline

Inline leaf element for interpolation of Julia expressions and their evaluated values. Two JuliaValue objects are considered equal if the Julia objects they refer to are equal (even if they originate from different expressions).

Fields

  • .ex :: Any: contains the original Julia expression (e.g. Expr, Symbol, or some literal value)
  • .ref :: Any: stores the Julia object the expression evaluates to

Constructors

JuliaValue(ex, ref = nothing)
source

Tables

Tables are build up from the following elements: Table, TableBody, TableCell, TableHeader, TableRow.

MarkdownAST.TableType
mutable struct Table <: TableComponent

Container block representing a table, an extension of the CommonMark spec, and should be interpreted as a rectangular grid of cells with a fixed number of rows and columns.

Since we can not constrain e.g. the number of children or in what order child nodes can appear in a Markdown tree, it is possible to construct tables that can be difficult to interpret. The following rules should be followed when interpreting tables:

  • The decendants of a Table node should be exactly be a single TableHeader followed by a TableBody.
    • If the first child is a TableBody, the header row is assumed to be a list of empty cells.
    • The rows from any nodes following the first TableBody are interpreted as additional table body rows, even if they are contained in a TableHeader.
    • A Table with no children is interpreted as a table with a single empty header cell.
  • A TableHeader that is the first child of a Table should only contain one TableRow.
    • If a TableHeader that is the first child of a Table contains additional rows, the additional rows are interpreted to be body rows.
    • If a TableHeader that is the first child of a Table is empty, it is assumed to represent a header row with empty cells.
  • Each TableRow of a table should contain the same number of TableCell.
    • Any row that has fewer cells than the longest row should be interpreted as if it is padded with additional empty cells.
source
MarkdownAST.TableBodyType
struct TableBody <: TableComponent

Represents the body of a Markdown table and should only occur as the second child of a Table node. See Table for information on how to handle other circumstances.

It can only contain TableRow elements.

source
MarkdownAST.TableCellType
mutable struct TableCell <: TableComponent

Represents a single cell in a Markdown table. Can contain inline nodes.

  • align :: Symbol: declares the alignment of a cell (can be :left, :right, or :center), and should match the .spec field of the ancestor Table
  • header :: Bool: true if the cell is part of a header row, and should only be true if the cell belongs to a row that is the first
  • column :: Int: the column index of the cell, which should match its position in the Markdown tree

It is possible that the fields of TableCell are inconsistent with the real structure of the Markdown tree, in which case the structure or the .spec field should take precedence when interpreting the elements.

source
MarkdownAST.TableHeaderType
struct TableHeader <: TableComponent

Represents the header portion of a Markdown table and should only occur as the first child of a Table node and should only contain a single TableRow as a child. See Table for information on how to handle other circumstances.

It can only contain TableRow elements.

source

In addition, to help with the complexity of the table structure, the following helper functions can be used when working with Table elements.

MarkdownAST.tablerowsFunction
tablerows(node::Node)

Returns an iterable object containing the all the TableRow elements of a table, bypassing the intermediate TableHeader and TableBody nodes. Requires node to be a Table element.

The first element of the iterator should be interpreted to be the header of the table.

source
MarkdownAST.tablesizeFunction
tablesize(node::Node, [dim])

Similar to size, returns the number of rows and/or columns of a Table element. The optional dim argument can be passed to return just either the number of rows or columns, and must be 1 to obtain the number of rows, and 2 to obtain the number of columns.

Complexity

Determining the number of columns is an $O(n \times m)$ operation in the number of rows and columns, due to the required traversal of the linked nodes. Determining only the number of rows with tablesize(node, 1) is an $O(n)$ operation.

source

Other elements

Document is the root element of a Markdown document.

Index

  • 1This terminology mirrors how each node of the HTML DOM tree is some HTML element.