CommonMark Specification

Check out the Markdown Dingus to experiment with the CommonMark processor.

What is CommonMark?

CommonMark is a strongly specified, highly compatible implementation of Markdown. It was created to address the ambiguities and inconsistencies in John Gruber’s original Markdown specification, which led to divergent implementations across different platforms and tools.

Why CommonMark Exists

The original Markdown specification by John Gruber was intentionally ambiguous in many areas, leading to different interpretations by various implementations. This created problems where the same Markdown document would render differently on different platforms (GitHub, StackOverflow, Reddit, etc.).

CommonMark provides:

  • Unambiguous specifications for all Markdown syntax
  • Comprehensive test suite to ensure consistent behavior
  • Clear precedence rules for conflicting syntax
  • Detailed parsing algorithm that can be implemented consistently

Key Differences from Standard Markdown

1. Stricter Parsing Rules

CommonMark enforces more consistent parsing behavior:

Blank Lines Before Block Elements

  • CommonMark requires blank lines before headings, blockquotes, and lists
  • Standard Markdown often allows these without blank lines
Text
# Heading

CommonMark: Requires blank line before heading

Standard Markdown: Often allows without blank line

2. List Item Parsing

Indentation Requirements

  • CommonMark has specific rules for list item indentation
  • Sublists must be indented consistently (typically 4 spaces)
  • Standard Markdown implementations vary on this
1. First item
   - Sublist item (4 spaces required in CommonMark)
2. Second item

List Continuation

  • CommonMark has clear rules for when list items are “loose” vs “tight”
  • Loose lists wrap items in <p> tags, tight lists don’t

3. Code Block Handling

Fenced Code Blocks

  • CommonMark standardizes fenced code block syntax with backticks or tildes
  • Requires consistent indentation and closing markers
code here

**Indented Code Blocks**

- CommonMark requires blank lines before indented code blocks
- Standard Markdown often allows them without blank lines

### 4. **Link and Image Processing**

**Reference Link Precedence**

- CommonMark has clear rules for which reference definition takes precedence
- Multiple definitions for the same reference are handled consistently

[link1]: /url1
[link1]: /url2
[link1]  <!-- Uses /url2 in CommonMark -->

Link Parsing Order

  • CommonMark processes links before emphasis
  • This affects how nested syntax is interpreted

5. Emphasis and Strong Emphasis

Nested Emphasis Rules

  • CommonMark has specific algorithms for handling nested * and _ markers
  • Prevents ambiguous parsing of complex emphasis patterns
*foo *bar* baz*  <!-- Clear precedence rules in CommonMark -->

Delimiter Processing

  • CommonMark uses a “delimiter stack” algorithm for consistent emphasis parsing
  • Standard Markdown implementations vary in their approach

6. HTML Block Processing

HTML Block Detection

  • CommonMark has 7 different types of HTML blocks with specific rules
  • Each type has different requirements for start/end conditions
<div>
This is an HTML block in CommonMark
</div>

7. Line Break Handling

Hard Line Breaks

  • CommonMark requires two spaces at end of line for hard breaks
  • Single line breaks become soft breaks (ignored in HTML)
Line one
Line two  <!-- Two spaces before line break -->

8. Entity and Character References

Numeric Character References

  • CommonMark supports both decimal and hexadecimal numeric references
  • Standard Markdown support varies
&#8212;  <!-- Decimal -->
&#x2014; <!-- Hexadecimal -->

CommonMark Parsing Algorithm

CommonMark uses a two-phase parsing approach:

Phase 1: Block Structure

  1. Line Processing: Each line is analyzed for block-level markers
  2. Container Blocks: Blockquotes, lists, and other containers are identified
  3. Leaf Blocks: Headings, code blocks, paragraphs are processed
  4. Reference Links: Link definitions are collected for later use

Phase 2: Inline Structure

  1. Inline Processing: Text within blocks is parsed for inline elements
  2. Emphasis Parsing: Uses delimiter stack algorithm for consistent emphasis
  3. Link Resolution: Reference links are resolved using collected definitions
  4. Entity Processing: Character references are converted to actual characters

Benefits of CommonMark

  1. Predictable Behavior: Same input always produces same output
  2. Cross-Platform Compatibility: Works consistently across different tools
  3. Comprehensive Testing: Extensive test suite ensures reliability
  4. Clear Documentation: Detailed specification eliminates guesswork
  5. Future-Proof: Well-defined extension points for new features

Implementation Notes

CommonMark is designed to be:

  • Specification-compliant: Follows the official CommonMark spec exactly
  • Test-driven: Passes the official CommonMark test suite
  • Extensible: Can be extended with additional features while maintaining compatibility
  • Fast: Optimized parsing algorithms for performance

Resources


This documentation covers CommonMark 0.31.2 (2024–01–28). For the most current information, always refer to the official specification.

Next up: Kramdown Specification


Search | Support Site | Knowledgebase | Legal | Privacy | Twitter