Why the heck are we still using Markdown??

There are few things in life that bring me much joy and hate at the same time. Like chocolate that hurts when eaten and markdown. Seriously why?? Half of the time we aren’t even using the full language!

HTML is the best Programming Language!

I know you’ve heard people say the only programming language they know is HTML. And I know, we both rolled our eyes in discontent trying to get our PL papers out of our assembled decks of papers on how HTML is only a markup language and not a programming language.

I mean yes we’re on the right but that guy probably has something we don’t have.

A life.

[Note] When I’m talking about markdown, I am specifically talking about CommonMark Unless stated otherwise. Because it is the unambiguous syntax specification. I love the project, I really appreciate their efforts on making this language a bit more grounded. It’s not the specification that’s broken, it’s the language itself.

The Good

Markdown is a minimal language used for typesetting trivial documents. It needs to do one simple job: get a Markdown file and output an HTML file. Its syntax is legible as it gets and is easy to write even with no assists. Like the C language you can see the output that will be created. Bold is always <b></b> at the end and italic the same.

Learning curve is simply nonexistent if you’re just a casual user. Just one look at the cheat sheet and you’re ready.

The Bad

We don’t know what we want.

Do we want UI? Do we want a programming language? We don’t know. The only reason feature creep exists is because of unclear specifications.

You want a MINIMAL easily legible markup language, you have markdown. Simple as that right?

well…

(output taken from dingus)

# Hello

*I am an*
__Unambiguous__
> Grammar
<h1>Hello</h1>
<p>
    <em>I am an</em>
    <strong>Unambiguous</strong>
</p>
<blockquote>
    <p>Grammar</p>
</blockquote>
Hello
=====

_I am an_
**Unambiguous**
> Grammar
<h1>Hello</h1>
<p>
    <em>I am an</em>
    <strong>Unambigious</strong>
</p>
<blockquote>
    <p>Grammar</p>
</blockquote>

I hope you have the 2 eyeballs enough to see that markdown is NOT what you asked for. These 2 produce IDENTICAL output. And this is just the tip of the iceberg?

It has so many poor decisions baked in that if you try to use it it will actively fight against you the moment you think you know what you’re doing.

Exhibit A: bold, italic, bold-italic, ???

In markdown you can write a bold in different ways. **bold**, __bold__, <b>bold</b> are some of the ways a valid bold can be written. And these are for commonmark. If you’re using something which isn’t marketing itself as “CommonMark™ Compliant®©” You can very well encounter valid stuff that produce the same input. Like:

  • _*bold*_
  • *_bold*_
  • _*bold_*
  • *_bold_*

Truly magnificent.

And please don’t let me get started on layered ones like:

***Peter* Piper** _Picked___a___Pack_ *of** Pick_led_* Peppers

or this:

*****\\*a*

This thing is actually so peak that we have class of parser vulnerabilities called ReDoS (Regular Expression Denial of Service) affecting this. Like this 6.9 (nice) severity level CVE for markdown-it

“markdown-it” is one of the most worked out, clean and easy to understand libraries for Markdown. I simply love markdown-it. The fact that even this library is affected by it shows the absolute state of how bad the situation is.

Exhibit B: __asm__ Was a Good Idea, but This?

In old languages where compilers were producing optimal code like a river in a desert. Inline assembly helped them write performance critical code with ease in the cost of the compiler engineers blood, sweat, tears, and the birth of their firstborn son.

It allowed stuff like SIMD operations before the compiler put support for them. If you want an overview of early SIMD generation failures can take a look at here

Now let’s take this wonderful idea and bolt it directly into the most bloated, single threaded, sandboxed environment expecting a simple and easy way to write documents. And this is how inline HTML inside Markdown was born!

Inline HTML allows you to do stuff like.

# Hi

I am a <ins> simple </ins> _programmer_ doing
<span class="fancy-text"> elegant </span> programming.

<div class="animation">

And here is my portfolio

</div>

Isn’t this just simple! Isn’t this just neat! The main reason why correct markdown parsing is exceptionally hard isn’t because Markdown syntax is so hard to comprehend. It’s only 1/10th of the issue. The real issue is that to ship a Markdown parser you also need a to ship a friendly HTML parser. And if you’re using HTML inside Markdown. Why not use HTML from the start!

Said the person writing this using all the bells and whistles known to man, which are NOT in the standard.

Markdown in an of itself isn’t powerful enough to satisfy the simple monkey brained developer like me who is only satisfied when the site looks good enough™.

Good enough™ in this case means it needs to have at least basic $\LaTeX$ with Tikz support with the ability to install packages, PlantUML, Mermaid, custom styling, custom shortcodes, tagging and taxonomy, proper footnotes, Bibtex support…

I don’t want a simple job from this simple tool too. To nail a painting to the wall I need a hammer. In this case the hammer is markdown. But if I wanted to paint it too, I will break the canvas the moment I try to paint with the hammer.

Breaking the canvas also means a whole lot of CVEs, primarily around XSS vulnerabilities.

Inline HTML Related CVEs
  • CVE‑2025‑24981 (XSS vulnerability)
  • CVE‑2025‑46734 (XSS vulnerability)
  • CVE‑2025‑7969 (XSS vulnerability)
  • CVE‑2025‑60312 (XSS vulnerability)
Every time we allow inline HTML, plugin hooks, or embedded execution engines, we expand the attack surface.

The result is predictable: recurring XSS vulnerabilities across major Markdown implementations. And much, much more. And this will continue to rise with the vibe sh*tted parsers hitting the market.

Exhibit C: Obscure and Old Syntax

Markdown like all the good tech we have around the WorldWide Web, was made in the good old 00’s! Markdown was inspired by preexisting conventions for marking up plaintext in email and usenet posts (ref).

Mind you before 2000’s most serious mails looked like this

From: k...@rational.com (Kent Mitchell)
Subject: Re: Does memory leak?
Date: 1995/03/31

Norman H. Cohen (nco...@watson.ibm.com) wrote:
: The only programs I know of with deliberate memory leaks are those whose
: executions are short enough, and whose target machines have enough
: virtual memory space, that running out of memory is not a concern.
: (This class of programs includes many student programming exercises and
: some simple applets and utilities; it includes few if any embedded or
: safety-critical programs.)

This sparked an interesting memory for me.  I was once working with a
customer who was producing on-board software for a missile.  In my analysis
of the code, I pointed out that they had a number of problems with storage
leaks.  Imagine my surprise when the customers chief software engineer said
"Of course it leaks".  He went on to point out that they had calculated the
amount of memory the application would leak in the total possible flight time
for the missile and then doubled that number.  They added this much
additional memory to the hardware to "support" the leaks.  Since the missile
will explode when it hits its target or at the end of its flight, the
ultimate in garbage collection is performed without programmer intervention.

--
Kent Mitchell                   | One possible reason that things aren't
Technical Consultant            | going according to plan is .....
Rational Software Corporation   | that there never *was* a plan!

Do you see the beauty of it? The quote syntax, the vertical separation with pipe (|) characters the 80 line length limit. This was what markdown was needed for (which it would fail horribly at, because in markdown you can’t divide the screen into N parts without inline HTML black magic)

But because of this legacy, you have 2 different ways of writing headings, the normal way and the ATX way. You have 2 different ways and more than 2 CVEs of writing bold and italic. You have 2 different ways of writing horizontal rule which collide which one of them collides with the setext header syntax. You have 2 different ways of writing unordered lists. You have an ordered list which don’t care about how you ordered them. And you have a footnote syntax which moves this entire grammar to context dependent grammar.

This language is literally the C++ of markup languages. Nearly everything can be done in 2 different ways which some of them might allow XSS and somehow leak memory in html.

It’s used everywhere but it also fails everywhere, equally.

Exhibit D: You Can’t Simply Just Parse It, You Also Have to Use It.

Parsing Is the Easy Thing

If you remember I told you that the footnote syntax moves this grammar up to a context sensitive one. Let me elucidate you about it more.

test [^1]
<p>test [^1]</p>
test [^1]



[^1]: hello
<p>test <a href=″hello″>^1</a></p>

*formatted & I had to change " to ″ because technical reasons

Actually footnotes are not supported in CommonMark, they have links. The only notable difference is that link syntax means you can put only one word after the definition as opposed to a shameful explanation of why bananas on pizza is a good idea.

Reference-style links and footnotes require global definition resolution. A token’s meaning depends on declarations elsewhere in the document. That breaks purely context-free parsing assumptions. Ergo the update to a CSG from a CFG.

Which is a lot of words for saying.

If you want a simple language, stay simple.

Rendering Is the Hard Thing

I think this is the most controversial part about this rant. Because nobody wants to admit that what we use and need are two very different things.

Vanilla Markdown needs a transliterator which is literally a 1:1 mapping function. You see **bold**, <b>bold</b> comes out.

But modern Markdown has to support stuff like footnotes which takes this simple transliterator into a full blown compiler. And after you’ve made this step, here’s the ladder you’ll climb.

I need a personal knowledge management system

requirementtechnicalityresult
I’ll need footnotesCFG to CSG grammarbasic compiler
I’ll need custom callouts*hooks that bind HTML with mdDependency graphs
I’ll need mathNeed typesetting library integrationMore complex dependency graphs with execution engines needed
I’ll need custom stylingCustom CSSCSS injection with file based scoping rules

(* like tera templating / shortcodes)

I swear what I have to do in Obsidian work to my needs using Markdown makes it seem like Notion made a sane decision copying Word / Excel internally.

Solution?

If you ask the guy with a majestic beard with 30+ years of experience in 8086 assembly, the answer is plain text.

If you ask the guy with a Macbook and openclaw running at 3 different Mac minis making a startup a day, the answer is mdx.

If you ask the guy who codes in C for a living, the answer is ReStructured Text (.rst)

If you ask me. The answer is none. All are broken in their own ways. Plain text is beautiful but I can’t show it to somebody that doesn’t know what a null pointer dereference is. ReStructured text is wonderful if you only read it and never write it. And mdx is so busy trying to be html, it forgets it needs to be legible.

And the biggest problem in all of them is that, they don’t have a build system. We cut corners in the name of speed. If they had a build system, a sane unambiguous legible syntax purpose built for what it’s needed to be. I think all of the problems will be fixed.

Don’t allow inline HTML, allow for well defined shortcodes and functions to manipulate the text.

Allow for custom hooks to be executed before, during and after the compilation.

And most importantly define everything. What we need is a custom built tool, not a Frankenstein’s monster that we dare to call a language.

I’m not saying Markdown / Markup should be a programming language; I’m saying we are trying to use it like one, and because it lacks a formal foundation, it’s failing at both.

I genuinely think a proper build system around a saner markup language with compile time hook support can fix a lot of the problems around this with the right constraints. At this point we should just let go of Markdown for good and look for our answers elsewhere. Preferably with a trivially parsable grammar.

What Do I Mean by All This

You can skip this if you understood what I’ve talked about.

From a formal language theory perspective:

Markup Language (Encyclopedia Britannica)

Markup language, standard text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts…

Programming Language (Encyclopedia Britannica)

Computer programming language, any of various languages for expressing a set of detailed instructions for a digital computer…

As you can CLEARLY see. Britannica is not the way to go. So I’ll use the other thing that I always trust in this life. Which is common sense and define a programming language as.

A Language is a Programming Language if it is Turing Complete

For the definition of a language I’m taking the only source of truth which is “Chomsky’s framework” from 1956.

basically

Language: A set of strings (finite sequences of symbols) over a finite alphabet. $$L \subseteq \Sigma^* $$

To recap an alphabet ($\Sigma$) is basically a set with all the letters that you can imagine. Or realistically anything that you can put into UNICODE without breaking your preferred OS’ parser.

$$ \begin{align} \Sigma &= \lbrace a,b \rbrace \newline \Sigma_{Unicode} &= \lbrace char | char \in \text{Unicode Standard} \rbrace \end{align} $$

  1. is a minimal alphabet that I made up.
  2. is the alphabet that we use from now on when I mean an alphabet.

And any language is $L$ where $L$ is any set of string made from the symbols from $\Sigma$

$$ L = \lbrace klasfdushsda,ksadhf, … \rbrace $$

is a valid language go figure.

TypeNameFormal Definition
0Recursively EnumerableGenerated by Turing machines
1Context-SensitiveGenerated by linear-bounded automata
2Context-FreeGenerated by pushdown automata (e.g., programming language syntax)
3RegularGenerated by finite automata (e.g., simple patterns, regex)

In the food pyramid of languages, Context-Free languages are our wheat and regular languages are our protein. A regular language is what your basic regex can parse. A Context-Free language is what BNF/EBNF can describe and context-sensitive is what we call C++ syntax. I hope you don’t have to deal with the latter.

Recursively Enumerable languages are weird and is outside the topic of this coffee fueled rant of a blog. But just to peak your interest. This technically is a valid Type-0 language

$$ HALT = \lbrace (P, x) | P \texttt{ halts on input } x \rbrace $$

Which brings us to our second point.

Turing Completeness: A computational system that can compute every Turing-computable function is called Turing-complete (or Turing-powerful). Alternatively, such a system is one that can simulate a universal Turing machine. (Wikipedia)

A Turing-computable function is in layman’s terms. A function that can is either going to return a value of something or die trying (i.e. halt) in either finite or infinite time.

Turing Computable

def good_function(inp: int):
	ret : int = 0
	for i in range(inp):
		ret += i
	return ret

Not Turing Computable

def busy_beaver(n: int):
	... # good luck

Why this compsci-101 lecture about programming language / language taxonomy? Because I want you to understand that simple “wants” could cause a big change.

The thing separating a set of mathematical axioms from provability to non-provability is the want of recursion (self-reference), which manifests itself as the common multiplication operation. A trivial operation can have nontrivial reactions.


More chaos?

EventLink
Jeff Atwood calls Gruber “negligent” (2009)https://blog.codinghorror.com/the-future-of-markdown/
Standard Markdown → CommonMark rename (2014)https://www.metafilter.com/142475/Standard-flavored-Markdown
Ars Technica: Markdown throwdownhttps://arstechnica.com/information-technology/2014/10/markdown-throwdown-what-happens-when-foss-software-gets-corporate-backing/
HN: “worst small program” quotehttps://news.ycombinator.com/item?id=4700160
Karl Voit: “Markdown is a Disaster” (2025)https://www.osnews.com/story/143128/markdown-is-a-disaster-why-and-what-to-do-instead/

If you liked my banter here, you can also like these posts! I hadn’t read them before actually writing this. My history with Markdown is as self contained as it gets. I have used it in forums and predominantly in Obsidian. My thoughts here are of my own. I don’t support anybody but I do think some people are more right than others.