Clojure and Markdown (and Javascript and Java and...)

Writing up a blog replacement for Wordpress (in Clojure) is coming along nicely. Clojure + Compojure are awesome. Most fun I've had making a website in a long while.

One problem I've run across is that I want to use Markdown for both post content and visitor commenting. I like Stack Overflow's live Javascript previews, so you can type text in Markdown and see what it'll look like as HTML, as you type it.

As I mentioned before, Markdown is very nice because it (partly) solves one longstanding issue I've had with programming blogs (including my own), namely the proper escaping of HTML and the proper formatting of source code. In Markdown you just put code in backquotes or indent it four spaces and there you go, properly escaped. Markdown is also easy and to type and read, which is a plus. I hate hate hate writing HTML by hand.

Anyways, there is no Markdown parser for Clojure, so I was going to write one. (There is MarkdownJ but it has unresolved issues.) The problem with writing my own Markdown parser in Clojure is that Markdown is not a well-specified language. There is no "official" grammar, just an informal "Here's how it works" description and a really ugly reference implementation in Perl. Most implementations (with the exception of peg-markdown and pandoc and friends) are implemented as a bunch of global regex-replacements passed repeatedly over some text.

The result is that there are a lot of Markdown parsers in a lot of languages, and they all give slightly different results in a lot of corner cases (and a lot of not-so-corner cases). The best I could do in Clojure is pick one implementation and try my best to match it.

Now, for each blog post, the server needs to store both the Markdown text and the HTML text. It needs the Markdown because if someone wants to edit content later, they need to edit the raw Markdown. It needs the HTML so that it can be cached and served to people viewing the website, obviously.

But a consequence of the above mess is that if you use a Javascript Markdown library (i.e. Showdown) to show a live preview, and then use a different Markdown library (my own or any other) to do server-side parsing of the text after it's POSTed, there's a good chance that the preview isn't going to match the real output.

One non-solution to this is to do all the parsing client-side, and POST both the Markdown and the post-Markdown HTML to the server so both can be stored, so no server-side parsing is necessary. Aside from being a horrid idea, it's a huge security risk because it leaves open the possibility of someone POSTing some clean Markdown along with some evil, un-matching HTML.

Another non-solution is to do all the parsing server-side and use AJAX to send the preview back to the client. That wouldn't be nearly as smooth or responsive as I want; on Stack Overflow for example the preview updates instantly after every keyup event in the textarea.

The ideal solution is use the same Javascript library client-side for previews, and server-side for parsing the text. Then the preview and content have a very high chance of matching. This requires some way to run Javascript on the server. Thanks to Clojure and Java and Rhino, this turns out to be trivial.

(ns bcc.markdown
  (:import (org.mozilla.javascript Context ScriptableObject)))

(defn markdown-to-html [txt]
  (let [cx (Context/enter)
        scope (.initStandardObjects cx)
        input (Context/javaToJS txt scope)
        script (str (slurp "showdown.js")
                    "new Showdown.converter().makeHtml(input);")]
    (try
     (ScriptableObject/putProperty scope "input" input)
     (let [result (.evaluateString cx scope script "<cmd>" 1 nil)]
       (Context/toString result))
     (finally (Context/exit)))))

This also saves me from having to write a Markdown parser in Clojure, for which I am thankful.

Once again I'm also thankful we live in times when CPU cycles are cheap and abundant. I'm running a Markdown parser, in Javascript, in Java, via Clojure, and this still runs essentially instantly even for very large input strings. If I had any chance of my blog becoming famous and getting a million hits a day, it might matter, but in real life I'm set.

February 22, 2009 @ 6:12 AM PST
Cateogory: Programming

12 Comments

Dirkgen2ly
Quoth Dirkgen2ly on February 22, 2009 @ 5:27 PM PST

Yeah, I looked at markdown a long time ago and I really like the thought. After writing web-sites by hand it looks like a dream. When I began to do a few wiki-pages I thought, "That be nice if blogs did that." Nowadays I've just settled on a good html-editor but I'm still thinking I might like your idea better.

david
Quoth david on April 11, 2009 @ 12:17 PM PDT

thank you! you made my day :)

and for those who wonder how to do this in pure java (I had to),

ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine jsEngine = manager.getEngineByName("js");
try
{
    jsEngine.eval(new InputStreamReader(getClass().getResourceAsStream("showdown.js")));
    showdownConverter = jsEngine.eval("new Showdown.converter()");
}
catch (Exception e)
{
    log.error("could not create showdown converter", e);
}

try
{
    return ((Invocable) jsEngine).invokeMethod(
        showdownConverter, 
        "makeHtml", 
        markdownString
    ) + "";
}
catch (Exception e)
{
    log.error("error while converting markdown to html", e);
    return "[could not convert input]";
}

joy!

Eric
Quoth Eric on September 24, 2009 @ 12:26 PM PDT

Thanks a lot for the article, that saved me from a fatal bug in markdownj!

And for the record, here is my implementation in Scala (without any exception management here):

import java.io._
import javax.script._

trait MarkdownJS {
  val jsEngine = new ScriptEngineManager().getEngineByName("js")
  val showdown = getClass.getClassLoader.getResourceAsStream("showdown.js")
  jsEngine.eval(new InputStreamReader(showdown))

  def parseToHtml(text: String) = {
    jsEngine.asInstanceOf[Invocable].
        invokeMethod(jsEngine.eval("new Showdown.converter()"),
                                   "makeHtml", text).toString
  }
}
Brian
Quoth Brian on September 24, 2009 @ 3:08 PM PDT

Cool, it's nice to see Scala and Java and Clojure in comparison.

Brian Clapper
Quoth Brian Clapper on February 10, 2010 @ 5:40 AM PST

This approach works well. I have adapted it for Scala, and I've referred to this article in a blog entry on my Scala solution. Thanks for writing this article, Brian.

Mathias
Quoth Mathias on April 30, 2010 @ 7:39 AM PDT

Interesting approach. Just as a side note: There now is another (server-side) alternative to processing Markdown in Java. "pegdown" implements an open-source, pure-Java PEG parser (based on the peg-markdown grammar) and should not suffer from the issues of the old MarkdownJ. Of course, as you write, it still wouldn't solve the compatibility problem with the browser-side processor... even though I'm not sure an AJAX solution really would suffer from a lag that would make it unusable.

Anyway, thanks for making your solution available!

Cheers, Mathias

Benjamin van der Veen
Quoth Benjamin van der Veen on November 16, 2010 @ 6:10 AM PST

In order to get this to work with Rhino 1.7R2 I had to prepend a newline to the JavaScript which instantiates Showdown and calls makeHtml.

Thanks for this example!

Mohamad
Quoth Mohamad on January 04, 2011 @ 9:13 AM PST

Is anyone aware of any implementations for ColdFusion? I know that ColdFusion integrates really well with Java, so it might not be a long shot to write a custom UDF that does this. But I have no knowledge of Java and I'm not really a programmer!

Dmitri
Quoth Dmitri on October 12, 2011 @ 10:35 AM PDT

Hi Brian,

I was just ran across this blog post while looking for a Markdown parser in Clojure. Being a completely unreasonable person, I decided to give writing one a shot. It turned out to not as crazy an exercise as I expected, and I actually came up with a reasonable one in about 200 lines. :)

I've parsed a whole bunch of different markdown examples, (mostly from github docs :), and they all come looking well.

I thought maybe you'll find it useful. It's available on github and on clojars.

It might actually run with ClojureScript out of the box (or with minimal change), which would solve the problem of having it do Js previews. :)

Brian
Quoth Brian on October 13, 2011 @ 7:01 PM PDT

That's pretty awesome. I was hoping someone would give that a shot. I'm going to have to check it out. Thanks for your work on it.

Paul
Quoth Paul on May 19, 2012 @ 9:23 PM PDT

This is very helpful, thanks. But how do you avoid XSS attacks?

I saw the src for showdown.js in the cow-blog repo has a SAFE global. Is this something you added yourself? (It's certainly not in the clone on github.)

Brian
Quoth Brian on May 21, 2012 @ 5:39 AM PDT

It is something I added. Real Markdown allows arbitrary HTML tags, but my version shouldn't.

Test: <script>///'/'/'/'</script> <> <!-- asdfsadf --> <>

Speak your Mind

You can use Markdown in your comment.
Email/URL are optional. Email is only used for Gravatar.

Preview