This is a read-only archive!

Let's parse

Is there anything more fun than parsing strings? I submit to you that there is not. I'm currently reading my way through Parsing Techniques - A Practical Guide, which has a first edition free online. (I'm hoping Santa brings me a copy of the 2nd edition this year.)

This is a good book, with enough math to be rigorous but not so much that it's completely unreadable. It starts from the absolute basics ("What's a grammar?") and goes through the Chomsky hierarchy and then dives into parsing techniques in great detail, in a language-agnostic way.

Languages and grammars are fascinating. In high school I studied Spanish, French, Latin and German, largely in my spare time. When I was 16, if people asked what I wanted to do for a living, I said "translator".

The plan to become a translator failed partly because the quality of my early education was horrendous and partly because mastering a language is extremely difficult and at 16 I wasn't motived enough. And then computers showed up in my life, which gave me a never-ending supply of languages to play with, while being fun (and profitable) in so many other ways. But I still took two years of Japanese classes in college for no reason other than enjoyment, and I'm still trying (and failing) to learn Japanese in my spare time 8 years later.

Perl was my first favorite language probably for no reason other than regular expressions. I can understand how people call PCRE syntax line-noise, but to me it's beautiful line noise. I live and breathe regular expressions nowadays. My favorite CS class in college was one where we went through and laboriously built finite-state automata and pushdown automata and Turing machines. Seeing the equivalence of these simple machines with the different classes of grammars was a huge epiphany. Such a simple concept with such huge consequences.

Dijkstra said:

Besides a mathematical inclination, an exceptionally good mastery of one's native tongue is the most vital asset of a competent programmer.

I strongly agree with that sentiment. People tell me at times that I'm good at written communication. I have my doubts, and anyways I find it funny because I'm so terrible at verbal communication. I think if I have any success at writing, it's because I view writing as a mechanical process.

I told a prof in college once that I felt like my papers wrote themselves once I had an idea in mind. There are rules of grammar and style, and you learn them and follow them, or break them deliberately if you have a good reason to. You write some prose, then you debug it until it "works" mentally. I don't care about typos and I split infinitives and comma-splice on purpose, but ambiguous or awkward phrases usually stand out to me like compiler bugs in my brain.

What's more important than language? Few things. Language is important enough to be nearly hard-wired into our brains. Children learn it instinctively. Human beings can still easily and effortlessly out-perform the best supercomputer at the task of parsing and interpreting speech. We think in words. The programming languages computers understand are dirt-simple by comparison, but writing code still feels like writing "thoughts for the computer" sometimes.

There are very few times you'll hear me say "What a wonderful world we live in". But one of those times is when I have the opportunity to explore an area of study like language. It's such an enjoyable experience to struggle and try to master such a thing. It's an amazing universe where we have these weird little rules and they work and we can understand them and manipulate them and produce things with them.

December 15, 2009 @ 12:46 PM PST
Cateogory: Programming


Chris Kleeschulte
Quoth Chris Kleeschulte on December 15, 2009 @ 11:59 PM PST

You do, indeed, have a knack for the written word. This blog post could of and should of been written by me. I feel exactly the same. Your description of your background in languages is the same as mine. While learning theorethical Computer Science at University, I was completely facinated that there existed a man, Alan Turing, who concocted this machine that still stands as the pinnacle electronic computing at this time. So from Finite State Machines, to Push Down Automata, to Turing Machines we reach the limits of modern Computer Science. We leverage our human languages to create a clumsy approach to asking machines to do our bidding. The total expressive power of the Turing Machine is not yet realized, but I think that a new machine will be needed to solve NP hard problems. It would be grand if a machine could "think" in complete words without having that expensive tokenizing step that recursive decent parsers must do.

Pas B
Quoth Pas B on December 16, 2009 @ 1:48 AM PST

I quite enjoy and appreciate the linguistics background that Larry Wall brings to Perl. Reading "Programming Perl" was an adventure not just in programming, but also to some degree in linguistics.

The first post-modern programming language, indeed! ;-)

P.S. I'm also reminded off Knuth's book on TEX, where he didn't just author a typesetting and layout system, he enjoyed and worked hard to really grok the discipline, and then shared his insight as well as and as context for the particulars of TEX.

Quoth Steev on December 16, 2009 @ 2:07 AM PST

Could/Should -have-, Chris.

Quoth Brian on December 16, 2009 @ 10:16 AM PST

That reminds me, my next post should be about programmers and their proclivity to being grammar Nazis. I kind of fall into that category myself.

Quoth Svante on December 16, 2009 @ 6:23 PM PST

I strongly object to the term "grammar Nazi". It gives a way too positive spin on the word "Nazi". Call it "grammar police", "grammar fanatic" or similar. I wouldn't want anyone to feel compelled to identify himself with the term "Nazi".

Correcting "should of" to "should have" is not something anyone should be annoyed of. One should rather be horrified by the frequency of this error, because it indicates a complete failure of grasping the very basics of english grammar, no offense intended. Look at this simple transformation: "I have written this article." --> "I should have written this article."

Looking at public fora and blogs, I have the impression that a large proportion of native english speakers have similar problems. The inability to discern "their", "they're/they are", and "there", or "its" and "it's/it is" seems to be the rule rather than the exception.

The correct spelling of "theoretical", "fascinate", and "descent" can be seen as an advanced topic, but shouldn't at least the most common building words be internalized?

Finally, why are the people who point out these embarrassing mistakes so often portrayed as abnormal, instead of those making them?

David A. P.
Quoth David A. P. on January 07, 2010 @ 5:58 AM PST

Ahem. I am virtuously restraining myself from correcting the mistakes in Svante's comments. It is delicious fun to point out mistakes in others' corrections, but -- of course -- it nearly guarantees that the new corrector will make mistakes of his or her own. This comment thread is a decent exemplar of this principle :).