This is a read-only archive!

Vim joy

Problem: There are ~40 sections of a pen-and-paper survey. The various sections were written by many many different people, non of them programmers, most of them with no idea their work would ever be the basis for a program. They are all in Word format and they're all laid out completely differently. Some of the questions are numbered, some questions are labeled with a name, some are unlabeled. Some sections have 4 questions, some have hundreds. Most are multiple choice, but some are fill-in-the-blank, and some are a mixture. Of the multiple choice, there's some repetitions of response options, but not much. Some options are capitalized, some aren't. Some things are laid out in flat lists, some in tables, some horizontally, some vertically, some tab-separated, some space-separated, some a huge mess of things. Of the things that should be consistent, there are a lot of typos, because everything was hand-written, and it wasn't necessarily ever intended to be computerized.

All of that needs to be translated into a certain survey / data collecton programming language.

Solution: Vim.

It took me about a day and a half to get that done at work using Vim. Probably my first instinct was to write a parser in Perl or Ruby. But the stuff is so inconsistent that writing a program that could solve it would be more work than solving it by hand, and it probably still wouldn't have worked right because certain things needed to be cleaned up manually anyways.

There's a set of problems of this sort, where you're dealing with messy data that nonetheless has some small level of consistency to it. What you want is to be able to let the computer deal with the things that can be handled consistently, but deal with the other things manually. You need to be able to steer the computer at all times to catch the quirks that result from human error in your data. That's what something like Vim lets you do. It's an interactive state engine / text parser / text generator / language-to-language compiler / regular expression engine, where you can step through a mental "program" to solve a problem, executing one command at a time, writing it as you go, with the invaluable ability of undoing your mistakes at any point. It would be hard to write a script in another programming language to do the same thing, unless the script you wrote ended up being Vim.

It's amazing what you can do with very few Vim commands. I used :g + norm a lot, along with :s, visual block mode, a few q-recorded macros and a few mappings, to do most of the heavy lifting. Not much more was necessary, other than elbow grease and careful attention.

About halfway through this task, on a whim I wrote a vim syntax-highlight file for this survey language, which only took maybe 15 minutes. It's very easy to write a syntax-highlight file for a new language for vim if you can get a good complete list of keywords for your language from somewhere, and aren't too concerned about being thorough or fancy. Even properly highlighting strings via a simple

syn region SomeString start=+"+ end=+"+ skip=+\\"+
hi link SomeString String

can do wonders for helping you quickly notice strings that have embedded quotes in them or aren't properly closed; invaluable for me when cutting/pasting mounds of handwritten text from Word.

Speaking of Word, the bane of my existence: copying/pasting from Word to plaintext results in a huge mess, as you can well imagine. You not only end up with tabs, you end up with Word's stupid special backwards quote marks and funky single-character ellipses and God knows what else. Vim is amazingly good at handling that kind of thing. A quick :set list can show you stray tabs, and :retab can turn the tabs into spaces. Vim will also make its best attempt at displaying any other unusual characters you throw at it. You can enter control characters and other odd usually untypeable things yourself via Ctrl-V in various modes, if you need to tell Vim how to deal with such things.

To help with some of the coding I found this script which lets you highlight a column of numbers (or letters, or roman numerals, or hexadecimal) and turn it into an incrementing list of numbers. I can't count how many times I've wanted to do that and have done it with little quicky vim mappings. That script works very well in comparison. It was very helpful in producing lists of numbered questions and response options without making any typos myself.

And I still often feel like I've just scratched the surface of what Vim can do. I'll probably be learning it until the day I die. It's an amazing piece of software.

October 18, 2007 @ 4:47 PM PDT
Cateogory: Programming
Tags: Vim