This is a read-only archive!

I hate the internet.

I use Wordpress for my blog. A database is used to store the posts themselves. Wordpress uses PHP to move data into and out of the database. Obviously the blog itself is displayed as HTML.

This means there are potentially three levels of escaping and un-escaping done to my text after I type text into this lovely TEXTAREA and before it's actually displayed. It's HTML-escaped in certain cases, like turning < into < (I just typed a raw < character there, note, and it was HTML-escaped automatically); it's SQL-escaped so it doesn't break whatever INSERT command is putting it into the tables; it's PHP-escaped along the way I'm sure. All my newlines are magically turned into <br> tags or <p> tags somewhere along the line too. Etc. etc.

And yet, if I type a backslash in my post, by the time it's fed through PHP, into the SQL table, fetched back out, and displayed in my blog, it will have vanished entirely. This is a problem when I post source code in my blog, for example, where backslashes are pretty common. Where along this long line of escaping and un-escaping of text is my poor backslash lost? I don't know.

Only ecently have I discovered that putting code into <code> tags will cause Wordpress do to even more escaping of special characters than usual, so that my backslashes DO survive the round trip through the system. Little did I know that, unlike in <pre> tags, newlines embedded in <code> tags are happily translated into <br> and <p> tags just like everywhere else. This messes things up quite a bit if you set white-space: pre in your stylesheet for <code> tags, for example; you'll end up with lots of extraneous HTML tags everywhere. So it turns out I had to write a Wordpress filter as a very last step in this huge mess, to undo the replacement of newlines with HTML that occurs inside <code> tags.

What does all this mean? It means that I hate the internet. Why does this have to be so complicated? Why can't we type some text in a box, and have it appear on a web page more or less as we typed it? It's a rhetorical question, I do know why. It's because the internet is layer upon layer of hack after hack. None of this crap was designed to be compatible with any of the rest of this crap. We have all these systems speaking different languages; it's no wonder things are lost in the translation. And these problems are all just the very beginning. Then comes cross-browser compatibility. And maybe throw a templating engine or two in there, for good measure. And God help you if you mix Javascript into the whole thing. I shudder to imagine being a web designer for a living. I can think of few worse tortures. I think few people realize how much of a miracle it is that any of this works at all.

October 31, 2006 @ 4:23 PM PST
Cateogory: Programming

2 Comments

Zeth
Quoth Zeth on November 02, 2006 @ 5:43 AM PST

I sympathise with your wordpress woes. On my install, for some random reason any html tags in the comments are escaped too many times so that you get a load of backslashes instead of the link or image that you wanted, very annoying! The only way to fix them is to change the item in the database itself.

>shudder to imagine being a web designer for a living

Well I am a web designer, it is not that bad, I would not do it forever but it is better than mining coal. There is always plenty to keep you occupied (blogs, slashdot, music etc).

taquito
Quoth taquito on January 16, 2008 @ 3:27 PM PST

lol I'm sorry because I know how mmuch you dislike Java/Javascript but in that last paragraph it sounded like you were about to say "And that's why you should use Java!" (supposed total interoperability)(I myself am a C++ guy)