Tuesday, January 9, 2007

Last month I was struggling with trying to import all my old posts from Blogware into a local WordPress install. Well, I finally managed to get it sorted, sort of.
The Blogware import script, written by Shayne Sweeney, that I had found via Chris Pirillo didn't seem to work. Every time I tried to use it, it just… did nothing. Kevin Marks had a look at it and my Blogware XML file, and suggested that I try importing it in smaller chunks. The Blogware XML indicates individual blog posts with <item></item> tags, so as long as you don't split the file inside an <item> tag you're ok. I ended up splitting my archive into nine smaller files and the importer worked fine.
Unfortunately, Blogware's export script makes no differentiation between a line break and a paragraph break, so posts are run together without paragraph breaks. Every one will have to be edited by hand to add the paragraph breaks in.
I have no idea why Blogware have done this. In fact, it's an ongoing bugbear of mine that developers the world over seem to have some weird bias against the paragraph break, even though it's important. And no, a paragraph break is not simply two line breaks one after the other – it is a different beast and it should be respected. It's in situations like this where you see why – by not respecting my paragraph breaks you turn my blog posts into one great big ugly blob of text. Cheers.
The next challenge was importing stuff from Blog-City. Kevin Marks took Shayne's original Blogware import script and tweaked it a bit, giving us this Blog-City import script. There are no categories in Blog-City posts, so they all get dumped in a 'Blogcity' category, which is fine for my purposes.
The first problem was that Blog-City had created two archives of photograph upload data, so I had to delete 150 pointless 'posts'. We then discovered that Blog-City had two dates in its XML archive:
Kevin assumed that the 'date updated' date was the date to take for determining when a post was posted, as it has a time stamp, but unfortunately it has absolutely no relevance to anything. I suspect it may have been the date stamp for when the post was exported. The date opened date, however, is the one you want. But it has no time stamp. It seems that I pubished all my blog posts spot on midnight. Wow. How OC is that?
This means that on days when I posted more than once, I have no way of telling which post came first (and I can't go back and check because the lovely people at Blog-City deleted my blog). For my purposes, however, this is not a problem, but it is rather shoddy work if you ask me.
Initially, the import script was ignoring the differences between line breaks and paragraph breaks too, but whereas we could find no cure for this in Blogware's export, Kevin was able to cure it for Blog-City, so I have my paragraphs. However, some of the encoding's screwed, so Kevin had to make a special case for apostrophes in order to turn them from a mush into proper apostrophes. Some of the encoding remains screwed, but I'll have to fix that by hand.
So now I do have all my posts in one database. Any new comments will be missing, but in general I have everything up to 6 December 2006. That's enough for me for now.
My next decision is, do I stay with Blogware or leave? But that's a topic for a different post.

{ Comments on this entry are closed }