WordPress and importing/exporting content Part 2

by Suw on January 9, 2007

Last month I was struggling with trying to import all my old posts from Blogware into a local WordPress install. Well, I finally managed to get it sorted, sort of.
The Blogware import script, written by Shayne Sweeney, that I had found via Chris Pirillo didn't seem to work. Every time I tried to use it, it just… did nothing. Kevin Marks had a look at it and my Blogware XML file, and suggested that I try importing it in smaller chunks. The Blogware XML indicates individual blog posts with <item></item> tags, so as long as you don't split the file inside an <item> tag you're ok. I ended up splitting my archive into nine smaller files and the importer worked fine.
Unfortunately, Blogware's export script makes no differentiation between a line break and a paragraph break, so posts are run together without paragraph breaks. Every one will have to be edited by hand to add the paragraph breaks in.
I have no idea why Blogware have done this. In fact, it's an ongoing bugbear of mine that developers the world over seem to have some weird bias against the paragraph break, even though it's important. And no, a paragraph break is not simply two line breaks one after the other – it is a different beast and it should be respected. It's in situations like this where you see why – by not respecting my paragraph breaks you turn my blog posts into one great big ugly blob of text. Cheers.
The next challenge was importing stuff from Blog-City. Kevin Marks took Shayne's original Blogware import script and tweaked it a bit, giving us this Blog-City import script. There are no categories in Blog-City posts, so they all get dumped in a 'Blogcity' category, which is fine for my purposes.
The first problem was that Blog-City had created two archives of photograph upload data, so I had to delete 150 pointless 'posts'. We then discovered that Blog-City had two dates in its XML archive:
<dateopened>2004-04-04T00:00+00:00</dateopened>
<dateupdated>2004-04-04T15:30+00:00</dateupdated>
Kevin assumed that the 'date updated' date was the date to take for determining when a post was posted, as it has a time stamp, but unfortunately it has absolutely no relevance to anything. I suspect it may have been the date stamp for when the post was exported. The date opened date, however, is the one you want. But it has no time stamp. It seems that I pubished all my blog posts spot on midnight. Wow. How OC is that?
This means that on days when I posted more than once, I have no way of telling which post came first (and I can't go back and check because the lovely people at Blog-City deleted my blog). For my purposes, however, this is not a problem, but it is rather shoddy work if you ask me.
Initially, the import script was ignoring the differences between line breaks and paragraph breaks too, but whereas we could find no cure for this in Blogware's export, Kevin was able to cure it for Blog-City, so I have my paragraphs. However, some of the encoding's screwed, so Kevin had to make a special case for apostrophes in order to turn them from a mush into proper apostrophes. Some of the encoding remains screwed, but I'll have to fix that by hand.
So now I do have all my posts in one database. Any new comments will be missing, but in general I have everything up to 6 December 2006. That's enough for me for now.
My next decision is, do I stay with Blogware or leave? But that's a topic for a different post.

Anonymous January 11, 2007 at 8:29 am

Just a test to see if comments are working.

Anonymous January 11, 2007 at 9:22 am

Suw, it's good to know your data are coming back. Not having them would be like losing numerous episodes of The Truman Show. It will be quite interesting to discover whether you decide to leave Blogware, and what reasons prevail in the end.
Best to you.
huw
p.s. I see that comments are working again. A short time ago, clicking on 'Post a Comment' netted nothing, but now it's normal again. Blogware bug I suppose.

Anonymous January 11, 2007 at 10:06 am

Hi Suw — I think the Comments problem is still persisting. I'm trying it again right now and sending you screenshots.

Anonymous January 11, 2007 at 10:10 am

Well, maybe I don't need to send the screen shots, as it appears now to be working fine. (Seems like taking your auto to the mechanic and it stops making the noise, eh??)
Anyway, it's good to know your data are coming back, and it will be interesting to see whether you migrate from Blogware. And why.

Rodney January 12, 2008 at 12:58 am

I’m getting ready to finally migrate my blog from blogware to wordpress.

Thanks for posting the link the the blogcity import script zip file.

How were you able to get your posts back in their right date order?

Comments on this entry are closed.

Previous post:

Next post: