I have downloaded a number of my posterous blogs and they’ve been sitting there looking at me. I also made a resolution recently to get as much of my stuff together under lloyddavis.co.uk. That meant some faffing with domain registrations, hosting and DNS but I finally got that sorted out last week.
I made a couple of false starts with the posterous archives. I foolishly thought that because they said they were wordpress export files, that they’d just be straightforward to import. Well no. Quelle surprise! The wordpress importer doesn’t like them at all and the posterous import plugins I tried made a mess too.
The most obvious thing first of all was that everything came through in the “Uncategorized” category. I went round in circles with this one, but in the end came up with the following recipe.
This example uses the tuttle consulting blog, note that I use Ubuntu, it’s probably much easier to load the file into a text editor and do fancy search and replace stuff, but especially after I spent the weekend at barcamp, I was in the mood for command line shenanigans…
- Extract the export file and shorten it’s name for typing convenience. I called mine tce.xml (because it was the tuttle consulting export)
- [please excuse my hacking, there's sure to be a more elegant way of doing the same thing - do let me know if it's obvious to you.]
- run sed ‘s/>\t/>\n\t/g’ tce.xml > tce1.xml to insert a newline between all closing brackets followed by a tab otherwise it’s just all one line and grep doesn’t do anything.
- run grep -v Uncategorized tce1.xml > tce2.xml to strip out the lines tagging posts as Uncategorized
- run cat tce2.xml | tr -d ‘\n’ > tce3.xml to remove the newlines because the importer didn’t seem to like them
- if you don’t already have one sign up for a wordpress.com account, we’re going to use that to import and export in a friendly format.
- create a new wordpress.com blog tcposterous.wordpress.com
- create a new “tuttle-consulting” category and set it as the default for new posts
- realise that not all posts were written by me, I think the fix for this is to note all of the post authors names (they’re in a dc:creator tag in the xml file) and then create them as users on the blog.
- ask the wordpress.com importer to do it’s magic
- wait while wordpress.com did the import (for this file it was a minute or so, but larger files do take longer)
- export from wordpress.com
- import this file to my own wordpress on lloyddavis.co.uk
- stand back to admire your work
- Outstanding Issues
- commenters’ names weren’t exported properly in the first place. They are in the html files, but somehow didn’t get converted in the export file… this needs some work before the bigger ones get done.
- there are a few coding glitches – tags as <> etc
- an audioboo which I’d attached to one post didn’t turn up. I said yes to downloading attachments but I’m not sure what that’s actually doing
- one post with a photo from flickr came through alright (ie links back to flickr), but one didn’t there’s a copy in the archive and that’s now been uploaded to my server. I presume this is to do with how they were embedded in the first place.
- No guarantees or warranties, YMMV etc. All suggestions for improvement gratefully accepted.