Importing old posterous blogs to stand-alone wordpress

I have downloaded a number of my posterous blogs and they’ve been sitting there looking at me. I also made a resolution recently to get as much of my stuff together under lloyddavis.co.uk. That meant some faffing with domain registrations, hosting and DNS but I finally got that sorted out last week.

I made a couple of false starts with the posterous archives. I foolishly thought that because they said they were wordpress export files, that they’d just be straightforward to import. Well no. Quelle surprise! The wordpress importer doesn’t like them at all and the posterous import plugins I tried made a mess too.

The most obvious thing first of all was that everything came through in the “Uncategorized” category. I went round in circles with this one, but in the end came up with the following recipe.

This example uses the tuttle consulting blog, note that I use Ubuntu, it’s probably much easier to load the file into a text editor and do fancy search and replace stuff, but especially after I spent the weekend at barcamp, I was in the mood for command line shenanigans…

  • Extract the export file and shorten it’s name for typing convenience. I called mine tce.xml (because it was the tuttle consulting export)
  • [please excuse my hacking, there’s sure to be a more elegant way of doing the same thing – do let me know if it’s obvious to you.]
  • run sed ‘s/>\t/>\n\t/g’ tce.xml > tce1.xml to insert a newline between all closing brackets followed by a tab otherwise it’s just all one line and grep doesn’t do anything.
  • run grep -v Uncategorized tce1.xml > tce2.xml to strip out the lines tagging posts as Uncategorized
  • run cat tce2.xml | tr -d ‘\n’ > tce3.xml to remove the newlines because the importer didn’t seem to like them
  • if you don’t already have one sign up for a wordpress.com account, we’re going to use that to import and export in a friendly format.
  • create a new wordpress.com blog tcposterous.wordpress.com
  • create a new “tuttle-consulting” category and set it as the default for new posts
  • realise that not all posts were written by me, I think the fix for this is to note all of the post authors names (they’re in a dc:creator tag in the xml file) and then create them as users on the blog.
  • ask the wordpress.com importer to do it’s magic
  • wait while wordpress.com did the import (for this file it was a minute or so, but larger files do take longer)
  • export from wordpress.com
  • import this file to my own wordpress on lloyddavis.co.uk
  • stand back to admire your work
  • Outstanding Issues
    • commenters’ names weren’t exported properly in the first place. They are in the html files, but somehow didn’t get converted in the export file… this needs some work before the bigger ones get done.
    • there are a few coding glitches – tags as <> etc
    • an audioboo which I’d attached to one post didn’t turn up. I said yes to downloading attachments but I’m not sure what that’s actually doing
    • one post with a photo from flickr came through alright (ie links back to flickr), but one didn’t there’s a copy in the archive and that’s now been uploaded to my server. I presume this is to do with how they were embedded in the first place.
  • No guarantees or warranties, YMMV etc. All suggestions for improvement gratefully accepted.

Advertisements

Barcamp Berkshire day 2 thought & link dump #bcb13

I’m always tempted to stay overnight, but I’ve never actually done it. I may not have slept perfectly last night anyway, but I’m sure it was better than lying on a corporate head office meeting room floor.

People did come to my early morning session on “the future of blogging?” despite being up against an eight-year-old girl.

It helped me recognise what it is that I want – a bunch of people who are broadly interested in the same things, but actively eating their own dogfood. Tom Morris helped draw this out further in his session on indieweb. Tom, for example, has stripped back his blogging input box so that it’s almost the original Twitter “What are you doing ?” box – except it accepts Markdown and more than 140 characters 🙂 There’s an indiewebcamp in Portland, OR, next weekend (June 22/23) and a UK one in Brighton on Sep 8th

The main reason I’m interested in a good RSS reader is not that I’m a writer who wants other people to read, it’s that I want to be able to find and read stuff in a non-fragmented way without having to scour FB, Twitter, Tumblr *and* Son of Google Reader.

It got me thinking too about archiving and my current ongoing project to draw all my stuff together under my own domain.

There was a small but lively discussion about bitcoin. Things got most heated during the discussion about forking…

I bumped into Ketan who’d brought in this and, usefully, an early viewmaster that had batteries to backlight the screen.

After a deliciously substantial lunch I did an impromptu (well, I did put it on the grid) meeting room gig, which was much like a house gig, but, y’know with a big table, uncomfortable chairs, a whiteboard’n’shit.

And I rounded off with a geek dive into Rail API data with Paul Freeman

A really good barcamp, well-balanced for me in terms of giving and receiving. I’m feeling refreshed, encouraged and inspired. Thanks to all who played a part in making it happen.