WP to OPML – help needed getting stuff into #fargo #writing #htb2013

I love writing and editing in Fargo. It really works for me to organise my thinking this way.

I’ve been posting to my wordpress.com blog through Fargo since the gateway was introduced. I like that too, no more messing with all the bells and whistles of the wordpress interface, just the clean writing window and all my current outlines in easy reach.

I’m starting to pull together my writing from the last few years. This was partly prompted by the demise of posterous.com where I’d been doing a lot – I used it as the blogging platform for two trips coast-to-coast across the USA for example.

I put all the posterous stuff away in wordpress because that seemed to be the logical place, but now that I’m looking at making some longer-form works out of the blogs that I wrote I actually want to get large chunks of content out of wordpress and over here into Fargo.

Post by post, I could do that with cut and paste, but I wondered if there was a programmatic solution – could I make wordpress spit out a bunch of posts in OPML format which could then be read by Fargo?

My aim was to create a feed template so that I could supply Fargo’s “Open by URL” with something like http://lloyddavis.co.uk/?feed=outline and I’d have a bunch of headlines (something that could also make sense of the dates would be nice later).

I’ve put up the code I’ve got for this so far as a gist on github. The way it should work is that it builds a string – $outlined – making the post title into a headline and creating nodes under that headline for each paragraph in the post. It’s installed on http://lloyddavis.co.uk – you should be able to see it in action by adding “?feed=outline” to, for example, a category URL and then view source.

However, this is the first time I’ve written more than about 5 lines of code anywhere in 20 years since my career swerved away from the code face. Last time I wrote anything serious, Think Pascal was the cool IDE to be using…

I’m at the stuck place where I feel like I’ve tried everything and can no longer see the wood for the trees. I’d appreciate some help from anyone with PHP & XML chops who can guide me through. I’ve asked on twitter for help at Hack The Barbican, but I’m also putting this out to the Fargo development community.

I’d also appreciate anyone saying “No, don’t do it that way, there’s an easier way to get what you want” 🙂

The issues I’m coming up against are mainly I think the inconsistent way in which the post content has been encoded when posterous was archived. Though of course there could have been some gunk in my HTML when I wrote the posts in the first place!

So typical issues include finding “

” in between paragraphs instead of the proper tags at either end. Also confusion over quotation marks.

I’ve tried converting the content string $cont to htmlentities in line 30 but that’s currently out – when it’s in the lines 43-46 are needed to decode some entities that Fargo seemed to choke on.

I’m assuming these are standard problems when building XML from other data sources so I’m looking forward to getting other people’s input.

For reference, the spec for opml is at http://dev.opml.org/spec2.html