Update 2

Thanks to this great list of open-source apps for Mac OS X there another option to look at: AbiWord. I’ve not tried it yet but it looks set to beat my non-starting attempts with OpenOffice (office Mac is PowerPC without X11). Even better – it seems to state that it can do the job. TBC.


What a difference a day makes

Despite yesterday’s test proving otherwise – today it seems that I can use Paste Special in to GoLive and keep all formatting. Today I can “Paste As” and chose “Cleared HTML (Removes exotic Markup)” over the limited “HTML” option which was all I could do yesterday. Just a few extra non-breaking spaces and p tags for the line breaks – but otherwise perfect unicode for the web.

I think that I may have not been pasting directly from word. Not sure. Anyway this is now the best solution for when I’m about to help update content.

[Posted to the Microsoft Word forum here]

I’ve been searching for a while now and have found no simple solution for this issue. I’m working to set up a CMS (Drupal in this case) and want to find a way to enable the writers – using Word 2004 – to upload their own content, properly styled in clean XHTML.

I want to avoid any extra steps as more steps leads to more chances for errors to creep in. The only formatting needed is semantic content; just HTML body content without extraneous Word Roundtrip information or formatting at all as all design should be defined by using CSS stylesheets.

I just want the basic stuff i.e. h1-h6 headings (defined at the authoring stage, using Word’s standard styles), bold, italics and quality typography (all accents, “curly” quotes and em-dashes) properly encoded into human readable XHTML entities (ie “&” becomes “&”).

I’m worried that I’m going to have to compromise on quality or make, what to my mind should be basic functionality, a laborious and error-prone process…

Does anybody have a solution? Is this doable by hacking/editing the “Word Conversion Options” or “com.microsoft.Word.prefs.plist” files?

[The basic structure: headings and paragraphs; bold; italics and accents (as unicode) can be handled by the CMS’s interface thanks to TinyMCE and its Paste From Word function – but this cannot handle typographic features such as proper curly quotes and em-dashes.]