Convert a LaTeX-created .pdf file to MS-Word .doc

Unfortunately, sometimes people want me to convert some beautiful LaTeX to MS-Word. I first tried to convert the .tex to .html and open the html file in MS-Word, but the result was usually a big mess. The best way I found so far is described here:

  • Produce a pdf with pdflatex, omitting all figures, page numbers, line numbers, etc. Fortunately, it is easy in LaTeX to turn this all off.
  • Generate a pdf file from your *.tex sources as usual.
  • Use pdf2text to convert the pdf file to a UTF-8 encoded text file:
     pdftotext -enc UTF8 vermeer2008.pdf
  • Open this text file in MS-Word

Not everything is converted correctly:

  • You probably want to replace the ligatures ("ff", "fi" and "ffi") with normal letters (use search & replace).
  • The lay-out and font-settings are all lost (subscript; superscript, etc.).
  • Math will probably have to be re-written in the MS-Word equation editor. Although the equations could be added to the document as images, that is not the best solution if you want a "standard" MS-Word document as a final product.

Despite the many drawbacks, I found that this is the most reliable way to convert the document. It is a lot of work, but for me it worked better than any other method. The method described here is easier than using conversion programs that try to preserve formatting, because in my experience they only make a bigger mess. I am always willing to learn though, so if you know a better method please let me know!

This entry was posted in latex, linux. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *