Edit online

How to Migrate from Word to DITA

Read time: 5 minute(s)

The need for migrating Microsoft Office® Word documents to XML formats, and particularly to DITA, is quite a frequently encountered situation. As usual, migration from proprietary formats to XML is never perfect and manual changes need to be made to the converted content. However, the methods below should help you find the best approach for your particular case:

Oxygen Batch Documents Converter add-on

The Oxygen Batch Documents Converter add-on can be installed in Oxygen and it allows you to convert one or more documents to various formats.

More details about the main stages of the Word to DITA migration using the Batch Documents Converter add-on: Migrating MS Word to DITA using the Batch Documents Converter

Note: The Batch Documents Converter add-on is the recommended way to convert one or multiple Word documents to DITA content.

Smart Paste

  1. Open the Word document in MS Office, select all the content, and copy it.
  2. Open Oxygen and create a new DITA topic in the Author visual editing mode.
  3. Paste the selected content. Oxygen's smart paste functionality will attempt to convert the content to DITA.

Word to HTML to DITA

  1. Save your MS Office Word document as HTML.
  2. Once you obtain that HTML, you have two possibilities:
    • In Oxygen, Select File->Import->HTML File to import the HTML as XHTML. Then open the XHTML in Oxygen and in the "Transformation Scenarios" view there should be four pre-configured transformation scenarios to convert XHTML to DITA topics, tasks, references, or concepts.
    • Open the HTML file in any Web browser, select all of its content, and copy it. Then open Oxygen, create a new DITA topic in the Author visual editing mode, and paste the selected content. Oxygen's smart paste functionality will attempt to convert the HTML to DITA.

Word to DocBook to DITA

  1. Open the Word document in the free Libre Office application and save it as DocBook.
  2. Open the DocBook document in Oxygen.
  3. Run the predefined transformation scenario called DocBook to DITA.

Word to DITA using DITA For Publishers

  1. If the Word document is in the new DOCX format you can open it in Oxygen's Archive Browser view and then open the document.xml file contained in the archive.
  2. Run the predefined transformation scenario called DOCX DITA. This ANT scenario runs the following build file: OXYGEN_INSTALL_DIR/frameworks/dita/DITA-OT/plugins/net.sourceforge.dita4publishers.word2dita/build-word2dita.xml over the DOCX archive and should produce a DITA project that contains a DITA map and multiple topics.
  3. You may need to do some reconfiguring to map DOCX styles to DITA content.
Note: This method may also be helpful if you want to run it automatically with scripts, since it is based on the DITA OT and Dita For Publishers plugins.