# DOCX Import Architecture

## Goal

DOCX import should become:

```text
DOCX zip package
  -> OOXML parts
  -> WordprocessingML AST
  -> UnifiedWriter DocumentModel
  -> LayoutEngine
  -> Renderer
```

v5.24.0 adds the first OOXML package parser in `scripts/addons/docx_io.js`.

## Why not DOCX -> HTML only?

HTML converters are useful fallbacks, but they are intentionally lossy. They do not preserve all Word concepts needed for high-fidelity editing, layout, and round-trip compatibility.

UnifiedWriter therefore uses direct OOXML parsing as the primary path and Mammoth only as a fallback.

## v5.24.0 supported import foundation

- `word/document.xml`
- `word/styles.xml` heading style mapping
- `word/numbering.xml` basic bullet/numbered list mapping
- `word/_rels/document.xml.rels`
- `word/media/*` inline images where referenced by DrawingML blips
- basic runs: text, tab, line break, page break
- run marks: bold, italic, underline, font size, color metadata
- basic tables converted to text with unsupported report hints

## Roadmap

| Level | Scope |
|---|---|
| Level 1 | paragraphs, runs, headings, page breaks |
| Level 2 | numbering, styles, images, relationships |
| Level 3 | tables, sections, page setup, header/footer |
| Level 4 | links, comments, footnotes, bookmarks |
| Level 5 | floating images, shapes, text boxes, equations, charts |
| Level 6 | unknown part preservation and DOCX round-trip |

## Comment policy

Complex importer code must explain intent, loss points, and future extension routes. Comments should document why a mapping is safe or intentionally temporary.

## dev-5.27.0 Level 1-2 practical import update

The OOXML-first importer now maps additional WordprocessingML metadata into the UnifiedWriter model-facing HTML bridge:

- Paragraph alignment from `w:jc`.
- Paragraph line spacing from `w:spacing/@w:line`.
- Paragraph left indentation from `w:ind/@w:left`.
- Heading detection from both styles and `w:outlineLvl`.
- Hyperlinks from `w:hyperlink` with relationship or anchor targets.
- Import report metadata now identifies the path as `ooxml-level-2-model-import`.

The table path is still intentionally conservative. Table content may be represented as text until the native table model and table-cell SelectionController work are implemented.
