liminfo

PDF to Word

Free web tool: PDF to Word

Drop a file here, or

Choose file

최대 50MB, PDF 파일

About PDF to Word

PDF to Word is a free, browser-based converter that extracts text content from PDF documents and outputs it as an editable Microsoft Word DOCX file. It uses pdfjs-dist to read the PDF page by page, and the docx JavaScript library (with Packer) to build a properly structured Word document, processing everything locally in your browser without any server upload.

The tool is useful for anyone who needs to copy or edit text from a PDF that cannot be selected directly, such as digitally produced reports, forms, or articles in PDF format. Typical users include students extracting notes from PDF textbooks, editors who need to revise a PDF-format article, and office workers converting received PDF reports into editable drafts.

Technically, the converter reads each page via pdfjs-dist getTextContent() and groups text items into lines based on Y coordinate proximity (a gap greater than 5 units triggers a new line). The resulting lines are filtered and wrapped in docx Paragraph objects with TextRun children. Between pages, a PageBreak Paragraph is inserted to preserve the original pagination. The complete array of paragraphs is assembled into a Document and converted to a Blob using Packer.toBlob(), then downloaded as a .docx file.

Key Features

  • Extracts all text content from every PDF page and writes it to a DOCX document
  • Lines are detected by comparing Y coordinates of adjacent text items for accurate line breaks
  • Page breaks from the original PDF are preserved as Word page break elements between pages
  • Real-time progress bar shows extraction percentage as each page is processed
  • Output DOCX file is named automatically matching the original PDF filename
  • 100% client-side processing via pdfjs-dist and docx library — files never leave your browser
  • Resulting DOCX is fully editable in Microsoft Word, Google Docs, and LibreOffice Writer
  • Supports PDF files up to 50 MB with no account or installation required

Frequently Asked Questions

How does PDF to Word conversion work in the browser?

The tool uses pdfjs-dist to extract text items from each PDF page, grouping them into lines based on vertical position (Y coordinate). These lines are wrapped in docx Paragraph objects. After all pages are processed with page breaks between them, the docx library assembles a valid DOCX binary that is downloaded to your computer.

Will the Word document look exactly like the original PDF?

No. The converter extracts plain text content and preserves line and page breaks, but complex formatting such as columns, tables, text boxes, images, headers, footers, and decorative fonts are not reproduced in the DOCX. The output is a text-focused Word document suitable for editing the written content.

Can the tool convert scanned PDFs to Word?

No. The tool relies on embedded text data in the PDF. Scanned PDFs or image-based PDFs do not contain selectable text, so the converter will produce an empty or nearly empty DOCX. For scanned documents, OCR software must be used to first extract the text.

Does the converted DOCX support Korean or other non-Latin languages?

Yes. The docx library handles Unicode text, so any language encoded in the PDF including Korean, Japanese, Chinese, Arabic, and other scripts will be written to the DOCX provided that pdfjs-dist can extract it from the PDF content stream.

Why does the output DOCX have extra spaces or broken words?

PDF text layout uses absolute positioning for each character or word fragment. When multiple fragments on the same visual line are extracted, they may appear as separate text runs, sometimes with spacing differences. The line-grouping algorithm mitigates this but cannot fully reconstruct the original word spacing.

Is my PDF sent to any server during conversion?

No. pdfjs-dist parses the PDF and docx builds the Word file entirely within your browser memory using JavaScript. No data is transmitted over the network. The DOCX is generated locally and downloaded directly to your device.

Can I open the converted file in Google Docs?

Yes. The output is a standard .docx file. You can upload it to Google Drive and open it with Google Docs. Alternatively, it works with Microsoft Word, LibreOffice Writer, Apple Pages, and any application that supports the DOCX format.

What happens to images, tables, or charts in the original PDF?

Images, charts, and graphical tables are not extracted. The tool only processes text content from the PDF content stream. Visual elements that are embedded as images or drawn graphics in the PDF will not appear in the DOCX output.