liminfo

PDF to Excel

Free web tool: PDF to Excel

Drop a file here, or

Choose file

최대 50MB, PDF 파일

About PDF to Excel

PDF to Excel is a free, browser-based converter that extracts structured text data from PDF documents and exports it as an Excel XLSX workbook. It uses pdfjs-dist to read each PDF page and the SheetJS (xlsx) library to build the spreadsheet, running entirely inside your browser so your files are never uploaded to any server.

The tool is particularly valuable for financial analysts, accountants, researchers, and data professionals who receive reports, statements, or datasets in PDF format and need to manipulate the numbers in a spreadsheet. Instead of manually retyping tabular data, you can convert the PDF and get an editable workbook within seconds.

Technically, the converter processes each PDF page by extracting all text items via pdfjs-dist getTextContent(). Each text item carries an X/Y coordinate from its transform matrix. The tool groups text items by their rounded Y coordinate (row grouping), sorts each row by X coordinate (column ordering), and passes the resulting 2D array to SheetJS aoa_to_sheet(). Each page becomes a separate named sheet (Page 1, Page 2, etc.) in the final XLSX workbook, preserving the spatial reading order of the original document.

Key Features

  • Extracts text from every PDF page and maps it to rows and columns based on spatial position
  • Each PDF page is exported as a separate named worksheet (Page 1, Page 2, etc.) in the XLSX file
  • Text items are sorted by Y coordinate (rows) and X coordinate (columns) to preserve reading order
  • Real-time progress bar shows conversion percentage per page for multi-page documents
  • Output file is named automatically matching the source PDF filename with .xlsx extension
  • 100% client-side processing via pdfjs-dist and SheetJS — files never leave your browser
  • Supports PDF files up to 50 MB with no account registration or software installation
  • Compatible with Microsoft Excel, Google Sheets, LibreOffice Calc, and all XLSX readers

Frequently Asked Questions

How does the tool convert PDF tables to Excel?

The tool uses pdfjs-dist to read each PDF page and extract every text item along with its X and Y coordinates. It groups items by Y position to form rows, sorts each row by X position to form columns, and then uses SheetJS to write the resulting 2D array as an Excel worksheet.

Will complex PDF tables with merged cells convert correctly?

The converter infers table structure purely from text positions, so simple tabular layouts convert cleanly. Complex tables with merged cells, nested headers, or irregular column widths may not map perfectly to a spreadsheet grid, since PDF format does not store explicit table structure metadata.

Can it convert scanned PDFs or image-based PDFs to Excel?

No. The tool reads embedded text from PDF content streams using pdfjs-dist. Scanned PDFs or image-only PDFs contain no extractable text data — in those cases the output will have empty rows. You would need OCR software to extract text from scanned documents first.

Why does the output sometimes have extra empty columns or misaligned data?

PDF layout uses absolute positioning for text without a formal table grid. Slight differences in X or Y coordinates between items on the same visual row can cause them to split across different rows or columns in the spreadsheet. This is an inherent limitation of text-coordinate-based extraction.

How are multi-page PDFs handled in the Excel output?

Each page of the PDF is converted independently and placed in its own worksheet tab in the XLSX workbook. A 5-page PDF produces 5 sheets named Page 1 through Page 5, keeping the data from each page organized separately.

Is the converted Excel file compatible with Google Sheets?

Yes. The output is a standard .xlsx file that can be opened directly in Google Sheets by uploading it to Google Drive. It is also compatible with Microsoft Excel, LibreOffice Calc, Numbers on Mac, and any other application that reads the XLSX format.

Does the tool send my PDF data to a server?

No. Both pdfjs-dist and SheetJS run entirely in the browser as JavaScript libraries. No file data is transmitted over the network at any point. All processing occurs in your browser memory and the resulting XLSX is created and downloaded locally.

What happens to formatting like bold text, colors, or borders in the original PDF?

The current tool extracts plain text content only. Visual formatting such as font weight, text color, cell borders, and background colors from the PDF are not transferred to the Excel output. The XLSX will contain the extracted text values in a default unformatted spreadsheet.