PDF to Excel: Convert PDF Tables to Spreadsheets
A practical guide to converting PDF tables into Excel spreadsheets. Covers why direct conversion fails, what tools work, and how to get clean results without manual reformatting.
4 min readSomeone sends you a PDF. The numbers you need are sitting in a table on page 12. You just want them in a spreadsheet. Somehow this is still harder than it should be in 2025.
Why copy-paste and "save as" fail
PDFs don't store tables. They store text at specific x/y coordinates on a page, with no concept of rows or columns. When you copy a table from a PDF and paste it into Excel, you get a single column of text with the values jumbled together. It's useless.
The "export to Excel" button in Acrobat and similar editors isn't much better. It tries to reverse-engineer the grid from visual positions, and the result is usually a mess of merged cells and misaligned data. You end up spending more time cleaning the output than you would have spent retyping it.
What to look for in a conversion tool
The tools that actually work understand document structure, not just text coordinates. They can figure out where a table starts and ends, which values belong to which column, and what happens when a row wraps across a line break.
A few things matter more than the marketing copy on a tool's landing page:
- Column detection that handles variable widths and tight spacing without mashing adjacent columns together
- Multi-page tables stitched into a single output, not dumped into separate sheets
- Numbers that come through as numbers, not text strings that look like numbers. Same for dates.
- Repeated headers on continuation pages stripped out so you don't get "Date | Amount | Description" appearing every 30 rows
That last point trips people up. You get your data into Excel, the formulas don't work, and it takes ten minutes to realize half the "number" cells are actually text.
Where this comes up
Financial reports are the classic case. Quarterly earnings land as a PDF, and someone on the team needs the numbers in a model yesterday. Government agencies are another big one, publishing census tables and regulatory filings that researchers have to pry out of PDFs page by page. Accounting teams hit it with invoices. Academics hit it with data tables buried in published papers.
The pattern is always the same: someone chose PDF as a delivery format, and now you need to actually use the data.
A note on unPDF
With unPDF, you describe what you need in plain language ("extract the revenue table from pages 4 through 6") rather than converting the whole document blindly. You upload the PDF, write a short prompt, and get back structured data you can download as .xlsx or CSV. It works well for recurring reports where the layout is consistent.
Check your output
Even with good tools, spend two minutes reviewing the result. Look for cells that got merged when they shouldn't have, number columns that Excel is treating as text, and rows that went missing (especially near page breaks). This is a quick scroll-through, not an audit. Catching a problem here saves a lot of confusion later when a SUM formula returns zero and you can't figure out why.