PDF file formatting

The PDF format, while very useful for its ability to be read on nearly any device, has some severe limitations.

Doxillion will attempt to convert the text content, but there are cases where the content cannot be converted accurately due to limitations that can vary from document to document.

A Few Examples

Many PDF writers do not actually keep spaces, tabs, line breaks, and columns. Instead they store words, or even letters, individually, along with the location on the page where the word or letter is supposed to go.

Most document formats store a table as a set of table cells, each containing text. PDF stores tables instead as text in front of a picture of the lines making the table.

Some PDFs store text, not as actual text, but as pictures of text. (Doxillion is not an Optical Character Recognition product, and so must leave these as images.)