PDF has often been used for publishing valuable data. Sometimes it is a piece of cake to extract the human-readable data and convert it into csv, excel or put it into a database. Sometimes PDF is used to publish data in a hard-to-process format, i.e. using embedded images. The open-source ecosystem provides fantastic tools to get data out of PDF files.
(more…)How to extract data from PDF files
