Extract text from PDF, from the command line

pdfs-512pdftotext is a command line tool for converting PDF files to plain text. Included by default with many Linux distributions.

$ pdftotext file.pdf

The gs (Ghostscript) program can also handle the process:
$ gs -sDEVICE=txtwrite -o extractedText.txt input.pdf

Advertisement

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s