Extract text from PDF, from the command line

pdfs-512pdftotext is a command line tool for converting PDF files to plain text. Included by default with many Linux distributions.

$ pdftotext file.pdf

The gs (Ghostscript) program can also handle the process:
$ gs -sDEVICE=txtwrite -o extractedText.txt input.pdf

Advertisement