128 lines
4.4 KiB
Plaintext
128 lines
4.4 KiB
Plaintext
! Generated automatically by mantohlp
|
|
1 pdftotext
|
|
|
|
pdftotext - Portable Document Format (PDF) to text converter
|
|
|
|
pdftotext [options] [PDF-file [text-file]]
|
|
|
|
Pdftotext converts Portable Document Format (PDF) files to plain text.
|
|
|
|
Pdftotext reads the PDF file, PDF-file, and writes a text file, text-
|
|
file. If text-file is not specified, pdftotext converts file.pdf to
|
|
file.txt. If text-file is '-', the text is sent to stdout.
|
|
|
|
()
|
|
|
|
2 ONFIGURATION_FIL
|
|
|
|
Pdftotext reads a configuration file at startup. It first tries to
|
|
find the user's private config file, ~/.xpdfrc. If that doesn't exist,
|
|
it looks for a system-wide config file, typically /usr/local/etc/xpdfrc
|
|
(but this location can be changed when pdftotext is built). See the
|
|
xpdfrc(5) man page for details.
|
|
|
|
()
|
|
|
|
2 OPTIONS
|
|
|
|
Many of the following options can be set with configuration file com-
|
|
mands. These are listed in square brackets with the description of the
|
|
corresponding command line option.
|
|
|
|
-f number
|
|
Specifies the first page to convert.
|
|
|
|
-l number
|
|
Specifies the last page to convert.
|
|
|
|
-layout
|
|
Maintain (as best as possible) the original physical layout of
|
|
the text. The default is to 'undo' physical layout (columns,
|
|
hyphenation, etc.) and output the text in reading order.
|
|
|
|
-fixed number
|
|
Assume fixed-pitch (or tabular) text, with the specified charac-
|
|
ter width (in points). This forces physical layout mode.
|
|
|
|
-raw Keep the text in content stream order. This is a hack which
|
|
often "undoes" column formatting, etc. Use of raw mode is no
|
|
longer recommended.
|
|
|
|
-htmlmeta
|
|
Generate a simple HTML file, including the meta information.
|
|
This simply wraps the text in <pre> and </pre> and prepends the
|
|
meta headers.
|
|
|
|
-enc encoding-name
|
|
Sets the encoding to use for text output. The encoding-name
|
|
must be defined with the unicodeMap command (see xpdfrc(5)).
|
|
The encoding name is case-sensitive. This defaults to "Latin1"
|
|
(which is a built-in encoding). [config file: textEncoding]
|
|
|
|
-eol unix | dos | mac
|
|
Sets the end-of-line convention to use for text output. [config
|
|
file: textEOL]
|
|
|
|
-nopgbrk
|
|
Don't insert page breaks (form feed characters) between pages.
|
|
[config file: textPageBreaks]
|
|
|
|
-opw password
|
|
Specify the owner password for the PDF file. Providing this
|
|
will bypass all security restrictions.
|
|
|
|
-upw password
|
|
Specify the user password for the PDF file.
|
|
|
|
-q Don't print any messages or errors. [config file: errQuiet]
|
|
|
|
-cfg config-file
|
|
Read config-file in place of ~/.xpdfrc or the system-wide config
|
|
file.
|
|
|
|
-v Print copyright and version information.
|
|
|
|
-h Print usage information. (-help and --help are equivalent.)
|
|
|
|
()
|
|
|
|
2 BUGS
|
|
|
|
Some PDF files contain fonts whose encodings have been mangled beyond
|
|
recognition. There is no way (short of OCR) to extract text from these
|
|
files.
|
|
|
|
()
|
|
|
|
2 XIT_CODE
|
|
|
|
The Xpdf tools use the following exit codes:
|
|
|
|
0 No error.
|
|
|
|
1 Error opening a PDF file.
|
|
|
|
2 Error opening an output file.
|
|
|
|
3 Error related to PDF permissions.
|
|
|
|
99 Other error.
|
|
|
|
()
|
|
|
|
2 AUTHOR
|
|
|
|
The pdftotext software and documentation are copyright 1996-2011 Glyph
|
|
& Cog, LLC.
|
|
|
|
()
|
|
|
|
2 SEE_ALSO
|
|
|
|
xpdf(1), pdftops(1), pdfinfo(1), pdffonts(1), pdfdetach(1),
|
|
pdftoppm(1), pdfimages(1), xpdfrc(5)
|
|
http://www.foolabs.com/xpdf/
|
|
|
|
()
|
|
|