Tesseract.tess_pipeline_pdfFunction
tess_pipeline_pdf(
    pipe::TessPipeline,
    filename::AbstractString;
    textOnly::Bool = false,
    dataDir::AbstractString = TESS_DATA
)::Bool

Generate a PDF file from the pipeline and save it to the specified file. Returns false if there is a problem adding the PDF generator to the output.

Arguments:

TNameDefaultDescription
RpipeThe pipline to collect the tsv from.
RfilenameThe file to write the tsv to.
KtextOnlyShould the PDF just include text or also contain the images?
KdataDirThe directory to look for the PDF font file in.

Details:

If the file exists it will be overwritten. A text only PDF will appear empty since Tesseract uses a glyphless font, however you will be able to search for the text and see and "empty" page where it's found. Normally textOnly is false and will include the image scanned by Tesseract which gives you a searchable PDF with the images.

Examples:

using Tesseract

# Generate some pages to load.
write("page01.tiff", sample_tiff())
write("page02.tiff", sample_tiff())
write("page03.tiff", sample_tiff())

download_languages() # Make sure we have the data files.
download_pdf_font() # Make sure we have the PDF font file.

instance = TessInst()
pipeline = TessPipeline(instance)

tess_pipeline_pdf(pipeline, "My Book.pdf")

tess_run_pipeline(pipeline, "My First Book") do add
    add(pix_read("page01.tiff"), 72)
    add(pix_read("page02.tiff"), 72)
    add(pix_read("page03.tiff"), 72)
end

println(string("PDF created: ", filesize("My Book.pdf"), " bytes."))

# output

PDF created: 316722 bytes.

See also: tess_run_pipeline, tess_pipeline_alto, tess_pipeline_hocr, tess_pipeline_text tess_pipeline_tsv

source
tess_pipeline_pdf(
    pipe::TessPipeline;
    textOnly::Bool = false,
    dataDir::AbstractString = TESS_DATA
)::Union{TessOutput, Nothing}

Generate a PDF file from the pipeline and save it to a byte array. Returns nothing if there is a problem adding the PDF generator to the output.

Arguments:

TNameDefaultDescription
RpipeThe pipline to collect the TSV data from.
KtextOnlyShould the PDF just include text or also contain the images?
KdataDirThe directory to look for the PDF font file in.

Examples:

using Tesseract

# Generate some pages to load.
write("page01.tiff", sample_tiff())
write("page02.tiff", sample_tiff())
write("page03.tiff", sample_tiff())

download_languages() # Make sure we have the data files.
download_pdf_font() # Make sure we have the PDF font file.

instance = TessInst()
pipeline = TessPipeline(instance)

book = tess_pipeline_pdf(pipeline)

tess_run_pipeline(pipeline, "My First Book") do add
    add(pix_read("page01.tiff"), 72)
    add(pix_read("page02.tiff"), 72)
    add(pix_read("page03.tiff"), 72)
end

println(string("PDF created: ", length(book[]), " bytes."))

# output

PDF created: 316722 bytes.

See also: tess_run_pipeline, tess_pipeline_alto, tess_pipeline_hocr, tess_pipeline_text tess_pipeline_tsv

source