Tesseract.tess_pipeline_unlvFunction
tess_pipeline_unlv(
    pipe::TessPipeline,
    filename::AbstractString,
    utf8::Bool = true
)::Bool

Generate a UNLV test file from the pipeline and save it to the specified file. Returns false if there is a problem adding the UNLV generator to the output.

Arguments:

TNameDefaultDescription
RpipeThe pipline to collect the UNLV text from.
RfilenameThe file to write the UNLV text to.
Outf8trueShould the output be transcoded to UTF-8?

Details:

If the file exists it will be overwritten. The Tesseract library outputs a file in Latin-1 encoding (even through all other formats are UTF-8). We can use Julia to convert the file into UTF-8 to match the other encodings if desired.

Examples:

using Tesseract

# Generate some pages to load.
write("page01.tiff", sample_tiff())
write("page02.tiff", sample_tiff())
write("page03.tiff", sample_tiff())

download_languages() # Make sure we have the data files.

instance = TessInst()
pipeline = TessPipeline(instance)

tess_pipeline_unlv(pipeline, "My Book.unlv")

tess_run_pipeline(pipeline, "My First Book") do add
    add(pix_read("page01.tiff"), 72)
    add(pix_read("page02.tiff"), 72)
    add(pix_read("page03.tiff"), 72)
end

for line in readlines("My Book.unlv")[1:10]
    println(line)
end

# output

No one would have believed in the last years of the
the nineteenth century that this world was being watched
watched keenly and closely by intelligences greater than
than man's and yet as mortal as his own; that as men busied
busied themselves about their various concerns they were
were scrutinised and studied, perhaps almost as narrowly as
as a man with a microscope might scrutinise the transient
transient creatures that swarm and multiply in a drop of
of water. With infinite complacency men went to and fro over
over this globe about their little affairs, serene in their

See also: tess_run_pipeline, tess_pipeline_word_box, tess_pipeline_lstm_box, tess_pipeline_unlv_latin1

source
tess_pipeline_unlv(
    pipe::TessPipeline
)::Union{TessOutput{String}, Nothing}

Generate an UNLV file from the pipeline and save it to a string. Returns nothing if there is a problem adding the UNLV generator to the output.

Arguments:

TNameDefaultDescription
RpipeThe pipline to collect the UNLV text from.

Details:

The Tesseract generates Latin-1 text however this function will transcode it to UTF-8 to interact with Julia correctly.

Examples:

using Tesseract

# Generate some pages to load.
write("page01.tiff", sample_tiff())
write("page02.tiff", sample_tiff())
write("page03.tiff", sample_tiff())

download_languages() # Make sure we have the data files.

instance = TessInst()
pipeline = TessPipeline(instance)

book = tess_pipeline_unlv(pipeline)

tess_run_pipeline(pipeline, "My First Book") do add
    add(pix_read("page01.tiff"), 72)
    add(pix_read("page02.tiff"), 72)
    add(pix_read("page03.tiff"), 72)
end

for line in split(book[], "\n")[1:10]
    println(line)
end

# output

No one would have believed in the last years of the
the nineteenth century that this world was being watched
watched keenly and closely by intelligences greater than
than man's and yet as mortal as his own; that as men busied
busied themselves about their various concerns they were
were scrutinised and studied, perhaps almost as narrowly as
as a man with a microscope might scrutinise the transient
transient creatures that swarm and multiply in a drop of
of water. With infinite complacency men went to and fro over
over this globe about their little affairs, serene in their

See also: tess_run_pipeline, tess_pipeline_word_box, tess_pipeline_lstm_box, tess_pipeline_unlv_latin1

source
tess_pipeline_unlv(
    func::Function,
    pipe::TessPipeline
)::Bool

Generate an UNLV file from the pipeline and pass it back to the client via a callback. Returns false if there is a problem adding the UNLV generator to the output.

Arguments:

TNameDefaultDescription
RfuncThe function to call with the lines of text.
RpipeThe pipline to collect the text from.

Details:

The text will be passed to the caller one line at a time. The "\n" line terminator will be included with the text. Tesseract generates UNLV text in Latin-1, this method will transcode it to UTF-8 to work with Julia.

Examples:

using Tesseract

# Generate some pages to load.
write("page01.tiff", sample_tiff())
write("page02.tiff", sample_tiff())
write("page03.tiff", sample_tiff())

download_languages() # Make sure we have the data files.

instance = TessInst()
pipeline = TessPipeline(instance)

count = 0
tess_pipeline_unlv(pipeline) do line
    global count
    if count < 10
        print(line)
    end
    count += 1
end

tess_run_pipeline(pipeline, "My First Book") do add
    add(pix_read("page01.tiff"), 72)
    add(pix_read("page02.tiff"), 72)
    add(pix_read("page03.tiff"), 72)
end

# output

No one would have believed in the last years of the
the nineteenth century that this world was being watched
watched keenly and closely by intelligences greater than
than man's and yet as mortal as his own; that as men busied
busied themselves about their various concerns they were
were scrutinised and studied, perhaps almost as narrowly as
as a man with a microscope might scrutinise the transient
transient creatures that swarm and multiply in a drop of
of water. With infinite complacency men went to and fro over
over this globe about their little affairs, serene in their
true

See also: tess_run_pipeline, tess_pipeline_word_box, tess_pipeline_lstm_box, tess_pipeline_unlv_latin1

source