Tesseract.TsvType
struct Tsv
    level::Int
    page::Int
    block::Int
    paragraph::Int
    line::Int
    word::Int
    left::Int
    top::Int
    width::Int
    height::Int
    conf::Float32
    text::String
end

This structure holds the details of a line in the TSV formatted text provided by the tesseract library.

Values:

NameDescription
levelIdentifies what the line describes.
pageThis is the page number passed into the tess_tsv() method.
blockIdentifies the block on the page.
paragraphThe paragraph number in the block.
lineThe line in the paragraph.
wordThe word in the line.
leftLeft edge of the item in pixels.
topTop edge of the item in pixels.
widthWidth of the item in pixels.
heightHeight of the item in pixels.
confHow confident the OCR engine is of the word (0 - 100). -1 if level is not 5.
textThe word that was decoded from the image.

Details:

Level identifies what information the line is providing:

  • 1 - Page information, added at the start of the page.
  • 2 - Block information, added at the start of a block.
  • 3 - Paragraph information, added at the start of a paragraph.
  • 4 - Line information, added at the start of a line.
  • 5 - Word information, identifies a word that was read from the page.

The left, top, width, and height values define a box in pixels that encompases the item. So if the level is 1, the box describes the whole image. If the level is 1, then the box encloses the block that was extracted, and so on down to the word that was extracted.

See also: tess_parsed_tsv

source