Download required files

julia> using Tesseract

julia> download_languages("eng+spa")
true

julia> download_pdf_font()
true

Tesseract.jl does not contain the language files needed by Tesseract to perform OCR on images. These files must be downloaded separately, and are multiple megabytes in size, so only the languages you are interested in can be downloaded. Languages are specified using ISO 639-3 language codes with a plus sign(+) between them.

By default the files are downloaded from https://github.com/tesseract-ocr/tessdata_best. Unless told to overwrite the existing file download_languages only downloads the file is it doesn't already exist. download_pdf_font is only needed if you want to generate searchable PDF files. Again is only downloads the file if it has not already been downloaded. The PDF font file is normally downloaded from https://github.com/tesseract-ocr/tessconfigs.