5.9.23.4.2.  Full-text search: Indexing PDF and other documents

Documents available in catalogs can be indexed and included into the full-text index.

For this, columns containing the PDF and other documents have to be stated in the key VARSEARCHINDEXDOCUMENT (either in the dir.prj of catalog or in the respective prj files).

VARSEARCHINDEXDOCUMENTVARIABLES=<List of columns to index>

In order for a document project to be indexed, the key VARSEARCHINDEXDOCUMENT has to be set on "YES".

VARSEARCHINDEXDOCUMENT=YES

In order for image content inside PDF documents to be read, the text recognition software "Tesseract" has to be installed and in the config file, the installation path has to be stated.

$CADENAS_SETUP/partsol.cfg

[INDEX:OCR]
TesseractPath=
TesseractDataPath=

Furthermore there are two optional settings:

DPI=600
ImageFormat=