TREX: Hidden text in documents
[documents] [documents to go] [hidden] [HTMLExport] [my documents] [my documents my qq] [Stellent] [text]
Symptom
You use TREX 5.0 or higher and index a document with hidden text. The system does not index this hidden text and you do not find the specified document during a search for key words in the hidden text.
Hidden text can be defined in different text formats, for example Microsoft Word or Adobe PDF. Usually this property is used in order to store comments in the document. However, there is also OCR (Optical Character Recognizing) software, that stores the identified text hidden and displays the original graphic.
Other terms
Stellent, HTMLExport, hidden text
Reason and Prerequisites
TREX uses software components which convert several document formats into HTML in order to access the textual contents of the processed document. During this conversion, hidden text is stored as an HTML comment. The system usually ignores this HTML comment during the following processing of the document.
Solution
In delivery track TREX 5.0 SP4 you must use at least Patch3 Hotfix 2. You can then use environment variable SAP_RETRIEVAL_INCLUDE_HIDDEN to influence the behavior during the conversion of documents into HTML. When you define this variable and allocate value TRUE, the system includes the comments, which are generated during the conversion, in the document text and thus they are indexed. If the variable has another value or if is not defined, hidden text is not indexed.
In delivery track TREX 5.0 SP5 you must use at least Support Package 1. In delivery track TREX 6.0 or higher the setting option is always possible. In these versions the behavior is influenced by settings in configuration file TREXFilter.ini. There, in the Section [filter], set parameter “includehidden” to value true. (If necessary, therefore insert line “includehidden = true”.) This setting allows to include and index comments generated during the conversion in the document text. If the variable has another value or if is not defined, hidden text is not indexed.
Note that these settings apply only to higher document formats as Microsoft Word, Microsoft Powerpoint or Adobe PDF. The processing of pure ASCII-text, HTML documents or XML documents is not affected by these settings. In particular, comments in HTML documents are typically ignored.
The following is a short instruction how to set the value of environment variables in Windows 2000. If you have questions on this topic, read the documentation of the operating system. If you have questions regarding this topic, refer to the operating system manual.
1. You must have administrator authorizations, because you must change the system settings.2. Change into the system control via Start -> Settings -> System control -> System.3. Select tab title ‘Advanced’.4. Choose the “Environment variables…”5. Choose “New …” in frame “System variables”.6. Enter “SAP_RETRIEVAL_INCLUDE_HIDDEN” as variable name.7. Enter the desired value (TRUE, if you want to index hidden text).8. Restart the PC.
Depending on your set-up, there can be several options to set this value.