Display of mixture of Hebrew/Arabic, Latin, and digits
[arabic] [BIDI] [BIDI marks] [hebrew] [LTR] [mixture] [mixture model] [RTL]
Related:
- Enable printing of Arabic-Indic digitsSymptom 1. You want to print numbers using Arabic-Indic digit...
Symptom
A text consisting of Hebrew or Arabic and Latin characters looks disordered. The characters appear in a order that does not reflect the logically intended structure of the text.
Other terms
RTL, LTR, BIDI, BIDI marks, BIDI formatting characters, Hebrew, Arabic, BIDI algorithm
Reason and Prerequisites
The Unicode BIDI Algorithm
The basic rules for display order of characters are:
Latin characters are written from left to right (LTR).Hebrew and Arabic characters are written from right to left (RTL).Digits are written from left to right, even in an RTL environment (within Hebrew or Arabic text).The behaviour of punctuation marks, separator symbols and mathematical symbols (e.g. space, ‘-’, ‘(’, ‘)’, ‘=’) depends on the surrounding characters. Digits influence this behaviour in a different way than normal characters.
When a mixture of these has to be displayed, the following happens:
An overall display order is defined (LTR or RTL). In SAP systems this is done dependend on the logon language (Hebrew, Arabic are RTL, other languages are LTR)The text is separated into blocks of characters fitting together and these blocks are arranged into display order, taking into account the overall display order. This is done by an algorithm standardized by the Unicode consortium (the “Unicode BIDI Algorithm”):
http://www.unicode.org/unicode/reports/tr9
This algorithm is confronted with the fundamental problem that many characters, especially digits and separators, are used with several meanings which are context dependent but cannot be derived automatically. The different meanings imply a different expected behaviour in bidirectional text, but the BIDI Algorithm has to decide for one unambiguous behaviour. Examples:
‘747-400′ could mean the mathematical term 747 minus 400 or it could mean the name of a certain plane type. In the first case, it should be displayed in RTL as ‘400-747′. In the latter case, both the digits and the minus sign actually are used like Latin characters, and are expected not be reordered. Actually, the BIDI Algorithm does not reorder this string in RTL context.Character ‘x’ is normally a Latin character. However, in ‘7×24′ it is used like a mathematical symbol (multiplication sign) (’7*24′). The visual display in RTL context is expected to be ‘24×7′. Actually, the BIDI Algorithm does not reorder this string in RTL context. What makes this example even more difficult to handle is the fact that Microsoft’s Richedit control which is used in SAP’s textedit control does not follow the BIDI Algorithm in this case but reorders the string.A material number that consists of digits and Latin characters is expected to never be reordered. In other words, it should behave as if all digits were LTR characters. A material number could even start with a digit, like ‘123-X4-567′. Actually, it is reordered in RTL context to ‘X4-567-123′.
Another problem is pure LTR text in an RTL enviroment, especially when parentheses are involved. Examples:
‘Hello world!’ is displayed in RTL as: ‘!Hello world”Hello (world)!’ is displayed in RTL as: ‘!(Hello (world’
Nevertheless, the Unicode BIDI Algorithm providesan unambiguous definitionhow to resolve these situations.SAP uses the Unicode BIDI Algorithm in all user interfaces and for printing(exception: Editor for Sapscript and Smartforms, see below). By using the Unicode BIDI algorithm SAP ensures that the display always follows a consistent set of rules.
In some cases this may lead to a display which looks wrong to the natural intuition of the user. The BIDI Algorithm was optimized for the display of normal flowing text. A non intuitive behaviour typically occurs when digits, Latin characters, and symbols are used in technical terms, abbreviations, short text fields, etc.
Microsoft Word as SAPscript/Smartforms editor
The text processing program Microsoft Word has chosen a different approach to deal with the Unicode BIDI Algorithm. It basically recognizes the input method which is used when a text is typed and derives the context direction (RTL or LTR) from it: Using the English keyboard results in an LTR context even in a Hebrew or Arabic document, whereas using a Hebrew or Arabic input method results in an RTL context. This additional directional information is stored invisibly together with the text.
SAP uses MS Word as editor for long texts (SAPscript and Smartforms). The additional invisible directional information is stored using special Unicode control characters (BIDI formatting characters).
Solution
Possible approaches to solve or avoid this kind of problems
Avoid mixture of Hebrew/Arabic and Latin letters together with digits and separators (e.g. use 7*24 instead of 7×24;)Avoid material numbers or similar short texts that are reordered by the Unicode BIDI algorithm. A typical example is a material number that starts with a digit like ‘122-X4-567′.. Use, for example, ‘M123-X4-567′ instead.Train users to view critical phrases only with the proper logon language (e.g. Hebrew for phrases which should be viewed in RTL context or English for phrases which should be viewed in LTR context).The Unicode BIDI algorithm can be influenced by inserting invisible control character (BIDI formatting characters). These can in some cases help to solve display problems. For details, see SAP notes 685023 and 779879.
Appendix: How to reproduce the behaviour of the BIDI algorithm
You can reproduce the behaviour of the BIDI algorithm with a given piece of text by simply displaying it in Microsoft’s Notepad editor. All SAP User Interfaces (except MS Word as Editor for long texts) behave like Notepad.
Make sure you have installed the support for complex scripts and the fonts for Hebrew or Arabic, respectively.Open Notepad and paste or type the text into it.Use ctrl+shift right to switch Notepad to RTL mode and ctrl+shift left to switch Notepad to LTR mode, or use the right mouse button -> “Right to left reading order”. You will see the exact (and correct) behavior of the Unicode BIDI algorithm in RTL and LTR context.Use the right mouse button ->”Show Unicode control character” to visualize any Unicode formatting characters that might be in the text.