##Adobe File Version: 1.000 #======================================================================= # FTP file name: HEBREW.TXT # # Contents: Map (external version) from Mac OS Hebrew # character set to Unicode 2.1 # # Copyright: (c) 1995-1999 by Apple Computer, Inc., all rights # reserved. # # Contact: charsets@apple.com # # Changes: # # b02 1999-Sep-22 Update contact e-mail address. Matches # internal utom<b1>, ufrm<b1>, and Text # Encoding Converter version 1.5. # n03 1998-Feb-05 Show required Unicode character # directionality in a different way. Update # mappings for 0xC0 and 0xDE to use # transcoding hints; matches internal utom<n6>, # ufrm<n20>, and Text Encoding Converter # version 1.3. Rewrite header comments. # n01 1995-Nov-15 First version. Matches internal ufrm<n8>. # # Standard header: # ---------------- # # Apple, the Apple logo, and Macintosh are trademarks of Apple # Computer, Inc., registered in the United States and other countries. # Unicode is a trademark of Unicode Inc. For the sake of brevity, # throughout this document, "Macintosh" can be used to refer to # Macintosh computers and "Unicode" can be used to refer to the # Unicode standard. # # Apple makes no warranty or representation, either express or # implied, with respect to these tables, their quality, accuracy, or # fitness for a particular purpose. In no event will Apple be liable # for direct, indirect, special, incidental, or consequential damages # resulting from any defect or inaccuracy in this document or the # accompanying tables. # # These mapping tables and character lists are subject to change. # The latest tables should be available from the following: # # <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/> # <ftp://dev.apple.com/devworld/Technical_Documentation/Misc._Standards/> # # For general information about Mac OS encodings and these mapping # tables, see the file "README.TXT". # # Format: # ------- # # Three tab-separated columns; # '#' begins a comment which continues to the end of the line. # Column #1 is the Mac OS Hebrew code (in hex as 0xNN). # Column #2 is the corresponding Unicode or Unicode sequence (in # hex as 0xNNNN, 0xNNNN+0xNNNN, etc.). Sequences of up to 5 # Unicode characters are used here. A single Unicode character # may be preceded by a tag indicating required directionality # (i.e. <LR>+0xNNNN or <RL>+0xNNNN). # Column #3 is a comment containing the Unicode name. # # The entries are in Mac OS Hebrew code order. # # Some of these mappings require the use of corporate characters. # See the file "CORPCHAR.TXT" and notes below. # # Control character mappings are not shown in this table, following # the conventions of the standard UTC mapping tables. However, the # Mac OS Roman character set uses the standard control characters at # 0x00-0x1F and 0x7F. # # Notes on Mac OS Hebrew: # ----------------------- # # 1. General # # The Mac OS Hebrew character set supports the Hebrew and Yiddish # languages. It incorporates the Hebrew letter repertoire of # ISO 8859-8, and uses the same code points for them, 0xE0-0xFA. # It also incorporates the ASCII character set. In addition, the # Mac OS Hebrew character set includes the following: # # - Hebrew points (nikud marks) at 0xC6, 0xCB-0xCF and 0xD8-0xDF. # These are non-spacing combining marks. Note that the RAFE point # at 0xD8 is not displayed correctly in some fonts, and cannot be # typed using the keyboard layouts in the current Hebrew localized # systems. Also note: The character given in Unicode as QAMATS # (U+05B8) actually refers to two different sounds, depending on # context. For example, when ALEF is followed by QAMATS, the QAMATS # can actually refer to two different sounds depending on the # following letters. The Mac OS Hebrew character set separately # encodes these two sounds for the same graphic shape, as "qamats" # (0xCB) and "qamats qatan" (0xDE). The "qamats" character is more # common, so it is mapped to the Unicode QAMATS; "qamats qatan" can # only be used with a limited number of characters, and it is # mapped using a corporate-zone variant tag (see below). # # - Various Hebrew ligatures at 0x81, 0xC0, 0xC7, 0xC8, 0xD6, and # 0xD7. Also note that the Yiddish YOD YOD PATAH ligature at 0x81 # is missing in some fonts. # # - The NEW SHEQEL SIGN at 0xA6. # # - Latin characters with diacritics at 0x80 and 0x82-0x9F. However, # most of these cannot be typed using the keyboard layouts in the # Hebrew localized systems. # # - Right-left versions of certain ASCII punctuation, symbols and # digits: 0xA0-0xA5, 0xA7-0xBF, 0xFB-0xFF. See below. # # - Miscellaneous additional punctuation at 0xC1, 0xC9, 0xCA, and # 0xD0-0xD5. There is a variant of the Hebrew encoding in which # the LEFT SINGLE QUOTATION MARK at 0xD4 is replaced by FIGURE # SPACE. The glyphs for some of the other punctuation characters # are missing in some fonts. # # - Four obsolete characters at 0xC2-0xC5 known as canorals (not to # be confused with cantillation marks!). These were used for # manual positioning of nikud marks before System 7.1 (at which # point nikud positioning became automatic with WorldScript.). # # 2. Directional characters and roundtrip fidelity # # The Mac OS Hebrew character set was developed around 1987. At that # time the bidirectional line line layout algorithm used in the Mac OS # Hebrew system was fairly simple; it used only a few direction # classes (instead of the 13 or so now used in the Unicode # bidirectional algorithm). In order to permit users to handle some # tricky layout problems, certain punctuation, symbol, and digit # characters have duplicate code points, one with a left-right # direction attribute and the other with a right-left direction # attribute. # # For example, plus sign is encoded at 0x2B with a left-right # attribute, and at 0xAB with a right-left attribute. However, there # is only one PLUS SIGN character in Unicode. This leads to some # interesting problems when mapping between Mac OS Hebrew and Unicode; # see below. # # A related problem is that even when a particular character is # encoded only once in Mac OS Hebrew, it may have a different # direction attribute than the corresponding Unicode character. # # For example, the Mac OS Hebrew character at 0xC9 is HORIZONTAL # ELLIPSIS with strong right-left direction. However, the Unicode # character HORIZONTAL ELLIPSIS has direction class neutral. # # 3. Font variants # # The table in this file gives the Unicode mappings for the standard # Mac OS Hebrew encoding. This encoding is supported by many of the # Apple fonts (including all of the fonts in the Hebrew Language Kit), # and is the encoding supported by the text processing utilities. # However, some TrueType fonts provided with the localized Hebrew # system implement a slightly different encoding; the difference is # only in one code point, 0xD4. For the standard variant, this is: # 0xD4 -> <RL>+0x2018 LEFT SINGLE QUOTATION MARK, right-left # # The TrueType variant is used by the following TrueType fonts from # the localized system: Caesarea, Carmel Book, Gilboa, Ramat Sharon, # and Sinai Book. For these, 0xD4 is as follows: # 0xD4 -> <RL>+0x2007 FIGURE SPACE, right-left # # Unicode mapping issues and notes: # --------------------------------- # # 1. Matching the direction of Mac OS Hebrew characters # # When Mac OS Hebrew encodes a character twice but with different # direction attributes for the two code points - as in the case of # plus sign mentioned above - we need a way to map both Mac OS Hebrew # code points to Unicode and back again without loss of information. # With the plus sign, for example, mapping one of the Mac OS Hebrew # characters to a code in the Unicode corporate use zone is # undesirable, since both of the plus sign characters are likely to # be used in text that is interchanged. # # The problem is solved with the use of direction override characters # and direction-dependent mappings. When mapping from Mac OS Hebrew # to Unicode, we use direction overrides as necessary to force the # direction of the resulting Unicode characters. # # The required direction is indicated by a direction tag in the # mappings. A tag of <LR> means the corresponding Unicode character # must have a strong left-right context, and a tag of <RL> indicates # a right-left context. # # For example, the mapping of 0x2B is given as <LR>+0x002B; the # mapping of 0xAB is given as <RL>+0x002B. If we map an isolated # instance of 0x2B to Unicode, it should be mapped as follows (LRO # indicates LEFT-RIGHT OVERRIDE, PDF indicates POP DIRECTION # FORMATTING): # # 0x2B -> 0x202D (LRO) + 0x002B (PLUS SIGN) + 0x202C (PDF) # # When mapping several characters in a row that require direction # forcing, the overrides need only be used at the beginning and end. # For example: # # 0x24 0x20 0x28 0x29 -> 0x202D 0x0024 0x0020 0x0028 0x0029 0x202C # # When mapping from Unicode to Mac OS ...
wendy6