mirror of
				https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
				synced 2025-11-04 16:52:06 +10:00 
			
		
		
		
	Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
		
			
				
	
	
		
			176 lines
		
	
	
		
			6.5 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			176 lines
		
	
	
		
			6.5 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
		 Last update: 2005-01-17, version 1.4
 | 
						|
 | 
						|
This file is maintained by H. Peter Anvin <unicode@lanana.org> as part
 | 
						|
of the Linux Assigned Names And Numbers Authority (LANANA) project.
 | 
						|
The current version can be found at:
 | 
						|
 | 
						|
	    http://www.lanana.org/docs/unicode/unicode.txt
 | 
						|
 | 
						|
		       ------------------------
 | 
						|
 | 
						|
The Linux kernel code has been rewritten to use Unicode to map
 | 
						|
characters to fonts.  By downloading a single Unicode-to-font table,
 | 
						|
both the eight-bit character sets and UTF-8 mode are changed to use
 | 
						|
the font as indicated.
 | 
						|
 | 
						|
This changes the semantics of the eight-bit character tables subtly.
 | 
						|
The four character tables are now:
 | 
						|
 | 
						|
Map symbol	Map name			Escape code (G0)
 | 
						|
 | 
						|
LAT1_MAP	Latin-1 (ISO 8859-1)		ESC ( B
 | 
						|
GRAF_MAP	DEC VT100 pseudographics	ESC ( 0
 | 
						|
IBMPC_MAP	IBM code page 437		ESC ( U
 | 
						|
USER_MAP	User defined			ESC ( K
 | 
						|
 | 
						|
In particular, ESC ( U is no longer "straight to font", since the font
 | 
						|
might be completely different than the IBM character set.  This
 | 
						|
permits for example the use of block graphics even with a Latin-1 font
 | 
						|
loaded.
 | 
						|
 | 
						|
Note that although these codes are similar to ISO 2022, neither the
 | 
						|
codes nor their uses match ISO 2022; Linux has two 8-bit codes (G0 and
 | 
						|
G1), whereas ISO 2022 has four 7-bit codes (G0-G3).
 | 
						|
 | 
						|
In accordance with the Unicode standard/ISO 10646 the range U+F000 to
 | 
						|
U+F8FF has been reserved for OS-wide allocation (the Unicode Standard
 | 
						|
refers to this as a "Corporate Zone", since this is inaccurate for
 | 
						|
Linux we call it the "Linux Zone").  U+F000 was picked as the starting
 | 
						|
point since it lets the direct-mapping area start on a large power of
 | 
						|
two (in case 1024- or 2048-character fonts ever become necessary).
 | 
						|
This leaves U+E000 to U+EFFF as End User Zone.
 | 
						|
 | 
						|
[v1.2]: The Unicodes range from U+F000 and up to U+F7FF have been
 | 
						|
hard-coded to map directly to the loaded font, bypassing the
 | 
						|
translation table.  The user-defined map now defaults to U+F000 to
 | 
						|
U+F0FF, emulating the previous behaviour.  In practice, this range
 | 
						|
might be shorter; for example, vgacon can only handle 256-character
 | 
						|
(U+F000..U+F0FF) or 512-character (U+F000..U+F1FF) fonts.
 | 
						|
 | 
						|
 | 
						|
Actual characters assigned in the Linux Zone
 | 
						|
--------------------------------------------
 | 
						|
 | 
						|
In addition, the following characters not present in Unicode 1.1.4
 | 
						|
have been defined; these are used by the DEC VT graphics map.  [v1.2]
 | 
						|
THIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW.
 | 
						|
 | 
						|
U+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1
 | 
						|
U+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3
 | 
						|
U+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7
 | 
						|
U+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9
 | 
						|
 | 
						|
The DEC VT220 uses a 6x10 character matrix, and these characters form
 | 
						|
a smooth progression in the DEC VT graphics character set.  I have
 | 
						|
omitted the scan 5 line, since it is also used as a block-graphics
 | 
						|
character, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL.
 | 
						|
 | 
						|
[v1.3]: These characters have been officially added to Unicode 3.2.0;
 | 
						|
they are added at U+23BA, U+23BB, U+23BC, U+23BD.  Linux now uses the
 | 
						|
new values.
 | 
						|
 | 
						|
[v1.2]: The following characters have been added to represent common
 | 
						|
keyboard symbols that are unlikely to ever be added to Unicode proper
 | 
						|
since they are horribly vendor-specific.  This, of course, is an
 | 
						|
excellent example of horrible design.
 | 
						|
 | 
						|
U+F810 KEYBOARD SYMBOL FLYING FLAG
 | 
						|
U+F811 KEYBOARD SYMBOL PULLDOWN MENU
 | 
						|
U+F812 KEYBOARD SYMBOL OPEN APPLE
 | 
						|
U+F813 KEYBOARD SYMBOL SOLID APPLE
 | 
						|
 | 
						|
Klingon language support
 | 
						|
------------------------
 | 
						|
 | 
						|
In 1996, Linux was the first operating system in the world to add
 | 
						|
support for the artificial language Klingon, created by Marc Okrand
 | 
						|
for the "Star Trek" television series.	This encoding was later
 | 
						|
adopted by the ConScript Unicode Registry and proposed (but ultimately
 | 
						|
rejected) for inclusion in Unicode Plane 1.  Thus, it remains as a
 | 
						|
Linux/CSUR private assignment in the Linux Zone.
 | 
						|
 | 
						|
This encoding has been endorsed by the Klingon Language Institute.
 | 
						|
For more information, contact them at:
 | 
						|
 | 
						|
	http://www.kli.org/
 | 
						|
 | 
						|
Since the characters in the beginning of the Linux CZ have been more
 | 
						|
of the dingbats/symbols/forms type and this is a language, I have
 | 
						|
located it at the end, on a 16-cell boundary in keeping with standard
 | 
						|
Unicode practice.
 | 
						|
 | 
						|
NOTE: This range is now officially managed by the ConScript Unicode
 | 
						|
Registry.  The normative reference is at:
 | 
						|
 | 
						|
	http://www.evertype.com/standards/csur/klingon.html
 | 
						|
 | 
						|
Klingon has an alphabet of 26 characters, a positional numeric writing
 | 
						|
system with 10 digits, and is written left-to-right, top-to-bottom.
 | 
						|
 | 
						|
Several glyph forms for the Klingon alphabet have been proposed.
 | 
						|
However, since the set of symbols appear to be consistent throughout,
 | 
						|
with only the actual shapes being different, in keeping with standard
 | 
						|
Unicode practice these differences are considered font variants.
 | 
						|
 | 
						|
U+F8D0	KLINGON LETTER A
 | 
						|
U+F8D1	KLINGON LETTER B
 | 
						|
U+F8D2	KLINGON LETTER CH
 | 
						|
U+F8D3	KLINGON LETTER D
 | 
						|
U+F8D4	KLINGON LETTER E
 | 
						|
U+F8D5	KLINGON LETTER GH
 | 
						|
U+F8D6	KLINGON LETTER H
 | 
						|
U+F8D7	KLINGON LETTER I
 | 
						|
U+F8D8	KLINGON LETTER J
 | 
						|
U+F8D9	KLINGON LETTER L
 | 
						|
U+F8DA	KLINGON LETTER M
 | 
						|
U+F8DB	KLINGON LETTER N
 | 
						|
U+F8DC	KLINGON LETTER NG
 | 
						|
U+F8DD	KLINGON LETTER O
 | 
						|
U+F8DE	KLINGON LETTER P
 | 
						|
U+F8DF	KLINGON LETTER Q
 | 
						|
	- Written <q> in standard Okrand Latin transliteration
 | 
						|
U+F8E0	KLINGON LETTER QH
 | 
						|
	- Written <Q> in standard Okrand Latin transliteration
 | 
						|
U+F8E1	KLINGON LETTER R
 | 
						|
U+F8E2	KLINGON LETTER S
 | 
						|
U+F8E3	KLINGON LETTER T
 | 
						|
U+F8E4	KLINGON LETTER TLH
 | 
						|
U+F8E5	KLINGON LETTER U
 | 
						|
U+F8E6	KLINGON LETTER V
 | 
						|
U+F8E7	KLINGON LETTER W
 | 
						|
U+F8E8	KLINGON LETTER Y
 | 
						|
U+F8E9	KLINGON LETTER GLOTTAL STOP
 | 
						|
 | 
						|
U+F8F0	KLINGON DIGIT ZERO
 | 
						|
U+F8F1	KLINGON DIGIT ONE
 | 
						|
U+F8F2	KLINGON DIGIT TWO
 | 
						|
U+F8F3	KLINGON DIGIT THREE
 | 
						|
U+F8F4	KLINGON DIGIT FOUR
 | 
						|
U+F8F5	KLINGON DIGIT FIVE
 | 
						|
U+F8F6	KLINGON DIGIT SIX
 | 
						|
U+F8F7	KLINGON DIGIT SEVEN
 | 
						|
U+F8F8	KLINGON DIGIT EIGHT
 | 
						|
U+F8F9	KLINGON DIGIT NINE
 | 
						|
 | 
						|
U+F8FD	KLINGON COMMA
 | 
						|
U+F8FE	KLINGON FULL STOP
 | 
						|
U+F8FF	KLINGON SYMBOL FOR EMPIRE
 | 
						|
 | 
						|
Other Fictional and Artificial Scripts
 | 
						|
--------------------------------------
 | 
						|
 | 
						|
Since the assignment of the Klingon Linux Unicode block, a registry of
 | 
						|
fictional and artificial scripts has been established by John Cowan
 | 
						|
<jcowan@reutershealth.com> and Michael Everson <everson@evertype.com>.
 | 
						|
The ConScript Unicode Registry is accessible at:
 | 
						|
 | 
						|
	  http://www.evertype.com/standards/csur/
 | 
						|
 | 
						|
The ranges used fall at the low end of the End User Zone and can hence
 | 
						|
not be normatively assigned, but it is recommended that people who
 | 
						|
wish to encode fictional scripts use these codes, in the interest of
 | 
						|
interoperability.  For Klingon, CSUR has adopted the Linux encoding.
 | 
						|
The CSUR people are driving adding Tengwar and Cirth into Unicode
 | 
						|
Plane 1; the addition of Klingon to Unicode Plane 1 has been rejected
 | 
						|
and so the above encoding remains official.
 |