third_party/icu/README.google - platform/external/chromium - Git at Google

 This directory contains the source code of ICU 4.2.1 for C/C++

 1. It was obtained with the following:

     $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-4-2-1 icu42

 2. The following directories were removed because they're not used by Chromium
    at the moment:
    as_is
    packaging
    source/extra
    source/sample
    source/layout
    source/layoutex

 3. Platform header files for Linux and Mac OS X:
    On Linux and Mac OS X, 'runConfigureICU Linux' and 'runConfigureICU MacOSX'
    are run to generate source/common/unicode/platform.h.
    Rename it to 'plinux.h' and 'pmac.h' on Linux and Mac and check them in.

    The Mac 'pmac.h' file needs to have patches/pmac.h.patch applied.

    Change source/common/unicode/umachine.h to refer to plinux.h and pmac.h
    on Linux and Mac, respetively.

 4. To avoid name collisions (two different versions of StringPiece
    are in Chrome's base and ICU), make the use of 'icu::' namespace
    qualifier required by setting U_USING_ICU_NAMESPACE to 0 in
    source/common/unicode/uversion.h

    In addition, the patches for ICU ticket 6935
    (http://icu-project.org/trac/ticket/6935) are applied.

    The combined patch is patches/namespace.patch.txt

 5. The word breaking for Chinese and Japanese were modified to use a word
    frequency list with the following patch and cjdict.txt.

    In addition, the word breaking rule for ASCII and full-width full stop(period)
    surrounded by letters has been modified to fit our need for segmenting
    a host name into its components  (e.g. treating 'www.google.com' not as
    a single word but as 5 words). It's what ICU 3.8 did before UTR 29
    changed the rule (WB #6, #7).  This also let us pass
    LayoutTests/css1/text_properties/text_transform.html without rebaselining.

    These patches alone will not work without build-related changes mentioned
    in #10 below.

    - patches/segmentation.patch.txt :
        Adds a dictionary (word-frequency)-based word breaking for CJK
        (Korean is supported in the code, but it does not do anything
         because we don't have a Korean word-list.)

    - source/data/brkitr/cjdict.txt :
        Chinese and Japanese word frequency list.
        See the file for license/copyright notice

    - source/data/brkitr/cc_edict.txt :
        the list of words derived from CC-Edict.)

    The following two files were removed (because Japanese breaking rules
    are now the same as that of other langauges).

    - source/data/brkitr/word_ja.txt
    - source/data/brkitr/ja.txt

    If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt
    to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test.

 6. A minor break iterator change

    - patches/brkitr.patch.txt

 7. Converter changes : converters.patch.txt
   - Include what we really need. See source/data/mappings/ucmlocal.txt
   - Alias and mapping changes : source/data/mappings/convrtrs.txt
   - Changes several tables and add six new tables, three of which
     are 'fake' tables for ISO-2022-CN(-Ext).
   - ucnv2022.c is modified to use 3 'fake' tables added above for
     ISO-2022-CN(-Ext).

 8. Locale changes
   - patches/locale1.patch.txt :
       Filipino locale, exemplar character set changes for CJK + 9 Indian
       locales with minor fixes for Danish, Hungarian, Turkish, Korean
       and Catalan.

   - patches/locale2.patch.txt :
       The minimum locale data Chrome needs for 35 languages Chrome is
       not localized to. Each locale data file has ExemplarCharacters,
       LocaleScript, layout, and the name of the language for a locale
       in its native language.

   - patches/locale3.patch.txt : Locale build configuration files

 9. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt

   - patches/unihan.patch.txt:
     unihan collation tables are never used in Chrome/Webkit, but it takes
     about 1MB in the uncompressed ICU data file in ICU 4.2.1.

 10. Build-related changes

   - patches/wpo.patch
   - patches/windows.patch
   - patches/data.build.patch :
       To remove some data files we don't use and cut down the data size.
   - patches/data.build.win.patch :
       Windows-only data build patch. Add a new target DATALIB to makedata.mak
   - add an empty file (stubdatabuilt.txt) to source/stubdata

 11. Pre-built data libraries are checked in.

     - source/data/in/icudt42l.dat : Built on Linux with all the patches
       above applied,
     - icudt42.dll : With icudt42l.dat in place, all the patches applied
       and header files moved (#11 below), generated in bin/ by building
       icudt_build project of build/icudt_build.sln on Windows.
       It's made in bin/ and moved to the top and checked in.
     - {mac,linux}/icudt42l_dat.s : Built on Mac and Linux with all the
       patches above applied and checked in.
       linux needs the '@' in the preamble changed to '%'. See
       http://codereview.chromium.org/215026.
       mac/icudt42l_dat.s needs one line added after it is generated.  A
       .private_extern directive needs to be added so that the top of the
       file looks like:

 .globl _icudt42_dat
         .private_extern _icudt42_dat
         .data

 12. The header files were moved as shown below:

    source/common/unicode ==> public/common/unicode
    source/i18n/unicode   ==> public/i18n/unicode

 13. The patch for a memory leak in i18n/timezone.cpp (Windows only):
     see http://bugs.icu-project.org/trac/ticket/7135

     - patches/tzmemory.patch

 14. The patch for a crash in common/putil.c (Linux only):
     see http://bugs.icu-project.org/trac/ticket/7177

     - patches/linuxtz.patch

 15. The patch for Linux locale detection

     - patches/locdet.patch
	This directory contains the source code of ICU 4.2.1 for C/C++

	1. It was obtained with the following:

	$ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-4-2-1 icu42

	2. The following directories were removed because they're not used by Chromium
	at the moment:
	as_is
	packaging
	source/extra
	source/sample
	source/layout
	source/layoutex

	3. Platform header files for Linux and Mac OS X:
	On Linux and Mac OS X, 'runConfigureICU Linux' and 'runConfigureICU MacOSX'
	are run to generate source/common/unicode/platform.h.
	Rename it to 'plinux.h' and 'pmac.h' on Linux and Mac and check them in.

	The Mac 'pmac.h' file needs to have patches/pmac.h.patch applied.

	Change source/common/unicode/umachine.h to refer to plinux.h and pmac.h
	on Linux and Mac, respetively.

	4. To avoid name collisions (two different versions of StringPiece
	are in Chrome's base and ICU), make the use of 'icu::' namespace
	qualifier required by setting U_USING_ICU_NAMESPACE to 0 in
	source/common/unicode/uversion.h

	In addition, the patches for ICU ticket 6935
	(http://icu-project.org/trac/ticket/6935) are applied.

	The combined patch is patches/namespace.patch.txt

	5. The word breaking for Chinese and Japanese were modified to use a word
	frequency list with the following patch and cjdict.txt.

	In addition, the word breaking rule for ASCII and full-width full stop(period)
	surrounded by letters has been modified to fit our need for segmenting
	a host name into its components (e.g. treating 'www.google.com' not as
	a single word but as 5 words). It's what ICU 3.8 did before UTR 29
	changed the rule (WB #6, #7). This also let us pass
	LayoutTests/css1/text_properties/text_transform.html without rebaselining.

	These patches alone will not work without build-related changes mentioned
	in #10 below.

	- patches/segmentation.patch.txt :
	Adds a dictionary (word-frequency)-based word breaking for CJK
	(Korean is supported in the code, but it does not do anything
	because we don't have a Korean word-list.)

	- source/data/brkitr/cjdict.txt :
	Chinese and Japanese word frequency list.
	See the file for license/copyright notice

	- source/data/brkitr/cc_edict.txt :
	the list of words derived from CC-Edict.)

	The following two files were removed (because Japanese breaking rules
	are now the same as that of other langauges).

	- source/data/brkitr/word_ja.txt
	- source/data/brkitr/ja.txt

	If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt
	to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test.

	6. A minor break iterator change

	- patches/brkitr.patch.txt

	7. Converter changes : converters.patch.txt
	- Include what we really need. See source/data/mappings/ucmlocal.txt
	- Alias and mapping changes : source/data/mappings/convrtrs.txt
	- Changes several tables and add six new tables, three of which
	are 'fake' tables for ISO-2022-CN(-Ext).
	- ucnv2022.c is modified to use 3 'fake' tables added above for
	ISO-2022-CN(-Ext).

	8. Locale changes
	- patches/locale1.patch.txt :
	Filipino locale, exemplar character set changes for CJK + 9 Indian
	locales with minor fixes for Danish, Hungarian, Turkish, Korean
	and Catalan.

	- patches/locale2.patch.txt :
	The minimum locale data Chrome needs for 35 languages Chrome is
	not localized to. Each locale data file has ExemplarCharacters,
	LocaleScript, layout, and the name of the language for a locale
	in its native language.

	- patches/locale3.patch.txt : Locale build configuration files

	9. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt

	- patches/unihan.patch.txt:
	unihan collation tables are never used in Chrome/Webkit, but it takes
	about 1MB in the uncompressed ICU data file in ICU 4.2.1.

	10. Build-related changes

	- patches/wpo.patch
	- patches/windows.patch
	- patches/data.build.patch :
	To remove some data files we don't use and cut down the data size.
	- patches/data.build.win.patch :
	Windows-only data build patch. Add a new target DATALIB to makedata.mak
	- add an empty file (stubdatabuilt.txt) to source/stubdata

	11. Pre-built data libraries are checked in.

	- source/data/in/icudt42l.dat : Built on Linux with all the patches
	above applied,
	- icudt42.dll : With icudt42l.dat in place, all the patches applied
	and header files moved (#11 below), generated in bin/ by building
	icudt_build project of build/icudt_build.sln on Windows.
	It's made in bin/ and moved to the top and checked in.
	- {mac,linux}/icudt42l_dat.s : Built on Mac and Linux with all the
	patches above applied and checked in.
	linux needs the '@' in the preamble changed to '%'. See
	http://codereview.chromium.org/215026.
	mac/icudt42l_dat.s needs one line added after it is generated. A
	.private_extern directive needs to be added so that the top of the
	file looks like:

	.globl _icudt42_dat
	.private_extern _icudt42_dat
	.data

	12. The header files were moved as shown below:

	source/common/unicode ==> public/common/unicode
	source/i18n/unicode ==> public/i18n/unicode

	13. The patch for a memory leak in i18n/timezone.cpp (Windows only):
	see http://bugs.icu-project.org/trac/ticket/7135

	- patches/tzmemory.patch

	14. The patch for a crash in common/putil.c (Linux only):
	see http://bugs.icu-project.org/trac/ticket/7177

	- patches/linuxtz.patch

	15. The patch for Linux locale detection

	- patches/locdet.patch