Although I use Korean everyday, I didn’t have enough chance to figure out how the Unicode is actually designed.
The Mac OS X supports the Unicode, and when they say “Unicode”, it usually means the UTF-16, which is a 2 byte version of the Unicode.
One day I noticed a strange symptom. An NSString made with “자연” looked different from the same one in an FCP project file.
I wonder why and asked to the Apple’s Cocoa mailing list.
I got an answer from Ken Thomas and he suggested to check Unicode normalization.
Here is the link :
There are 4 different way of representing Unicode.
- Normalization Form D (NFD)
- Normalization Form C (NFC)
- Normalization Form KD (NFKD)
- Normalization Form KC (NFKC)
Form D means Canonical “D”ecomposition, while Form C means Canonical “C”omposition.
The K in KD and KC mean “Compatibility”.
The link above explains what they are very well using pictures. So, take a look at it.
Now, you will be able to understand what -precomposed.. and -decomposed.. mean in the NSString documentation for these methods.
- -precomposedStringWithCanonicalMapping : Form C
- -precomposedStringWithCompatibilityMapping : Form KC
- -decomposedStringWithCanonicalMapping : Form D
- -decomposedStringWithCompatibilityMapping : Form KD
There are two additional issues I would like to mention.
First, we know that the Mac OS X uses Unicode natively. However which Unicode representation does it use?
– (const char*)fileSystemRepresentation of the NSString and – (const char*)fileSystemRepresentationWithPath:(NSString *)path returns the file name in Unicode in a way the MacOS X uses.
It is said to be mostly decomposed version. So, when you try opening a file by choosing one using NSOpenPanel, it would contain the string in decomposed way.
Second… then how we compare two Unicode strings? I guess we should take care of the two cases. But, the NSString is smart enought to handle them for us.
The compare: methods of the NSString can accept whatever forms and it can compare a composed version with a decomposed version. If the twos are actually for the same characters or words, it returns NSOrderedSame, which means that they are the same.
Impressive, isn’t it?