Converting Unicode code point to surrogate representation

Although I looked up Unicode official web site, it was not possible to find how to convert Unicode code point to 4 bytes representation Emoticon uses. What’s weird about Emoticons are :

  • Unicode emoticon area is just smaller set of emoticons people think of
  • Some emoticons use Unicode code point, but some uses 4 bytes of Unicode sequence ( is there a name for this? )
  • Is there a rule to convert code point to the 4 bytes of sequence and to UTF-8?
    • There are some rule between code point and UTF-8

@gluebyte sent me a URL for a web site, which I found at work also but didn’t know there was conversion rule between code point and the 4 bytes of Unicode escape sequence. ( This is called surrogate code. So, it’s different from normal Unicode code point escape sequence. )

Actually, as for Unicode terminology, terminology is more difficult than what they mean.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: