why utf8 is important
May 28th, 2008 | 1 Comment | Posted in anything under the moonlight, developer's tools by dreamluverz
Advantages
Here are several advantages of UTF-8:
- UTF-8 can be read and written quickly just with bit-mask and bit-shift operations.
- Comparing two char strings in C/C++ with strcmp() gives the same result as wcscmp(), so that legicographic sorting and tree-search order are preserved.
- Bytes FF and FE never appear in an UTF-8 output, so they can be used to indicate an UTF-16 or UTF-32 text (see BOM).
- UTF-8 is byte order independent. The bytes order is the same on all systems, so that it doesn’t actually require a BOM.
Disadvantages
UTF-8 has several disadvantages:
- You cannot determine the number of bytes of the UTF-8 text from the number of UNICODE characters because UTF-8 uses a variable length encoding.
- It needs 2 bytes for those non-Latin characters that are encoded in just 1 byte with extended ASCII char sets.
- ISO Latin-1, a subset of UNICODE, is not a subset of UTF-8.
- The 8-bit chars of UTF-8 are stripped by many mail gateways because Internet messages were originally designed as 7-bit ASCII. The problem led to the creation of UTF-7.
- UTF-8 uses the values 100xxxxx in more than 50% of its representation, but existing implementation of ISO 2022, 4873, 6429, and 8859 systems mistake these as C1 control codes. The problem led to the creation of UTF-7,5.
source:http://www.codeguru.com/cpp/misc/misc/multi-lingualsupport/article.php/c10451/
Why is it Important?
UTF-8 is an important encoding because of the following reasons:
- ASCII compatible
- easily supported
- compact and efficient for most scripts
- easily processed, unlike other multibyte encodings
source: http://developers.sun.com/dev/gadc/technicalpublications/articles/utf8.html
Tags: importance of utf8, pros and cons of utf8, utf8


