Character sets¶
All TYPO3 CMS websites use UTF-8 as their character set. Using UTF-8 means you have a consistent data storage and can store any glyph from any language without thinking more about charsets.
Database field lengths¶
The TYPO3 CMS Core is compatible with UTF-8.
You might however face the problem that the database field lengths of some extensions must be extended. For example, each Chinese glyph takes three bytes. So if a field is a varchar(10) and an author enters 10 Chinese glyphs only the first 3 glyphs will be stored (since they take up 9 bytes). UTF-8 is tricky in this respect because all ASCII chars take only 1 byte while European special chars typically take up 2 and asian charsets take up 3 - but some special glyphs could take even 5-6 bytes!
For more information on how to set up the database to use UTF-8, please read here: