unicode - Why Java char uses UTF-16? -


recently read lots of thing unicode code points , how evolved on time , sure read http://www.joelonsoftware.com/articles/unicode.html also.

but couldn't find real reason why java uses utf-16 char.

for example if had string contains 1024 letter of ascii scoped charachter string. means 1024 * 2 bytes equals 2kb string memory consume in anyway.

so if java base char utf-8 1kb of data. if string has charachter needs 2bytes example 10 charachter of "字" naturally increase size of memory consumption. (1014 * 1 byte) + (10 * 2 bytes) = 1kb + 20 bytes

the result isn't obvious 1kb + 20 bytes vs. 2kb don't ascii curiosity why not utf-8 take cares of multibyte chars also. utf-16 looks waste of memory in string has lots of non multibyte chars.

is there reason behind ?

one reason performance characteristics of random access or iterating on characters of string:

utf-8 encoding uses variable number (1-4) bytes encode unicode char. therefore accessing character index: string.charat(i) way more complicated implement , slower array access used java.lang.string.


Comments