recently read lots of thing unicode code points , how evolved on time , sure read http://www.joelonsoftware.com/articles/unicode.html also.
but couldn't find real reason why java uses utf-16 char.
for example if had string contains 1024 letter of ascii scoped charachter string. means 1024 * 2 bytes
equals 2kb string memory consume in anyway.
so if java base char utf-8 1kb of data. if string has charachter needs 2bytes example 10 charachter of "字" naturally increase size of memory consumption. (1014 * 1 byte) + (10 * 2 bytes) = 1kb + 20 bytes
the result isn't obvious 1kb + 20 bytes vs. 2kb
don't ascii curiosity why not utf-8 take cares of multibyte chars also. utf-16 looks waste of memory in string has lots of non multibyte chars.
is there reason behind ?
one reason performance characteristics of random access or iterating on characters of string:
utf-8 encoding uses variable number (1-4) bytes encode unicode char. therefore accessing character index: string.charat(i)
way more complicated implement , slower array access used java.lang.string
.
Comments
Post a Comment