Поддерживать
www.wikidata.ru-ru.nina.az
UTF 16 angl Unicode Transformation Format v informatike odin iz sposobov kodirovaniya simvolov iz Yunikoda v vide posledovatelnosti 16 bitnyh slov Dannaya kodirovka pozvolyaet zapisyvat simvoly Yunikoda v diapazonah U 0000 U D7FF i U E000 U 10FFFF obshim kolichestvom 1 112 064 Pri etom kazhdyj simvol zapisyvaetsya odnim ili dvumya slovami surrogatnaya para Kodirovka UTF 16 opisana v prilozhenii Q k mezhdunarodnomu standartu ISO IEC 10646 a takzhe ej posvyashyon IETF RFC 2781 UTF 16 an encoding of ISO 10646 Istoriya poyavleniyaPervaya versiya Yunikoda 1991 g predstavlyala soboj 16 bitnuyu kodirovku s fiksirovannoj shirinoj simvola obshee chislo raznyh simvolov bylo 216 65 536 Vo vtoroj versii Yunikoda 1996 g bylo resheno znachitelno rasshirit kodovuyu oblast dlya sohraneniya sovmestimosti s temi sistemami gde uzhe byl realizovan 16 bitnyj Yunikod i byla sozdana UTF 16 Oblast 0xD800 0xDFFF otvedyonnaya dlya surrogatnyh par ranee prinadlezhala k oblasti simvolov dlya chastnogo ispolzovaniya Poskolku v UTF 16 mozhno otobrazit 220 216 2048 1 112 064 simvolov to eto chislo i bylo vybrano v kachestve novoj velichiny kodovogo prostranstva Yunikoda Princip kodirovaniyaDC00 DFFE DFFFD800 010000 0103FE 0103FFD801 010400 0107FE 0107FF DBFF 10FC00 10FFFE V UTF 16 simvoly kodiruyutsya dvuhbajtovymi slovami s ispolzovaniem vseh vozmozhnyh diapazonov znachenij ot 0 do FFFF16 Pri etom mozhno kodirovat simvoly Unicode v diapazonah 000016 D7FF16 i E00016 FFFF16 Isklyuchennyj otsyuda diapazon D80016 DFFF16 ispolzuetsya kak raz dlya kodirovaniya tak nazyvaemyh surrogatnyh par simvolov kotorye kodiruyutsya dvumya 16 bitnymi slovami Simvoly Unicode do FFFF16 vklyuchitelno isklyuchaya diapazon dlya surrogatov zapisyvayutsya kak est 16 bitnym slovom Simvoly zhe v diapazone 1000016 10FFFF16 bolshe 16 bit kodiruyutsya po sleduyushej sheme Iz koda simvola vychitaetsya 1000016 V rezultate poluchitsya znachenie ot nulya do FFFFF16 kotoroe pomeshaetsya v razryadnuyu setku 20 bit Starshie 10 bit chislo v diapazone 000016 03FF16 summiruyutsya s D80016 i rezultat idyot v vedushee pervoe slovo kotoroe vhodit v diapazon D80016 DBFF16 Mladshie 10 bit tozhe chislo v diapazone 000016 03FF16 summiruyutsya s DC0016 i rezultat idyot v posleduyushee vtoroe slovo kotoroe vhodit v diapazon DC0016 DFFF16 Poryadok bajtovOdin simvol kodirovki UTF 16 predstavlen posledovatelnostyu dvuh bajtov ili dvuh par bajtov Kotoryj iz dvuh bajtov idyot vperedi starshij ili mladshij zavisit ot poryadka bajtov Sistemu sovmestimuyu s processorami x86 nazyvayut little endian a s processorami m68k i SPARC big endian Dlya opredeleniya poryadka bajtov ispolzuetsya metka poryadka bajtov angl Byte order mark V nachale teksta zapisyvaetsya kod U FEFF Pri schityvanii esli vmesto U FEFF schitalos U FFFE znachit poryadok bajtov obratnyj little endian poskolku kod U FFFE v Yunikode ne kodiruet simvol i zarezervirovan kak raz dlya celej opredeleniya poryadka bajtov Tak kak v kodirovke UTF 8 ne ispolzuyutsya znacheniya 0xFE i 0xFF mozhno ispolzovat metku poryadka bajtov kak priznak pozvolyayushij razlichat UTF 16 i UTF 8 UTF 16LE i UTF 16BE Predusmotrena takzhe vozmozhnost vneshnego ukazaniya poryadka bajtov dlya etogo kodirovka dolzhna byt opisana kak UTF 16LE ili UTF 16BE little endian big endian a ne prosto UTF 16 V etom sluchae metka poryadka bajtov U FEFF ne nuzhna UTF 16 v OS WindowsOsnovnaya statya Yunikod v operacionnyh sistemah Microsoft V API Win32 rasprostranyonnom v sovremennyh versiyah operacionnoj sistemy Microsoft Windows imeetsya dva sposoba predstavleniya teksta v forme tradicionnyh 8 bitnyh kodovyh stranic i v vide UTF 16 Pri ispolzovanii UTF 16 Windows ne nakladyvaet ogranichenij na prikladnye programmy kasatelno kodirovaniya tekstovyh fajlov pozvolyaya im ispolzovat kak UTF 16LE tak i UTF 16BE posredstvom ustanovki i traktovki sootvetstvuyushej metki poryadka bajtov Odnako vnutrennij format Windows vsegda UTF 16LE Etot moment sleduet uchityvat pri rabote s ispolnyaemymi fajlami ispolzuyushimi yunikodovye versii funkcij WinAPI Stroki v nih vsegda kodiruyutsya v UTF 16LE V fajlovyh sistemah NTFS a takzhe FAT s podderzhkoj dlinnyh imyon imena fajlov zapisyvayutsya takzhe v UTF 16LE Primery procedurPrimery nizhe zapisany na psevdokode i v nih ne uchityvaetsya maska poryadka bajtov oni lish pokazyvayut sut kodirovaniya Poryadok bajtov ot mladshego k starshemu Little Endian intelovskij x86 Tip Word dvuhbajtovoe slovo 16 bitnoe bezznakovoe celoe a tip UInt32 32 bitnoe bezznakovoe celoe Shestnadcaterichnye znacheniya nachinayutsya so znaka dollara Kodirovanie V primere WriteWord uslovnaya procedura kotoraya pishet odno slovo pri etom sdvigaet vnutrennij ukazatel Funkciya LoWord vozvrashaet mladshee slovo ot 32 bitnogo celogo starshie bity ne glyadya otbrasyvayutsya Dopustimye znacheniya Code 0000 D7FF E000 10FFFF Procedure WriteUTF16Char Code UInt32 If Code lt 10000 Then WriteWord LoWord Code Else Code Code 10000 Var Lo10 Word LoWord Code And 3FF Var Hi10 Word LoWord Code Shr 10 WriteWord D800 Or Hi10 WriteWord DC00 Or Lo10 End If End Procedure Dekodirovanie V primere ReadWord chitaet slovo iz potoka sdvigaya pri etom vnutrennij ukazatel Ona zhe pri neobhodimosti mozhet korrektirovat poryadok bajtov Funkciya WordToUInt32 rasshiryaet dvuhbajtovoe slovo do chetyryohbajtovogo bezznakovogo celogo zapolnyaya starshie bity nulyami Error preryvaet vypolnenie po suti isklyuchenie V sluchae uspeha vozvrashayutsya znacheniya v diapazonah 0000 D7FF i E000 10FFFF Function ReadUTF16Char UInt32 Var Leading Word Lidiruyushee pervoe slovo Var Trailing Word Posleduyushee vtoroe slovo Leading ReadWord If Leading lt D800 Or Leading gt DFFF Then Return WordToUInt32 Leading Else If Leading gt DC00 Then Error Nedopustimaya kodovaya posledovatelnost Else Var Code UInt32 Code WordToUInt32 Leading And 3FF Shl 10 Trailing ReadWord If Trailing lt DC00 Or Trailing gt DFFF Then Error Nedopustimaya kodovaya posledovatelnost Else Code Code Or WordToUInt32 Trailing And 3FF Return Code 10000 End If End If End FunctionPrimechaniyaUsing Byte Order Marks angl Data obrasheniya 18 fevralya 2016 22 yanvarya 2016 goda SsylkiUnicode Technical Note 12 Obrabotka UTF 16 angl Unicode FAQ V chyom raznica mezhdu UCS 2 i UTF 16 angl RFC 2781 UTF 16 an encoding of ISO 10646 angl Polnoe opisanie standarta Unicode angl ISO 10646 UTF 16 Informaciya o konvertirovanii bolshih znachenij v dva slova UTF 16 angl
Вершина