normalize()
method returns the Unicode Normalization Form of the string.
The source for this interactive example is stored in a GitHub repository. If you'd like to contribute to the interactive examples project, please clone https://github.com/mdn/interactive-examples and send us a pull request.
str.normalize([form])
form
可选
One of
"NFC"
,
"NFD"
,
"NFKC"
,或
"NFKD"
, specifying the Unicode Normalization Form. If omitted or
undefined
,
"NFC"
被使用。
These values have the following meanings:
"NFC"
Canonical Decomposition, followed by Canonical Composition.
"NFD"
Canonical Decomposition.
"NFKC"
Compatibility Decomposition, followed by Canonical Composition.
"NFKD"
Compatibility Decomposition.
A string containing the Unicode Normalization Form of the given string.
RangeError
RangeError
is thrown if
form
isn't one of the values specified above.
Unicode assigns a unique numerical value, called a
code point
, to each character. For example, the code point for
"A"
is given as U+0041. However, sometimes more than one code point, or sequence of code points, can represent the same abstract character — the character
"ñ"
for example can be represented by either of:
"n"
(U+006E) followed by the code point for the combining tilde (U+0303).
let string1 = '\u00F1'; let string2 = '\u006E\u0303'; console.log(string1); // ñ console.log(string2); // ñ
However, since the code points are different, string comparison will not treat them as equal. And since the number of code points in each version is different, they even have different lengths.
let string1 = '\u00F1'; // ñ let string2 = '\u006E\u0303'; // ñ console.log(string1 === string2); // false console.log(string1.length); // 1 console.log(string2.length); // 2
normalize()
method helps solve this problem by converting a string into a normalized form common for all sequences of code points that represent the same characters. There are two main normalization forms, one based on
canonical equivalence
and the other based on
compatibility
.
In Unicode, two sequences of code points have canonical equivalence if they represent the same abstract characters, and should always have the same visual appearance and behavior (for example, they should always be sorted in the same way).
可以使用
normalize()
使用
"NFD"
or
"NFC"
arguments to produce a form of the string that will be the same for all canonically equivalent strings. In the example below we normalize two representations of the character
"ñ"
:
let string1 = '\u00F1'; // ñ
let string2 = '\u006E\u0303'; // ñ
string1 = string1.normalize('NFD');
string2 = string2.normalize('NFD');
console.log(string1 === string2); // true
console.log(string1.length); // 2
console.log(string2.length); // 2
Note that the length of the normalized form under
"NFD"
is
2
. That's because
"NFD"
gives you the
decomposed
version of the canonical form, in which single code points are split into multiple combining ones. The decomposed canonical form for
"ñ"
is
"\u006E\u0303"
.
You can specify
"NFC"
以获取
composed
canonical form, in which multiple code points are replaced with single code points where possible. The composed canonical form for
"ñ"
is
"\u00F1"
:
let string1 = '\u00F1'; // ñ
let string2 = '\u006E\u0303'; // ñ
string1 = string1.normalize('NFC');
string2 = string2.normalize('NFC');
console.log(string1 === string2); // true
console.log(string1.length); // 1
console.log(string2.length); // 1
console.log(string2.codePointAt(0).toString(16)); // f1
In Unicode, two sequences of code points are compatible if they represent the same abstract characters, and should be treated alike in some — but not necessarily all — applications.
All canonically equivalent sequences are also compatible, but not vice versa.
例如:
"ff"
. It is compatible with two consecutive U+0066 code points (
"ff"
).
. It is compatible with the U+0044 code point (
"D"
).
In some respects (such as sorting) they should be treated as equivalent—and in some (such as visual appearance) they should not, so they are not canonically equivalent.
可以使用
normalize()
使用
"NFKD"
or
"NFKC"
arguments to produce a form of the string that will be the same for all compatible strings:
let string1 = '\uFB00';
let string2 = '\u0066\u0066';
console.log(string1); // ff
console.log(string2); // ff
console.log(string1 === string2); // false
console.log(string1.length); // 1
console.log(string2.length); // 2
string1 = string1.normalize('NFKD');
string2 = string2.normalize('NFKD');
console.log(string1); // ff <- visual appearance changed
console.log(string2); // ff
console.log(string1 === string2); // true
console.log(string1.length); // 2
console.log(string2.length); // 2
When applying compatibility normalization it's important to consider what you intend to do with the strings, since the normalized form may not be appropriate for all applications. In the example above the normalization is appropriate for search, because it enables a user to find the string by searching for
"f"
. But it may not be appropriate for display, because the visual representation is different.
As with canonical normalization, you can ask for decomposed or composed compatible forms by passing
"NFKD"
or
"NFKC"
, respectively.
normalize()
// Initial string
// U+1E9B: LATIN SMALL LETTER LONG S WITH DOT ABOVE
// U+0323: COMBINING DOT BELOW
let str = '\u1E9B\u0323';
// Canonically-composed form (NFC)
// U+1E9B: LATIN SMALL LETTER LONG S WITH DOT ABOVE
// U+0323: COMBINING DOT BELOW
str.normalize('NFC'); // '\u1E9B\u0323'
str.normalize(); // same as above
// Canonically-decomposed form (NFD)
// U+017F: LATIN SMALL LETTER LONG S
// U+0323: COMBINING DOT BELOW
// U+0307: COMBINING DOT ABOVE
str.normalize('NFD'); // '\u017F\u0323\u0307'
// Compatibly-composed (NFKC)
// U+1E69: LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE
str.normalize('NFKC'); // '\u1E69'
// Compatibly-decomposed (NFKD)
// U+0073: LATIN SMALL LETTER S
// U+0323: COMBINING DOT BELOW
// U+0307: COMBINING DOT ABOVE
str.normalize('NFKD'); // '\u0073\u0323\u0307'
| 规范 |
|---|
|
ECMAScript (ECMA-262)
The definition of 'String.prototype.normalize' in that specification. |
The compatibility table in this page is generated from structured data. If you'd like to contribute to the data, please check out https://github.com/mdn/browser-compat-data and send us a pull request.
更新 GitHub 上的兼容性数据| Desktop | Mobile | Server | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
normalize
|
Chrome 34 | Edge 12 | Firefox 31 | IE No | Opera 21 | Safari 10 | WebView Android No | Chrome Android 34 | Firefox Android 31 | Opera Android 21 | Safari iOS 10 | Samsung Internet Android 2.0 | nodejs 0.12 |
完整支持
不支持
String
String.fromCharCode()
String.fromCodePoint()
String.prototype.anchor()
String.prototype.big()
String.prototype.blink()
String.prototype.bold()
String.prototype.charAt()
String.prototype.charCodeAt()
String.prototype.codePointAt()
String.prototype.concat()
String.prototype.endsWith()
String.prototype.fixed()
String.prototype.fontcolor()
String.prototype.fontsize()
String.prototype.includes()
String.prototype.indexOf()
String.prototype.italics()
String.prototype.lastIndexOf()
String.prototype.link()
String.prototype.localeCompare()
String.prototype.match()
String.prototype.matchAll()
String.prototype.normalize()
String.prototype.padEnd()
String.prototype.padStart()
String.prototype.repeat()
String.prototype.replace()
String.prototype.replaceAll()
String.prototype.search()
String.prototype.slice()
String.prototype.small()
String.prototype.split()
String.prototype.startsWith()
String.prototype.strike()
String.prototype.sub()
String.prototype.substr()
String.prototype.substring()
String.prototype.sup()
String.prototype.toLocaleLowerCase()
String.prototype.toLocaleUpperCase()
String.prototype.toLowerCase()
String.prototype.toSource()
String.prototype.toString()
String.prototype.toUpperCase()
String.prototype.trim()
String.prototype.trimEnd()
String.prototype.trimStart()
String.prototype.valueOf()
String.prototype[@@iterator]()
String.raw()
Function
Object
Object.prototype.__defineGetter__()
Object.prototype.__defineSetter__()
Object.prototype.__lookupGetter__()
Object.prototype.__lookupSetter__()
Object.prototype.hasOwnProperty()
Object.prototype.isPrototypeOf()
Object.prototype.propertyIsEnumerable()
Object.prototype.toLocaleString()
Object.prototype.toSource()
Object.prototype.toString()
Object.prototype.valueOf()
Object.setPrototypeOf()