Unicode 特性转义 正则表达式 allows for matching characters based on their Unicode properties. A character is described by several properties which are either binary ("boolean-like") or non-binary. For instance, unicode property escapes can be used to match emojis, punctuations, letters (even letters from specific languages or scripts), etc.

注意: For Unicode property escapes to work, a regular expression must use the u flag which indicates a string must be considered as a series of Unicode code points. See also RegExp.prototype.unicode .

注意: Some Unicode properties encompasses much more characters than some character classes (譬如 \w which matches only latin letters, a to z ) but the latter is better supported among browsers (as of January 2020).

句法

The following section is also duplicated on this cheatsheet . Do not forget to edit it as well, thanks!
// Non-binary values
\p{UnicodePropertyValue}
\p{UnicodePropertyName=UnicodePropertyValue}
// Binary and non-binary values
\p{UnicodeBinaryPropertyName}
// Negation: \P is negated \p
\P{UnicodePropertyValue}
\P{UnicodeBinaryPropertyName}
					

另请参阅 PropertyValueAliases.txt

UnicodeBinaryPropertyName
The name of a binary property . E.g.: ASCII , Alpha , Math , Diacritic , Emoji , Hex_Digit , Math , White_space , etc. See Unicode Data PropList.txt for more info.
UnicodePropertyName
The name of a non-binary 特性:
UnicodePropertyValue
One of the tokens listed in the Values section, below. Many values have aliases or shorthand (e.g. the value Decimal_Number General_Category property may be written Nd , digit ,或 Decimal_Number ). For most values, the UnicodePropertyName part and equals sign may be omitted. If a UnicodePropertyName is specified, the value must correspond to the property type given.

注意: As there are many properties and values available, we will not describe them exhaustively here but rather provide various examples

Rationale

Before ES2018 there was no performance-efficient way to match characters from different sets based on scripts (like Macedonian, Greek, Georgian etc.) or propertyName (like Emoji etc) in JavaScript. Check out tc39 Proposal on Unicode Property Escapes for more info.

范例

General categories

General categories are used to classify Unicode characters and subcategories are available to define a more precise categorization. It is possible to use both short or long forms in Unicode property escapes.

They can be used to match letters, numbers, symbols, punctuations, spaces, etc. For a more exhaustive list of general categories, please refer to the Unicode specification .

// finding all the letters of a text
let story = "It’s the Cheshire Cat: now I shall have somebody to talk to.";
// Most explicit form
story.match(/\p{General_Category=Letter}/gu);
// It is not mandatory to use the property name for General categories
story.match(/\p{Letter}/gu);
// This is equivalent (short alias):
story.match(/\p{L}/gu);
// This is also equivalent (conjunction of all the subcategories using short aliases)
story.match(/\p{Lu}|\p{Ll}|\p{Lt}|\p{Lm}|\p{Lo}/gu);
					

Scripts and script extensions

Some languages use different scripts for their writing system. For instance, English and Spanish are written using the Latin script while Arabic and Russian are written with other scripts (respectively Arabic and Cyrillic). The Script and Script_Extensions Unicode properties allow regular expression to match characters according to the script they are mainly used with ( Script ) or according to the set of scripts they belong to ( Script_Extensions ).

例如, A belongs to the Latin script and ε 希腊语 脚本。

let mixedCharacters = "aεЛ";
// Using the canonical "long" name of the script
mixedCharacters.match(/\p{Script=Latin}/u); // a
// Using a short alias for the script
mixedCharacters.match(/\p{Script=Greek}/u); // ε
// Using the short name Sc for the Script property
mixedCharacters.match(/\p{Sc=Cyrillic}/u); // Л
					

For more details, please refer to the Unicode specification Scripts table in the ECMAScript specification .

If a character is used in a limited set of scripts, the Script property will only match for the "predominant" used script. If we want to match characters based on a "non-predominant" script, we could use the Script_Extensions property ( Scx for short).

// ٢ is the digit 2 in Arabic-Indic notation
// while it is predominantly written within the Arabic script
// it can also be written in the Thaana script
"٢".match(/\p{Script=Thaana}/u);
// null as Thaana is not the predominant script        super()
"٢".match(/\p{Script_Extensions=Thaana}/u);
// ["٢", index: 0, input: "٢", groups: undefined]
					

Unicode property escapes vs. character classes

With JavaScript regular expressions, it is also possible to use character classes and especially \w or \d to match letters or digits. However, such forms only match characters from the Latin script (in other words, a to z and A to Z for \w and 0 to 9 for \d ). As shown in this example , it might be a bit clumsy to work with non Latin texts.

Unicode property escapes categories encompass much more characters and \p{Letter} or \p{Number} will work for any script.

// Trying to use ranges to avoid \w limitations:
const nonEnglishText = "Приключения Алисы в Стране чудес";
const regexpBMPWord = /([\u0000-\u0019\u0021-\uFFFF])+/gu;
// BMP goes through U+0000 to U+FFFF but space is U+0020
console.table(nonEnglishText.match(regexpBMPWord));
// Using Unicode property escapes instead
const regexpUPE = /\p{L}+/gu;
console.table(nonEnglishText.match(regexpUPE));
					

规范

规范
ECMAScript (ECMA-262)
The definition of 'RegExp: Unicode property escapes' in that specification.

浏览器兼容性

For browser compatibility information, check out the main Regular Expressions compatibility table .

另请参阅

元数据

  • 最后修改:
  1. JavaScript
  2. 教程:
  3. 完全初学者
    1. JavaScript 基础
    2. JavaScript 第一步
    3. JavaScript 构建块
    4. 引入 JavaScript 对象
  4. JavaScript 指南
    1. 介绍
    2. 语法和类型
    3. 控制流程和错误处理
    4. 循环和迭代
    5. 函数
    6. 表达式和运算符
    7. 数字和日期
    8. 文本格式化
    9. 正则表达式
    10. 索引集合
    11. 键控集合
    12. Working with objects
    13. 对象模型的细节
    14. Using promises
    15. 迭代器和生成器
    16. Meta programming
    17. JavaScript 模块
  5. 中间体
    1. Client-side JavaScript frameworks
    2. 客户端侧 Web API
    3. 重新介绍 JavaScript
    4. JavaScript 数据结构
    5. 相等比较和相同
    6. 闭包
  6. 高级
    1. 继承和原型链
    2. 严格模式
    3. JavaScript 类型数组
    4. 内存管理
    5. 并发模型和事件循环
  7. 参考:
  8. 内置对象
    1. AggregateError
    2. Array
    3. ArrayBuffer
    4. AsyncFunction
    5. Atomics
    6. BigInt
    7. BigInt64Array
    8. BigUint64Array
    9. 布尔
    10. DataView
    11. Date
    12. Error
    13. EvalError
    14. FinalizationRegistry
    15. Float32Array
    16. Float64Array
    17. Function
    18. Generator
    19. GeneratorFunction
    20. Infinity
    21. Int16Array
    22. Int32Array
    23. Int8Array
    24. InternalError
    25. Intl
    26. JSON
    27. Map
    28. Math
    29. NaN
    30. Number
    31. Object
    32. Promise
    33. Proxy
    34. RangeError
    35. ReferenceError
    36. Reflect
    37. RegExp
    38. Set
    39. SharedArrayBuffer
    40. String
    41. Symbol
    42. SyntaxError
    43. TypeError
    44. TypedArray
    45. URIError
    46. Uint16Array
    47. Uint32Array
    48. Uint8Array
    49. Uint8ClampedArray
    50. WeakMap
    51. WeakRef
    52. WeakSet
    53. WebAssembly
    54. decodeURI()
    55. decodeURIComponent()
    56. encodeURI()
    57. encodeURIComponent()
    58. escape()
    59. eval()
    60. globalThis
    61. isFinite()
    62. isNaN()
    63. null
    64. parseFloat()
    65. parseInt()
    66. undefined
    67. unescape()
    68. uneval()
  9. 表达式 & 运算符
    1. Addition (+)
    2. Addition assignment (+=)
    3. Assignment (=)
    4. Bitwise AND (&)
    5. Bitwise AND assignment (&=)
    6. Bitwise NOT (~)
    7. Bitwise OR (|)
    8. Bitwise OR assignment (|=)
    9. Bitwise XOR (^)
    10. Bitwise XOR assignment (^=)
    11. Comma operator (,)
    12. 条件 (三元) 运算符
    13. Decrement (--)
    14. Destructuring assignment
    15. Division (/)
    16. Division assignment (/=)
    17. Equality (==)
    18. Exponentiation (**)
    19. Exponentiation assignment (**=)
    20. 函数表达式
    21. Greater than (>)
    22. Greater than or equal (>=)
    23. Grouping operator ( )
    24. Increment (++)
    25. Inequality (!=)
    26. Left shift (<<)
    27. Left shift assignment (<<=)
    28. Less than (<)
    29. Less than or equal (<=)
    30. Logical AND (&&)
    31. Logical AND assignment (&&=)
    32. Logical NOT (!)
    33. Logical OR (||)
    34. Logical OR assignment (||=)
    35. Logical nullish assignment (??=)
    36. Multiplication (*)
    37. Multiplication assignment (*=)
    38. Nullish coalescing operator (??)
    39. 对象初始化器
    40. 运算符优先级
    41. Optional chaining (?.)
    42. Pipeline operator (|>)
    43. 特性访问器
    44. Remainder (%)
    45. Remainder assignment (%=)
    46. Right shift (>>)
    47. Right shift assignment (>>=)
    48. Spread syntax (...)
    49. Strict equality (===)
    50. Strict inequality (!==)
    51. Subtraction (-)
    52. Subtraction assignment (-=)
    53. Unary negation (-)
    54. Unary plus (+)
    55. Unsigned right shift (>>>)
    56. Unsigned right shift assignment (>>>=)
    57. 异步函数表达式
    58. await
    59. class expression
    60. delete operator
    61. function* 表达式
    62. in operator
    63. instanceof
    64. new operator
    65. new.target
    66. super
    67. this
    68. typeof
    69. void 运算符
    70. yield
    71. yield*
  10. 语句 & 声明
    1. async function
    2. block
    3. break
    4. class
    5. const
    6. continue
    7. debugger
    8. do...while
    9. empty
    10. export
    11. for
    12. for await...of
    13. for...in
    14. for...of
    15. 函数声明
    16. function*
    17. if...else
    18. import
    19. import.meta
    20. label
    21. let
    22. return
    23. switch
    24. throw
    25. try...catch
    26. var
    27. while
    28. with
  11. 函数
    1. 箭头函数表达式
    2. 默认参数
    3. 方法定义
    4. 其余参数
    5. 自变量对象
    6. getter
    7. setter
    1. Private class fields
    2. Public class fields
    3. 构造函数
    4. extends
    5. static
  12. 错误
    1. Error: Permission denied to access property "x"
    2. InternalError: too much recursion
    3. RangeError: argument is not a valid code point
    4. RangeError: invalid array length
    5. RangeError: invalid date
    6. RangeError: precision is out of range
    7. RangeError: radix must be an integer
    8. RangeError: repeat count must be less than infinity
    9. RangeError: repeat count must be non-negative
    10. ReferenceError: "x" is not defined
    11. ReferenceError: assignment to undeclared variable "x"
    12. ReferenceError: can't access lexical declaration`X' before initialization
    13. ReferenceError: deprecated caller or arguments usage
    14. ReferenceError: invalid assignment left-hand side
    15. ReferenceError: reference to undefined property "x"
    16. SyntaxError: "0"-prefixed octal literals and octal escape seq. are deprecated
    17. SyntaxError: "use strict" not allowed in function with non-simple parameters
    18. SyntaxError: "x" is a reserved identifier
    19. SyntaxError: JSON.parse: bad parsing
    20. SyntaxError: Malformed formal parameter
    21. SyntaxError: Unexpected token
    22. SyntaxError: Using //@ to indicate sourceURL pragmas is deprecated. Use //# instead
    23. SyntaxError: a declaration in the head of a for-of loop can't have an initializer
    24. SyntaxError: applying the 'delete' operator to an unqualified name is deprecated
    25. SyntaxError: for-in loop head declarations may not have initializers
    26. SyntaxError: function statement requires a name
    27. SyntaxError: identifier starts immediately after numeric literal
    28. SyntaxError: illegal character
    29. SyntaxError: invalid regular expression flag "x"
    30. SyntaxError: missing ) after argument list
    31. SyntaxError: missing ) after condition
    32. SyntaxError: missing : after property id
    33. SyntaxError: missing ; before statement
    34. SyntaxError: missing = in const declaration
    35. SyntaxError: missing ] after element list
    36. SyntaxError: missing formal parameter
    37. SyntaxError: missing name after . operator
    38. SyntaxError: missing variable name
    39. SyntaxError: missing } after function body
    40. SyntaxError: missing } after property list
    41. SyntaxError: redeclaration of formal parameter "x"
    42. SyntaxError: return not in function
    43. SyntaxError: test for equality (==) mistyped as assignment (=)?
    44. SyntaxError: unterminated string literal
    45. TypeError: "x" has no properties
    46. TypeError: "x" is (not) "y"
    47. TypeError: "x" is not a constructor
    48. TypeError: "x" is not a function
    49. TypeError: "x" is not a non-null object
    50. TypeError: "x" is read-only
    51. TypeError: 'x' is not iterable
    52. TypeError: More arguments needed
    53. TypeError: Reduce of empty array with no initial value
    54. TypeError: X.prototype.y called on incompatible type
    55. TypeError: can't access dead object
    56. TypeError: can't access property "x" of "y"
    57. TypeError: can't assign to property "x" on "y": not an object
    58. TypeError: can't define property "x": "obj" is not extensible
    59. TypeError: can't delete non-configurable array element
    60. TypeError: can't redefine non-configurable property "x"
    61. TypeError: cannot use 'in' operator to search for 'x' in 'y'
    62. TypeError: cyclic object value
    63. TypeError: invalid 'instanceof' operand 'x'
    64. TypeError: invalid Array.prototype.sort argument
    65. TypeError: invalid arguments
    66. TypeError: invalid assignment to const "x"
    67. TypeError: property "x" is non-configurable and can't be deleted
    68. TypeError: setting getter-only property "x"
    69. TypeError: variable "x" redeclares argument
    70. URIError: malformed URI sequence
    71. Warning: -file- is being assigned a //# sourceMappingURL, but already has one
    72. Warning: 08/09 is not a legal ECMA-262 octal constant
    73. Warning: Date.prototype.toLocaleFormat is deprecated
    74. Warning: JavaScript 1.6's for-each-in loops are deprecated
    75. Warning: String.x is deprecated; use String.prototype.x instead
    76. Warning: expression closures are deprecated
    77. Warning: unreachable code after return statement
  13. 杂项
    1. JavaScript 技术概述
    2. 词法语法
    3. JavaScript 数据结构
    4. Enumerability and ownership of properties
    5. Iteration protocols
    6. 严格模式
    7. 过渡到严格模式
    8. 模板文字
    9. 弃用特征