在 JavaScript 中删除字符串中的重音符号/变音符号

更新时间：2023-02-26 09:58:31

使用 ES2015/ES6 String.prototype.normalize(),

With ES2015/ES6 String.prototype.normalize(),

const str = "Crème Brulée"
str.normalize("NFD").replace(/[u0300-u036f]/g, "")
> "Creme Brulee"

这里发生了两件事:

normalize()ing to NFD Unicode 范式将组合字形分解为简单字形的组合.Crème 的 è 最终表示为 e + ̀.
使用正则表达式字符类匹配 U+0300 → U+036F 范围，现在可以轻松地在全球范围内删除变音符号，Unicode 标准将其方便地分组为组合变音符号 Unicode 块.

normalize()ing to NFD Unicode normal form decomposes combined graphemes into the combination of simple ones. The è of Crème ends up expressed as e + ̀.
Using a regex character class to match the U+0300 → U+036F range, it is now trivial to globally get rid of the diacritics, which the Unicode standard conveniently groups as the Combining Diacritical Marks Unicode block.

从 2021 年开始，您还可以使用 Unicode 属性转义:

As of 2021, one can also use Unicode property escapes:

str.normalize("NFD").replace(/p{Diacritic}/gu, "")

性能测试见评论.

或者，如果您只想排序

Intl.Collator 有足够的支持~95% 现在，polyfill 也可用这里但我还没有测试过.

Intl.Collator has sufficient support ~95% right now, a polyfill is also available here but I haven't tested it.

const c = new Intl.Collator();
["creme brulee", "crème brulée", "crame brulai", "crome brouillé",
"creme brulay", "creme brulfé", "creme bruléa"].sort(c.compare)
["crame brulai", "creme brulay", "creme bruléa", "creme brulee",
"crème brulée", "creme brulfé", "crome brouillé"]


["creme brulee", "crème brulée", "crame brulai", "crome brouillé"].sort((a,b) => a>b)
["crame brulai", "creme brulee", "crome brouillé", "crème brulée"]

上一篇 : ：Oracle-从SQL文本获取表名下一篇 : 如何在 PowerShell 中删除重音符号?

在 JavaScript 中删除字符串中的重音符号/变音符号

相关阅读

技术问答最新文章