更新时间:2022-04-18 00:10:02
我编写了这个函数来处理混合阿拉伯语和英语字符的字符串,删除特殊字符(包括变音符号)和规范化一些阿拉伯字符,比如将所有的 ة 转换为 ه .
I wrote this function which handles strings with mixed Arabic and English characters, removing special characters (including diacritics) and normalizing some Arabic characters like converting all ة's into ه's.
normalize_text = function(text) {
//remove special characters
text = text.replace(/([^\u0621-\u063A\u0641-\u064A\u0660-\u0669a-zA-Z 0-9])/g, '');
//normalize Arabic
text = text.replace(/(آ|إ|أ)/g, 'ا');
text = text.replace(/(ة)/g, 'ه');
text = text.replace(/(ئ|ؤ)/g, 'ء')
text = text.replace(/(ى)/g, 'ي');
//convert arabic numerals to english counterparts.
var starter = 0x660;
for (var i = 0; i < 10; i++) {
text.replace(String.fromCharCode(starter + i), String.fromCharCode(48 + i));
}
return text;
}
<input value="الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ" type="text" id="input">
<button onclick="document.getElementById('input').value = normalize_text(document.getElementById('input').value)">Normalize</button>