且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在Perl中将字符串与变音符进行匹配?

更新时间:2022-02-24 23:28:53

使用UCA(thnx到 tchrist )的正确解决方案:

Right solution with UCA (thnx to tchrist):

# found start/end offsets for matched s
use 5.014;
use utf8;
use Unicode::Collate;
binmode STDOUT, ':encoding(UTF-8)';
my $str  = "Îñţérñåţîöñåļîžåţîöñ" x 2;
my $look = "Nation";
my $Collator = Unicode::Collate->new(
    normalization => undef, level => 1
   );

my @match = $Collator->match($str, $look);
say "match ok!" if @match;

P.S. 假设您可以删除变音符号以获取基本ASCII字母的代码是邪恶的,静止的,​​残破的,损坏大脑的,错误的,并且是死刑的理由." © tchrist 为什么现代的Perl默认情况下会避免使用UTF-8吗?

P.S. "Code that assumes you can remove diacritics to get at base ASCII letters is evil, still, broken, brain-damaged, wrong, and justification for capital punishment." © tchrist Why does modern Perl avoid UTF-8 by default?