且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

正则表达式仅从字符串中获取单词

更新时间:2022-05-25 23:00:56

这实际上非常困难,可能根本不适合正则表达式.问题是您要接受彼得"中的彼得",而放弃第五".您真正想要做的可能是使用字典(适当的字典,而不是.NET字典类)并检查实际单词.否则,您将如何使用"Peters"或"it" s?
That actually quite difficult, and probably not suited for a regex at all. The problem is that you want to accept the "Peter" from "Peter''s" but discard "5th". What you really want to do is probably use a dictionary (a proper one, rather than an .NET Dictionary class) and check for actual words. Other wise, what are you going to do with "Peters''" or "it''s"?


我将从Split开始.问题实际上不是正则表达式所致,这也很难支持.

—SA
I would start with Split. The problem is not really up to Regex, which is also would be hard to support.

—SA


这将适用于您指定的输入
(^|\s)(?<word>[a-zA-Z][a-zA-Z'']*)</word>

我同意OriginalGriff的观点,即制作成能够100%工作的正则表达式,即使不是没有可能,也几乎没有.如果您不需要100%的精度,那么正则表达式应该为您锻炼.
This one will work for the input you specified
(^|\s)(?<word>[a-zA-Z][a-zA-Z'']*)</word>

I agree with OriginalGriff that making a regex that''ll work 100% is if not impossible then atleast almost. If you do not required 100% precision then the regex should workout for you.