且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在python中比较字符串时忽略空格

更新时间:2023-02-20 11:56:50

isjunk的工作方式与您想象的略有不同.通常,isjunk仅标识一个或多个字符,这些字符不影响匹配的长度,但仍包含在总字符数中.例如,考虑以下内容:

isjunk works a little differently than you might think. In general, isjunk merely identifies one or more characters that do not affect the length of a match but that are still included in the total character count. For example, consider the following:

>>> SequenceMatcher(lambda x: x in "abcd", " abcd", "abcd abcd").ratio()
0.7142857142857143

第二个字符串("abcd")的前四个字符都是可忽略的,因此可以将第二个字符串与以空格开头的第一个字符串进行比较.从第一个字符串和第二个字符串中的空格开始,然后,上面的SequenceMatcher查找十个匹配的字符(每个字符串五个)和4个不匹配的字符(第二个字符串中可忽略的前四个字符).这使您的比率为10/14(0.7142857142857143).

The first four characters of the second string ("abcd") are all ignorable, so the second string can be compared to the first string beginning with the space. Starting with the space in both the first string and the second string, then, the above SequenceMatcher finds ten matching characters (five in each string) and 4 non-matching characters (the ignorable first four characters in the second string). This gives you a ratio of 10/14 (0.7142857142857143).

然后,在您的情况下,第一个字符串"a b c"与第二个字符串在索引0、1和2(值"a b")匹配.第一个字符串的索引3(" ")没有匹配项,但在匹配长度方面被忽略.由于忽略了空格,因此索引4("c")与第二个字符串的索引3匹配.因此,您的9个字符中有8个匹配,因此比率为0.88888888888888.

In your case, then, the first string "a b c" matches the second string at indices 0, 1, and 2 (with values "a b"). Index 3 of the first string (" ") does not have a match but is ignored with regard to the length of the match. Since the space is ignored, index 4 ("c") matches index 3 of the second string. Thus 8 of your 9 characters match, giving you a ratio of 0.88888888888888.

您可能想尝试以下方法:

You might want to try this instead:

>>> c = a.replace(' ', '')
>>> d = b.replace(' ', '')
>>> difflib.SequenceMatcher(a=c, b=d).ratio()
1.0