且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Python比较标记化列表

更新时间:2023-11-28 23:20:52

您可以使用嵌套列表推导,如下所示:

You can use a nested list comprehension like this:

>>> m = ['abc','bcd','cde','def']
>>> r = [['abc','def'],['bcd','cde'],['abc','def','bcd']]
>>> [[1 if mx in rx else 0 for mx in m] for rx in r]
[[1, 0, 0, 1], [0, 1, 1, 0], [1, 1, 0, 1]]

此外,您可以使用int(...)缩短1 if ... else 0,并且可以将r的子列表转换为set,这样单个mx in rx的查找会更快.

Also, you could shorten the 1 if ... else 0 using int(...), and you can convert the sublists of r to set, so that the individual mx in rx lookups are faster.

>>> [[int(mx in rx) for mx in m] for rx in r]
[[1, 0, 0, 1], [0, 1, 1, 0], [1, 1, 0, 1]]
>>> [[int(mx in rx) for mx in m] for rx in map(set, r)]
[[1, 0, 0, 1], [0, 1, 1, 0], [1, 1, 0, 1]]

虽然int(...)1 if ... else 0短,但它似乎也较慢,因此您可能不应该使用它.在重复查找之前将r的子列表转换为set应该可以加快较长列表的速度,但是对于您的示例列表很短,实际上这比幼稚的方法要慢.

While int(...) is a bit shorter than 1 if ... else 0, it also seems to be slower, so you probably should not use that. Converting the sublists of r to set prior to the repeated lookup should speed things up for longer lists, but for you very short example lists, it's in fact slower than the naive approach.

>>> %timeit [[1 if mx in rx else 0 for mx in m] for rx in r]
100000 loops, best of 3: 4.74 µs per loop
>>> %timeit [[int(mx in rx) for mx in m] for rx in r]
100000 loops, best of 3: 8.07 µs per loop
>>> %timeit [[1 if mx in rx else 0 for mx in m] for rx in map(set, r)]
100000 loops, best of 3: 5.82 µs per loop

对于更长的列表,如预期的那样,使用set会变得更快:

For longer lists, using set becomes faster, as would be expected:

>>> m = [random.randint(1, 100) for _ in range(50)]
>>> r = [[random.randint(1,100) for _ in range(10)] for _ in range(20)]
>>> %timeit [[1 if mx in rx else 0 for mx in m] for rx in r]
1000 loops, best of 3: 412 µs per loop
>>> %timeit [[1 if mx in rx else 0 for mx in m] for rx in map(set, r)]
10000 loops, best of 3: 208 µs per loop