且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将sed正则表达式转换为python代码

更新时间:2023-02-17 22:27:34

你问的是***的方法,我只是给你一个简单的方法.你肯定可以优化它.但是,仍然值得对您的约束进行测试,因为调用 shell 需要一些时间.
值得注意的是,shell 中的管道可能是获得更快代码的好方法,因为 sed 可以开始工作而无需等待 cat 完成.sort 也可以开始它的工作,但显然只有在 sed 完成工作时才会输出.因此,这是在 IO 期间使用 CPU 的好方法,应该被视为一种省力/性能好的解决方案.
我试过一个简单的例子,但你会明白的:

You asked for the best way, I'm just giving you a simple one. You could surely optimize it. But still, it is worth testing with your constraints, since invoking a shell takes some time.
It should be worth noting that pipes in shell might be a great way to have faster code, since sed can start to work whithout waiting for cat to finish. sort will also be able to begin its work but obviously will only output when sed is done working. So it is a great way to use your CPU during your IOs and should be considered as a low effort/good performance solution.
I've tried with a simple example, but you will get the idea :

test 中:

love
lol
loki
loki
ki
loutre
poutre

简单的 bash 命令,看起来像你的:

Simple bash command, looking like yours :

cat test | sed 's/lo\(.*\)$/\1/' | sort | uniq

输出:

ki
l
poutre
utre
ve

现在让我们尝试在 python 中做同样的事情:

Now let's try to do the same in python :

#!/usr/bin/python

import re

s = """love
lol
loki
loki
ki
loutre
poutre"""

arr = s.split('\n')                                             # sed iterates on each line
arr = map((lambda line: re.sub(r'lo(.*)$', r'\1', line)), arr)  # sed
arr = set(arr)                                                  # uniq
arr = sorted(list(arr))                                         # sort

print '\n'.join(arr)                                            # output it

这也可以用丑陋的代码行编写:

This could also be written in a ugly line of code :

print '\n'.join(sorted(list(set(map((lambda line: re.sub(r'lo(.*)$', r'\1', line)), s.split('\n'))))))