更新时间:2023-11-10 16:08:16
所以你的意思是没有办法加快速度吗?,因为我的情况是读取一堆文件,然后读取文件的每一行并将其存储到数据库中
So you mean there is no way to speed this up?, because my scenario is to read bunch of files then read each lines of it and store it to the database
优化的第一法则是问自己是否应该打扰.如果您的程序仅运行一次或几次优化,那将是浪费时间.
The first rule of optimization is to ask yourself if you should bother. If your program is run only once or a couple of times optimizing it is a waste of time.
第二条规则是在执行其他任何操作之前,先测量问题所在;
The second rule is that before you do anything else, measure where the problem lies;
编写一个简单的程序,该程序顺序读取文件,将文件拆分为几行,然后将其填充到数据库中. 在 profiler 下运行该程序,以查看该程序将大部分时间花费在哪里.
Write a simple program that sequentially reads files, splits them into lines and stuffs those in a database. Run that program under a profiler to see where the program is spending most of its time.
只有这样,您才知道该程序的哪一部分需要加快速度.
Only then do you know which part of the program needs speeding up.
尽管如此,这里还是有一些指针.
Here are some pointers nevertheless.
mmap
可以完成文件读取.multiprocessing.Pool
将读取的文件分散到不同的内核上.但是,这些文件中的数据将最终进入不同的进程,并且必须使用IPC发送回父进程.对于大量数据,这会产生大量开销.mmap
.multiprocessing.Pool
to spread out the reading of multiple files over different cores. But then the data from those files will end up in different processes and would have to be sent back to the parent process using IPC. This has significant overhead for large amounts of data.