且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

python multiprocessing:某些函数完成后不返回(队列内容太大)

更新时间:2022-05-28 23:11:15

好的,似乎当函数的输出太大时,用于填充Queue的管道就被塞了(我的粗略理解?这是未解决的/已关闭的错误? http://bugs.python.org/issue8237 ).我已经修改了问题中的代码,以便有一些缓冲(在进程运行时定期清空队列),从而解决了我所有的问题.因此,现在这将收集任务(函数及其参数),启动它们并收集输出.我希望它看起来更简单/更干净.

Alright, it seems that the pipe used to fill the Queue gets plugged when the output of a function is too big (my crude understanding? This is an unresolved/closed bug? http://bugs.python.org/issue8237). I have modified the code in my question so that there is some buffering (queues are regularly emptied while processes are running), which solves all my problems. So now this takes a collection of tasks (functions and their arguments), launches them, and collects the outputs. I wish it were simpler /cleaner looking.

编辑(2014年9月; 2017年11月更新:出于可读性的考虑而进行了重写):我正在使用自那时以来所做的增强功能来更新代码.新代码(功能相同,但功能更好)在这里: https://gitlab.com/cpbl/cpblUtilities/blob/master/parallel. py

Edit (2014 Sep; update 2017 Nov: rewritten for readability): I'm updating the code with the enhancements I've made since. The new code (same function, but better features) is here: https://gitlab.com/cpbl/cpblUtilities/blob/master/parallel.py

呼叫说明也在下面.

def runFunctionsInParallel(*args, **kwargs):
    """ This is the main/only interface to class cRunFunctionsInParallel. See its documentation for arguments.
    """
    return cRunFunctionsInParallel(*args, **kwargs).launch_jobs()

###########################################################################################
###
class cRunFunctionsInParallel():
    ###
    #######################################################################################
    """Run any list of functions, each with any arguments and keyword-arguments, in parallel.
The functions/jobs should return (if anything) pickleable results. In order to avoid processes getting stuck due to the output queues overflowing, the queues are regularly collected and emptied.
You can now pass os.system or etc to this as the function, in order to parallelize at the OS level, with no need for a wrapper: I made use of hasattr(builtinfunction,'func_name') to check for a name.
Parameters
----------
listOf_FuncAndArgLists : a list of lists 
    List of up-to-three-element-lists, like [function, args, kwargs],
    specifying the set of functions to be launched in parallel.  If an
    element is just a function, rather than a list, then it is assumed
    to have no arguments or keyword arguments. Thus, possible formats
    for elements of the outer list are:
      function
      [function, list]
      [function, list, dict]
kwargs: dict
    One can also supply the kwargs once, for all jobs (or for those
    without their own non-empty kwargs specified in the list)
names: an optional list of names to identify the processes.
    If omitted, the function name is used, so if all the functions are
    the same (ie merely with different arguments), then they would be
    named indistinguishably
offsetsSeconds: int or list of ints
    delay some functions' start times
expectNonzeroExit: True/False
    Normal behaviour is to not proceed if any function exits with a
    failed exit code. This can be used to override this behaviour.
parallel: True/False
    Whenever the list of functions is longer than one, functions will
    be run in parallel unless this parameter is passed as False
maxAtOnce: int
    If nonzero, this limits how many jobs will be allowed to run at
    once.  By default, this is set according to how many processors
    the hardware has available.
showFinished : int
    Specifies the maximum number of successfully finished jobs to show
    in the text interface (before the last report, which should always
    show them all).
Returns
-------
Returns a tuple of (return codes, return values), each a list in order of the jobs provided.
Issues
-------
Only tested on POSIX OSes.
Examples
--------
See the testParallel() method in this module
    """