且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

多线程处理数据库记录的***做法

更新时间:2023-08-24 13:38:04

使用如下:




  • 创建列lockedby和locktime,分别是线程/进程/机器ID和时间戳

  • 每个任务都会执行一个查询,例如:



    UPDATE taskstable SET lockedby =(my id),locktime = now()WHERE lockedby IS NULL ORDER BY ID LIMIT 10



$ b b

其中10是批量大小。




  • 然后,每个任务执行一个SELECT,以确定哪些行被锁定处理,并处理这些

  • 在每一行完成后,您将lockedby和locktime设置为NULL

  • 所有这一切都是在一个循环中完成的。 b
  • cron作业或计划任务定期重置锁定时间过长的任何行的被锁定,因为它们可能由挂起或崩溃的任务完成。


  • LIMIT 10是MySQL特有的,但我认为其他数据库有等同的。 ORDER BY是import,以避免查询是不确定的。


    I have a single process that queries a table for records where PROCESS_IND = 'N', does some processing, and then updates the PROCESS_IND to 'Y'.

    I'd like to allow for multiple instances of this process to run, but don't know what the best practices are for avoiding concurrency problems.

    Where should I start?

    The pattern I'd use is as follows:

    • Create columns "lockedby" and "locktime" which are a thread/process/machine ID and timestamp respectively (you'll need the machine ID when you split the processing between several machines)
    • Each task would do a query such as:

      UPDATE taskstable SET lockedby=(my id), locktime=now() WHERE lockedby IS NULL ORDER BY ID LIMIT 10

    Where 10 is the "batch size".

    • Then each task does a SELECT to find out which rows it's "locked" for processing, and processes those
    • After each row is complete, you set lockedby and locktime back to NULL
    • All this is done in a loop for as many batches as exist.
    • A cron job, or scheduled task, periodically resets the "lockedby" of any row whose locktime is too long ago, as they were presumably done by a task which has hung or crashed. Someone else will then pick them up

    The LIMIT 10 is MySQL specific but I think other databases have equivalents. The ORDER BY is import to avoid the query being nondeterministic.