多线程处理数据库记录的***做法

更新时间：2023-08-24 13:38:04

使用如下：

每个任务都会执行一个查询，例如：

UPDATE taskstable SET lockedby =（my id），locktime = now（）WHERE lockedby IS NULL ORDER BY ID LIMIT 10

$ b b

其中10是批量大小。

LIMIT 10是MySQL特有的，但我认为其他数据库有等同的。 ORDER BY是import，以避免查询是不确定的。

I have a single process that queries a table for records where PROCESS_IND = 'N', does some processing, and then updates the PROCESS_IND to 'Y'.

I'd like to allow for multiple instances of this process to run, but don't know what the best practices are for avoiding concurrency problems.

Where should I start?

The pattern I'd use is as follows:

Create columns "lockedby" and "locktime" which are a thread/process/machine ID and timestamp respectively (you'll need the machine ID when you split the processing between several machines)
Each task would do a query such as:

UPDATE taskstable SET lockedby=(my id), locktime=now() WHERE lockedby IS NULL ORDER BY ID LIMIT 10

Where 10 is the "batch size".

Then each task does a SELECT to find out which rows it's "locked" for processing, and processes those
After each row is complete, you set lockedby and locktime back to NULL
All this is done in a loop for as many batches as exist.
A cron job, or scheduled task, periodically resets the "lockedby" of any row whose locktime is too long ago, as they were presumably done by a task which has hung or crashed. Someone else will then pick them up

The LIMIT 10 is MySQL specific but I think other databases have equivalents. The ORDER BY is import to avoid the query being nondeterministic.

相关阅读