且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何从Firebird SQL数据库中删除大数据

更新时间:2023-02-02 20:48:26

根据您在问题中的描述以及您的评论,该问题与Firebird中垃圾回收的工作方式有关.Firebird是一个所谓的多版本并发控制(MVCC)数据库,您所做的每个更改到行(记录)(包括删除),将创建该记录的新版本,并使以前的版本可用于在进行更改的事务提交之前启动的其他事务.

Based on your description in the question, and your comments, the problem has to do with how garbage collection works in Firebird. Firebird is a so-called Multi-Version Concurrency Control (MVCC) database, each change you make to a row (record), including deletions, will create new versions of that record, and keep previous versions available for other transactions that were started before the transaction that made the change is committed.

如果在先前版本的记录中没有更多的交易感兴趣",则该先前版本将有资格进行垃圾收集.Firebird有两种垃圾收集选项:合作(受所有服务器模式支持)和 background (由SuperServer支持),以及第三种 combined 模式两者都做(这是SuperServer的默认设置).

If there are no more transactions 'interested' in a previous version of record, that previous version becomes eligible for garbage collection. Firebird has two options for garbage collection: cooperative (supported by all server modes) and background (supported by SuperServer), and a third combined mode which does both (this is the default for SuperServer).

背景模式是专用线程,用于清理垃圾,如果活动语句看到垃圾,则会通过活动语句发出信号.

The background mode is a dedicated thread which cleans up garbage, it's signaled by active statements if they see garbage.

cooperative 模式下,看到垃圾的语句也是必须对其进行清理的语句.当语句在大型更新或删除后立即执行全表扫描时,这可能会特别昂贵.该语句不仅会查找并返回行,而且还将重写数据库页面以消除该垃圾.

In the cooperative mode, a statement that sees garbage is also the one that has to clean it up. This can be especially costly when the statement performs a full table scan just after a large update or delete. Instead of just finding and returning rows, that statement will also rewrite database pages to get rid of that garbage.

另请参见幻灯片垃圾收集机制和详细信息.

有一些可能的解决方案:

There are some possible solutions:

  1. 如果您使用的是SuperServer,请通过将 firebird.conf 中的 GCPolicy 设置设置为 background 来更改策略.

  1. If you're using SuperServer, change the policy, by setting the setting GCPolicy in firebird.conf to background.

此解决方案的缺点是,收集所有垃圾可能需要更长的时间,但是最大的好处是,通过执行垃圾收集工作不会降低事务处理的速度.

The downside of this solution is that it might take longer before all garbage is collected, but the big benefit is that transactions are not slowed down by doing garbage collection work.

在提交产生大量垃圾的事务后,执行一条执行全表扫描的语句(例如,从表中的 select count(*))来触发垃圾回收,使用单独的工作线程不会阻塞其余的过程.

After committing a transaction that produced a lot garbage, execute a statement that performs a full table scan (e.g. select count(*) from table) to trigger garbage collection, using a separate worker thread to not block the rest of your process.

仅当没有活动的交易仍然对那些旧记录版本感兴趣时,此选项才真正起作用.

This option only really works if there are no active transactions that are still interested in those old record versions.

创建数据库的备份(无需进行任何还原,只需验证备份是否正常工作即可.)

Create a backup of the database (there is no need to restore, except to verify if the backup worked correctly).

默认情况下(除非指定 -g 选项以禁用垃圾收集), gbak 工具将在备份期间执行垃圾收集.这与选项2具有相同的限制,因为gbak等效于 select * from table

By default (unless you specify the -g option to disable garbage collection), the gbak tool will perform garbage collection during a backup. This has the same restriction as option 2, as this works because gbak does the equivalent of a select * from table

使用 gfix -sweep .

与前两个选项具有类似的限制

This has similar restrictions as the previous two options

对于不会导致垃圾回收速度变慢的连接,请指定连接选项 isc_dpb_no_garbage_collect (详细信息在驱动程序和连接库之间有所不同).

For connections that cannot incur the slowdown of a garbage collection, specify the connection option isc_dpb_no_garbage_collect (details vary between drivers and connection libraries).

如果您为所有连接都指定了此选项,并且您的策略是 cooperative (因为它已配置,或者您使用的是Classic或SuperClassic服务器模式),则不会进行垃圾收集,这也可能最终导致速度降低,因为引擎将不得不扫描更长的记录版本链.可以通过使用前面的两个选项来执行垃圾收集来缓解这种情况.

If you specify this for all connections, and your policy is cooperative (either because it is configured, or you're using Classic or SuperClassic server mode), then no garbage collection will take place, which can cause an eventual slowdown as well, because the engine will have to scan longer chains of record versions. This can be mitigated by using the previous two options to perform a garbage collection.

不是真正删除记录,而是在应用程序中引入软删除,以将记录标记为已删除,而不是真正删除记录.

Instead of really deleting records, introduce a soft-delete in your application to mark records as deleted instead of really deleting them.

要么永久保留这些记录,要么在以后的某个时间真正将其删除(例如,通过在数据库未加载时运行的计划作业),并包括触发垃圾收集的先前选项之一.

Either keep those records permanently, or really delete them at a later time, for example by a scheduled job running at a time the database is not under load, and include one of the previous options to trigger a garbage collection.