Testing GC-related Issues

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 测试gc相关问题

| username: zhanggame1

[Test Environment for TiDB]
[TiDB Version] 7.1
[Encountered Issue: Phenomenon and Impact]
Testing TiDB GC (Garbage Collection) recovery situation with a table of approximately 90 million rows.
Testing method: Loop deletion, deleting 5000 rows each time, continuously executing.
Delete statement:

delete from history_uint limit 5000;

Test Conclusion:
The deletion process becomes slower over time, starting at around 200 milliseconds per operation and later exceeding 2 seconds.

Encountered Issue:
After half an hour of deletion, the monitoring shows that subsequent GC tasks are no longer visible, and both CPU and IO load significantly decrease. The cause of the issue is unknown.



Supplementary 1-hour GC data, it appears that no further GC was performed later.

| username: WalterWj | Original post link

Because of MVCC, this SQL getting slower and slower should be expected. You can enable cop cache to reduce the impact of historical versions.

Moreover, this test doesn’t seem to have much significance :thinking:. For situation tables, truncate is generally used. If deleting, the new version recommends using batch DML.

| username: zhanggame1 | Original post link

Batch DML is also a looped delete, there’s no difference.

| username: zhanggame1 | Original post link

Getting slower is expected and not a problem. The main issues are twofold:

  1. It seems that GC is not executing later on.
  2. Why does the number of regions keep increasing as deletes are performed?
| username: 人如其名 | Original post link

It is recommended to look at the principles of batch DML, as the mechanisms are different. Batch DML deletes through the primary key, skipping records marked for GC, and the deletion performance remains stable without slowing down over time.

| username: longzhuquan | Original post link

Because there is no GC, as delete operations proceed, the data keeps increasing (the underlying delete does not remove the data but marks it as garbage data, which is only cleaned up during GC), so the number of regions keeps increasing. To understand why GC is stuck, check the GC time settings and whether there are other transactions blocking the GC.

| username: zhanggame1 | Original post link

Set GC to 10m, and I’m the only one using the database for testing.

| username: tidb菜鸟一只 | Original post link

Please post the result.

| username: zhanggame1 | Original post link

The image link appears to be broken or inaccessible. Please provide the text you need translated.

| username: zhanggame1 | Original post link

Stop deleting at 12 o’clock, and then monitor like this. GC has started executing.

| username: tidb菜鸟一只 | Original post link

Your display shows that GC has already advanced to the latest time. If the system load is high, it is indeed possible that GC cannot advance, and the GC time here will show a delay.

| username: h5n1 | Original post link

  1. Delete is also an insert. Deleting data involves writing a new record with a delete marker, so delete operations increase the actual data size.
  2. GC in compaction filter, handling GC during compaction.
| username: redgame | Original post link

Consider performing deletions in larger batches instead of deleting only 5000 rows at a time. Reducing the number of delete operations may lessen the burden on GC and improve the overall performance of the delete operation.

| username: zhanggame1 | Original post link

The best practice example in the official documentation is 5000. What do you think is appropriate?

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.