Performance Issues Caused by High TiKV Pressure Due to Insert Operations in TiDB Online Cluster

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 线上集群insert 导致tikv 压力高的性能问题

| username: 是我的海

【TiDB Usage Environment】Production Environment
【TiDB Version】v6.5.6


【Encountered Problem: Problem Phenomenon and Impact】
During daily operations, the CPU pressure on TiKV is abnormally high.

The io util display is inaccurate due to kernel issues on the machine. There is no actual io bottleneck.

From the topsql in the dashboard, it can be seen that the main time-consuming statement is an insert … set statement.

The execution plan looks fine, but the actual execution time of 100ms is not as expected when executed manually.

From the slow query information, it can be seen that the main time is spent in the prewrite phase.

How can this situation be optimized?

| username: tidb狂热爱好者 | Original post link

show create table table_structure

| username: 有猫万事足 | Original post link

Find the graph for the corresponding time period and take a screenshot.

See if it’s caused by compaction.

| username: 是我的海 | Original post link

Create Table: CREATE TABLE `xxxxxxx` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'Primary key auto-increment',
  `sid` varchar(100) NOT NULL DEFAULT '0' COMMENT 'Search ID',
  `query` text DEFAULT NULL COMMENT 'Question text',
  `subquestioninfo` text DEFAULT NULL COMMENT 'Sub-question information',
  `batch` bigint(40) NOT NULL DEFAULT '0' COMMENT 'Production batch',
  `qid` varchar(40) NOT NULL DEFAULT '0' COMMENT 'ID generated by delivery, 0 if none',
  `source` varchar(20) NOT NULL DEFAULT '0' COMMENT 'Source: 1-Single question shot, 2-Whole page shot',
  `course` bigint(10) NOT NULL DEFAULT '0' COMMENT 'Subject',
  `grade` bigint(10) NOT NULL DEFAULT '0' COMMENT 'Grade 1,20,30: Elementary, Middle, High',
  `questiontype` bigint(10) NOT NULL DEFAULT '0' COMMENT 'Question type',
  `weight` bigint(10) NOT NULL DEFAULT '0' COMMENT 'Popularity value',
  `pval` decimal(10,2) NOT NULL DEFAULT '0' COMMENT 'Satisfaction',
  `gentime` bigint(40) NOT NULL DEFAULT '0' COMMENT 'Delivery time',
  `genway` bigint(10) NOT NULL DEFAULT '0' COMMENT 'Generation method 1-Real-time, 2-Hour, 2-Day',
  `status` bigint(10) NOT NULL DEFAULT '0' COMMENT 'Current status',
  `createtime` bigint(40) NOT NULL DEFAULT '0' COMMENT 'Storage time',
  `updatetime` bigint(40) NOT NULL DEFAULT '0' COMMENT 'Status update time',
  `top_pid` varchar(100) NOT NULL COMMENT 'SID image PID',
  `tid` varchar(20) NOT NULL DEFAULT '0' COMMENT 'TID after going online',
  `ori_tid` varchar(20) NOT NULL DEFAULT '0' COMMENT 'First TID at the shooting search site',
  `version` varchar(100) NOT NULL DEFAULT '0' COMMENT 'Model version',
  `deleted` tinyint(4) NOT NULL DEFAULT '0' COMMENT '1: Soft delete',
  `ext` text DEFAULT NULL COMMENT 'Reserved extension field',
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */,
  UNIQUE KEY `sid` (`sid`),
  KEY `tid` (`tid`),
  KEY `questiontype` (`questiontype`),
  KEY `pull` (`batch`,`course`,`status`),
  KEY `idx_createtime_status` (`createtime`,`status`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=18798046568 COMMENT='AIGC automatic question solving hot table'
| username: 是我的海 | Original post link

| username: 有猫万事足 | Original post link

The timing matches, it is caused by compaction.

show config where type=‘tikv’ and name like ‘%level0-file-num-compaction-trigger%’

Run the above SQL, let me see what the parameter is set to.

| username: 是我的海 | Original post link

Our table is constantly deleting data every day because the write volume is quite large, and the business does not need to retain that much data.

| username: tidb狂热爱好者 | Original post link

Your table is constantly being deleted, which leads to…

| username: 有猫万事足 | Original post link

Default configuration, it seems like your write load is quite high.

Now there are two issues:

  1. In the case of rewriting, your table will definitely have hotspots.

id bigint(20) NOT NULL AUTO_INCREMENT COMMENT ‘Primary key auto-increment’,
PRIMARY KEY (id) /*T![clustered_index] CLUSTERED */,

This is a combination of a clustered table and an auto-increment ID.

Either use a clustered table with AUTO_RANDOM or a non-clustered table with SHARD_ROW_ID_BITS.

If your business is already online, this might be quite difficult to change.

  1. The configuration shown above means that defaultcf and writecf l0 have 4 files, which should trigger compaction.
    But you see in your monitoring that the maximum value is 12, which is much higher than 4.
    At this time, l0 files will do a full merge with the lower level, amplifying the write. Pending compaction bytes will increase.
    This is the reason for the batch delay to 100ms, as the instantaneous write volume is 4.6g, ensuring a delay of 100ms. Your disk is still quite good.

You can consider lowering the number of level0-file-num-compaction-trigger for defaultcf and writecf, so that these two cfs start compaction earlier, doing compaction in smaller batches to see if it can reduce the value of pending compaction bytes, thereby reducing write latency.
The idea is to change one batch of 100ms to two batches of 70ms.

Note that if your write load is really high, this adjustment might not be useful.

My disk is not good, with the default configuration of default.level0-file-num-compaction-trigger=4. When pending compaction bytes >2g, there will be near-second level delays in batches. When I adjusted default.level0-file-num-compaction-trigger=1, the probability of encountering write delays due to compaction in daily use was very low.
However, once there is a large data import, you can still see l0 files >10.
So if your write volume is consistently this high, I’m not sure if lowering this parameter will be useful.

You can refer to the above document to modify the value of level0-file-num-compaction-trigger at runtime and observe whether it causes the corresponding tikv’s pending compaction bytes to decrease.

If it works, then consider modifying the entire cluster configuration through tiup cluster edit-config.

| username: 是我的海 | Original post link

Sure, thanks a lot.
Actually, we have another cluster with a similar usage scenario, with a large amount of data being written and deleted daily. However, we haven’t encountered this issue there. I’ll try adjusting this parameter first.

| username: 有猫万事足 | Original post link

The same principle applies. You can go straight to the pending compaction bytes graph and see what the peak is, as well as the number of L0 files.

This kind of batch latency increase, if it doesn’t happen within 30 minutes, is not very noticeable. And honestly, I think 100ms is actually fine. Most people won’t feel the lag.

| username: 是我的海 | Original post link

TiKV CPU consistently remains around 700%, and during peak periods it can reach 1500%. This is somewhat concerning, so I am delving into the specific reasons.
Thanks for the support :pray:

| username: 是我的海 | Original post link

Bro, please check Mr. Mao’s reply. I’ll give feedback after making adjustments.

| username: jiaxin | Original post link

The write volume is very large, so batch writing is recommended. We have a business with a daily write volume of over 100 million. Initially, the performance couldn’t keep up, but later we switched to batch writing.

| username: 是我的海 | Original post link

The write throughput of this cluster is only two to three thousand QPS, which I feel is not meeting expectations. It’s not appropriate to ask the business to change the code for this level of concurrency.

In another cluster of ours, we write tens of billions of records to a single wide table every day, with a QPS of around 30,000. Initially, we used batch writes, and the TiKV CPU usage reached nearly 2000%. Later, we switched to single-row writes, and the QPS peaked at around 40,000. The TiKV CPU usage slightly decreased to below 1500%, but the TiDB CPU usage went up to over 2000%. Next week, I will have to post another topic to research how to optimize this. :joy:

| username: 有猫万事足 | Original post link

If the write throughput is only two to three thousand QPS and can’t go any higher, it seems like a hotspot issue.

At least from your table structure, it looks like it can’t support high-volume writes. When there are a lot of inserts, the writes will all be concentrated on one region.

You have at least 6 TiKV nodes. You need to distribute the writes.
If you must use AUTO_INCREMENT, then you should use a non-clustered table with SHARD_ROW_ID_BITS.

Like this:

Create Table: CREATE TABLE xxxxxxx (
id bigint(20) NOT NULL AUTO_INCREMENT COMMENT ‘Primary key auto-increment’,
sid varchar(100) NOT NULL DEFAULT ‘0’ COMMENT ‘Search ID’,
query text DEFAULT NULL COMMENT ‘Question text’,
subquestioninfo text DEFAULT NULL COMMENT ‘Sub-question info’,
batch bigint(40) NOT NULL DEFAULT ‘0’ COMMENT ‘Production batch’,
qid varchar(40) NOT NULL DEFAULT ‘0’ COMMENT ‘ID generated by delivery, 0 if none’,
source varchar(20) NOT NULL DEFAULT ‘0’ COMMENT ‘Source: 1-Single question, 2-Full page’,
course bigint(10) NOT NULL DEFAULT ‘0’ COMMENT ‘Subject’,
grade bigint(10) NOT NULL DEFAULT ‘0’ COMMENT ‘Grade 1,20,30: Elementary, Middle, High’,
questiontype bigint(10) NOT NULL DEFAULT ‘0’ COMMENT ‘Question type’,
weight bigint(10) NOT NULL DEFAULT ‘0’ COMMENT ‘Popularity value’,
pval decimal(10,2) NOT NULL DEFAULT ‘0’ COMMENT ‘Satisfaction’,
gentime bigint(40) NOT NULL DEFAULT ‘0’ COMMENT ‘Delivery time’,
genway bigint(10) NOT NULL DEFAULT ‘0’ COMMENT ‘Generation method 1-Real-time, 2-Hourly, 2-Daily’,
status bigint(10) NOT NULL DEFAULT ‘0’ COMMENT ‘Status’,
createtime bigint(40) NOT NULL DEFAULT ‘0’ COMMENT ‘Entry time’,
updatetime bigint(40) NOT NULL DEFAULT ‘0’ COMMENT ‘Status update time’,
top_pid varchar(100) NOT NULL COMMENT ‘SID image PID’,
tid varchar(20) NOT NULL DEFAULT ‘0’ COMMENT ‘TID after going live’,
ori_tid varchar(20) NOT NULL DEFAULT ‘0’ COMMENT ‘First TID in search scene’,
version varchar(100) NOT NULL DEFAULT ‘0’ COMMENT ‘Model version’,
deleted tinyint(4) NOT NULL DEFAULT ‘0’ COMMENT ‘1: Soft delete’,
ext text DEFAULT NULL COMMENT ‘Reserved extension field’,
PRIMARY KEY (id) /*T![clustered_index] NONCLUSTERED */,
UNIQUE KEY sid (sid),
KEY tid (tid),
KEY questiontype (questiontype),
KEY pull (batch,course,status),
KEY idx_createtime_status (createtime,status)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin shard_row_id_bits = 4 pre_split_regions=4 COMMENT=‘AIGC auto-solving hot table’

The main difference is

PRIMARY KEY (id) /*T![clustered_index] NONCLUSTERED */,
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin shard_row_id_bits = 4 pre_split_regions=4 COMMENT=‘AIGC auto-solving hot table’

No other changes are needed.

| username: 是我的海 | Original post link

From the cluster’s phenomenon, it doesn’t seem to be a hotspot write issue. In fact, many other TiDB business tables were directly migrated from MySQL and are using auto_increment, and there aren’t any significant hotspot issues.

This image shows the current hotspot status of the cluster, which I think is under no pressure. The main issue might still be compaction.

I also changed the parameter you mentioned from 4 to 8, but the effect is not very ideal.

| username: 有猫万事足 | Original post link

This hotspot distribution chart may not necessarily reveal single hotspot issues. For example, with your table structure, all writes are concentrated on one region, which might only account for 1/20 of the overall load. This won’t cause a significant increase in the number of hotspot regions. However, in the Traffic Visualization section of the TiDB Dashboard, this region might already appear very bright.

What I meant was to lower it, because lowering it would result in smaller, more frequent compactions. Increasing it would lead to larger, less frequent compactions, which would likely increase the pending compaction bytes.

You can gradually change it from 4 to 3, then from 3 to 2, and observe. The set config command supports modifying parameters for individual TiKV instances. As long as you observe that the pending compaction bytes for a single TiKV are gradually decreasing, it means the change is effective. If, after the change, the number of L0 files (the chart above the pending compaction bytes is the Level0 SST file number) remains consistently higher than the value you set, then the pending compaction bytes won’t decrease. This indicates that the change is ineffective, and you should revert the parameter for that TiKV instance online and consider other solutions.

| username: dba远航 | Original post link

It seems that the 302 video has an explanation of this.