When a Large Amount of Data is Written, TiKV's CPU Nodes are Fully Occupied

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 大量写入数据时,TiKV的CPU节点被占满

| username: Zealot

[TiDB Version] 7.1.2

When using DataX or TiSpark to write data, the TiKV CPU gets fully occupied. How should this be handled?

Why is the read IO also very high when I am writing data?

The IO of all nodes is very high, it shouldn’t be caused by write hotspots. Is it really writing too fast?

| username: 路在何chu | Original post link

Using AUTO_RANDOM to handle auto-increment primary keys

| username: Zealot | Original post link

I am using a string as the primary key. Are you suggesting that I change it to use AUTO_RANDOM as the primary key and then create an index for the original primary key?

| username: Zealot | Original post link

From this perspective, the IO of all nodes suddenly increases during writing. This doesn’t seem to be something that a hotspot can solve.

| username: 路在何chu | Original post link

AUTO_RANDOM does not support strings. Is it feasible to use integers as primary keys?

| username: Zealot | Original post link

I took a look and it doesn’t seem to be caused by writing hotspots. All my nodes have very high IO, and the amount of data written is around 20GB-50GB.

| username: 像风一样的男子 | Original post link

How is your disk performance? Do you feel like you’ve reached the disk read/write limit?

| username: 像风一样的男子 | Original post link

Additionally, the compaction traffic is very high. Is there a large number of updates and deletes causing significant pressure on the disk due to garbage collection?

| username: 江湖故人 | Original post link

From the screenshot, the high read IO is mainly caused by compaction, which is several times less than the write IO.

RocksDB’s compaction includes two aspects: one is flushing the memtable to disk when it is full, which is a special type of compaction also known as minor compaction. The other is merging data from the L0 layer downwards, known as major compaction, which is commonly referred to as compaction.

Compaction is essentially a merge sort process, writing data from the Ln layer to the Ln+1 layer, filtering out deleted data to achieve physical deletion. The main process includes:

  1. Preparation: Selecting the sst files to be merged from the Ln/Ln+1 layers based on certain conditions and priorities, and determining the key range to be processed.
  2. Processing: Reading the key-value data, merging, sorting, and handling operations for different types of keys.
  3. Writing: Writing the sorted data into the Ln+1 layer sst files and updating the metadata information.
| username: 江湖故人 | Original post link

5 nodes, with write IO reaching 1GB/s, that’s already quite impressive :+1:

| username: Zealot | Original post link

Mainly using TiSpark to write data, it’s not easy to control the writing speed. For now, I’ll just use TiSpark to write directly to Hive.

| username: oceanzhang | Original post link

IO has indeed reached a certain level.

| username: Sunward | Original post link

With such a high write volume, even high-performance physical machines and solid-state drives struggle. The only solution is to add more machines.

| username: Zealot | Original post link

This IO, I feel, hasn’t reached the hardware limit. I’m using Huawei Cloud ECS, which claims to have a speed of 1000MB/s. I’m not sure if it’s because writing also takes up read IO, but the actual speed doesn’t reach the 1000*5 write speed. However, it doesn’t matter anymore; I’m not doing it this way. I’ll write directly to Hive instead, and it only takes 3 minutes to write to Hive, whereas it takes 1 hour to write to TiDB.

| username: 江湖故人 | Original post link

If you have Hive, just write in Hive :laughing:

| username: andone | Original post link

This issue caused by hot spots cannot be resolved by writing; it can only be solved by adding more machines.

| username: 江湖故人 | Original post link

For tables with frequent writes, removing some unnecessary indexes can also optimize insertion speed.

| username: 有猫万事足 | Original post link

It’s normal. You can see that the proportion of compaction in both write and read operations is very high. During compaction, a layer needs to be sorted and written, and of course, there will be a high read proportion during sorting.

Logical import works like this. To avoid compaction issues, you need to use physical import, sort and generate SST files first, and then directly place them into TiKV.

| username: zhanggame1 | Original post link

If you want TiKV to have fast write performance, use clustered tables and avoid adding extra indexes except for the primary key.