Seeking Solutions: The Impact of Bulk Data Insertion on TiDB Query Performance

translator_bot · June 23, 2024, 10:52am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 求方案，大批量插入数据对tidb查询的数据影响

| username: codingBoyYe

A new business is going online, and we need to manually synchronize 2 billion records to a certain table. To save time, the business is performing batch writes based on a certain dimension (the batch size is not fixed, with a maximum of 18,000 records per batch). However, the biggest impact is that it affects normal business requests. Is there a way to avoid this situation?

Our TiKV has 3 nodes with 4-core CPUs. During data synchronization, we found that the CPU load of TiKV is very high. Expanding the cluster can solve this, but our resources are usually sufficient. Is there a solution that does not require adding nodes?

translator_bot · June 23, 2024, 10:52am

| username: Mark | Original post link

The performance of CPU and IO determines the efficiency of concurrent writes. If the original poster’s situation doesn’t improve, more time will be needed.

translator_bot · June 23, 2024, 10:52am

| username: kkpeter | Original post link

The original poster’s TiKV node configuration is not high.

translator_bot · June 23, 2024, 10:52am

| username: Kongdom | Original post link

We have tried enabling the RAID card cache, and the write speed increased significantly after enabling it. However, if the RAID card does not have a battery, it is not recommended to enable it, as a power outage could result in data loss.
The Linux commands are as follows:

MegaCli64 -LDInfo -Lall -aALL # Check the current cache status
MegaCli64 -LDSetProp -WB -Lall -aAll # Enable cache with battery
MegaCli64 -LDSetProp CachedBadBBU -Lall -aALL # Enable cache without battery
MegaCli64 -LDSetProp NOCachedBadBBU -Lall -aALL # Disable cache without battery

PS: SSDs probably don’t need this; we are using mechanical hard drives.

translator_bot · June 23, 2024, 10:52am

| username: jansu-dev | Original post link

I don’t know if this can solve your problem, you can test it out.

translator_bot · June 23, 2024, 10:52am

| username: xiaohetao | Original post link

I think your CPU is relatively low. Is the disk an SSD?

Improving CPU data volume and disk I/O capability will speed up the writing process.

translator_bot · June 23, 2024, 10:52am

| username: GreenGuan | Original post link

Under the premise of unchanged resources, it is recommended to reduce the batch write volume (because it needs to be received by the majority, which itself will occupy CPU and network resources), while extending the overall import time.

translator_bot · June 23, 2024, 10:52am

| username: tidb狂热爱好者 | Original post link

This CPU is too poor.

translator_bot · June 23, 2024, 10:52am

| username: ngvf | Original post link

First, deploy according to the official deployment requirements, and then test whether the performance is as stated by the official documentation. Alternatively, you could first test the performance of a non-standard deployment cluster to see if it meets business needs. If inserting specific business data is slow, a detailed analysis will be required. Currently, you mentioned that TiKV write CPU usage is already very high, so you need to expand the CPU first.

translator_bot · June 23, 2024, 10:52am

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.