Writing 12 million records every ten minutes causes a write bottleneck in the cluster, with the Storage async write duration under TiKV Details showing excessively long asynchronous write times

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 每十分钟写入1200万数据,集群出现写瓶颈,TiKV Details 下 Storage 的 Storage async write duration 异步写所花费的时间超长

| username: fengchao723

[TiDB Usage Environment] Production environment, 24 nodes, a total of 500T storage, 5% used. 100,000 regions. CPU 64vcore. Memory 512G. 4 SSDs corresponding to 4 TiKV instances.

[Reproduction Path] Writing 12 million data every ten minutes, currently experiencing very high cluster latency during each write, and occasionally encountering “table not found” errors when querying tables. PD switches intermittently. Used as a real-time engine, mainly for real-time writes, with very few direct read requests, primarily using TiSpark for analysis. Data is preprocessed before writing, with no lock conflicts. Mainly looking to optimize apply-pool-size for improvement. Currently set to the default value of 2.

[Encountered Issues: Symptoms and Impact] TiKV Details under Raft propose’s Apply wait duration indicates the waiting time for apply, currently around 15 seconds. TiKV Details under Storage’s Storage async write duration indicates the time taken for asynchronous writes, currently around 15 seconds.

[Attachments: Screenshots/Logs/Monitoring]


image
image
image

| username: Fly-bird | Original post link

How is the IO utilization of the TiDB cluster?

| username: zhanggame1 | Original post link

What version is the database?

| username: zhanggame1 | Original post link

Check Grafana’s Performance-Overview, then monitor the system’s CPU, I/O, memory, etc. Also, check if there are any issues in the logs.

| username: 像风一样的男子 | Original post link

Check out this troubleshooting process:

| username: fengchao723 | Original post link

There is a label above. V5.2.1

| username: fengchao723 | Original post link

Sorry, I can’t translate the content from the image. Please provide the text you need translated.

| username: fengchao723 | Original post link

We increased the storage.scheduler-worker-pool-size to 32, so the CPU usage is a bit high.

| username: fengchao723 | Original post link

| username: oceanzhang | Original post link

High CPU usage does not necessarily mean that the CPU is truly insufficient.

| username: oceanzhang | Original post link

Another possibility is that the CPU is also waiting for certain resources.

| username: oceanzhang | Original post link

From experience, high CPU usage like this is usually due to waiting for I/O.

| username: TiDBer_小阿飞 | Original post link

How is the cluster set up? Is it a mixed deployment? Share it for us to see.

| username: 路在何chu | Original post link

Your latency is so high, it’s reaching the second level. How many regions does each TiKV have?

| username: fengchao723 | Original post link

It’s not a mixed deployment. The data nodes are all SSDs, with 24 nodes in total. Each node has 4 bare SSDs, corresponding to 4 TiKV instances. So, the total number of TiKV instances is 24*4 = 96. TiFlash is not used. There are 6 TiDB servers and 5 PDs deployed separately on 6 management nodes.

| username: fengchao723 | Original post link

I see that there are indeed hotspots when writing to TiKV. We can solve this by pre-allocating regions.

| username: heiwandou | Original post link

It is very likely a write hotspot.

| username: TiDBer_小阿飞 | Original post link

  • TiKV_write_stall
  • TiKV_raft_log_lag
    Are these two graphs normal?
| username: fengchao723 | Original post link

The Apply wait duration of Raft propose is relatively high, taking more than 10 seconds. The commit log and append log are relatively normal. The write_stall is normal, all being 0.

| username: 像风一样的男子 | Original post link

Did you use auto_rand when creating the table?