Sysbench Insert Trigger Flow Control

translator_bot · June 23, 2024, 6:05am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: sysbench insert触发流控

| username: TiDBer_pkQ5q1l0

[TiDB Usage Environment] v6.1.0
8 cores, 32GB RAM, Disk IOPS 15,000
[Overview] Scenario + Problem Overview
Using sysbench for stress testing, preparing stress test data (Test time: 2022-08-21 19:00~2022-08-22 07:00)
sysbench --config-file=sysbench.cfg oltp_point_select --tables=32 --table-size=10000000 prepare
In 12 hours, less than 200 million data was written, and the writing speed became slower and slower. There are no locks observed in tikv.log.
[Phenomenon] Business and Database Phenomenon

[Problem] Current Issues Encountered
Slow write speed
[TiDB Version]
v6.1.0

[Attachments] Relevant Logs and Configuration Information

TiUP Cluster Display Information

image1380×432 128 KB
TiUP Cluster Edit Config Information

image745×843 55.7 KB

translator_bot · June 23, 2024, 6:05am

| username: cheng | Original post link

Append data to the memory memtable.
The write_buffer_size controls the size of the data stored in the memtable. Once this threshold is exceeded, the memtable will transfer the data.
After the memtable is full, it will transfer the data to the immutable memtable. When there is one immutable memtable, it will start flushing data to the disk.
When data is written too quickly, if there are more than 5 immutable memtables, the LSM will perform flow control (write stall) on the memtable.

TiKV itself has a flow control mechanism.

translator_bot · June 23, 2024, 6:05am

| username: TiDBer_pkQ5q1l0 | Original post link

Are there any optimization methods?

translator_bot · June 23, 2024, 6:05am

| username: h5n1 | Original post link

If the scheduler latch delay is high, you can check the scheduler thread utilization in thread CPU and consider increasing the number of threads. Additionally, you can consider increasing the scheduler-concurrency parameter.
For flow control reasons, you can check the write stall reason in the TiKV details to see which layer is causing it.

translator_bot · June 23, 2024, 6:05am

| username: TiDBer_pkQ5q1l0 | Original post link

The image is not visible. Please provide the text you need translated.

translator_bot · June 23, 2024, 6:05am

| username: TiDBer_pkQ5q1l0 | Original post link

When idle, the scheduler work CPU usage is not high between 20:00 and 7:00.

translator_bot · June 23, 2024, 6:05am

| username: alfred | Original post link

It looks like the IO has reached its bottleneck.

translator_bot · June 23, 2024, 6:05am

| username: TiDBer_pkQ5q1l0 | Original post link

The IO doesn’t seem to have a bottleneck, there is flow control, and write stall is normal. What exactly triggered the flow control? Is TiDB really this fragile?

translator_bot · June 23, 2024, 6:05am

| username: h5n1 | Original post link

https://metricstool.pingcap.com/#backup-with-dev-tools Click here to export the monitoring data for overview, pd, tidb, and tikv. Make sure to expand all panels and wait for the data to load completely before exporting.

translator_bot · June 23, 2024, 6:05am

| username: alfred | Original post link

These few images show that there is no disk bottleneck.

translator_bot · June 23, 2024, 6:05am

| username: TiDBer_pkQ5q1l0 | Original post link

Boss, please help analyze this!

translator_bot · June 23, 2024, 6:05am

| username: wuxiangdong | Original post link

Try increasing these two settings:
set config tikv rocksdb.max-background-jobs=
set config tikv rocksdb.max-sub-compactions=

translator_bot · June 23, 2024, 6:05am

| username: h5n1 | Original post link

Less than 200 million data entries were written in 12 hours. How was this calculated?
How is it written in sysbench.cfg, and how much concurrency was used? From the TiDB connection count, there are only 10 connections.

image989×410 20.4 KB
From the IO perspective, there was a significant amount of IO before 21:00, which should be the data writing period. After that, it is the indexing process. Up to the current version, all indexing operations in TiDB are executed serially. From the monitoring, the longer period afterward should be the indexing process. The indexing process can currently be sped up by adjusting the following two parameters: tidb_ddl_reorg_batch_size and tidb_ddl_reorg_worker_cnt. Besides using high-performance disks, there are no other methods at the moment. You can use admin show ddl jobs to check the indexing time.

image1030×389 53.8 KB

image1380×328 37.7 KB

translator_bot · June 23, 2024, 6:05am

| username: TiDBer_pkQ5q1l0 | Original post link

The preparation process involves creating tables, loading data, and adding indexes. The sysbench is completed, and it should be with 8 threads. The prepare execution started around 7 PM on the 21st and was checked at 7 AM the next day. Data was written up to the 18th table (each table with 1KW data). Tables sbtest1-8 were written in one batch, and sbtest9-16 in another batch. The first two batches were already completed, so the estimated data write volume is between 160 million to 200 million.

translator_bot · June 23, 2024, 6:05am

| username: TiDBer_pkQ5q1l0 | Original post link

The main issue is not understanding why flow control is triggered.

translator_bot · June 23, 2024, 6:05am

| username: h5n1 | Original post link

After the execution is completed, check admin show ddl jobs. From the monitoring, there is no write stall.

translator_bot · June 23, 2024, 6:05am

| username: xiaohetao | Original post link

Does the table have an index?

translator_bot · June 23, 2024, 6:05am

| username: xiaohetao | Original post link

Is KV using SSD disks?

translator_bot · June 23, 2024, 6:05am

| username: TiDBer_pkQ5q1l0 | Original post link

During the process of creating tables, building indexes, and writing data, it is generally recommended to insert the data first and then create the indexes.

translator_bot · June 23, 2024, 6:05am

| username: TiDBer_pkQ5q1l0 | Original post link

The DDL jobs for sbtest