Error Occurred During Sysbench Write Test

translator_bot · June 22, 2024, 4:53pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Sysbench写测试时报错

| username: Leox

After setting up the cluster and using Sysbench for testing, the read test (point_select) can proceed normally, but the write test (index_update) encounters an error.

It seems like there is a consistency issue. I would like to ask the experts how to solve this problem.

Hardware configuration: 3 machines: ARM processor (2×40c 2.5GHz) Memory: 32GB×8 Hard disk: 500G SSD / dapustor 3.2T

Cluster Topology	10.10.12.6	10.10.12.78	10.10.12.9
numa 0	TiKV, PD	TiKV, PD	TiKV, PD
numa 1	TiDB	TiDB	Haproxy, Sysbench

TiDB configuration items:

server_configs:
  tidb:
    log.level: error
    mem-quota-query: 34359738368
    performance.server-memory-quota: 34359738368
    performance.txn-total-size-limit: 10485760000
    prepared-plan-cache.enabled: true
    token-limit: 3001
  tikv:
    coprocessor.split-region-on-table: false
    log-level: error
    raftdb.max-background-jobs: 12
    raftstore.apply-max-batch-size: 1024
    raftstore.apply-pool-size: 8
    raftstore.hibernate-regions: true
    raftstore.raft-max-inflight-msgs: 1024
    raftstore.store-max-batch-size: 1024
    raftstore.store-pool-size: 4
    rocksdb.compaction-readahead-size: 2MB
    rocksdb.defaultcf.max-write-buffer-number: 32
    rocksdb.writecf.max-write-buffer-number: 32
    server.grpc-concurrency: 8
    server.max-grpc-send-msg-len: 5242880
    storage.block-cache.capacity: 64G
    storage.scheduler-worker-pool-size: 8

translator_bot · June 22, 2024, 4:53pm

| username: ffeenn | Original post link

Take a look at the KV logs and TiDB logs. It’s clear that TiDB has stalled. Try reducing the data size. Start with 1 million and see if that works.

translator_bot · June 22, 2024, 4:53pm

| username: Leox | Original post link

100w indeed did not report an error, but there is such a fluctuation every 200s, which also seems abnormal.

translator_bot · June 22, 2024, 4:53pm

| username: WalterWj | Original post link

Check the pending monitoring of TiKV.

translator_bot · June 22, 2024, 4:53pm

| username: Leox | Original post link

It does seem to have a problem.

translator_bot · June 22, 2024, 4:53pm

| username: ffeenn | Original post link

Switch to another machine to execute sysbench, and take a screenshot of the tidb Statement OPS in the Overview panel and the sysinfo.

translator_bot · June 22, 2024, 4:53pm

| username: Leox | Original post link

Applying pressure from machines outside the cluster

Sysinfo:

tidb Statement OPS:

translator_bot · June 22, 2024, 4:53pm

| username: ffeenn | Original post link

Your IO disk busy rate is already very high, so fluctuations and delays are inevitable. What is the IOPS of your SSD test? It seems like the disk has reached its bottleneck. Try reducing the number of threads and the amount of data, and test to find the final bottleneck.

translator_bot · June 22, 2024, 4:53pm

| username: Leox | Original post link

J5310 3.2T is the nominal value, our actual test results are slightly better than those in the table.

translator_bot · June 22, 2024, 4:53pm

| username: ffeenn | Original post link

For production use, it is recommended to separate TiKV. This disk is average and not as good as those from cloud providers. Separating it can maximize its advantages.

translator_bot · June 22, 2024, 4:53pm

| username: Leox | Original post link

Got it, understood! Thank you!

translator_bot · June 22, 2024, 4:53pm

| username: Minorli-PingCAP | Original post link

The sysbench load has exceeded the hardware’s capacity.