Performance Testing TiDB Cluster with Sysbench Results in "FATAL: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'INSERT INTO sbtest9(k, c, pad) VALUES(...” Error

translator_bot · June 21, 2024, 2:22pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 使用sysbench对TiDB集群进行压测，报“FATAL: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'INSERT INTO sbtest9(k, c, pad) VALUES(…”错误

| username: TiDBer_X3DgmgrB

[TiDB Usage Environment] Test Environment

PD 3 nodes: 16C32G
TiDB 3 nodes: 16C32G
TiFlash 3 nodes: 16C32G
TiKV 20 nodes: 16C32G

[TiDB Version] v6.1.0

[Reproduction Path] Perform a stress test on the TiDB database using sysbench

sysbench --db-driver=mysql --time=300 --threads=100 --report-interval=1 --mysql-host= --mysql-port=4000 --mysql-user= --mysql-password= --mysql-db=test --tables=10 --table_size=1000000 oltp_read_write prepare

[Encountered Issue: Phenomenon and Impact]

Background: To evaluate cluster performance, a stress test was conducted on the TiDB cluster using sysbench, generating 10 tables with 100 million rows of data each.
Phenomenon: An error occurred during the prepare phase, FATAL: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'INSERT INTO sbtest9(k, c, pad) VALUES(50300166, '16357275439-41970985209-34833281730-07150732211-32256237037-94842996031-08714086735-83899234046-58786808990-64628079874', '79244697413-69968263748-31322533223-94195053462-84736177096') (...) FATAL: 'sysbench.cmdline.call_command' function failed: /usr/share/sysbench/oltp_common.lua:230: db_bulk_insert_next() failed
The prepare phase was then interrupted.
Investigation Results:
- Using select count(*) from test.sbtest<id> to query each test table at the time revealed that the number of inserted rows was around 4.5 million to 5 million, less than half of the expected number of rows.
- Using show variables like '%timeout' yielded the following results:

translator_bot · June 21, 2024, 2:22pm

| username: WalterWj | Original post link

Have you implemented resource isolation? Don’t let OOM happen.

translator_bot · June 21, 2024, 2:22pm

| username: 像风一样的男子 | Original post link

Lost connection, did TiDB restart?

translator_bot · June 21, 2024, 2:22pm

| username: TiDBer_X3DgmgrB | Original post link

Apart from the operating system itself, the server is only running TiDB-related nodes, with each node being a separate virtual machine. No OOM-related errors have been observed, and the cluster is not executing any tasks other than OLTP tasks. How can I confirm if an OOM has occurred?

translator_bot · June 21, 2024, 2:22pm

| username: TiDBer_X3DgmgrB | Original post link

TiDB did not restart. I can connect and query normally using Navicat, but the sysbench prepare task was interrupted.

translator_bot · June 21, 2024, 2:22pm

| username: 有猫万事足 | Original post link

Did you implement load balancing? Or did you only access one TiDB node?

Halfway through data preparation, the server got disconnected. If it’s not due to the “too many open files” limit, then the TiDB server might have crashed. Check the corresponding logs.

translator_bot · June 21, 2024, 2:22pm

| username: tidb菜鸟一只 | Original post link

Check if the TiDB server has restarted due to an OOM (Out of Memory) issue by using the command tiup cluster display <cluster-name>.