How to Optimize TiDB TPS When It Cannot Be Increased

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb TPS无法提高 如何调优,

| username: 初来乍到

[TiDB Usage Environment] Production Environment
[TiDB Version]
Host Configuration: 32C64G, three machines with SSD hard drives
Cluster information as follows:

[Encountered Issue: Problem Phenomenon and Impact]
During the process of using DM to synchronize data from MySQL and RDS to TiDB, the data writing speed to TiDB is too slow.

  1. DM Information: DM status is sync

  2. TiDB Information


[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
From the monitoring information, it can be seen that the DM transaction/DDL query execution latency is too high.
From the TiDB monitoring information, the TPS has not been able to increase, and many pending synchronization data are stuck in the sync process, causing the delay to continue increasing, with no sign of catching up with the data.
[Attachments: Screenshots/Logs/Monitoring]


Regardless of whether DM is started or not, this metric has not been able to improve.

| username: Billmay表妹 | Original post link

How is the network situation?

| username: tidb狂热爱好者 | Original post link

Please consider the points I mentioned. For cross-node migration, I suggest exporting the full data using tiup dumping and importing it with tiup lighting. For subsequent incremental data, synchronize it daily.

| username: TiDBer_aKu9dgpb | Original post link

You can check the logs and system tables to see if there are any locks.

| username: TiDBer_Lee | Original post link

My understanding is that it can be seen as MySQL’s master-slave replication. If the master database has a high write traffic, the slave database will experience significant latency. You can consider first looking into DM tuning, and if that doesn’t work, you can consider creating multiple synchronization tasks, with each task synchronizing a few tables, and isolating the high-traffic large tables.

| username: tidb菜鸟一只 | Original post link

If the data volume is larger than 1T, it is not recommended to use DM’s full + incremental mode. It is recommended to use Dumpling + Lightning for the full data transfer first, and then use DM for the incremental data transfer.

| username: 初来乍到 | Original post link

I reduced one TiDB instance. Now there is one TiKV instance without a TiDB node. Will this show any effect? From my observation, there is no improvement in DM synchronization, and there is still a lot of delay.

| username: h5n1 | Original post link

  1. Are the 3 TiDB instances using load balancing?
  2. Check the dashboard for slow SQL queries.
| username: zhanggame1 | Original post link

Check the dashboard for SQL execution latency, the load on each component, and IO pressure.

| username: zhanggame1 | Original post link

Looking at the TiDB load, it’s not high. You should check other aspects, including the network layer.

| username: wangkk2024 | Original post link

It should be the network layer.

| username: dba远航 | Original post link

Network, IO, and CPU can all affect TPS. Check which one is the bottleneck.

| username: zhang_2023 | Original post link

It looks like network communication is dragging down performance.