TiDB server memory and CPU usage are increasing, but overall write QPS is decreasing

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb server 内存和cpu 占用越来越高,但整体写 qps 越来越低

| username: 数据库菜鸡

[TiDB Usage Environment] Testing
[TiDB Version] v7.1.0
[Reproduction Path] Initially, data insertion is fast, and TiDB server memory and CPU usage are low. However, both CPU and memory usage gradually increase, while the write QPS decreases.
[Encountered Problem: Phenomenon and Impact]
Initially, data insertion is fast, and TiDB server memory and CPU usage are low. However, both CPU and memory usage gradually increase, while the write QPS decreases.
[Resource Configuration]
6 cores, 16GB
[Attachments: Screenshots/Logs/Monitoring]


| username: 数据库菜鸡 | Original post link

Please look at the data after 3 PM.

| username: 像风一样的男子 | Original post link

Are you performing a stress test with continuous data insertion?

| username: tidb狂热爱好者 | Original post link

Conduct another stress test.

| username: 数据库菜鸡 | Original post link

Yes, at 16:32, I reloaded the TiDB servers one by one. The client connects to the TiDB servers through HAProxy, and then the QPS showed a significant increase.

| username: 数据库菜鸡 | Original post link

16:32 Only reloaded the TiDB server, nothing else was changed.

| username: 像风一样的男子 | Original post link

If it’s a stress test, the waveform is very normal.

| username: 数据库菜鸡 | Original post link

I am conducting a stress test and feel that tidb-server might have a resource leak.

| username: 数据库菜鸡 | Original post link

Is this normal? The memory and CPU of the TiDB server keep increasing, but the QPS is decreasing. Is this normal?

| username: 数据库菜鸡 | Original post link

There is already a lot of data in it.

| username: 有猫万事足 | Original post link

If data is being imported continuously, it’s a bit strange that it’s less than 9 MiB in the end.

| username: 数据库菜鸡 | Original post link

“Look from 15:30, ignore the previous part.”

| username: 数据库菜鸡 | Original post link

Each record is quite small, around one to two hundred bytes per record.

| username: zhanggame1 | Original post link

I don’t think there’s any problem. Resource consumption should increase as the data volume grows.

| username: 有猫万事足 | Original post link

What you’re saying is that the CPU and memory usage keep increasing, but the disk I/O never reaches 9 MiB/s. This is definitely abnormal. Have you made any adjustments to address write hotspots on the table? If your CPU and memory usage keep increasing and the disk I/O is also increasing in sync, stabilizing at a level of at least 20 MiB/s, I think that would be quite normal. But right now, it’s clear that the disk write speed isn’t keeping up. Hover your mouse over it to see exactly what the write speed is; I feel it could be as low as 1-2 MiB/s. Even the worst disk shouldn’t be this low.

| username: 数据库菜鸡 | Original post link

Restarting the TiDB node restores it, but then it continues to degrade. We can’t keep restarting the TiDB server every now and then.

| username: 数据库菜鸡 | Original post link

There are three 1M ones, and another three are around seven to eight hundred K.

| username: 有猫万事足 | Original post link

Indeed, it’s too low. Look at your CPU, memory, and disk I/O together. The disk I/O changes the least. It’s definitely not normal.

| username: 数据库菜鸡 | Original post link

Postgres write test also only has 2000 QPS, the hard drive is a Samsung 980, it shouldn’t be this bad.

| username: Hacker007 | Original post link

It starts fast, possibly because it’s writing to the cache. Once the cache is full, it slows down. Writing to the disk can’t keep up with the speed of writing to the cache.