TiDB Frequently Experiences Full Memory and CPU Usage

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB经常出现内存和CUP跑满的情况

| username: TiDBer_RSav8Cg3

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed that led to the issue
[Encountered Issue: Problem Phenomenon and Impact]
While looking at slow queries, there is a database with about 4000 tables. I am not sure if this will affect server performance.
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]



| username: TiDBer_QYr0vohO | Original post link

It might be a disk issue. You can check the disk monitoring.

| username: 像风一样的男子 | Original post link

Sort the slow queries by total execution time and take a look.

| username: TiDBer_RSav8Cg3 | Original post link

By total execution time

| username: TiDBer_RSav8Cg3 | Original post link

There is no problem with the disk. It’s just that the CPU and memory get fully utilized all of a sudden.

| username: 像风一样的男子 | Original post link

It’s been over ten minutes for a single SQL query, and all the SQL queries are piling up there. Check which one is stuck.

| username: TiDBer_RSav8Cg3 | Original post link

How do you analyze the execution plan? I also don’t know how to optimize these slow SQL queries.

| username: 像风一样的男子 | Original post link

Is there a DBA? If not, find whoever wrote the code.

| username: TiDBer_RSav8Cg3 | Original post link

I am a MySQL DBA, but the execution plan is different from TiDB’s.

| username: tidb菜鸟一只 | Original post link

How many TiDB instances do you have, and are they all experiencing this issue?

| username: TiDBer_RSav8Cg3 | Original post link

This situation has only occurred in this cluster; it hasn’t happened in other clusters. I noticed that the execution times of the slow SQL queries are quite similar. I’m not sure if it’s a resource consumption issue or if there’s a lock wait occurring.

| username: tidb菜鸟一只 | Original post link

How does your business side connect to multiple TiDB nodes? Through HAProxy? It feels like there are too many loads on this node, overwhelming it.

| username: TiDBer_RSav8Cg3 | Original post link

No, it’s just 1 TiDB, 3 PDs, 3 TiKVs, and Grafana. All nodes are deployed in a mixed manner.

| username: tidb菜鸟一只 | Original post link

Can you show your topology? Or are all the nodes placed on a single host?

| username: TiDBer_RSav8Cg3 | Original post link

They are all 16C32G.

| username: h5n1 | Original post link

Post the monitoring overview → TiDB, TiKV detail → thread CPU

| username: TiDBer_RSav8Cg3 | Original post link

Is it possible that tidb-server is occupying the memory? It hasn’t even executed any SQL yet, and it’s already using 85% of the memory.

| username: redgame | Original post link

For the situation where there are a large number of tables in the database, theoretically it will not directly affect server performance. Optimize that SQL query.

| username: tidb菜鸟一只 | Original post link

You can deploy TiDB and PD on one machine, but deploying them on the same machine as TiKV will cause resource contention. Have you done NUMA resource isolation? If not, run SHOW config WHERE NAME LIKE '%storage.block-cache.capacity%'; to check the memory configuration parameters of TiKV. It is possible that TiKV is using too much memory, squeezing out TiDB resources. Additionally, it is recommended to have at least two TiDB instances. If one encounters an issue, the other can be used. It is also best to configure HAProxy for load balancing.

| username: TiDBer_RSav8Cg3 | Original post link