TiDB Frequent Interruptions and Log Alerts: What Could Be the Issue?

translator_bot · June 22, 2024, 10:27am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb频繁中断，日志告警，请问什么问题

| username: wenyi

[2023/04/21 14:34:25.069 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=1.860438519s]
[2023/04/21 14:34:25.207 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=1.309973486s]
[2023/04/21 14:34:25.207 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=1.289187957s]
[2023/04/21 14:34:25.207 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=1.216492095s]
[2023/04/21 14:34:25.376 +08:00] [INFO] [coprocessor.go:1109] [“[TIME_COP_WAIT] resp_time:349.669223ms txnStartTS:440941620226686978 region_id:377911 store_addr:10.201.14.5:20160 kv_process_ms:0 kv_wait_ms:0 kv_read_ms:0 processed_versions:29 total_versions:86 rocksdb_delete_skipped_count:0 rocksdb_key_skipped_count:85 rocksdb_cache_hit_count:17 rocksdb_read_count:1 rocksdb_read_byte:16303”]
[2023/04/21 14:34:25.411 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=203.569255ms]
[2023/04/21 14:34:25.873 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=235.976268ms]
[2023/04/21 14:34:25.873 +08:00] [INFO] [2pc.go:1162] [“send TxnHeartBeat”] [startTS=440941553969790984] [newTTL=345500]
[2023/04/21 14:34:26.326 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=190.976232ms]
[2023/04/21 14:34:26.326 +08:00] [INFO] [2pc.go:1162] [“send TxnHeartBeat”] [startTS=440941553039179784] [newTTL=349500]

[2023/04/21 14:38:55.662 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457843] [“mutations count”=205371]
[2023/04/21 14:38:55.664 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457839] [“mutations count”=166580]
[2023/04/21 14:38:55.667 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457835] [“mutations count”=122896]
[2023/04/21 14:38:55.675 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457835] [“mutations count”=115353]
[2023/04/21 14:38:55.689 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457847] [“mutations count”=184994]
[2023/04/21 14:38:55.713 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457831] [“mutations count”=274089]
[2023/04/21 14:38:55.746 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457851] [“mutations count”=197124]
[2023/04/21 14:38:55.772 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=456779] [“mutations count”=223606]
[2023/04/21 14:38:55.800 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457839] [“mutations count”=168292]
[2023/04/21 14:38:55.821 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457835] [“mutations count”=125067]
[2023/04/21 14:39:04.426 +08:00] [INFO] [2pc.go:1162] [“send TxnHeartBeat”] [startTS=440941705161342984] [newTTL=47400]
[2023/04/21 14:39:04.586 +08:00] [INFO] [2pc.go:1162] [“send TxnHeartBeat”] [startTS=440941705187557384] [newTTL=47500]
[2023/04/21 14:39:04.923 +08:00] [INFO] [2pc.go:1162] [“send TxnHeartBeat”] [startTS=440941705266462729] [newTTL=47499]
[2023/04/21 14:39:05.257 +08:00] [INFO] [2pc.go:1162] [“send TxnHeartBeat”] [startTS=440941705292414984] [newTTL=47750]
[2023/04/21 14:39:05.425 +08:00] [INFO] [2pc.go:1162] [“send TxnHeartBeat”] [startTS=440941705318629385] [newTTL=47800]

translator_bot · June 22, 2024, 10:27am

| username: 裤衩儿飞上天 | Original post link

Check the machine load of TiDB and PD, the IO of PD, and the network condition between TiDB and PD.

translator_bot · June 22, 2024, 10:27am

| username: Jellybean | Original post link

Are you referring to the TiDB server instance frequently restarting, or is there an issue with the executed SQL?

translator_bot · June 22, 2024, 10:27am

| username: TiDBer_pkQ5q1l0 | Original post link

Check the resource load of the entire cluster to see if the frequent interruptions are due to TIDB experiencing OOM (Out of Memory) issues.

translator_bot · June 22, 2024, 10:27am

| username: wenyi | Original post link

There is an out-of-memory error at the system level.

translator_bot · June 22, 2024, 10:27am

| username: TiDBer_pkQ5q1l0 | Original post link

Then it is necessary to check if slow SQL is causing the OOM.

translator_bot · June 22, 2024, 10:27am

| username: TiDBer_pkQ5q1l0 | Original post link

Frequent OOM (Out of Memory) issues are often caused by certain large SQL queries.

translator_bot · June 22, 2024, 10:27am

| username: wenyi | Original post link

I am loading a CSV to migrate data into TiDB.

translator_bot · June 22, 2024, 10:27am

| username: TiDBer_pkQ5q1l0 | Original post link

Will it interrupt when not loading?

translator_bot · June 22, 2024, 10:27am

| username: TiDBer_pkQ5q1l0 | Original post link

You can observe the memory growth during the load process.

translator_bot · June 22, 2024, 10:27am

| username: wenyi | Original post link

If you don’t load data, there definitely won’t be any interruptions.

translator_bot · June 22, 2024, 10:27am

| username: TiDBer_pkQ5q1l0 | Original post link

Isn’t the memory of the tidb-server too small? How about splitting the CSV into multiple files for import?

translator_bot · June 22, 2024, 10:27am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.