Brief Downtime

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 短暂宕机

| username: jboracle1981

【TiDB Usage Environment】Test Environment / Test / Poc
【TiDB Version】v7.1.1
【Reproduction Path】Operations performed that led to the issue
【Encountered Issue: Symptoms and Impact】
Brief downtime, client unable to connect. Numerous tiflash error logs.
【Resource Configuration】Navigate to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
【Attachments: Screenshots/Logs/Monitoring】











| username: jboracle1981 | Original post link

Could you please help analyze the cause?

| username: 托马斯滑板鞋 | Original post link

Two TiKV, three TiDB, one TiFlash? What should be the machine configuration?

| username: jboracle1981 | Original post link

3 virtual machines, each with 8 cores and 32GB of RAM.

| username: 像风一样的男子 | Original post link

How many replicas are there for 2 KVs? There is no high availability. If one KV fluctuates, the entire cluster becomes unavailable.

| username: jboracle1981 | Original post link

There is only one TiFlash.

| username: 托马斯滑板鞋 | Original post link

I have only tried deploying 1 TiKV and 3 TiKV with fewer than 3 TiKV; I haven’t tried 2 TiKV to see if it’s stable. It seems to be an issue with the 114 host? You can try evicting the TiDB from 114 and then check the host logs.

| username: jboracle1981 | Original post link

The issue is on 114, where tiflash is installed. Should we kill the tidb on 114? Or should we add another tiflash node?

| username: 托马斯滑板鞋 | Original post link

From the screenshot, it is clear that your TiFlash CPU bottleneck is very obvious. What I mean is to remove the TiDB server on 114 and keep only one TiFlash;
If you test again, you can use dstat 1 to check the host’s load;
Also, I didn’t see any crash restart records in the TiFlash logs. Could you send them if it’s convenient?

| username: 托马斯滑板鞋 | Original post link

:joy: From your screenshot, it looks like TiFlash is constantly restarting.

| username: jboracle1981 | Original post link

:joy: Embarrassed you

| username: jboracle1981 | Original post link

I understood it as I manually restarted it :joy:

| username: jboracle1981 | Original post link

The log is over 900MB, which is a bit large.

| username: 托马斯滑板鞋 | Original post link

Cut it and see if there are any keywords like start / out of / shutdown / abort inside.

| username: jboracle1981 | Original post link

Okay :grinning:

| username: jboracle1981 | Original post link

start(upload://x3TmT4BBWQaONulkxmcQgcJK0Ji.rar) (13.7 MB)

| username: jboracle1981 | Original post link

Thanks a lot :handshake:

| username: 托马斯滑板鞋 | Original post link

I can’t see any issues, except for the log truncation and TiFlash startup records :upside_down_face:
Did you only connect one TiDB server? If so, could you send the TiDB server log and PD log for the problematic period?

| username: jboracle1981 | Original post link

I will check the configuration file of the program.

| username: tidb菜鸟一只 | Original post link

Can I see the resource topology diagram?