The tidb-server node inexplicably has tikv data

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb-server节点也莫名有了tikv的数据

| username: 扬仔_tidb

【TiDB Usage Environment】Production Environment
【TiDB Version】v5.4.0
【Reproduction Path】Operations performed that led to the issue
【Encountered Issue: Issue Phenomenon and Impact】


  1. This cluster was left by a former colleague. Today, I received an alert for the tidb-server (IP ending in: 175)

    The /data data disk of this machine is alarming. Upon inspection, I found these files in the /data/tidb_data/tikv-20160 directory of this tidb-server, which should typically belong to tikv.
    At the same time, I checked another tidb-server in this cluster and found no /data/tidb_data/tikv-20160 directory.
  2. By using tiup cluster edit-config xxxx, I checked that this tidb-server only has the following four roles, and no tikv role:

Temporary Solution: Expand the /data of the tidb-1 machine

Help Needed:

Could the experts in the group please advise what might be the possible reason for this?

| username: zhanggame1 | Original post link

ps -ef | grep tikv to see if there are any tikv processes.

| username: 扬仔_tidb | Original post link

I didn’t find the tikv process, and the 20160 listening port is not open.

| username: Kongdom | Original post link

One possibility is that there are TiKV nodes from other clusters on this server.

If it’s just a disk mount, wouldn’t it be impossible to find the TiKV process?

| username: tidb菜鸟一只 | Original post link

When you execute tiup cluster list on the control machine, is there only one cluster?

| username: 像风一样的男子 | Original post link

You can use lsof filename to see which process is occupying the file.

| username: h5n1 | Original post link

You must have done a scale-in --force at the same time, right? Check with pd-ctl store or Information_schema.tikv_store_status.

| username: Kongdom | Original post link

That’s right, it could be the reason. At first, I didn’t carefully review the question and thought that this file was continuously increasing, leading to insufficient space. :joy:

| username: 扬仔_tidb | Original post link

The image is not visible. Please provide the text you need translated.

| username: 扬仔_tidb | Original post link

The colleague who set up this cluster did perform a scale-in operation on the tidb-server back then, and that machine has already been taken offline.
Information_schema.tikv_store_status only shows the currently normal 3 tikv nodes.
The dashboard also shows the normal 3 tikv nodes.

lsof information

raft-stre 26482 26655 1002 3w REG 253,16 148301388 1311533 /data/tidb_data/tikv-20160/
ps -ef|grep 26482 is empty.

| username: 扬仔_tidb | Original post link

Further investigation revealed that this tidb-server service starts a tikv process approximately every 5 seconds, but the pd in the process points to another cluster. I suspect that the colleague who initially set it up might have made a mistake in the configuration file, causing the IP of this tidb-server to remain in the tikv role of another cluster.

If my suspicion is correct, where can I find this residual IP, or how can I decommission this abnormal TIKV?

| username: MrSylar | Original post link

Delete tikv-20160.service under /etc/systemd/system.

| username: Kongdom | Original post link

There should be a configuration file in the tidb_deploy directory, which will record the IP.

| username: 扬仔_tidb | Original post link

I’ll try removing it. I found this cluster to be quite complex. The machines were migrated from another cloud back in the day. I might have made a mistake, I guess. I want to find the configuration of this erroneous tidb-server machine in another cluster. Otherwise, where is the call coming from?

| username: MrSylar | Original post link

Are you referring to automatic restart? This is a mechanism of systemctl.