After Expansion, TiKV Remains in Offline State

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 扩容后tikv一直处于offline状态

| username: 月明星稀

【TiDB Usage Environment】Production Environment
【TiDB Version】
【Reproduction Path】Execute scaling using tiup, command
【Encountered Problem: Problem Phenomenon and Impact】
tikv remains in offline status, logs are as follows:
[2023/11/09 00:46:14.608 +08:00] [FATAL] [server.rs:1099] [“failed to start node: Other("[components/pd_client/src/util.rs:878]: duplicated store address: id:1005 address:\"1.1.1.210:20160\" version:\"6.5.0\" peer_address:\"1.1.1.210:20160\" status_address:\"1.1.1.210:20180\" git_hash:\"47b81680f75adc4b7200480cea5dbe46ae07c4b5\" start_timestamp:1699461974 deploy_path:\"/usr/local/tikvtest/tikv-20160/bin\" , already registered by id:7 address:\"1.1.1.210:20160\" state:Offline version:\"6.5.0\" peer_address:\"1.1.1.210:20160\" status_address:\"1.1.1.210:20180\" git_hash:\"47b81680f75adc4b7200480cea5dbe46ae07c4b5\" start_timestamp:1695720782 deploy_path:\"/usr/local/tikvtest/tikv-20160/bin\" last_heartbeat:1699438910562858261 node_state:Removing ")”]

Before scaling, I confirmed that the deployment path and data storage directory do not exist, so there should be no conflict issues. Please, experts, help to check the reason.

| username: 像风一样的男子 | Original post link

Enter pd-ctl to check all the stores to see if there are two instances of 1.1.1.210:20160. It is highly likely that one of the KV nodes has not been fully decommissioned.

| username: Kongdom | Original post link

The address is duplicated, 1.1.1.210:20160 has already been configured as store id 7

duplicated store address

| username: Jolyne | Original post link

You can use pd-ctl to check the status of the store: pd-ctl -u http://pd-ip:2379 store.

| username: Kongdom | Original post link

I suspect that you might have expanded the capacity repeatedly?

duplicated store address: (duplicate address below)
id:1005 address:“1.1.1.210:20160” version:“6.5.0” deploy_path:“/usr/local/tikvtest/tikv-20160/bin”,
already registered by (address already registered below)
id:7 address:“1.1.1.210:20160” state:Offline version:“6.5.0” deploy_path:“/usr/local/tikvtest/tikv-20160/bin”

| username: 月明星稀 | Original post link

Use /pd/api/v1/stores to check, indeed there was already one. Could it be because I used --force when scaling down?

| username: 像风一样的男子 | Original post link

Yes, you can refer to the way I handled it before:

| username: 像风一样的男子 | Original post link

Or you can change a KV port to start it.

| username: Fly-bird | Original post link

Multiple expansions on the same IP?

| username: oceanzhang | Original post link

Did you configure it incorrectly?

| username: heiwandou | Original post link

Check the configuration file.

| username: andone | Original post link

It feels like the configuration file is written incorrectly.

| username: hey-hoho | Original post link

The previous TiKV with store ID 20160 hasn’t been completely scaled down yet, and you expanded it again, causing the store ID to be duplicated.

| username: Kongdom | Original post link

Yes, I feel like this is the issue :thinking:

| username: oceanzhang | Original post link

The address is duplicated.

| username: swino | Original post link

Check the configuration file.