Tiup prune has no effect

translator_bot · June 23, 2024, 8:56am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tiup prune无效果

| username: wakaka

【TiDB Usage Environment】Production environment
【TiDB Version】5.0.6
【Encountered Problem】After scaling in a TiFlash node, it shows as tombstone, and tiup prune has no effect
【Reproduction Path】
tiup cluster scale-in NAME111 --node X.X.X.X:9000
tiup cluster prune NAME111
【Problem Phenomenon and Impact】
After pruning, the node status still shows as Tombstone. Checking the store in PD shows that the node is no longer there, but attempting to scale-out on this node still indicates the port is occupied (the node is still in the tiup edit-config configuration file and cannot be edited out)
【Attachments】Related logs and monitoring (https://metricstool.pingcap.com/)

If the question is related to performance optimization or fault troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output.

translator_bot · June 23, 2024, 8:56am

| username: wakaka | Original post link

scale-out Error: port conflict for ‘8234’ between ‘tiflash_servers:X.X.X.X.metrics_port’ and ‘tiflash_servers:X.X.X.X.metrics_port’

translator_bot · June 23, 2024, 8:56am

| username: wakaka | Original post link

After manually entering PD and executing store remove-tombstone, tiup prune prompts “Error: no store matching address ‘X.X.X.X:3930’ found”.

translator_bot · June 23, 2024, 8:56am

| username: xfworld | Original post link

I don’t know your operating environment and background, but generally, you need to resolve the replica issue first, and then consider scaling down.

You can refer to the official documentation for details:

If you need to force a scale-down, you can also refer to the above content for operation.

translator_bot · June 23, 2024, 8:56am

| username: hey-hoho | Original post link

Have you removed the TiFlash replicas first?

translator_bot · June 23, 2024, 8:56am

| username: wakaka | Original post link

There are a total of 14 TiFlash nodes, and the node that went down has been down for a long time. I checked that all the replicas are 2, which is less than the number of nodes.
The steps are as follows:

Scale-in
At this point, it becomes tombstone, then prune
Scale-out, stuck and failed
Query the node, still tombstone, try prune again, ineffective
Try scale-in again, prune still ineffective, status still tombstone
Enter PD, store remove-tombstone, now the status shows N/A

translator_bot · June 23, 2024, 8:56am

| username: wakaka | Original post link

There are 14 TiFlash nodes, and the downed node has been down for a long time. I checked that all replicas are 2, which is less than the number of nodes.
The steps are:

Scale-in
At this point, it becomes tombstone, then prune
Scale-out, stuck and failed
Query the node, still tombstone, try prune again, no effect
Try scale-in again, prune still no effect, status still tombstone
Enter PD, store remove-tombstone, now the status shows N/A

translator_bot · June 23, 2024, 8:56am

| username: hey-hoho | Original post link

Try manually scaling down using this document:

translator_bot · June 23, 2024, 8:56am

| username: xfworld | Original post link

It’s not just about having fewer replicas than the number of nodes; you need to look at which nodes the replica data is on. If the relationship is not resolved, there will still be issues.

translator_bot · June 23, 2024, 8:56am

| username: wakaka | Original post link

It crashed. I understand that the replicas will migrate, right? So what’s the best way to handle it now?

translator_bot · June 23, 2024, 8:56am

| username: wakaka | Original post link

It has already been deleted in PD and cannot be found in the view, but tiup cluster display still shows N/A status.

translator_bot · June 23, 2024, 8:56am

| username: wakaka | Original post link

Manually editing the config to remove that TiFlash node cannot be saved.

translator_bot · June 23, 2024, 8:56am

| username: xfworld | Original post link

Followed the steps in the image:

First, organize the rules.
Clean up as needed, then proceed with the manual deletion steps.
If the second step doesn’t work, I suggest turning off all synchronization rules, scaling down all TiFlash nodes, and then scaling them up again.

translator_bot · June 23, 2024, 8:56am

| username: wakaka | Original post link

He is saying that all TiFlash nodes are going offline, right? I have 14 TiFlash nodes here, and one of them is down. If all of them go offline and then come back online, the business might crash.

translator_bot · June 23, 2024, 8:56am

| username: hey-hoho | Original post link

You can check if the TiFlash node still has replicas like this. If not, you can use --force to shrink it forcibly.

 select p.region_id,p.peer_id,p.store_id,m.ADDRESS
 from 
 TIKV_REGION_STATUS s 
 join TIKV_REGION_PEERS p 
 on s.region_id=p.region_id 
 join TIKV_STORE_STATUS m on m.store_id=p.store_id 
 where  m.LABEL like '%tiflash%' and m.ADDRESS like '%tiflash_ip%'

translator_bot · June 23, 2024, 8:56am

| username: wakaka | Original post link

The TIKV_STORE_STATUS no longer contains tiflash, and the result is empty. After using --force to forcibly scale down, it is now resolved. Thank you!

translator_bot · June 23, 2024, 8:56am

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.