Remove ticdc issue

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 移除ticdc问题

| username: TiDBer_Y2d2kiJh

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v5.4.0 2tidb 3pd 3kv 1ticdc
[Reproduction Path] The data directory of the ticdc server was damaged, and it is estimated that the data will need to be deleted to restore the storage mount. Currently, the status of this ticdc is down. The question now is whether I can still use the scale-in method to uninstall ticdc? Or is there a better way to handle this ticdc component?
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration]
[Attachment: Screenshot/Log/Monitoring]

| username: songxuecheng | Original post link

A single CDC can only scale down and then scale up again. You need to check your own data recovery. If the downstream can tolerate duplicates, you can start at the specified time.

| username: TiDBer_嘎嘣脆 | Original post link

If two clusters are synchronizing CDC, you can deploy CDC to another cluster. The CDC that is down in cluster A is already marked as down in the PD information, so registering a new CDC from cluster B to the PD of cluster A will not be affected.

| username: ffeenn | Original post link

Force scale down, then scale up again, the fastest method.

| username: TiDBer_Y2d2kiJh | Original post link

The downstream has not started synchronizing data yet.

| username: songxuecheng | Original post link

Scale down and then scale up.

| username: zhanggame1 | Original post link

Of course, reducing the capacity is the simplest.

| username: redgame | Original post link

You can scale down and then scale up.

| username: TiDBer_Y2d2kiJh | Original post link

Since I haven’t done it before, I’m not sure if it will be successful in this situation.

| username: zhanggame1 | Original post link

You can set up a test machine to try it out first. It’s not complicated; a single-machine deployment will do. If the file is gone, you can add the --force parameter to delete it.

| username: zhanggame1 | Original post link

I tested it on a single machine with version 7.1.0:
First, I scaled out and added a CDC node.

root@tidb:~# tiup cluster scale-out tidb-test scale-out-cdc.yml -u root -p
tiup is checking updates for component cluster ...
Starting component `cluster`: /root/.tiup/components/cluster/v1.12.5/tiup-cluster scale-out tidb-test scale-out-cdc.yml -u root -p
Input SSH password:

+ Detect CPU Arch Name
  - Detecting node 127.0.1.1 Arch info ... Done

+ Detect CPU OS Name
  - Detecting node 127.0.1.1 OS info ... Done
Please confirm your topology:
Cluster type:    tidb
Cluster name:    tidb-test
Cluster version: v7.1.0
Role  Host       Ports  OS/Arch       Directories
----  ----       -----  -------       -----------
cdc   127.0.1.1  8300   linux/x86_64  /tidb-deploy/cdc-8300,/tidb-data/cdc-8300
Attention:
    1. If the topology is not what you expected, check your yaml file.
    2. Please confirm there is no port/directory conflicts in same host.
Do you want to continue? [y/N]: (default=N) y
+ [ Serial ] - SSHKeySet: privateKey=/root/.tiup/storage/cluster/clusters/tidb-test/ssh/id_rsa, publicKey=/root/.tiup/storage/cluster/clusters/tidb-test/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=tidb, host=127.0.1.1
+ [Parallel] - UserSSH: user=tidb, host=127.0.1.1
+ [Parallel] - UserSSH: user=tidb, host=127.0.1.1
+ [Parallel] - UserSSH: user=tidb, host=127.0.1.1
+ [Parallel] - UserSSH: user=tidb, host=127.0.1.1
+ Download TiDB components
  - Download cdc:v7.1.0 (linux/amd64) ... Done
+ Initialize target host environments
+ Deploy TiDB instance
  - Deploy instance cdc -> 127.0.1.1:8300 ... Done
+ Copy certificate to remote host
+ Generate scale-out config
  - Generate scale-out config cdc -> 127.0.1.1:8300 ... Done
+ Init monitor config
Enabling component cdc
        Enabling instance 127.0.1.1:8300
        Enable instance 127.0.1.1:8300 success
Enabling component node_exporter
        Enabling instance 127.0.1.1
        Enable 127.0.1.1 success
Enabling component blackbox_exporter
        Enabling instance 127.0.1.1
        Enable 127.0.1.1 success
+ [ Serial ] - Save meta
+ [ Serial ] - Start new instances
Starting component cdc
        Starting instance 127.0.1.1:8300
        Start instance 127.0.1.1:8300 success
Starting component node_exporter
        Starting instance 127.0.1.1
        Start 127.0.1.1 success
Starting component blackbox_exporter
        Starting instance 127.0.1.1
        Start 127.0.1.1 success
+ Refresh components conifgs
  - Generate config pd -> 127.0.1.1:2379 ... Done
  - Generate config tikv -> 127.0.1.1:20160 ... Done
  - Generate config tidb -> 127.0.1.1:4000 ... Done
  - Generate config cdc -> 127.0.1.1:8300 ... Done
  - Generate config prometheus -> 127.0.1.1:9090 ... Done
  - Generate config grafana -> 127.0.1.1:3000 ... Done
+ Reload prometheus and grafana
  - Reload prometheus -> 127.0.1.1:9090 ... Done
  - Reload grafana -> 127.0.1.1:3000 ... Done
+ [ Serial ] - UpdateTopology: cluster=tidb-test
Scaled cluster `tidb-test` out successfully

Then I simulated a CDC node failure by deleting the CDC directories with rm.

rm -rf /tidb-deploy/cdc-8300
rm -rf /tidb-data/cdc-8300

After restarting the cluster, I saw that the CDC was offline.


Then I successfully deleted the node with tiup cluster scale-in tidb-test --node 127.0.1.1:8300. If unsuccessful, you can add the --force parameter.

| username: TiDBer_Y2d2kiJh | Original post link

Note!

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.