Error Adding Node: Failed to execute operation: No such file or directory

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 扩容节点报错,提示Failed to execute operation: No such file or directory。

| username: TiDBer_F5dSCuOb

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] v7.1.1
[Reproduction Path] Scale-out node, tiup cluster scale-out tidb-test scale-out.yml -p
[Encountered Issue: Phenomenon and Impact]
[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Log/Monitoring]

Error message as follows:
Error: failed to enable: 10.0.4.2 node_exporter-9100.service, please check the instance’s log() for more detail.: executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@10.0.4.2:22’ {ssh_stderr: Failed to execute operation: No such file or directory, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /usr/bin/sudo -H bash -c “systemctl daemon-reload && systemctl enable node_exporter-9100.service”}, cause: Process exited with status 1

The system has the following directories: /bin:/sbin:/usr/bin:/usr/sbin /usr/bin/sudo

Password-free login test screenshot:

Directory screenshot on 10.0.4.2:

Error screenshot:

| username: 像风一样的男子 | Original post link

Is the new server password the same as the old one? Can you manually SSH into tidb@10.0.4.2?

| username: 大飞哥online | Original post link

The error indicates that the SSH connection cannot be established. Check the password and access settings.

| username: TiDBer_F5dSCuOb | Original post link

It can be logged in. Passwordless login has also been set up.

| username: TiDBer_F5dSCuOb | Original post link

SSH connection is possible, and passwordless login is also set up. But I don’t know why this error is occurring.

| username: 大飞哥online | Original post link

How about not using the password-free method and using the interactive password method to get it started first?

| username: TiDBer_F5dSCuOb | Original post link

The current error indicates that the file or directory does not exist! The corresponding file permissions have also been assigned, but this error is still reported. It doesn’t seem to be a password issue!

| username: TiDBer_F5dSCuOb | Original post link

Has no one else encountered a similar issue?

| username: Jasper | Original post link

Can you check the current cluster status with tiup cluster display to see if it is normal?
Is the newly expanded machine a new server or one that was previously scaled down?

| username: TiDBer_F5dSCuOb | Original post link

The cluster status is normal. The machine had its system reinstalled after a previous scale-down, and then it was scaled up again. This is the situation.

Cluster status screenshot:

| username: TiDBer_F5dSCuOb | Original post link

I simulated a crash, took down two nodes, and then after restarting the system, I tried to simulate scaling up, which led to this problem.

| username: 像风一样的男子 | Original post link

Check in pd-ctl to see if there are two stores with the same IP and port.

| username: TiDBer_F5dSCuOb | Original post link

It’s the same IP but different ports. I changed the port.

| username: TiDBer_F5dSCuOb | Original post link

This error is consistently reproducible. I used the 10.0.4.16 server in the cluster, reinstalled the system, and re-expanded, but the same error occurred.

| username: TiDBer_F5dSCuOb | Original post link

Screenshot of the file content for expansion:
image

image

| username: 像风一样的男子 | Original post link

What is the status of having 2 stores on the same server now?

| username: ti-tiger | Original post link

  • The node_exporter-9100.service is not installed or started on the 10.0.4.2 node. This is a Prometheus exporter used to collect and expose hardware and kernel-related metrics.

  • The scale-out.yml file does not correctly configure the path or parameters for the node_exporter-9100.service on the 10.0.4.2 node.

  • Check whether the binary files of node_exporter-9100.service have been downloaded and extracted on the 10.0.4.2 node and whether they have execution permissions.

  • Check whether the scale-out.yml file specifies the deployment directory and port number for the node_exporter-9100.service on the 10.0.4.2 node, for example:

monitored:
  node_exporter_port: 9100
  blackbox_exporter_port: 9115
  deploy_dir: /tidb-deploy/monitored-9100
  data_dir: /tidb-data/monitored-9100
  log_dir: /tidb-deploy/monitored-9100/log
  • Re-execute the scale-out command, or manually start the node_exporter-9100.service on the 10.0.4.2 node:
sudo systemctl daemon-reload && sudo systemctl enable node_exporter-9100.service && sudo systemctl start node_exporter-9100.service
| username: Jasper | Original post link

You simulated a crash, but the corresponding tikv-server was not scaled in. If it’s a test environment, you can first scale in the corresponding tikv, and then scale it out again to return to normal.

| username: 像风一样的男子 | Original post link

It seems that node_exporter-9100 has not been completely scaled down, there must be some remnants.

| username: Jasper | Original post link

It doesn’t look like it wasn’t scaled down cleanly; from the image above, it seems it wasn’t scaled down at all…