Error Occurred While Deploying TiKV in V5.4.0

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: V5.4.0 部署TiKV出现错误

| username: Boomballa

[TiDB Usage Environment] Test Environment
[TiDB Version] v5.4.0
[Resource Configuration] Physical Machine + SSD Separate Data Partition
[Encountered Issue: Error when deploying TiKV on physical machines]
Three physical machines with identical configurations, two of them report the same error,


The other machine can pass the tikv toml file check normally.


Error Description:




| username: xfworld | Original post link

What is the error reported by the tikv node when starting the cluster with tiup?

| username: Boomballa | Original post link

Currently, this error occurred during the cluster deployment process, specifically in the deploy phase, and the cluster has not been created.

| username: xfworld | Original post link

Did tiup check pass? If not, you can refer to the detailed deployment configuration documentation and make adjustments accordingly.

Forcing a skip might result in failure to start or abnormal conditions.

| username: Boomballa | Original post link

Checked, the issues are with some less important options. Now, when I check again, it’s just errors about directories already existing.

| username: h5n1 | Original post link

Try adding --user xxx -p when deploying, and if that doesn’t work, try adding --ssh=system.

| username: xfworld | Original post link

Which options are considered not particularly important?

| username: Boomballa | Original post link

10.196.3.188 disk Fail mount point /data0 does not have ‘nodelalloc’ option set
10.196.3.188 disk Fail multiple components tikv:/data0/dba/tidb_elephant_test/tikv21281v,tiflash:/data0/dba/tidb_elephant_test/tiflash9132v are using the same partition 10.196.3.188:/data0 as data dir
10.196.3.188 limits Fail soft limit of ‘nofile’ for user ‘dba’ is not set or too low
10.196.3.188 limits Fail hard limit of ‘nofile’ for user ‘dba’ is not set or too low
10.196.3.188 limits Fail soft limit of ‘stack’ for user ‘dba’ is not set or too low

| username: Boomballa | Original post link

Thank you for your help. I did add --user when deploying. The current situation is that other components are deployed without any issues, only two of the TiKV components have problems. It doesn’t seem to be related to SSH.

| username: xfworld | Original post link

Is the issue with 191 and 188 checks the same?

What about 190?

| username: Boomballa | Original post link

190 check can pass normally… I checked the tikv toml files, they are all the same.

| username: xfworld | Original post link

How about cleaning up the environment and trying to redeploy?

| username: Boomballa | Original post link

We took a closer look, and these two machines have cgroupV2 enabled. Deploying version v6.x works fine, but version v5.4 reports an error. Even after we cleared the cgroupV2 configuration, it still doesn’t work. Does version v5.4 require cgroupV1?

| username: liuis | Original post link

I think completely cleaning and reinstalling should solve the problem.

| username: ffeenn | Original post link

Are these servers running the same operating system? The production environment system should have been optimized, right?