TiFlash Fails to Start When Upgrading from Version 6.2 to Version 6.3

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 从V6.2升级到V6.3版本时,TIFLASH启动不成功

| username: yuchangfa

The same V6.2 was upgraded to 6.3 in the test environment without issues. However, the upgrade failed for TiFlash in the production environment.

Manually executing systemctl start tiflash-9000.service did not report any errors, but port 9000 did not start, and the related TiFlash process did not start successfully.

Comparing the production and test environments, the installation and deployment configuration parameters are the same (all using default parameters, no custom parameters). The only difference is that the test environment was gradually upgraded from V4.* to V5., while the production environment was gradually upgraded from V5. to V6.*.

Could you please advise on how I should proceed? Thank you!!

| username: yuchangfa | Original post link

Currently, I can only manually replace the BIN directory with the original V6.2 program, and then manually execute systemctl start tiflash-9000.service to start it successfully. The 9000 port and process status are normal.

I don’t know what went wrong with the upgrade to 6.3.
Please advise. Thank you!

| username: yuchangfa | Original post link

[2022/10/11 00:06:36.103 +08:00] [WARN] [StorageConfigParser.cpp:241] [“Application: The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]
[2022/10/11 00:07:13.504 +08:00] [WARN] [StorageConfigParser.cpp:241] [“Application: The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]
[2022/10/11 00:07:23.699 +08:00] [ERROR] [] [“Application: null context when constructing CivetServer. Possible problem binding to port.”] [thread_id=1]
[2022/10/11 00:07:39.357 +08:00] [WARN] [StorageConfigParser.cpp:241] [“Application: The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]

| username: tidb狂热爱好者 | Original post link

Are you using 6.3 in production? That thing is a test version.

| username: zhouzeru | Original post link

I recommend the stable version. The 6.3 version doesn’t even have a dashboard.

| username: tidb狂热爱好者 | Original post link

I recommend using the stable version. The 6.3 version doesn’t have a dashboard.

There is a dashboard that you can check out. But for production, use the stable version and don’t be a guinea pig.

| username: Lloyd-Pottiger | Original post link

Check if the Prometheus port is being occupied.

| username: tidb狂热爱好者 | Original post link

Switch to the tisb account on that machine and execute the command to see the returned error.

| username: yuchangfa | Original post link

It’s impossible. The port hasn’t started.
It should be a BUG.

| username: yuchangfa | Original post link

Hearing you say that, it scared me quite a bit. Every time a new version comes out, I first upgrade it on the test environment, run it for a few days without any issues, and then upgrade the production version as well. :rofl:

| username: Billmay表妹 | Original post link

I will report it to the relevant teacher to see if it is a BUG.

| username: Billmay表妹 | Original post link

Hahaha~ There probably isn’t any version without bugs!
Without bugs, there would be no DBAs~

| username: qizheng | Original post link

The error above looks like a configuration issue. Could you please provide the complete log after the startup failure?

| username: yuchangfa | Original post link

The configuration can’t be wrong because version 6.2 is the production version currently running. The same default parameters were used: upgrading from the test machine with version 6.2 to 6.3 worked fine. However, on the production server, which also uses the default parameters, upgrading from 6.2 to 6.3 resulted in an error. The error indicated that TIFLASH couldn’t start. Checking port 9000 showed it couldn’t start properly and wasn’t being listened to. Manually starting TIFLASH also didn’t work.

Later, I manually rolled back the TIFLASH program to the original 6.2 version, and manually restarting TIFLASH succeeded. Therefore, I suspect there’s a bug in the 6.3 version of TIFLASH.

Regardless, I don’t want to risk it in the production environment. Currently, the test machine is running version 6.3, while the production environment has been manually rolled back to version 6.2, and no issues have been found so far.

| username: Lucien-卢西恩 | Original post link

Hello, could you please send the complete log after the startup failure?

| username: tidb狂热爱好者 | Original post link

How do I roll back the version?

| username: 数据小黑 | Original post link

I recall there were compatibility issues with TiFlash that caused upgrade failures. The troubleshooting method is to set the number of replicas on TiFlash to 0, and after synchronization is complete (all replicas are deleted), try upgrading again. I’m not sure if your production environment allows this operation.

| username: Lucien-卢西恩 | Original post link

It is not recommended to roll back; you can refer to the suggestions from @数据小黑.

| username: yuchangfa | Original post link

TiFlash was not used at all. No replicas were made. It was only deployed and started during installation, but no substantial use was made.

| username: yuchangfa | Original post link

Now upgrading to version 6.5, the same upgrade is unsuccessful. This time, no error messages are reported.