Issue with TiFlash Component Not Functioning Properly After Upgrading TiDB Cluster Version from v5.4.0 to v6.5.0

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb集群版本从v5.4.0升级到v6.5.0,出现tiflash组件无法正常工作的问题

| username: vcdog

[TiDB Usage Environment] Production Environment
[TiDB Version] Upgraded from v5.4.0 to v6.5.0, currently, all components in the TiDB cluster are version v6.5.0
[Reproduction Path] After upgrading the TiDB cluster version from v5.4.0 to v6.5.0, the TiFlash component is not working properly.
[Encountered Problem: Symptoms and Impact]

tiup cluster display my_cluster_name

Then, following the official documentation, I tried to scale in and out the TiFlash component, and the status was normal again.
However, as soon as a few small tables are loaded into TiFlash, the disconnected issue occurs.

At the same time, logging into the remote TiFlash server, a large number of core files are generated in the deployment path, as follows:

Each core file is 1G in size, with a total of 70G+.

Checking the error logs, the following errors are reported:

[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

| username: vcdog | Original post link

The above is the error log for TiFlash.

| username: tidb狂热爱好者 | Original post link

Deleting and rebuilding will solve the problem.

| username: vcdog | Original post link

The very strange thing is that we have two production environment clusters, both of which were upgraded from version v5.4.0 to version v6.5.0. In one cluster, TiFlash works normally, but in the other, TiFlash does not work properly.

| username: wzf0072 | Original post link

The errors are all related to the table with physical_table_id=1130. Should we disable the TIFLASH REPLICA for this table first and then enable it again?

| username: vcdog | Original post link

I have already set the replica count of all tables in TiFlash to 0 and restarted TiFlash, but the same error still occurs, and a large number of core files are generated.

| username: tidb菜鸟一只 | Original post link

Let’s see what table this is.

| username: vcdog | Original post link

I think I’ve roughly found the cause of the problem. I’ll verify it again today, and if it is indeed the cause, I’ll post the verification results.

| username: tidb狂热爱好者 | Original post link

What is the reason?

| username: 裤衩儿飞上天 | Original post link

Is it an issue with AVX2 instructions?

| username: vcdog | Original post link

The hypothesis verification failed, so the entire replica cluster had to be destroyed and rebuilt:

  1. Stop the ticdc synchronization task from the primary cluster to the replica cluster.
  2. Destroy the replica cluster.
  3. Backup and export the primary cluster data.
  4. Import the data into the replica cluster.
  5. Load tiflash on the replica cluster.
| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.