Newly Added Tables Won't Automatically Sync After Building TiFlash Replicas by Database?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 按库构建 TiFlash 副本后新增的表不会自动同步?

| username: TiDBer_Di6thIjv

[TiDB Usage Environment] POC

[TiDB Version] 6.1.0

[Problem Encountered] New tables are not automatically synchronized when constructing TiFlash replicas by database.

[Reproduction Steps]

  1. Construct TiFlash replicas by database
ALTER DATABASE test SET TIFLASH REPLICA 1;
  1. Create a new table in the database and insert data
  2. Use
SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test';

to check the synchronization progress, and find that there is no progress for the new table.

| username: songxuecheng | Original post link

Check this accordingly.

| username: TiDBer_Di6thIjv | Original post link

I previously did another ALTER DATABASE test SET TIFLASH REPLICA 1 and it worked. I’m not sure if it’s a bug or expected behavior.

After building by database, should new tables be automatically synchronized as expected?

| username: songxuecheng | Original post link

TiFlash is built according to tables. It doesn’t work for databases.

| username: lilinghai | Original post link

The statement ALTER DATABASE db_name SET TIFLASH REPLICA count; (supported in versions after 6.0) is equivalent to TiDB executing a series of DDL operations to add TiFlash replicas for the tables in that database. This statement will only add TiFlash replicas to all current tables in the database, and the behavior described in the post is as expected.

| username: Mark | Original post link

Do newly added tables in the database still need to be manually operated? This can be brought up with the official team.
Common issues with TiFlash:

| username: TiDBer_wTKU9jv6 | Original post link

This is essentially equivalent to batch execution of the operation to add TiFlash by table, saving the manual effort of adding each table individually. Manually adding TiFlash to new tables in the future would also require re-executing the add TiFlash operation. Perhaps this operation of adding TiFlash to the database can be recorded somewhere, so that when new tables are added in the future, it can check whether this operation exists for the database, and if so, perform the add operation for the new table.

| username: 数据小黑 | Original post link

Refer to: 专栏 - tiflash 6.0 on K8s 扩容与新特性实践 | TiDB 社区
It mentions:

  • From the execution of the command to build TiFlash replicas by database until all tables in the database are synchronized, it is not recommended to execute TiFlash replica count settings or other DDL operations related to the database, otherwise, the final state may be unexpected. Unexpected scenarios include:
    • Setting the TiFlash replica count to 2 first, and then setting the TiFlash replica count to 1 before all tables in the database are synchronized, cannot guarantee that the final TiFlash replica count of all tables will be either 1 or 2.
    • During the execution of the command, if a table is created under the database, TiFlash replicas may be created for these newly added tables.
    • During the execution of the command, if an index is added to a table under the database, the command may be stuck waiting until the index is added.
  • The command to build TiFlash replicas by database will skip system tables, views, temporary tables, and tables containing character sets not supported by TiFlash.

In actual operations, the above tips should be considered. Specifically, during the execution of the command to build TiFlash replicas by database, if tables are created under the target database, TiFlash replicas may be created for these newly added tables, which is uncertain; after all synchronization is completed, creating tables will definitely create TiFlash replicas for these new tables. (It has been verified that only in the case of create table like, TiFlash replicas are automatically created.)

According to test results, if the DDL has not finished executing in the background when building replicas by database, the results are unpredictable. To check if it has finished executing, use:

SELECT * FROM INFORMATION_SCHEMA.TIFLASH_REPLICA WHERE TABLE_SCHEMA = 'TESTDB';
| username: flow-PingCAP | Original post link

The reason is that the SQL statement alter database xxx set tiflash replica xx; does not guarantee atomicity in its internal execution process. It completes the work through multiple DDLs that set TiFlash table replicas, so it is uncertain whether the corresponding DDL has been executed for newly added tables during the SQL execution process. Additionally, after the SQL execution is completed, TiFlash replicas will not be automatically set for subsequently added tables.

Reference documentation: 构建 TiFlash 副本 | PingCAP 文档中心

| username: 数据小黑 | Original post link

I tested it again. After creating the database, if you execute the operation create table test1 like test;, you can see the newly created table using SELECT * FROM INFORMATION_SCHEMA.TIFLASH_REPLICA. However, if you directly create a table, you cannot see it. I just checked the documentation, and it clearly states that create table like will create a TiFlash replica. So, it is confirmed that after building by database, newly created tables will not automatically create TiFlash replicas.

| username: 数据小黑 | Original post link

Is this DDL operation a PD scheduling task? Besides checking the completion status of the TiFlash replica (SELECT * FROM INFORMATION_SCHEMA.TIFLASH_REPLICA WHERE TABLE_SCHEMA = ‘TESTDB’;), where else can we monitor whether it has been completed?

| username: ShawnYan | Original post link

If it is in the form of create table like, due to the mechanism of TiFlash itself, a TiFlash replica will be automatically created.

This statement is not entirely accurate. If it is create table like other_schema.table_xxx, a TiFlash replica will not be automatically created.

| username: 数据小黑 | Original post link

Sure, take another look?

| username: ShawnYan | Original post link

Yes, it can only be done for tables under the current schema. This behavior is quite strange; shouldn’t the create table like command automatically create a TiFlash replica?

| username: 数据小黑 | Original post link

The automatic creation of replicas with “create table like” is specifically mentioned in the official documentation, and this is how it currently works.

| username: ShawnYan | Original post link

(root@127.1) [test] 17:38:04> create table test1_2 like test2.test1;
Query OK, 0 rows affected (0.54 sec)

(root@127.1) [test] 17:51:15> select * from information_schema.tiflash_replica where table_schema like 'test%';
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| test         | test       |      128 |             1 |                 |         1 |        1 |
| test         | test1      |      115 |             1 |                 |         1 |        1 |
| test         | test2      |      117 |             1 |                 |         1 |        1 |
| test         | test1_1    |      158 |             1 |                 |         1 |        1 |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
4 rows in set (0.00 sec)
| username: 数据小黑 | Original post link

You added a “where table_schema like ‘test%’;” at the end, won’t that limit your query scope?

| username: ShawnYan | Original post link

:sweat_smile: This is to demonstrate schema test/test2, querying them separately yields the same result.

(root@127.1) [test] 17:51:24> select * from information_schema.tiflash_replica where table_schema = 'test';
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| test         | test       |      128 |             1 |                 |         1 |        1 |
| test         | test1      |      115 |             1 |                 |         1 |        1 |
| test         | test2      |      117 |             1 |                 |         1 |        1 |
| test         | test1_1    |      158 |             1 |                 |         1 |        1 |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
4 rows in set (0.01 sec)

(root@127.1) [test] 17:54:34> select * from information_schema.tiflash_replica where table_schema = 'test2';
Empty set (0.00 sec)
| username: 数据小黑 | Original post link

Can I cross schemas? Is it because of different versions? :sweat_smile:

| username: flow-PingCAP | Original post link

DDL is advanced by TiDB’s ddl master. You can use admin show ddl; to check, but you can only see the DDL of each table…