Datax Writing to TIDB is Slow, Dropping from 5000 rows/s to 200 rows/s

translator_bot · June 22, 2024, 7:58am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Datax 写入TIDB 慢，5000条/s 下降至 200条/s

| username: 卡卡其其

【TiDB Usage Environment】Production Environment
【TiDB Version】6.5.2
【Reproduction Path】jdbc, channel, batchsize
【Encountered Problem: Phenomenon and Impact】Sudden decrease in write performance
【Attachment: Screenshot/Log/Monitoring】

reader: hdfs reader
channel: 10
batchsize: 1024
“jdbcUrl”: “jdbc:mysql://xxxx:4000/ads_global?useUnicode=true&characterEncoding=utf-8&useSSL=false&autoReconnect=true&useServerPrepStmts=true&cachePrepStmts=true&prepStmtCacheSize=1000&prepStmtCacheSqlLimit=2048000&useConfigs=maxPerformance&rewriteBatchedStatements=true&allowMultiQueries=true&useCursorFetch=true&netTimeoutForStreamingResults=28800&sessionVariables=wait_timeout=123456”

translator_bot · June 22, 2024, 7:58am

| username: Billmay表妹 | Original post link

Have you used TiCDC?

translator_bot · June 22, 2024, 7:58am

| username: WalterWj | Original post link

Is the downstream pressure high?

translator_bot · June 22, 2024, 7:58am

| username: tidb菜鸟一只 | Original post link

Could you please share the table schema for writing into the table?

translator_bot · June 22, 2024, 7:58am

| username: TiDBer_pkQ5q1l0 | Original post link

Granfa checks the load status of each node.

translator_bot · June 22, 2024, 7:58am

| username: 卡卡其其 | Original post link

CREATE TABLE ods_crm_biz_t_integral_account_log (
id bigint(64) NOT NULL AUTO_INCREMENT,
member_id bigint(64) NOT NULL COMMENT ‘Member ID’,
integral_type varchar(50) NOT NULL COMMENT ‘Integral Type’,
integral_value decimal(12,2) DEFAULT NULL COMMENT ‘Integral Value’,
tran_id bigint(64) DEFAULT NULL COMMENT ‘Transaction ID’,
tran_no varchar(50) DEFAULT NULL COMMENT ‘Transaction Number’,
update_type int(2) DEFAULT NULL COMMENT ‘Change Type (1: Increase, 2: Decrease)’,
accruals_id bigint(64) DEFAULT NULL COMMENT ‘Accruals ID’,
oper_src int(2) DEFAULT ‘1’ COMMENT ‘Source (1: Increase, 2: Decrease, 4: Initialize Integral, 5: Manual Adjustment, 6: Integral Reset, 7: Integral to Coupon, 8: Integral to Voucher)’,
repay_id varchar(10000) DEFAULT NULL COMMENT ‘Repayment ID’,
tenant_id varchar(10) DEFAULT NULL COMMENT ‘Tenant ID’,
create_dept bigint(64) DEFAULT NULL COMMENT ‘Creating Department’,
create_position bigint(64) DEFAULT NULL COMMENT ‘Creating Position’,
create_user bigint(64) DEFAULT NULL COMMENT ‘Creator’,
create_time datetime DEFAULT NULL COMMENT ‘Creation Time’,
update_user bigint(64) DEFAULT NULL COMMENT ‘Modifier’,
update_time datetime DEFAULT NULL COMMENT ‘Modification Time’,
status int(2) DEFAULT NULL COMMENT ‘Status’,
is_deleted int(2) DEFAULT ‘0’ COMMENT ‘Is Deleted (0: Normal, 1: Deleted)’,
original_member_id bigint(64) DEFAULT NULL COMMENT ‘Original Member ID’,
original_card_no varchar(100) DEFAULT NULL COMMENT ‘Original Member Card Number’,
original_member_comments varchar(2000) DEFAULT NULL COMMENT ‘Original Member Comments’,
PRIMARY KEY (id) /*T![clustered_index] CLUSTERED */,
KEY t_integral_account_log_index1 (member_id,integral_type),
KEY t_integral_account_log_index2 (tran_id),
KEY t_integral_account_log_index3 (tran_no),
KEY t_integral_account_log_index4 (create_time)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=1594605659184571699 COMMENT=‘Integral Account Change Log’

translator_bot · June 22, 2024, 7:58am

| username: 卡卡其其 | Original post link

Are you considering hotspot issues? Previously tested adding things under the table to scatter it, but it didn’t work very well.

translator_bot · June 22, 2024, 7:58am

| username: 卡卡其其 | Original post link

Other tables didn’t encounter this issue during synchronization.

translator_bot · June 22, 2024, 7:58am

| username: 卡卡其其 | Original post link

Currently, there is no pressure.

translator_bot · June 22, 2024, 7:58am

| username: tidb菜鸟一只 | Original post link

It should be a hotspot issue. Check the hotspot situation of this table during import under the dashboard. My suggestion is to change AUTO_INCREMENT to AUTO_RANDOM in the table creation statement.

translator_bot · June 22, 2024, 7:58am

| username: 卡卡其其 | Original post link

Other tasks are currently writing at a speed of 1000 entries per second, but this task is only at 200 entries per second. Also, it can’t be found in the hotspot.

translator_bot · June 22, 2024, 7:58am

| username: huhaifeng | Original post link

Looking at your table structure, it can be seen that the current value of AUTO_INCREMENT is too large, which seems to be problematic. It is recommended to check it.

Additionally, I would like to ask if the id field carried a value during your data migration?
Personally, I suggest that it can carry a value, and the batch size can be considered to be reduced a bit.

Lastly, I personally feel that it is caused by the hotspot issue of AUTO_INCREMENT — since this auto-increment leads to inserts being executed only on one TiDB; other tasks can execute inserts on different TiDBs.

translator_bot · June 22, 2024, 7:58am

| username: 卡卡其其 | Original post link

With values, each batch contains 200 entries, not much difference. Currently, the maximum is 1300 entries per second.

translator_bot · June 22, 2024, 7:58am

| username: huhaifeng | Original post link

With values, each batch has 200 entries, not much difference. Currently, the highest is 1300 entries per second.

With values, if you try to make it as unordered as possible, it should be much better.

translator_bot · June 22, 2024, 7:58am

| username: 卡卡其其 | Original post link

Out-of-order might be unavoidable. I’m synchronizing data from the upstream, and that’s just how it is.

translator_bot · June 22, 2024, 7:58am

| username: 北京大爷 | Original post link

If it is confirmed that data will always be written to TiDB synchronously in the future, you can consider removing the AUTO_INCREMENT attribute and changing the primary key to member_id, integral_type, id, and deleting the secondary index member_id, integral_type.

The benefit is to use the discrete nature of the business member_id to distribute the data.

However, note that when data queries need to filter by id, try to change other secondary indexes to tran_id+id, etc., to accelerate query performance and reduce latency caused by table lookups.

translator_bot · June 22, 2024, 7:58am

| username: redgame | Original post link

This has dropped too much. Is the environment normal?

translator_bot · June 22, 2024, 7:58am

| username: Hacker007 | Original post link

There should have been some operations that affected it, right? Or change the target end to MySQL to verify if it’s a TiDB issue.