Using TiSpark for Batch Writing Causes Cluster Unavailability

translator_bot · June 23, 2024, 3:32am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 使用TiSpark批量写入，集群不可用

| username: zaker

Description: When running batch processing with TiSpark, the cluster experiences high latency during writes, making the cluster unavailable.
Version: TiSpark 2.3.13, TiDB 5.2.4, Spark 2.4.7
Explanation: When loading data with Spark, the latency is slightly high, but it significantly increases during writes.

What could be the reason? Is it a parameter configuration issue or a TiSpark version issue?

translator_bot · June 23, 2024, 3:32am

| username: 数据小黑 | Original post link

What is the usage of TiKV’s CPU, memory, and IO?

translator_bot · June 23, 2024, 3:32am

| username: zaker | Original post link

6 physical machines, each with 160 cores, with high CPU usage on 3 instances.

translator_bot · June 23, 2024, 3:32am

| username: yilong | Original post link

Refer to the slow write, check where exactly it is slow?

TiDB 的问答社区 – 19 Jan 21

TiDB 写入慢流程排查系列（一）— 前言

🌌 运维指南 Trouble Shooting 指南

TiDB 写入慢流程排查系列（一）— 前言背景在使用及测试 TiDB 的过程中，不少用户会遇到 TiDB 集群写入性能差的问题，但是因为 TiDB 不仅包含 tidb server、tikv server、pd server 三大基础组件，还包含 Binlog、TiCDC、TiFlash 等周边生态组件，整体架构相对复杂，排查问题比较困难。所以在这个基础上，整理了 TiDB 写入操作的流程，以及对应环节的相关监控，希望该文档能在一定程度上帮助用户排查并定位 TiDB...

阅读时间: 1 mins 🕑 赞: 9 ❤

translator_bot · June 23, 2024, 3:32am

| username: 数据小黑 | Original post link

It might be a hotspot issue. Trying to use auto_random to disperse the hotspot might help.

translator_bot · June 23, 2024, 3:32am

| username: zaker | Original post link

There are 3 instances writing more than the others. This table has no primary key, and SHARD_ROW_ID_BITS=6 has been set. Are there any other ways to distribute the load? It seems that when writing to other tables in this way, the cluster latency is significantly high, making the entire cluster unusable, and queries are in a waiting state. Can the resource usage of this kind of TiSpark writing be limited?

translator_bot · June 23, 2024, 3:32am

| username: 数据小黑 | Original post link

As far as I remember, there are no parameters to control concurrency because it writes directly to TiKV. In the case of a small cluster, the interference is relatively large.