Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 十亿级别的表使用tikv client java 去获取 catelog一直失败,gRpc总是断,这个需要怎么处理呢?
[TiDB Usage Environment] Production Environment
[TiDB Version] 4.0.16
[Reproduction Path] The TiKV client in Java retrieves a table with billions of entries and many regions. The first time it fetches the catalog, the tableID can be obtained, and the CDC push gRPC is successfully established, which will push CDC data. However, when an error event is received and the CDC connection needs to be re-established, an error occurs when trying to fetch the catalog to get the tableID.
[Encountered Problem: Phenomenon and Impact]
2023-10-24 09:57:44.892 [tidb-extractor-6-thd-0] WARN org.tikv.common.region.AbstractRegionStoreClient - leader for region[12] is not found, it is possible that network partition occurred 2023-10-24 09:57:44.892 [tidb-extractor-6-thd-0] INFO org.tikv.common.region.AbstractRegionStoreClient - try grpc forward: region[12] 2023-10-24 09:57:45.197 [tidb-extractor-6-thd-0] WARN org.tikv.common.region.AbstractRegionStoreClient - No store available, retry: region[12] 2023-10-24 09:57:45.197 [tidb-extractor-6-thd-0] WARN org.tikv.common.operation.RegionErrorHandler - request failed because of: DEADLINE_EXCEEDED: deadline exceeded after 19.999203340s. [closed=, open=[[remote_addr=/172.21.77.94:20160]]]
Note: The network is connected
Warnings about region leader not found and no available store have appeared. This may be due to network partitions or other failures.
When the TiKV client cannot find the leader of a region or there is no available store, it may cause request failures. This could be due to network issues, TiKV node failures, or other reasons.
To resolve this issue, you can try the following steps:
- Check network connection: Ensure that the network connection between TiKV and TiDB is normal and that there are no network failures or partitions.
- Check TiKV cluster status: Use TiDB Dashboard or TiKV monitoring tools (such as Prometheus and Grafana) to check the status of the TiKV cluster. Ensure that all TiKV nodes are running normally and that there are no anomalies.
- Check TiKV configuration: Review the TiKV configuration files to ensure they are correct and that there are no configuration issues causing network connection problems or store unavailability.
- Adjust TiKV configuration: Based on TiKV’s load and network conditions, you can try adjusting some TiKV configuration parameters, such as
grpc-concurrency
, grpc-keepalive-time
, grpc-keepalive-timeout
, etc., to optimize network connections and store availability.
Are you concerned about having too many regions? Is there an error?
There are thousands of regions, not sure if this is causing the error. In the error message above, region[12] contains all data starting with mDB, which should be related to the table’s catalog.