DM Error 38008

translator_bot · June 22, 2024, 3:45pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: dm错误38008

| username: Jjjjayson_zeng

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Problem Phenomenon and Impact]
[Resource Configuration]
[Attachment: Screenshot / Log / Monitoring]

Here it comes, here it comes, a new issue has appeared again.

translator_bot · June 22, 2024, 3:45pm

| username: liuis | Original post link

Error in gRPC communication between DM components, error.DM-dm-master-38008

translator_bot · June 22, 2024, 3:45pm

| username: Jjjjayson_zeng | Original post link

What should I do when encountering this kind of error? The task required by the customer is on this worker.

translator_bot · June 22, 2024, 3:45pm

| username: liuis | Original post link

Is there no problem with the upstream and downstream networks?

translator_bot · June 22, 2024, 3:45pm

| username: Jjjjayson_zeng | Original post link

Of course, no problem.

translator_bot · June 22, 2024, 3:45pm

| username: Jjjjayson_zeng | Original post link

A very important point is that when this error is reported, I don’t know which machine it is. How can I determine if there is a problem? I can only check each machine one by one.

translator_bot · June 22, 2024, 3:45pm

| username: Jjjjayson_zeng | Original post link

I have already resolved this issue.

translator_bot · June 22, 2024, 3:45pm

| username: Jjjjayson_zeng | Original post link

However, there is still no rational solution.

translator_bot · June 22, 2024, 3:45pm

| username: liuis | Original post link

What is the reason?

translator_bot · June 22, 2024, 3:45pm

| username: Jjjjayson_zeng | Original post link

The DM component cannot effectively control memory usage during synchronization, resulting in a high memory limit for a single worker, eventually leading to a semi-dead state. Manually killing the thread and restarting the worker is required.

translator_bot · June 22, 2024, 3:45pm

| username: liuis | Original post link

Sure enough, it’s a node communication issue… Restarting works wonders.

translator_bot · June 22, 2024, 3:45pm

| username: Jjjjayson_zeng | Original post link

But a very important issue is that besides the problem, I don’t know which node it is. Do you understand what I mean?

translator_bot · June 22, 2024, 3:45pm

| username: Jjjjayson_zeng | Original post link

It’s exhausting to check each one individually…

translator_bot · June 22, 2024, 3:45pm

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.