Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: exchSenderExec中是否考虑消费速度过慢导致的背压问题?
As mentioned, after the producer and consumer establish a connection, the producer will start a coroutine to enter EstablishMPPConnectionWithStoreID
, where the logic for sending data is as follows:
var sendError error
for sendError == nil {
// Retrieve data from data ch
chunk, err := tunnel.RecvChunk()
...
if chunk == nil {
// All data has been retrieved
break
}
res := tipb.SelectResponse{
Chunks: []tipb.Chunk{*chunk},
}
raw, err := res.Marshal()
...
sendError = server.Send(&mpp.MPPDataPacket{Data: raw})
}
Here, data is continuously read from the data channel and then sent to the receiver. If the production speed is fast and the consumption speed is slow, it will lead to data backlog at the consumer end, and if the data volume is large, it might cause an OOM (Out of Memory) issue?
How is this considered here, and is there any other mechanism in TiFlash to control the sending rate?
The settings to speed up synchronization in the documentation. I think you can try reducing them to see if it solves the issue.
-- The default values for these two parameters are both 100MiB, meaning the maximum disk bandwidth occupied by snapshots for replica synchronization does not exceed 100MiB/s.
SET CONFIG tikv `server.snap-io-max-bytes-per-sec` = '300MiB';
SET CONFIG tiflash `raftstore-proxy.server.snap-max-write-bytes-per-sec` = '300MiB';
Especially the two parameters above.
Thank you, but replica synchronization is not the same scenario as my issue. exchSenderExec is used to send to downstream operators in the exchange operator.
Alright 
I thought it was the same as another issue where TiFlash crashed during data import.
Set the max-server-memory parameter to limit the memory usage of the TiFlash instance.
Is it possible to solve this issue only through operational means? Is this considered a kernel defect?
First of all, this “slow processing” will inevitably lead to the OOM problem, right?
Secondly, the best way to avoid this issue is, of course, to speed up; however, if speeding up is not possible, then operational measures might be the only solution.