Tidb-server Out of Memory (OOM)

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb-server OOM

| username: EDG-给我冲

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1.7
[Encountered Problem: Problem Phenomenon and Impact]

     Dear experts in the forum, the server where tidb-server is located experienced OOM (10000% sure it is OOM).
     How should I troubleshoot whether it is a single SQL occupying too much memory or multiple SQLs concurrently occupying memory?

After repeated troubleshooting, it was found that the following SQL caused the OOM. I would like to ask why this SQL caused the OOM. The extra parameters have been removed, only the structure is retained, the key point is len(483672).

[2024/05/29 20:17:01.284 +08:00] [WARN] [expensivequery.go:196]
[expensive_query] [cost_time=60.050779608s]
[stats=es_charge_settle_accounts_detail:pseudo,es_charge_owner_fee:450095105724907523]
[conn_id=7351167617542984059] [user=prod] [database=maindb] 
[txn_start_ts=0] [mem_max="0 Bytes (0 Bytes)"]
[sql="SELECT DISTINCT
	t.* 
FROM
	(
		SELECT
		ef.fld_guid AS id,
		ef.fld_left_total AS arrears,
		ef.fld_object_name AS resourceName,
		0 AS type,
		CONCAT( ef.fld_guid, '_0' ) AS uid 
	FROM
		`es_charge_owner_fee` ef 
	WHERE
		ef.fld_area_guid = 'f095e3e1-5b60-36b5-580e-2779c3c933ac' 
		AND ef.fld_adjust_guid = '' 
		AND ef.fld_allot_date >= '2019-05-10' 
		AND ef.fld_allot_date <= '2023-12-31' 
		AND ef.fld_owner_guid IN (
			'f4d12a5be4ebc3198f60fc7ffffc6298',
			'6363d38405b642e8ad120d87f44f1c00',
			'fc86384751dd40d8b8a620d2e7b7e542',
			'315f92df775d4adfb6d32cd13383a5f2',
			'084deaaf5fdd4e278b2a11edd556caf4'
		)
len(483672)"]

[Environment Information] tidb-server 3 nodes, each node with 32G memory, mixed deployment with PD. Single SQL memory usage limit is 2G. tidb_mem_oom_action is set to CANCEL. In the TiDB dashboard slow query, the maximum SQL memory usage during the OOM period is more than 300M. In the record file under /tmp/1000_tidb/MC4wLjAuMDo0MDAwLzAuMC4wLjA6MTAwODA=/tmp-storage/record, no SQLs were found to occupy a large amount of memory.

| username: zhanggame1 | Original post link

Check the TiDB logs for “expensive_query”.

| username: EDG-给我冲 | Original post link

There are many, but the recorded mem_max are all 0 Bytes. What should I do?

| username: EDG-给我冲 | Original post link

[mem_max=“0 Bytes (0 Bytes)”] [sql=“use maindb”] All records are like this.

| username: Jack-li | Original post link

Take a look at the tidb_slow_query.log log file on the TiDB node.

| username: zhh_912 | Original post link

Check the logs of the operating system and the database.

| username: Kongdom | Original post link

In the slow query log, it should record the SQL that has been executed. If the limit for a single query is 2G and the TiDB server experiences an OOM (Out of Memory) issue, it is likely not a problem with a single SQL query. Based on the description, it doesn’t seem to be a concurrency issue either. You might want to check if other services on the server are causing the OOM.

| username: 友利奈绪 | Original post link

Are there any related errors in the corresponding monitoring and logs?

| username: EDG-给我冲 | Original post link

It’s not other services, it’s a dedicated database server. After investigation, it was found that the following SQL caused the issue. I want to know why the memory limit was set, but OOM still occurred.

[2024/05/29 20:17:01.284 +08:00] [WARN] [expensivequery.go:196] [expensive_query] [cost_time=60.050779608s] [stats=es_charge_settle_accounts_detail:pseudo,es_charge_owner_fee:450095105724907523] [conn_id=7351167617542984059] [user=prod] [database=maindb] [txn_start_ts=0] [mem_max="0 Bytes (0 Bytes)"] [sql="SELECT DISTINCT t.* FROM (\n        SELECT ef.fld_guid AS id,\n        ef.fld_left_total AS arrears,\n        ef.fld_object_name as resourceName,\n        0 AS type,\n        CONCAT(ef.fld_guid,'_0') AS uid\n        FROM\n        `es_charge_owner_fee` ef\n         WHERE ef.fld_area_guid='f095e3e1-5b60-36b5-580e-2779c3c933ac'\n            AND ef.fld_adjust_guid=''\n            \n                AND ef.fld_allot_date >= '2019-05-10'\n            \n            \n                AND ef.fld_allot_date <= '2023-12-31'\n            \n            \n                AND ef.fld_owner_guid IN\n                (\n                    'f4d12a5be4ebc3198f60fc7ffffc6298'\n                ,\n                    '6363d38405b642e8ad120d87f44f1c00'\n                ,\n                    '3deaa21461014f0188c502052522f31e'\n                ,\n                    '61bd77e6c024af4e9018607fef82c605'\n                ,\n                    'bdfd9f8bd6f04c5b90c2f2fe5b353371'\n                ,\n                    'cc53a19c807754c6ca1b3858e47b7a7b'\n                ,\n                    '061d32baa86dd97db01e3ee21284a0ca'\n                ,\n                    '029427e8ab2467ee652471c5a25c2524'\n                ,\n                    '6d61ad7441924617b5d194d75750ecbe'\n                ,\n                    'bbe1bda657574c8cb1b341f8837bf2ab'\n                ,\n                    '26fd8a3ebecd02613a13db9cbb591e2c'\n                ,\n                    'f3684529d0e54a18912d014a8ee9fc40'\n                ,\n                    'fe24caeb51cb4e218f0c96be3e64e17e'\n                ,\n                    'e4a557e4983c45d3a549940e9a87e45a'\n                ,\n                    'd35fbf3b93d451eba8b1455fd2867eca'\n                ,\n                    '0f769f39-c0a3-8a34-0ec9-5d8d6edf6e75'\n                ,\n                    'bd2285df20ca421f8fa202e7baf5b1d1'\n                ,\n                    'd617a9cc38914029a95448ea1cf0f191'\n                ,\n                    '6f21dfba1da831610913c0091f0f8129'\n                ,\n                    '9f1fcaa8bb0541c2a3609a79e9030490'\n                ,\n                    'fb8e97cf330147d9b088ead7d77db680'\n                ,\n                    '9cf921fc6bfe423aa04bc799875ab93d'\n                ,\n                    '3a6e8b4bbad145a5bf7213879bb17c08'\n                ,\n                    '0003464da535425f9b9319cd0224b090'\n                ,\n                    '2677f1f61d314214b938ae3818256121'\n                ,\n                    'f0b3ee05e7c21e25a051293ab1896c37'\n                ,\n                    'fe6229fc7b582ff05d1dccdbb3aceb5e'\n                ,\n                    '17e26b54b39526989e3dc560dcbb26c0'\n                ,\n                    '36933e31afac1680c1f2f14a9cf40ef7'\n                ,\n                    '6a3b4d02b78f415981e6ab515f6e96f5'\n                ,\n                    '38d5101aa21429d223f276b7c16498b3'\n                ,\n                    '24b9ac8fb0cb7b9dc467d0c3db37f5a4'\n                ,\n                    '3cc37c787f58469e90ab4a117b65d795'\n                ,\n                    '4f43c0230669f66544d6e30c430d168e'\n                ,\n                    '0da6c6e5c0a8474e97605eb3d4c6db8e'\n                ,\n                    '0c3b27a02397bd714b05a7102c6f2bee'\n                ,\n                    'acbc845b79e640d688f43bccf3a4aa06'\n                ,\n                    'de141adb6bfc46f69b22572ec0ece074'\n                ,\n                    '23f1c1a94a552e08baed8e8b6b8f9686'\n                ,\n                    'e0fe2f08973b4c3288fd2f5fcb6caa7d'\n                ,\n                    '196fe68fe90ec67f323744e36fd1b431'\n                ,\n                    'f21ba526df65be51e5bf958ac8d19889'\n                ,\n                    'a39a7045d9928bb9872e6621babf9712'\n                ,\n                    'a0d685149780421c81f2e36eccc796bf'\n                ,\n                    '070e4c4d9ba34c3786f20716fc097691'\n                ,\n                    '1ab37ff236a245df90e7519baaf22e82'\n                ,\n                    '99fdd267f8d891279dee72d6e5a763e0'\n                ,\n                    '7710fe7856dba6a4791e97bc9262cc61'\n                ,\n                    '7b6e349d6783455d9212a85db34d90b6'\n                ,\n                    '8cc65b785165741bd803189f18cc588f'\n                ,\n                    'dfc1be32f94249698705350af14f7937'\n                ,\n                    '50c8c73cd8cc0660076755f03cf17895'\n                ,\n                    '9840434ed2f5494398dc45fcec5e5478'\n                ,\n                    '75612508905d5550bbb15bd6e605945f'\n                ,\n                    '8326a460167867301edf21811b13c163'\n                ,\n                    'dd2f7e97808745089bf8669491257bd1'\n                ,\n                    'fecaf0e84a4f8682a739a1aa3ed44b3a'\n                ,\n                    '820c92a23b279980983bf5a689cef32c'\n                ,\n                    'ead5905541b9bf6bd7c1b758754af06e'\n                ,\n                    'cf463f598f9329665e5c3f6729afb9b2'\n                ,\n                    'f9fef52b59bd49cb24d9d3569632879e'\n                ,\n                    '777c07f6af99456980150055a1f76b53'\n                ,\n                    'e3440741075fa865ddb9ca46f609f7c3'\n                ,\n                    '21975584d19e434aba0be688b22d2de3'\n                ,\n                    '3389615e6c17bd149cdfd7b60283d218'\n                ,\n                    '7055ba0e66d71c83a554c0118ba94995'\n                ,\n                    '71937e7375ac44ee81167070cd43833e'\n                ,\n                    'ffd91a34155666bc10034f125614c00c'\n                ,\n                    '33ca1bb35936424b81d274462cde0503'\n                ,\n                    '8e9a428a54c996915701bb1ab1449f6c'\n                ,\n                    'cfd49e3e131f409205b0db783749a9b6'\n                ,\n                    '1f16583c4ebe4214be4af8ca968a8f98'\n                ,\n                    'cbd52277ee4cc7e0e696dd396256da03'\n                ,\n                    '9282520d-0d26-a4aa-2825-c2a005318aa3'\n                ,\n                    '07a8d89203993e79cbc35ac00f697c1f'\n                ,\n                    '8ba31dcb7f9fa60476dbe5e0f7bc86a4'\n                ,\n                    '550849f3a4ff2e3120bdbb411e1eb5db'\n                ,\n                    'dccd7f7db7c074198d5f1c34d3d57eab'\n                ,\n                    'dadedd96ceab41bab366d505be0fa056'\n                ,\n                    '1b970bd5d582481e846fb87ad0f0f291'\n                ,\n                    'c109128acad50d5c1718bccc2ef03b2f'\n                ,\n                    '9f759af0ac75f86f8ff03ae28927c815'\n                ,\n                    '7e45b34d792f4f7b1fe2551e90f78aa6'\n                ,\n                    'ed2b70828d0845f6b07e3e962424a272'\n                ,\n                    '79d22a7f3ecbb82875d4d0edcd7c9cde'\n                ,\n                    'b8fe72d3db339646c9772f65901c58f6'\n                ,\n                    '7347b2396f84b0d43d6f7e14226c53ec'\n                ,\n                    '8edb42f0719077cea7b4556d899d95b9'\n                ,\n                    'a499fbc708574226bdf9b2ab70c7bcf5'\n                ,\n                    'b38f4d1b94194f7cb4d5f849cf210786'\n                ,\n                    '98537639b0c9458ba4e95f68408d0f47'\n                ,\n                    'cf40f7279bf44219862dd64fb4a33962'\n                ,\n                    'b8ab8a62448b4dbf9d25126431dc77af'\n                ,\n                    '096920393cab4049897ab4adb7ada2fe'\n                ,\n                    '6927c3d07cdc4bad9ce30e2a891c3449'\n                ,\n                    '7ee08bdcfc784f3ab2314911a6cda35e'\n                ,\n                    '9b00cba691fc4624905b0ada38e5ed40'\n                ,\n                    '59ff2e353d4ece9539b07ece0d95c339'\n                ,\n                    'a4e8bc7db2cb41c7a07c794b6bd227ac'\n                ,\n                    '53f92d76c9334020b6d4c382731ef029'\n                ,\n                    '9fd0b18bd15444f1815dc5ed2e4a731f'\n                ,\n                    'fc86384751dd40d8b8a620d2e7b7e542'\n                ,\n                    '315f92df775d4adfb6d32cd13383a5f2'\n                ,\n                    '084deaaf5fdd4e278b2a11edd556caf4'\n                ,\n    len(483672)"]
| username: Kongdom | Original post link

:thinking: Why are you certain that this is the cause and not concurrency? Is there only a TiDB server node on this server? Are there any PD or TiKV nodes?

| username: EDG-给我冲 | Original post link

There are PD nodes. Our business has no concurrency and is not used at night. Multiple tests of this business scenario always result in OOM.

| username: 小龙虾爱大龙虾 | Original post link

This reminds me of a previous bug where, when there were a lot of items inside an IN () clause, TiDB would cause an OOM while constructing the execution plan. I can’t find the link now, but if you suspect this is the issue, you can capture the memory usage just before the OOM and check on GitHub.

| username: tidb菜鸟一只 | Original post link

Will there still be an error if there is less data in the “in” clause?

| username: Hacker007 | Original post link

Set a limit on the memory usage of a single SQL statement, and then observe.

| username: 柴米油酱 | Original post link

select … in 480,000 parameters. This kind of SQL should be restricted under database specifications, otherwise, you will have to take the blame sooner or later.

| username: EDG-给我冲 | Original post link

It has already been limited to 2 GB.

| username: EDG-给我冲 | Original post link

That’s also the developers’ responsibility, not the operations’. :grin:

| username: EDG-给我冲 | Original post link

No, it won’t.

| username: tidb菜鸟一只 | Original post link

I remember that the tidb_mem_quota_query parameter limits memory quite well in version 6.1.7. If it doesn’t work, you might need to upgrade to version 6.5, which might have better control.

| username: Kamner | Original post link

The IN condition allows developers to execute in batches. Limiting a single query to use 2GB of memory is correct, but if this SQL uses more than 2GB of memory, it will still result in an OOM.

It’s like setting the JVM Xmx to 2GB; when the Java process exceeds this maximum value, it will report an OOM.