You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "tibrewalpratik17 (via GitHub)" <gi...@apache.org> on 2023/03/23 22:22:15 UTC

[GitHub] [pinot] tibrewalpratik17 opened a new issue, #10467: _tmp folder not getting properly cleaned up leading to high disk usage

tibrewalpratik17 opened a new issue, #10467:
URL: https://github.com/apache/pinot/issues/10467

   Recently we saw very high disk usage for some of our hosts. On investigating, we found in our servers, directories something like this for a table:
   `_tmp/tmp-<segment_name>-<timestamp>/tmp-<uuid>`
   The segment name in this path^ does not exist anymore for that table (deleted by retention). The contents of the directory are of this manner:
   0	col1.sv.sorted.fwd
   0	col2.mv.fwd
   0	col3.sv.sorted.fwd
   0	col4.sv.sorted.fwd
   0	col5.sv.sorted.fwd
   0	col6.sv.sorted.fwd
   4.0K	col1.dict
   4.0K	col2.dict
   4.0K	col3.dict
   26G	        col4.dict
   132G	col5.dict
   148G	col6.dict
   Any idea what this _tmp folder signifies and why are they getting created?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] snleee closed issue #10467: _tmp folder not getting properly cleaned up leading to high disk usage

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee closed issue #10467: _tmp folder not getting properly cleaned up leading to high disk usage
URL: https://github.com/apache/pinot/issues/10467


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] _tmp folder not getting properly cleaned up leading to high disk usage [pinot]

Posted by "tibrewalpratik17 (via GitHub)" <gi...@apache.org>.
tibrewalpratik17 commented on issue #10467:
URL: https://github.com/apache/pinot/issues/10467#issuecomment-1747635286

   #10815 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] tibrewalpratik17 commented on issue #10467: _tmp folder not getting properly cleaned up leading to high disk usage

Posted by "tibrewalpratik17 (via GitHub)" <gi...@apache.org>.
tibrewalpratik17 commented on issue #10467:
URL: https://github.com/apache/pinot/issues/10467#issuecomment-1484215704

   @Jackie-Jiang yeah recently for some tables, I have seen a lot of folders under `_tmp`. One of the tables for which I saw this had a complicated nested structure we were trying out ingestion for but it was not able to ingest properly. This resulted in a lot of folders inside the `_tmp` directory for that table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] tibrewalpratik17 commented on issue #10467: _tmp folder not getting properly cleaned up leading to high disk usage

Posted by "tibrewalpratik17 (via GitHub)" <gi...@apache.org>.
tibrewalpratik17 commented on issue #10467:
URL: https://github.com/apache/pinot/issues/10467#issuecomment-1511651452

   @Jackie-Jiang we saw a case today where the `_tmp` folder created was huge in size (~60% disk size) and the total disk usage went up to 100%. The server crashed and during restart tries as the disk size was full, the server was unable to come back up. This led to the `_tmp` folder not getting deleted because of this change and we had to manually delete the folder to bring the server back up. 
   Any idea why we are seeing such huge `_tmp` folder sizes? I don't find anything suspicious from the logs. And how can we make sure that if segment seal fails or something, there is a proper cleanup of `_tmp` folder.
   PS: `_tmp` folder is not even reported in size on Pinot controller UI so debugging the right table responsible for high disk usage becomes tedious.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #10467: _tmp folder not getting properly cleaned up leading to high disk usage

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10467:
URL: https://github.com/apache/pinot/issues/10467#issuecomment-1513734499

   @tibrewalpratik17 Can you check what content is stored within the `_tmp` folder? You'll need to directly log onto the server. The `_tmp` folder should contain the temporary mmap files for the consuming segment, so it is quite surprising that it can take 60% of the disk.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #10467: _tmp folder not getting properly cleaned up leading to high disk usage

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10467:
URL: https://github.com/apache/pinot/issues/10467#issuecomment-1483896425

   Thanks for reporting the issue! The files under this folder is the temp files for the real-time consuming segments. Seems we only clean them up when segment is properly sealed. This can leave orphan files when segment is not sealed (e.g. server crash, segment deleted before sealing).
   Since these files are just for consuming segments, we can consider scanning all the data folders for realtime tables and remove the `_tmp` folder during server start.
   
   @tibrewalpratik17 Do you see multiple folders under `_tmp` folder?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] tibrewalpratik17 commented on issue #10467: _tmp folder not getting properly cleaned up leading to high disk usage

Posted by "tibrewalpratik17 (via GitHub)" <gi...@apache.org>.
tibrewalpratik17 commented on issue #10467:
URL: https://github.com/apache/pinot/issues/10467#issuecomment-1513740743

   @Jackie-Jiang the content of the `_tmp` folder was of this nature for the table I am talking about:
   
   ```
   922G	col1.dict
   0	col1.sv.unsorted.fwd
   27G	col2.dict
   0	col2.sv.sorted.fwd
   ```
   
   I had to manually delete this folder to get the server started again. Here `col1` and `col2` are just `STRING` based columns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #10467: _tmp folder not getting properly cleaned up leading to high disk usage

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10467:
URL: https://github.com/apache/pinot/issues/10467#issuecomment-1513818128

   Based on the file name, these are the dictionary file of the column. I have no idea why they are so large. Do you know the content of `col1`? Is it expected to be this size? The dictionary contains the unique values of the column.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org