You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/03/30 16:32:00 UTC

[jira] [Commented] (DRILL-8416) Memory leak when the async Parquet reader skips empty pages

    [ https://issues.apache.org/jira/browse/DRILL-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706953#comment-17706953 ] 

ASF GitHub Bot commented on DRILL-8416:
---------------------------------------

jnturton opened a new pull request, #2784:
URL: https://github.com/apache/drill/pull/2784

   # [DRILL-8416](https://issues.apache.org/jira/browse/DRILL-8416): Memory leak when the async Parquet reader skips empty pages
   
   ## Description
   
   A regression introduced by the Parquet reader clean-up released in Drill 1.20 has meant that buffers used for (non-empty) compressed data holding _empty_ dictionary or data pages which are skipped are not freed. Because empty pages are uncommon in real data this bug went undetected for a long time.
   
   ## Documentation
   N/A
   
   ## Testing
   New unit test.
   




> Memory leak when the async Parquet reader skips empty pages
> -----------------------------------------------------------
>
>                 Key: DRILL-8416
>                 URL: https://issues.apache.org/jira/browse/DRILL-8416
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.21.0
>            Reporter: Matthias Rosenthaler
>            Assignee: James Turton
>            Priority: Major
>             Fix For: 1.21.1
>
>         Attachments: example.parquet, meta_steps.parquet
>
>
> If I try to query (
> {code:java}
> SELECT * FROM `hdfs.data`.`./v2/meta_steps/me-2023-03-20-13-15-30-inv230021-kontrollsystemf39st9qrx20-03-2/meta_steps.parquet`{code}
> ) the following parquet file which is stored on hadoop file system I am getting the following error:
> {code:java}
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory leaked: (64) Allocator(op:0:0:1:ParquetRowGroupScan) 1000000/64/34688/10000000000 (res/actual/peak/limit){code}
> Everything is working fine with drill version 1.19.
> If I select only columns without NULL values, the query also works in 1.21.0:
> {code:java}
> SELECT `name`,`type` FROM `hdfs.data`.`./v2/meta_steps/me-2023-03-20-13-15-30-inv230021-kontrollsystemf39st9qrx20-03-2/meta_steps.parquet`{code}
> Generated a new example.parquet with pyarrow 8.0.0 and a float column with NULL valuues and the same error happened.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)