You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by sreeparna bhabani <bh...@gmail.com> on 2020/05/09 08:22:07 UTC

Error while creating Parquet from database : External Sort encountered error while spilling to disk

Hi Team,

Facing one issue while creating Parquet file in Drill from database. I have
created one Jira ticket with details of log-
https://issues.apache.org/jira/browse/DRILL-7737

*Summary-*

I am creating one Parquet file from database using CTAS.  But getting error
"*External Sort encountered an error while spilling to disk*" while I am
creating with PARTITION BY clause.

*Version of Apache Drill* -

1.17

*Memory config-*

DRILL_HEAP=16 G
DRILL_MAX_DIRECT_MEMORY=32G

*Configs which I tried-*

store.parquet.reader.pagereader.async=true;

store.parquet.reader.pagereader.bufferedread=false;

planner.memory.max_query_memory_per_node=31147483648

drill.exec.memory.operator.output_batch_size=4194304

*Details of volume-*

The number of rows for which I am trying to CTAS PARTITION BY  is 14424482.
No of columns 145.

There are 3 columns in Partition By clause.

The size of the Parquet file is less than 1 GB which is generated from
Python from the same dataset in SNAPPY compression.

FYI - I am able to create Parquet in Drill using CTAS *without* PARTITION
BY.

*CTAS script-*

CREATE TABLE dfs.root.<Table_name>
PARTITION BY (<Column1>,<Column2>,<Column3>)
AS SELECT *
FROM db.<Table>;

Please suggest how we can fix this.

Thanks n Regards,
*Sreeparna Bhabani*