You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by sreeparna bhabani <bh...@gmail.com> on 2020/05/09 08:22:07 UTC
Error while creating Parquet from database : External Sort
encountered error while spilling to disk
Hi Team,
Facing one issue while creating Parquet file in Drill from database. I have
created one Jira ticket with details of log-
https://issues.apache.org/jira/browse/DRILL-7737
*Summary-*
I am creating one Parquet file from database using CTAS. But getting error
"*External Sort encountered an error while spilling to disk*" while I am
creating with PARTITION BY clause.
*Version of Apache Drill* -
1.17
*Memory config-*
DRILL_HEAP=16 G
DRILL_MAX_DIRECT_MEMORY=32G
*Configs which I tried-*
store.parquet.reader.pagereader.async=true;
store.parquet.reader.pagereader.bufferedread=false;
planner.memory.max_query_memory_per_node=31147483648
drill.exec.memory.operator.output_batch_size=4194304
*Details of volume-*
The number of rows for which I am trying to CTAS PARTITION BY is 14424482.
No of columns 145.
There are 3 columns in Partition By clause.
The size of the Parquet file is less than 1 GB which is generated from
Python from the same dataset in SNAPPY compression.
FYI - I am able to create Parquet in Drill using CTAS *without* PARTITION
BY.
*CTAS script-*
CREATE TABLE dfs.root.<Table_name>
PARTITION BY (<Column1>,<Column2>,<Column3>)
AS SELECT *
FROM db.<Table>;
Please suggest how we can fix this.
Thanks n Regards,
*Sreeparna Bhabani*