You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Ashu Pachauri <as...@gmail.com> on 2021/09/23 15:08:08 UTC

Very slow parquet write performance due to single threaded write

Hi,

I have been trying to load a medium sized csv file (22 million rows and  20
columns) into a parquet table using Drill's CTAS statement.

However, now matter what I try, the parquet writer in the query plan has
only one associated minor fragment and thus runs in a single thread.  I
have tried a simple query with/without order by and with/without partition
by clauses without much success.

Is this a limitation of Drill that even in the presence of partition by
clause ( and absence of any order by), the writes in CTAS are single
threaded or I am missing something?


Thanks and Regards,
Ashu Pachauri

Re: Very slow parquet write performance due to single threaded write

Posted by Ted Dunning <td...@apache.org>.
Ashu,

Did you send this same message to a different list (possibly dev@drill?)?

I remember answering it with some timing information, but see that you don't have an answer here.

On 2021/09/23 15:08:08, Ashu Pachauri <as...@gmail.com> wrote: 
> Hi,
> 
> I have been trying to load a medium sized csv file (22 million rows and  20
> columns) into a parquet table using Drill's CTAS statement.
> 
> However, now matter what I try, the parquet writer in the query plan has
> only one associated minor fragment and thus runs in a single thread.  I
> have tried a simple query with/without order by and with/without partition
> by clauses without much success.
> 
> Is this a limitation of Drill that even in the presence of partition by
> clause ( and absence of any order by), the writes in CTAS are single
> threaded or I am missing something?
> 
> 
> Thanks and Regards,
> Ashu Pachauri
>