You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Ashu Pachauri <as...@gmail.com> on 2021/09/23 15:08:08 UTC
Very slow parquet write performance due to single threaded write
Hi,
I have been trying to load a medium sized csv file (22 million rows and 20
columns) into a parquet table using Drill's CTAS statement.
However, now matter what I try, the parquet writer in the query plan has
only one associated minor fragment and thus runs in a single thread. I
have tried a simple query with/without order by and with/without partition
by clauses without much success.
Is this a limitation of Drill that even in the presence of partition by
clause ( and absence of any order by), the writes in CTAS are single
threaded or I am missing something?
Thanks and Regards,
Ashu Pachauri
Re: Very slow parquet write performance due to single threaded write
Posted by Ted Dunning <td...@apache.org>.
Ashu,
Did you send this same message to a different list (possibly dev@drill?)?
I remember answering it with some timing information, but see that you don't have an answer here.
On 2021/09/23 15:08:08, Ashu Pachauri <as...@gmail.com> wrote:
> Hi,
>
> I have been trying to load a medium sized csv file (22 million rows and 20
> columns) into a parquet table using Drill's CTAS statement.
>
> However, now matter what I try, the parquet writer in the query plan has
> only one associated minor fragment and thus runs in a single thread. I
> have tried a simple query with/without order by and with/without partition
> by clauses without much success.
>
> Is this a limitation of Drill that even in the presence of partition by
> clause ( and absence of any order by), the writes in CTAS are single
> threaded or I am missing something?
>
>
> Thanks and Regards,
> Ashu Pachauri
>