You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Hao Zhu <hz...@maprtech.com> on 2015/04/09 01:32:53 UTC

Is there any way to create a parquet file with multiple parquet blocks using Drill?

Hi Team,

"store.parquet.block-size" can control the parquet block size in Drill.
When creating a table like this:

ALTER SESSION SET `store.format` = 'parquet';
ALTER SESSION SET `store.parquet.block-size` = 10485760;    --10MB block
size
CREATE TABLE dfs.root.`hao/parquet_tables/parq_10m` AS
(SELECT * FROM hive.`sometable`);

All resulting files are with size 10M(Same as parquet block size).

My question is:
Is there any way to create a parquet file with multiple parquet blocks?

Thanks,
Hao

Re: Is there any way to create a parquet file with multiple parquet blocks using Drill?

Posted by Hao Zhu <hz...@maprtech.com>.
Thanks Steven and Jacques.

I found impala can not do this either.
Actually I know setting parquet.block.size to the same chunk size/HDFS
block size is the best practice.
My goal was to use Drill to generate the parquet files with multiple
parquet row groups/blocks.

Seems I should use pig to do this.

Thanks,
Hao



On Wed, Apr 8, 2015 at 6:05 PM, Jacques Nadeau <ja...@apache.org> wrote:

> The reason it is not commonly used is typically the goal with Parquet is to
> have no more that a Parquet row group should always be contained within a
> single block replica set (to guarantee the possibility of total locality).
> The easiest way to guarantee this is to keep your Parquet row group format
> at or slightly smaller than your HDFS block size.
>
> On Wed, Apr 8, 2015 at 5:52 PM, Steven Phillips <sp...@maprtech.com>
> wrote:
>
> > No, this is currently not possible with drill.
> >
> > It's generally not recommended to do that anyway, so I don't know if this
> > will ever be supported by drill.
> >
> > On Wed, Apr 8, 2015 at 4:32 PM, Hao Zhu <hz...@maprtech.com> wrote:
> >
> > > Hi Team,
> > >
> > > "store.parquet.block-size" can control the parquet block size in Drill.
> > > When creating a table like this:
> > >
> > > ALTER SESSION SET `store.format` = 'parquet';
> > > ALTER SESSION SET `store.parquet.block-size` = 10485760;    --10MB
> block
> > > size
> > > CREATE TABLE dfs.root.`hao/parquet_tables/parq_10m` AS
> > > (SELECT * FROM hive.`sometable`);
> > >
> > > All resulting files are with size 10M(Same as parquet block size).
> > >
> > > My question is:
> > > Is there any way to create a parquet file with multiple parquet blocks?
> > >
> > > Thanks,
> > > Hao
> > >
> >
> >
> >
> > --
> >  Steven Phillips
> >  Software Engineer
> >
> >  mapr.com
> >
>

Re: Is there any way to create a parquet file with multiple parquet blocks using Drill?

Posted by Jacques Nadeau <ja...@apache.org>.
The reason it is not commonly used is typically the goal with Parquet is to
have no more that a Parquet row group should always be contained within a
single block replica set (to guarantee the possibility of total locality).
The easiest way to guarantee this is to keep your Parquet row group format
at or slightly smaller than your HDFS block size.

On Wed, Apr 8, 2015 at 5:52 PM, Steven Phillips <sp...@maprtech.com>
wrote:

> No, this is currently not possible with drill.
>
> It's generally not recommended to do that anyway, so I don't know if this
> will ever be supported by drill.
>
> On Wed, Apr 8, 2015 at 4:32 PM, Hao Zhu <hz...@maprtech.com> wrote:
>
> > Hi Team,
> >
> > "store.parquet.block-size" can control the parquet block size in Drill.
> > When creating a table like this:
> >
> > ALTER SESSION SET `store.format` = 'parquet';
> > ALTER SESSION SET `store.parquet.block-size` = 10485760;    --10MB block
> > size
> > CREATE TABLE dfs.root.`hao/parquet_tables/parq_10m` AS
> > (SELECT * FROM hive.`sometable`);
> >
> > All resulting files are with size 10M(Same as parquet block size).
> >
> > My question is:
> > Is there any way to create a parquet file with multiple parquet blocks?
> >
> > Thanks,
> > Hao
> >
>
>
>
> --
>  Steven Phillips
>  Software Engineer
>
>  mapr.com
>

Re: Is there any way to create a parquet file with multiple parquet blocks using Drill?

Posted by Steven Phillips <sp...@maprtech.com>.
No, this is currently not possible with drill.

It's generally not recommended to do that anyway, so I don't know if this
will ever be supported by drill.

On Wed, Apr 8, 2015 at 4:32 PM, Hao Zhu <hz...@maprtech.com> wrote:

> Hi Team,
>
> "store.parquet.block-size" can control the parquet block size in Drill.
> When creating a table like this:
>
> ALTER SESSION SET `store.format` = 'parquet';
> ALTER SESSION SET `store.parquet.block-size` = 10485760;    --10MB block
> size
> CREATE TABLE dfs.root.`hao/parquet_tables/parq_10m` AS
> (SELECT * FROM hive.`sometable`);
>
> All resulting files are with size 10M(Same as parquet block size).
>
> My question is:
> Is there any way to create a parquet file with multiple parquet blocks?
>
> Thanks,
> Hao
>



-- 
 Steven Phillips
 Software Engineer

 mapr.com