You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Carboni, Andrea" <an...@hpe.com> on 2015/10/07 17:53:55 UTC

DRILL-3376 Reading individual files created by CTAS with partition causes an exception

Hi all,

could be possible to include in Drill 1.2 the fix for this bug (3376)? The usage of Parquet files without the possibility of using WHERE conditions on dates is very limiting.

Regards,
Andrea



Re: DRILL-3376 Reading individual files created by CTAS with partition causes an exception

Posted by Gianfranco Luceri <gl...@gmail.com>.
Hi all,
regarding this issue and the request from Andrea, I can confirm it occours, at least, when the selection is a path to a single file which is single-valued on the column in the where clause. 

Furthermore, it occours when the selection is a directory which contains subdirectory each of which contains a single file which is single-valued on the column in the where clause. 
Our scenario is:

/benchmark/lineitem/1992/01/01/lineitem.parquet
/benchmark/lineitem/1992/01/02/lineitem.parquet
...
/benchmark/lineitem/1998/01/01/lineitem.parquet
...

and every file "lineitem.parquet" is single-valued on the date column (l_shipdate).

Then executing tpc-h query 1 will cause the error.

The environment consist of 8 drillbits (ver. 1.1.0) node on a 8 node hdfs hadoop cluster (Hortonworks 2.3, hadoop 2.7.1).


Strangely (at least for me), the error is not produced when the same query is executed on the same scenario but on a "pseudo"-cluster composed by a single drillbit instance (1.1.0) on a "pseudo"-hdfs cluster composed by a single hadoop instance (2.7.1 vanilla)


Hope it helps.

Regards,
Gianfranco

On Wednesday, October 07, 2015 10:00:07 AM Steven Phillips wrote:
> That bug only occurs when the selection is a path to a single file, and
> that file is single-valued on the column in the where clause.
> 
> The more common use case of querying a directory which contains parquet
> files that are each single-valued on a date column does not have this
> problem.
> 
> Are you seeing this or a similar issue in your queries?
> 
> On Wed, Oct 7, 2015 at 8:53 AM, Carboni, Andrea <an...@hpe.com>
> wrote:
> 
> > Hi all,
> >
> > could be possible to include in Drill 1.2 the fix for this bug (3376)? The
> > usage of Parquet files without the possibility of using WHERE conditions on
> > dates is very limiting.
> >
> > Regards,
> > Andrea
> >
> >
> >

Re: DRILL-3376 Reading individual files created by CTAS with partition causes an exception

Posted by Steven Phillips <st...@dremio.com>.
That bug only occurs when the selection is a path to a single file, and
that file is single-valued on the column in the where clause.

The more common use case of querying a directory which contains parquet
files that are each single-valued on a date column does not have this
problem.

Are you seeing this or a similar issue in your queries?

On Wed, Oct 7, 2015 at 8:53 AM, Carboni, Andrea <an...@hpe.com>
wrote:

> Hi all,
>
> could be possible to include in Drill 1.2 the fix for this bug (3376)? The
> usage of Parquet files without the possibility of using WHERE conditions on
> dates is very limiting.
>
> Regards,
> Andrea
>
>
>