You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Sungwook Yoon <sy...@maprtech.com> on 2015/08/23 16:07:16 UTC

Drill dir0 issue

Hi,

I am trying to use Hive parquet stored files partitioned by some column.
So, the directory structure is partitioned with the column.

The column is actually year.
Let's say there are 5 years, so dir0 are like year=2010,
year=2011,year=2012,year=2013,year=2014

We did like following
select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 =
'year=2012';

I get nothing.
Apparently, there are parquet files in the directory though.

Sometimes it picks up e.g., year=2010, but not year=2012..

Where am I going wrong with this?

Thanks,

Sungwook

Re: Drill dir0 issue

Posted by Sungwook Yoon <sy...@maprtech.com>.
Related weird behavior regarding Hive partitioned directories as dfs
storage.

I first created a view
create view tmp_view as select cast(substr(`dir0`, 6,4) as int) as `year`,
cast(aaa as varchar(100)) as aaa from dfs.root.`/user/hive/warehouse/table`
o;

select aaa from tmp_view where `year` between 2010 and 2012  limit 5;
returns following 5 rows.
+--------+
| V571   |
| V571   |
| 8363   |
| V8281  |
| 59970  |

... good.

Then,

select aaa from tmp_view where `year` between 2010 and 2012 and aaa like
'%V571%' limit 5;

returns no row...

Sungwook



On Sun, Aug 23, 2015 at 5:23 PM, Sungwook Yoon <sy...@maprtech.com> wrote:

>
> So, I filed the issue here,
>
> https://issues.apache.org/jira/browse/DRILL-3692
>
> If more details are needed let me know.
>
> Sungwook
>
>
> On Sun, Aug 23, 2015 at 2:45 PM, Aman Sinha <as...@maprtech.com> wrote:
>
>> Yes, I just realized that and was about to respond to my prior message.
>> I just tested with a directory structure similar to Sungwook's  (where
>> directories are named with 'year=2012' format) and it works for me.
>> But I am on the current master branch.
>> In the original message 'Sometimes it picks up e.g., year=2010, but not
>> year=2012..'   that clearly sounds like wrong result...
>> definitely file a JIRA with a repro.
>>
>> Aman
>>
>> On Sun, Aug 23, 2015 at 12:23 PM, Jacques Nadeau <ja...@dremio.com>
>> wrote:
>>
>> > The way that Sungwook is describing the issue, it has nothing to do with
>> > Hive.  The files were generated via Hive but he is querying directly
>> > through the DFS schema.
>> >
>> > --
>> > Jacques Nadeau
>> > CTO and Co-Founder, Dremio
>> >
>> > On Sun, Aug 23, 2015 at 12:20 PM, Aman Sinha <as...@maprtech.com>
>> wrote:
>> >
>> > > Sungwook, do you have the latest master build which has the fix for
>> Hive
>> > > partition pruning (DRILL-3121) ?
>> > >
>> > > On Sun, Aug 23, 2015 at 12:15 PM, Sungwook Yoon <sy...@maprtech.com>
>> > > wrote:
>> > >
>> > > > Will do,
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Sungwook
>> > > >
>> > > >
>> > > > On Sun, Aug 23, 2015 at 2:14 PM, Jacques Nadeau <jacques@dremio.com
>> >
>> > > > wrote:
>> > > >
>> > > > > It sounds like a bug. Can you file a jira?
>> > > > >
>> > > > > --
>> > > > > Jacques Nadeau
>> > > > > CTO and Co-Founder, Dremio
>> > > > >
>> > > > > On Sun, Aug 23, 2015 at 12:13 PM, Sungwook Yoon <
>> syoon@maprtech.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi Jacques,
>> > > > > >
>> > > > > > This works well, no problem of accessing the partitioned dirs.
>> > > > > > (and actually pretty faster than accessing from one level above)
>> > > > > >
>> > > > > > Just the issues I asked about, when I access from the
>> > > > > > /user/hive/warehouse/table, it somehow does not recover every
>> dir0.
>> > > > > >
>> > > > > > Sungwook
>> > > > > >
>> > > > > >
>> > > > > > On Sun, Aug 23, 2015 at 2:02 PM, Jacques Nadeau <
>> > jacques@dremio.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > I think Hsuan misunderstood your question.
>> > > > > > >
>> > > > > > > Can you let us know what you get if you query:
>> > > > > > >
>> > > > > > > select * from dfs.root.`/user/hive/warehouse/table/year=2012`
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Jacques Nadeau
>> > > > > > > CTO and Co-Founder, Dremio
>> > > > > > >
>> > > > > > > On Sun, Aug 23, 2015 at 7:07 AM, Sungwook Yoon <
>> > syoon@maprtech.com
>> > > >
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi,
>> > > > > > > >
>> > > > > > > > I am trying to use Hive parquet stored files partitioned by
>> > some
>> > > > > > column.
>> > > > > > > > So, the directory structure is partitioned with the column.
>> > > > > > > >
>> > > > > > > > The column is actually year.
>> > > > > > > > Let's say there are 5 years, so dir0 are like year=2010,
>> > > > > > > > year=2011,year=2012,year=2013,year=2014
>> > > > > > > >
>> > > > > > > > We did like following
>> > > > > > > > select * from dfs.root.`/user/hive/warehouse/table` d where
>> > > d.dir0
>> > > > =
>> > > > > > > > 'year=2012';
>> > > > > > > >
>> > > > > > > > I get nothing.
>> > > > > > > > Apparently, there are parquet files in the directory though.
>> > > > > > > >
>> > > > > > > > Sometimes it picks up e.g., year=2010, but not year=2012..
>> > > > > > > >
>> > > > > > > > Where am I going wrong with this?
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > >
>> > > > > > > > Sungwook
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Drill dir0 issue

Posted by Sungwook Yoon <sy...@maprtech.com>.
So, I filed the issue here,

https://issues.apache.org/jira/browse/DRILL-3692

If more details are needed let me know.

Sungwook


On Sun, Aug 23, 2015 at 2:45 PM, Aman Sinha <as...@maprtech.com> wrote:

> Yes, I just realized that and was about to respond to my prior message.
> I just tested with a directory structure similar to Sungwook's  (where
> directories are named with 'year=2012' format) and it works for me.
> But I am on the current master branch.
> In the original message 'Sometimes it picks up e.g., year=2010, but not
> year=2012..'   that clearly sounds like wrong result...
> definitely file a JIRA with a repro.
>
> Aman
>
> On Sun, Aug 23, 2015 at 12:23 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
> > The way that Sungwook is describing the issue, it has nothing to do with
> > Hive.  The files were generated via Hive but he is querying directly
> > through the DFS schema.
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Sun, Aug 23, 2015 at 12:20 PM, Aman Sinha <as...@maprtech.com>
> wrote:
> >
> > > Sungwook, do you have the latest master build which has the fix for
> Hive
> > > partition pruning (DRILL-3121) ?
> > >
> > > On Sun, Aug 23, 2015 at 12:15 PM, Sungwook Yoon <sy...@maprtech.com>
> > > wrote:
> > >
> > > > Will do,
> > > >
> > > > Thanks,
> > > >
> > > > Sungwook
> > > >
> > > >
> > > > On Sun, Aug 23, 2015 at 2:14 PM, Jacques Nadeau <ja...@dremio.com>
> > > > wrote:
> > > >
> > > > > It sounds like a bug. Can you file a jira?
> > > > >
> > > > > --
> > > > > Jacques Nadeau
> > > > > CTO and Co-Founder, Dremio
> > > > >
> > > > > On Sun, Aug 23, 2015 at 12:13 PM, Sungwook Yoon <
> syoon@maprtech.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Jacques,
> > > > > >
> > > > > > This works well, no problem of accessing the partitioned dirs.
> > > > > > (and actually pretty faster than accessing from one level above)
> > > > > >
> > > > > > Just the issues I asked about, when I access from the
> > > > > > /user/hive/warehouse/table, it somehow does not recover every
> dir0.
> > > > > >
> > > > > > Sungwook
> > > > > >
> > > > > >
> > > > > > On Sun, Aug 23, 2015 at 2:02 PM, Jacques Nadeau <
> > jacques@dremio.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I think Hsuan misunderstood your question.
> > > > > > >
> > > > > > > Can you let us know what you get if you query:
> > > > > > >
> > > > > > > select * from dfs.root.`/user/hive/warehouse/table/year=2012`
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Jacques Nadeau
> > > > > > > CTO and Co-Founder, Dremio
> > > > > > >
> > > > > > > On Sun, Aug 23, 2015 at 7:07 AM, Sungwook Yoon <
> > syoon@maprtech.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I am trying to use Hive parquet stored files partitioned by
> > some
> > > > > > column.
> > > > > > > > So, the directory structure is partitioned with the column.
> > > > > > > >
> > > > > > > > The column is actually year.
> > > > > > > > Let's say there are 5 years, so dir0 are like year=2010,
> > > > > > > > year=2011,year=2012,year=2013,year=2014
> > > > > > > >
> > > > > > > > We did like following
> > > > > > > > select * from dfs.root.`/user/hive/warehouse/table` d where
> > > d.dir0
> > > > =
> > > > > > > > 'year=2012';
> > > > > > > >
> > > > > > > > I get nothing.
> > > > > > > > Apparently, there are parquet files in the directory though.
> > > > > > > >
> > > > > > > > Sometimes it picks up e.g., year=2010, but not year=2012..
> > > > > > > >
> > > > > > > > Where am I going wrong with this?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Sungwook
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Drill dir0 issue

Posted by Aman Sinha <as...@maprtech.com>.
Yes, I just realized that and was about to respond to my prior message.
I just tested with a directory structure similar to Sungwook's  (where
directories are named with 'year=2012' format) and it works for me.
But I am on the current master branch.
In the original message 'Sometimes it picks up e.g., year=2010, but not
year=2012..'   that clearly sounds like wrong result...
definitely file a JIRA with a repro.

Aman

On Sun, Aug 23, 2015 at 12:23 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> The way that Sungwook is describing the issue, it has nothing to do with
> Hive.  The files were generated via Hive but he is querying directly
> through the DFS schema.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Sun, Aug 23, 2015 at 12:20 PM, Aman Sinha <as...@maprtech.com> wrote:
>
> > Sungwook, do you have the latest master build which has the fix for Hive
> > partition pruning (DRILL-3121) ?
> >
> > On Sun, Aug 23, 2015 at 12:15 PM, Sungwook Yoon <sy...@maprtech.com>
> > wrote:
> >
> > > Will do,
> > >
> > > Thanks,
> > >
> > > Sungwook
> > >
> > >
> > > On Sun, Aug 23, 2015 at 2:14 PM, Jacques Nadeau <ja...@dremio.com>
> > > wrote:
> > >
> > > > It sounds like a bug. Can you file a jira?
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Sun, Aug 23, 2015 at 12:13 PM, Sungwook Yoon <sy...@maprtech.com>
> > > > wrote:
> > > >
> > > > > Hi Jacques,
> > > > >
> > > > > This works well, no problem of accessing the partitioned dirs.
> > > > > (and actually pretty faster than accessing from one level above)
> > > > >
> > > > > Just the issues I asked about, when I access from the
> > > > > /user/hive/warehouse/table, it somehow does not recover every dir0.
> > > > >
> > > > > Sungwook
> > > > >
> > > > >
> > > > > On Sun, Aug 23, 2015 at 2:02 PM, Jacques Nadeau <
> jacques@dremio.com>
> > > > > wrote:
> > > > >
> > > > > > I think Hsuan misunderstood your question.
> > > > > >
> > > > > > Can you let us know what you get if you query:
> > > > > >
> > > > > > select * from dfs.root.`/user/hive/warehouse/table/year=2012`
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Jacques Nadeau
> > > > > > CTO and Co-Founder, Dremio
> > > > > >
> > > > > > On Sun, Aug 23, 2015 at 7:07 AM, Sungwook Yoon <
> syoon@maprtech.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am trying to use Hive parquet stored files partitioned by
> some
> > > > > column.
> > > > > > > So, the directory structure is partitioned with the column.
> > > > > > >
> > > > > > > The column is actually year.
> > > > > > > Let's say there are 5 years, so dir0 are like year=2010,
> > > > > > > year=2011,year=2012,year=2013,year=2014
> > > > > > >
> > > > > > > We did like following
> > > > > > > select * from dfs.root.`/user/hive/warehouse/table` d where
> > d.dir0
> > > =
> > > > > > > 'year=2012';
> > > > > > >
> > > > > > > I get nothing.
> > > > > > > Apparently, there are parquet files in the directory though.
> > > > > > >
> > > > > > > Sometimes it picks up e.g., year=2010, but not year=2012..
> > > > > > >
> > > > > > > Where am I going wrong with this?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Sungwook
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Drill dir0 issue

Posted by Jacques Nadeau <ja...@dremio.com>.
The way that Sungwook is describing the issue, it has nothing to do with
Hive.  The files were generated via Hive but he is querying directly
through the DFS schema.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Sun, Aug 23, 2015 at 12:20 PM, Aman Sinha <as...@maprtech.com> wrote:

> Sungwook, do you have the latest master build which has the fix for Hive
> partition pruning (DRILL-3121) ?
>
> On Sun, Aug 23, 2015 at 12:15 PM, Sungwook Yoon <sy...@maprtech.com>
> wrote:
>
> > Will do,
> >
> > Thanks,
> >
> > Sungwook
> >
> >
> > On Sun, Aug 23, 2015 at 2:14 PM, Jacques Nadeau <ja...@dremio.com>
> > wrote:
> >
> > > It sounds like a bug. Can you file a jira?
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Sun, Aug 23, 2015 at 12:13 PM, Sungwook Yoon <sy...@maprtech.com>
> > > wrote:
> > >
> > > > Hi Jacques,
> > > >
> > > > This works well, no problem of accessing the partitioned dirs.
> > > > (and actually pretty faster than accessing from one level above)
> > > >
> > > > Just the issues I asked about, when I access from the
> > > > /user/hive/warehouse/table, it somehow does not recover every dir0.
> > > >
> > > > Sungwook
> > > >
> > > >
> > > > On Sun, Aug 23, 2015 at 2:02 PM, Jacques Nadeau <ja...@dremio.com>
> > > > wrote:
> > > >
> > > > > I think Hsuan misunderstood your question.
> > > > >
> > > > > Can you let us know what you get if you query:
> > > > >
> > > > > select * from dfs.root.`/user/hive/warehouse/table/year=2012`
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Jacques Nadeau
> > > > > CTO and Co-Founder, Dremio
> > > > >
> > > > > On Sun, Aug 23, 2015 at 7:07 AM, Sungwook Yoon <syoon@maprtech.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am trying to use Hive parquet stored files partitioned by some
> > > > column.
> > > > > > So, the directory structure is partitioned with the column.
> > > > > >
> > > > > > The column is actually year.
> > > > > > Let's say there are 5 years, so dir0 are like year=2010,
> > > > > > year=2011,year=2012,year=2013,year=2014
> > > > > >
> > > > > > We did like following
> > > > > > select * from dfs.root.`/user/hive/warehouse/table` d where
> d.dir0
> > =
> > > > > > 'year=2012';
> > > > > >
> > > > > > I get nothing.
> > > > > > Apparently, there are parquet files in the directory though.
> > > > > >
> > > > > > Sometimes it picks up e.g., year=2010, but not year=2012..
> > > > > >
> > > > > > Where am I going wrong with this?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Sungwook
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Drill dir0 issue

Posted by Aman Sinha <as...@maprtech.com>.
Sungwook, do you have the latest master build which has the fix for Hive
partition pruning (DRILL-3121) ?

On Sun, Aug 23, 2015 at 12:15 PM, Sungwook Yoon <sy...@maprtech.com> wrote:

> Will do,
>
> Thanks,
>
> Sungwook
>
>
> On Sun, Aug 23, 2015 at 2:14 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
> > It sounds like a bug. Can you file a jira?
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Sun, Aug 23, 2015 at 12:13 PM, Sungwook Yoon <sy...@maprtech.com>
> > wrote:
> >
> > > Hi Jacques,
> > >
> > > This works well, no problem of accessing the partitioned dirs.
> > > (and actually pretty faster than accessing from one level above)
> > >
> > > Just the issues I asked about, when I access from the
> > > /user/hive/warehouse/table, it somehow does not recover every dir0.
> > >
> > > Sungwook
> > >
> > >
> > > On Sun, Aug 23, 2015 at 2:02 PM, Jacques Nadeau <ja...@dremio.com>
> > > wrote:
> > >
> > > > I think Hsuan misunderstood your question.
> > > >
> > > > Can you let us know what you get if you query:
> > > >
> > > > select * from dfs.root.`/user/hive/warehouse/table/year=2012`
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Sun, Aug 23, 2015 at 7:07 AM, Sungwook Yoon <sy...@maprtech.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am trying to use Hive parquet stored files partitioned by some
> > > column.
> > > > > So, the directory structure is partitioned with the column.
> > > > >
> > > > > The column is actually year.
> > > > > Let's say there are 5 years, so dir0 are like year=2010,
> > > > > year=2011,year=2012,year=2013,year=2014
> > > > >
> > > > > We did like following
> > > > > select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0
> =
> > > > > 'year=2012';
> > > > >
> > > > > I get nothing.
> > > > > Apparently, there are parquet files in the directory though.
> > > > >
> > > > > Sometimes it picks up e.g., year=2010, but not year=2012..
> > > > >
> > > > > Where am I going wrong with this?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Sungwook
> > > > >
> > > >
> > >
> >
>

Re: Drill dir0 issue

Posted by Sungwook Yoon <sy...@maprtech.com>.
Will do,

Thanks,

Sungwook


On Sun, Aug 23, 2015 at 2:14 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> It sounds like a bug. Can you file a jira?
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Sun, Aug 23, 2015 at 12:13 PM, Sungwook Yoon <sy...@maprtech.com>
> wrote:
>
> > Hi Jacques,
> >
> > This works well, no problem of accessing the partitioned dirs.
> > (and actually pretty faster than accessing from one level above)
> >
> > Just the issues I asked about, when I access from the
> > /user/hive/warehouse/table, it somehow does not recover every dir0.
> >
> > Sungwook
> >
> >
> > On Sun, Aug 23, 2015 at 2:02 PM, Jacques Nadeau <ja...@dremio.com>
> > wrote:
> >
> > > I think Hsuan misunderstood your question.
> > >
> > > Can you let us know what you get if you query:
> > >
> > > select * from dfs.root.`/user/hive/warehouse/table/year=2012`
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Sun, Aug 23, 2015 at 7:07 AM, Sungwook Yoon <sy...@maprtech.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying to use Hive parquet stored files partitioned by some
> > column.
> > > > So, the directory structure is partitioned with the column.
> > > >
> > > > The column is actually year.
> > > > Let's say there are 5 years, so dir0 are like year=2010,
> > > > year=2011,year=2012,year=2013,year=2014
> > > >
> > > > We did like following
> > > > select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 =
> > > > 'year=2012';
> > > >
> > > > I get nothing.
> > > > Apparently, there are parquet files in the directory though.
> > > >
> > > > Sometimes it picks up e.g., year=2010, but not year=2012..
> > > >
> > > > Where am I going wrong with this?
> > > >
> > > > Thanks,
> > > >
> > > > Sungwook
> > > >
> > >
> >
>

Re: Drill dir0 issue

Posted by Jacques Nadeau <ja...@dremio.com>.
It sounds like a bug. Can you file a jira?

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Sun, Aug 23, 2015 at 12:13 PM, Sungwook Yoon <sy...@maprtech.com> wrote:

> Hi Jacques,
>
> This works well, no problem of accessing the partitioned dirs.
> (and actually pretty faster than accessing from one level above)
>
> Just the issues I asked about, when I access from the
> /user/hive/warehouse/table, it somehow does not recover every dir0.
>
> Sungwook
>
>
> On Sun, Aug 23, 2015 at 2:02 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
> > I think Hsuan misunderstood your question.
> >
> > Can you let us know what you get if you query:
> >
> > select * from dfs.root.`/user/hive/warehouse/table/year=2012`
> >
> >
> >
> >
> >
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Sun, Aug 23, 2015 at 7:07 AM, Sungwook Yoon <sy...@maprtech.com>
> wrote:
> >
> > > Hi,
> > >
> > > I am trying to use Hive parquet stored files partitioned by some
> column.
> > > So, the directory structure is partitioned with the column.
> > >
> > > The column is actually year.
> > > Let's say there are 5 years, so dir0 are like year=2010,
> > > year=2011,year=2012,year=2013,year=2014
> > >
> > > We did like following
> > > select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 =
> > > 'year=2012';
> > >
> > > I get nothing.
> > > Apparently, there are parquet files in the directory though.
> > >
> > > Sometimes it picks up e.g., year=2010, but not year=2012..
> > >
> > > Where am I going wrong with this?
> > >
> > > Thanks,
> > >
> > > Sungwook
> > >
> >
>

Re: Drill dir0 issue

Posted by Sungwook Yoon <sy...@maprtech.com>.
Hi Jacques,

This works well, no problem of accessing the partitioned dirs.
(and actually pretty faster than accessing from one level above)

Just the issues I asked about, when I access from the
/user/hive/warehouse/table, it somehow does not recover every dir0.

Sungwook


On Sun, Aug 23, 2015 at 2:02 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> I think Hsuan misunderstood your question.
>
> Can you let us know what you get if you query:
>
> select * from dfs.root.`/user/hive/warehouse/table/year=2012`
>
>
>
>
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Sun, Aug 23, 2015 at 7:07 AM, Sungwook Yoon <sy...@maprtech.com> wrote:
>
> > Hi,
> >
> > I am trying to use Hive parquet stored files partitioned by some column.
> > So, the directory structure is partitioned with the column.
> >
> > The column is actually year.
> > Let's say there are 5 years, so dir0 are like year=2010,
> > year=2011,year=2012,year=2013,year=2014
> >
> > We did like following
> > select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 =
> > 'year=2012';
> >
> > I get nothing.
> > Apparently, there are parquet files in the directory though.
> >
> > Sometimes it picks up e.g., year=2010, but not year=2012..
> >
> > Where am I going wrong with this?
> >
> > Thanks,
> >
> > Sungwook
> >
>

Re: Drill dir0 issue

Posted by Jacques Nadeau <ja...@dremio.com>.
I think Hsuan misunderstood your question.

Can you let us know what you get if you query:

select * from dfs.root.`/user/hive/warehouse/table/year=2012`







--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Sun, Aug 23, 2015 at 7:07 AM, Sungwook Yoon <sy...@maprtech.com> wrote:

> Hi,
>
> I am trying to use Hive parquet stored files partitioned by some column.
> So, the directory structure is partitioned with the column.
>
> The column is actually year.
> Let's say there are 5 years, so dir0 are like year=2010,
> year=2011,year=2012,year=2013,year=2014
>
> We did like following
> select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 =
> 'year=2012';
>
> I get nothing.
> Apparently, there are parquet files in the directory though.
>
> Sometimes it picks up e.g., year=2010, but not year=2012..
>
> Where am I going wrong with this?
>
> Thanks,
>
> Sungwook
>

Re: Drill dir0 issue

Posted by Andries Engelbrecht <ae...@maprtech.com>.
Hi Sungwook,

Can you create a different directory with a few files in each sub directory, but use 2012, 2013, 2014instead of year=2012, etc.
Might be a good test to see if the directory naming structure of year=xxxx is tripping up Drill on directory pruning.

—Andries


> On Aug 23, 2015, at 9:47 AM, Kristine Hahn <kh...@maprtech.com> wrote:
> 
> If you set up your data in directories like the log data in the
> Querying Directories example on
> http://drill.apache.org/docs/querying-directories, which uses WHERE
> dir0='2013' LIMIT 10 in the query, and you are having intermittent
> Table Not Found results, look for hidden files in the directory you
> are querying. The files must be compatible--they must have comparable
> data types and columns in the same order. Hidden files that do not
> have comparable data types can cause a Table Not Found error.
> Kristine Hahn
> Sr. Technical Writer
> 415-497-8107 @krishahn skype:krishahn
> 
> 
> 
> On Sun, Aug 23, 2015 at 9:01 AM, USC <hs...@usc.edu> wrote:
>> Hi Sungwook,
>> In your where clause, you only need to say year=2012.
>> 
>> The directory column (e.g., dir0) is used when users query a directory.
>> 
>> 
>>> On Aug 23, 2015, at 7:07 AM, Sungwook Yoon <sy...@maprtech.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I am trying to use Hive parquet stored files partitioned by some column.
>>> So, the directory structure is partitioned with the column.
>>> 
>>> The column is actually year.
>>> Let's say there are 5 years, so dir0 are like year=2010,
>>> year=2011,year=2012,year=2013,year=2014
>>> 
>>> We did like following
>>> select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 =
>>> 'year=2012';
>>> 
>>> I get nothing.
>>> Apparently, there are parquet files in the directory though.
>>> 
>>> Sometimes it picks up e.g., year=2010, but not year=2012..
>>> 
>>> Where am I going wrong with this?
>>> 
>>> Thanks,
>>> 
>>> Sungwook


Re: Drill dir0 issue

Posted by Kristine Hahn <kh...@maprtech.com>.
If you set up your data in directories like the log data in the
Querying Directories example on
http://drill.apache.org/docs/querying-directories, which uses WHERE
dir0='2013' LIMIT 10 in the query, and you are having intermittent
Table Not Found results, look for hidden files in the directory you
are querying. The files must be compatible--they must have comparable
data types and columns in the same order. Hidden files that do not
have comparable data types can cause a Table Not Found error.
Kristine Hahn
Sr. Technical Writer
415-497-8107 @krishahn skype:krishahn



On Sun, Aug 23, 2015 at 9:01 AM, USC <hs...@usc.edu> wrote:
> Hi Sungwook,
> In your where clause, you only need to say year=2012.
>
> The directory column (e.g., dir0) is used when users query a directory.
>
>
>> On Aug 23, 2015, at 7:07 AM, Sungwook Yoon <sy...@maprtech.com> wrote:
>>
>> Hi,
>>
>> I am trying to use Hive parquet stored files partitioned by some column.
>> So, the directory structure is partitioned with the column.
>>
>> The column is actually year.
>> Let's say there are 5 years, so dir0 are like year=2010,
>> year=2011,year=2012,year=2013,year=2014
>>
>> We did like following
>> select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 =
>> 'year=2012';
>>
>> I get nothing.
>> Apparently, there are parquet files in the directory though.
>>
>> Sometimes it picks up e.g., year=2010, but not year=2012..
>>
>> Where am I going wrong with this?
>>
>> Thanks,
>>
>> Sungwook

Re: Drill dir0 issue

Posted by USC <hs...@usc.edu>.
Hi Sungwook,
In your where clause, you only need to say year=2012.

The directory column (e.g., dir0) is used when users query a directory.


> On Aug 23, 2015, at 7:07 AM, Sungwook Yoon <sy...@maprtech.com> wrote:
> 
> Hi,
> 
> I am trying to use Hive parquet stored files partitioned by some column.
> So, the directory structure is partitioned with the column.
> 
> The column is actually year.
> Let's say there are 5 years, so dir0 are like year=2010,
> year=2011,year=2012,year=2013,year=2014
> 
> We did like following
> select * from dfs.root.`/user/hive/warehouse/table` d where d.dir0 =
> 'year=2012';
> 
> I get nothing.
> Apparently, there are parquet files in the directory though.
> 
> Sometimes it picks up e.g., year=2010, but not year=2012..
> 
> Where am I going wrong with this?
> 
> Thanks,
> 
> Sungwook