You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Christopher Matta <cm...@mapr.com> on 2014/10/11 22:24:00 UTC

Query on CSV data not returning to the sqlline prompt

My data is in a directory on a 4 node MapR cluster, the csv files are
partitioned into 4 dirs, each with 100 csv files with 10 million records,
totaling 4 billion records.

When I do a SELECT * FROM FACT LIMIT 10; the data returns, however the
sqlline prompt never comes back and the profile says the query is still
running. I can ctrl+c the query which returns the10 rows selected (360.949
seconds) report and the sqlline prompt, but the profiles page shows the
query as still running.

The drillbit.log doesn’t show any errors during this.

Seems like a bug, but I would like to gather more evidence before
submitting it, any ideas?

Chris Matta
cmatta@mapr.com
215-701-3146
​

Re: Query on CSV data not returning to the sqlline prompt

Posted by Jinfeng Ni <ji...@gmail.com>.
Couple days ago, there were some discussions regarding to the "limit"
operator in the following drill-dev thread:

http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201410.mbox/%3cCAAOiHjEf+iEp_H6isx4mREnYWcVjjBitTYvK7YKN+8WQi7+fZw@mail.gmail.com%3e


On Mon, Oct 13, 2014 at 7:27 AM, mufy <mu...@gmail.com> wrote:

> Just so that all are aware:
>
> We're noticing this behavior irrespective of the data size. Even a simple
> show tables against 'sys' DB itself is malfunctioning. Continuing to look
> into the health of the cluster setup.
>
>
> ---
> Mufeed Usman
> My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My
> Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal
> <http://mufeed.livejournal.com>
>
>
>
>
> On Sun, Oct 12, 2014 at 2:06 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > The question implied by what Chris asks is whether there is a mechanism
> > whereby a consumer can notify a producer that no more data is required?
> >
> > In the case of this query, once Chris has his 10 rows, shouldn't the
> > consumer signal (transitively) all upstream parts of the query that they
> > can go home now?
> >
> >
> > On Sat, Oct 11, 2014 at 1:24 PM, Christopher Matta <cm...@mapr.com>
> > wrote:
> >
> > > My data is in a directory on a 4 node MapR cluster, the csv files are
> > > partitioned into 4 dirs, each with 100 csv files with 10 million
> records,
> > > totaling 4 billion records.
> > >
> > > When I do a SELECT * FROM FACT LIMIT 10; the data returns, however the
> > > sqlline prompt never comes back and the profile says the query is still
> > > running. I can ctrl+c the query which returns the10 rows selected
> > (360.949
> > > seconds) report and the sqlline prompt, but the profiles page shows the
> > > query as still running.
> > >
> > > The drillbit.log doesn’t show any errors during this.
> > >
> > > Seems like a bug, but I would like to gather more evidence before
> > > submitting it, any ideas?
> > >
> > > Chris Matta
> > > cmatta@mapr.com
> > > 215-701-3146
> > > ​
> > >
> >
>

Re: Query on CSV data not returning to the sqlline prompt

Posted by mufy <mu...@gmail.com>.
Just so that all are aware:

We're noticing this behavior irrespective of the data size. Even a simple
show tables against 'sys' DB itself is malfunctioning. Continuing to look
into the health of the cluster setup.


---
Mufeed Usman
My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My
Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal
<http://mufeed.livejournal.com>




On Sun, Oct 12, 2014 at 2:06 AM, Ted Dunning <te...@gmail.com> wrote:

> The question implied by what Chris asks is whether there is a mechanism
> whereby a consumer can notify a producer that no more data is required?
>
> In the case of this query, once Chris has his 10 rows, shouldn't the
> consumer signal (transitively) all upstream parts of the query that they
> can go home now?
>
>
> On Sat, Oct 11, 2014 at 1:24 PM, Christopher Matta <cm...@mapr.com>
> wrote:
>
> > My data is in a directory on a 4 node MapR cluster, the csv files are
> > partitioned into 4 dirs, each with 100 csv files with 10 million records,
> > totaling 4 billion records.
> >
> > When I do a SELECT * FROM FACT LIMIT 10; the data returns, however the
> > sqlline prompt never comes back and the profile says the query is still
> > running. I can ctrl+c the query which returns the10 rows selected
> (360.949
> > seconds) report and the sqlline prompt, but the profiles page shows the
> > query as still running.
> >
> > The drillbit.log doesn’t show any errors during this.
> >
> > Seems like a bug, but I would like to gather more evidence before
> > submitting it, any ideas?
> >
> > Chris Matta
> > cmatta@mapr.com
> > 215-701-3146
> > ​
> >
>

Re: Query on CSV data not returning to the sqlline prompt

Posted by Parth Chandra <pc...@maprtech.com>.
DRILL-1435 is the same problem I think.

On Wed, Oct 15, 2014 at 11:52 PM, mufy <mu...@gmail.com> wrote:

> Parth,
>
> Was interested to know if this issue was reproducible? And if so, is there
> a JIRA already filed?
>
>
> ---
> Mufeed Usman
> My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My
> Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal
> <http://mufeed.livejournal.com>
>
>
>
>
> On Tue, Oct 14, 2014 at 3:48 AM, Christopher Matta <cm...@mapr.com>
> wrote:
>
> > Thanks for the responses. I can confirm that queries without a LIMIT
> > operator run fully and do return. I'll continue to look into this on my
> > end.
> >
> > Chris Matta
> > cmatta@mapr.com
> > 215-701-3146
> >
> > On Mon, Oct 13, 2014 at 1:50 PM, Parth Chandra <pc...@maprtech.com>
> > wrote:
> >
> > > There is a mechanism in place where the fragments running on different
> > > drillbits are informed that they can stop producing data. The same
> > > mechanism is used to cancel queries (for example when you Ctrl-C from
> > > sqlline), as well as when the limit operator is used.
> > > Initial testing shows that this is working fine, but there is some
> > > condition where it is not working correctly.
> > > I'll be working on trying to reproducing this in a debug environment.
> > >
> > > On Sat, Oct 11, 2014 at 1:36 PM, Ted Dunning <te...@gmail.com>
> > > wrote:
> > >
> > > > The question implied by what Chris asks is whether there is a
> mechanism
> > > > whereby a consumer can notify a producer that no more data is
> required?
> > > >
> > > > In the case of this query, once Chris has his 10 rows, shouldn't the
> > > > consumer signal (transitively) all upstream parts of the query that
> > they
> > > > can go home now?
> > > >
> > > >
> > > > On Sat, Oct 11, 2014 at 1:24 PM, Christopher Matta <cm...@mapr.com>
> > > > wrote:
> > > >
> > > > > My data is in a directory on a 4 node MapR cluster, the csv files
> are
> > > > > partitioned into 4 dirs, each with 100 csv files with 10 million
> > > records,
> > > > > totaling 4 billion records.
> > > > >
> > > > > When I do a SELECT * FROM FACT LIMIT 10; the data returns, however
> > the
> > > > > sqlline prompt never comes back and the profile says the query is
> > still
> > > > > running. I can ctrl+c the query which returns the10 rows selected
> > > > (360.949
> > > > > seconds) report and the sqlline prompt, but the profiles page shows
> > the
> > > > > query as still running.
> > > > >
> > > > > The drillbit.log doesn’t show any errors during this.
> > > > >
> > > > > Seems like a bug, but I would like to gather more evidence before
> > > > > submitting it, any ideas?
> > > > >
> > > > > Chris Matta
> > > > > cmatta@mapr.com
> > > > > 215-701-3146
> > > > > ​
> > > > >
> > > >
> > >
> >
>

Re: Query on CSV data not returning to the sqlline prompt

Posted by mufy <mu...@gmail.com>.
Parth,

Was interested to know if this issue was reproducible? And if so, is there
a JIRA already filed?


---
Mufeed Usman
My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My
Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal
<http://mufeed.livejournal.com>




On Tue, Oct 14, 2014 at 3:48 AM, Christopher Matta <cm...@mapr.com> wrote:

> Thanks for the responses. I can confirm that queries without a LIMIT
> operator run fully and do return. I'll continue to look into this on my
> end.
>
> Chris Matta
> cmatta@mapr.com
> 215-701-3146
>
> On Mon, Oct 13, 2014 at 1:50 PM, Parth Chandra <pc...@maprtech.com>
> wrote:
>
> > There is a mechanism in place where the fragments running on different
> > drillbits are informed that they can stop producing data. The same
> > mechanism is used to cancel queries (for example when you Ctrl-C from
> > sqlline), as well as when the limit operator is used.
> > Initial testing shows that this is working fine, but there is some
> > condition where it is not working correctly.
> > I'll be working on trying to reproducing this in a debug environment.
> >
> > On Sat, Oct 11, 2014 at 1:36 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > The question implied by what Chris asks is whether there is a mechanism
> > > whereby a consumer can notify a producer that no more data is required?
> > >
> > > In the case of this query, once Chris has his 10 rows, shouldn't the
> > > consumer signal (transitively) all upstream parts of the query that
> they
> > > can go home now?
> > >
> > >
> > > On Sat, Oct 11, 2014 at 1:24 PM, Christopher Matta <cm...@mapr.com>
> > > wrote:
> > >
> > > > My data is in a directory on a 4 node MapR cluster, the csv files are
> > > > partitioned into 4 dirs, each with 100 csv files with 10 million
> > records,
> > > > totaling 4 billion records.
> > > >
> > > > When I do a SELECT * FROM FACT LIMIT 10; the data returns, however
> the
> > > > sqlline prompt never comes back and the profile says the query is
> still
> > > > running. I can ctrl+c the query which returns the10 rows selected
> > > (360.949
> > > > seconds) report and the sqlline prompt, but the profiles page shows
> the
> > > > query as still running.
> > > >
> > > > The drillbit.log doesn’t show any errors during this.
> > > >
> > > > Seems like a bug, but I would like to gather more evidence before
> > > > submitting it, any ideas?
> > > >
> > > > Chris Matta
> > > > cmatta@mapr.com
> > > > 215-701-3146
> > > > ​
> > > >
> > >
> >
>

Re: Query on CSV data not returning to the sqlline prompt

Posted by Christopher Matta <cm...@mapr.com>.
Thanks for the responses. I can confirm that queries without a LIMIT
operator run fully and do return. I'll continue to look into this on my end.

Chris Matta
cmatta@mapr.com
215-701-3146

On Mon, Oct 13, 2014 at 1:50 PM, Parth Chandra <pc...@maprtech.com>
wrote:

> There is a mechanism in place where the fragments running on different
> drillbits are informed that they can stop producing data. The same
> mechanism is used to cancel queries (for example when you Ctrl-C from
> sqlline), as well as when the limit operator is used.
> Initial testing shows that this is working fine, but there is some
> condition where it is not working correctly.
> I'll be working on trying to reproducing this in a debug environment.
>
> On Sat, Oct 11, 2014 at 1:36 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > The question implied by what Chris asks is whether there is a mechanism
> > whereby a consumer can notify a producer that no more data is required?
> >
> > In the case of this query, once Chris has his 10 rows, shouldn't the
> > consumer signal (transitively) all upstream parts of the query that they
> > can go home now?
> >
> >
> > On Sat, Oct 11, 2014 at 1:24 PM, Christopher Matta <cm...@mapr.com>
> > wrote:
> >
> > > My data is in a directory on a 4 node MapR cluster, the csv files are
> > > partitioned into 4 dirs, each with 100 csv files with 10 million
> records,
> > > totaling 4 billion records.
> > >
> > > When I do a SELECT * FROM FACT LIMIT 10; the data returns, however the
> > > sqlline prompt never comes back and the profile says the query is still
> > > running. I can ctrl+c the query which returns the10 rows selected
> > (360.949
> > > seconds) report and the sqlline prompt, but the profiles page shows the
> > > query as still running.
> > >
> > > The drillbit.log doesn’t show any errors during this.
> > >
> > > Seems like a bug, but I would like to gather more evidence before
> > > submitting it, any ideas?
> > >
> > > Chris Matta
> > > cmatta@mapr.com
> > > 215-701-3146
> > > ​
> > >
> >
>

Re: Query on CSV data not returning to the sqlline prompt

Posted by Parth Chandra <pc...@maprtech.com>.
There is a mechanism in place where the fragments running on different
drillbits are informed that they can stop producing data. The same
mechanism is used to cancel queries (for example when you Ctrl-C from
sqlline), as well as when the limit operator is used.
Initial testing shows that this is working fine, but there is some
condition where it is not working correctly.
I'll be working on trying to reproducing this in a debug environment.

On Sat, Oct 11, 2014 at 1:36 PM, Ted Dunning <te...@gmail.com> wrote:

> The question implied by what Chris asks is whether there is a mechanism
> whereby a consumer can notify a producer that no more data is required?
>
> In the case of this query, once Chris has his 10 rows, shouldn't the
> consumer signal (transitively) all upstream parts of the query that they
> can go home now?
>
>
> On Sat, Oct 11, 2014 at 1:24 PM, Christopher Matta <cm...@mapr.com>
> wrote:
>
> > My data is in a directory on a 4 node MapR cluster, the csv files are
> > partitioned into 4 dirs, each with 100 csv files with 10 million records,
> > totaling 4 billion records.
> >
> > When I do a SELECT * FROM FACT LIMIT 10; the data returns, however the
> > sqlline prompt never comes back and the profile says the query is still
> > running. I can ctrl+c the query which returns the10 rows selected
> (360.949
> > seconds) report and the sqlline prompt, but the profiles page shows the
> > query as still running.
> >
> > The drillbit.log doesn’t show any errors during this.
> >
> > Seems like a bug, but I would like to gather more evidence before
> > submitting it, any ideas?
> >
> > Chris Matta
> > cmatta@mapr.com
> > 215-701-3146
> > ​
> >
>

Re: Query on CSV data not returning to the sqlline prompt

Posted by Ted Dunning <te...@gmail.com>.
The question implied by what Chris asks is whether there is a mechanism
whereby a consumer can notify a producer that no more data is required?

In the case of this query, once Chris has his 10 rows, shouldn't the
consumer signal (transitively) all upstream parts of the query that they
can go home now?


On Sat, Oct 11, 2014 at 1:24 PM, Christopher Matta <cm...@mapr.com> wrote:

> My data is in a directory on a 4 node MapR cluster, the csv files are
> partitioned into 4 dirs, each with 100 csv files with 10 million records,
> totaling 4 billion records.
>
> When I do a SELECT * FROM FACT LIMIT 10; the data returns, however the
> sqlline prompt never comes back and the profile says the query is still
> running. I can ctrl+c the query which returns the10 rows selected (360.949
> seconds) report and the sqlline prompt, but the profiles page shows the
> query as still running.
>
> The drillbit.log doesn’t show any errors during this.
>
> Seems like a bug, but I would like to gather more evidence before
> submitting it, any ideas?
>
> Chris Matta
> cmatta@mapr.com
> 215-701-3146
> ​
>