You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Abdel Hakim Deneche <ad...@maprtech.com> on 2016/01/19 20:10:03 UTC

query hanging in CANCELLATION_REQUEST

I was running a query with a hash join that was generating lot's of
results. I cancelled the query from sqlline then closed it. Now the query
is stuck in CANCELLATION_REQUEST state.

Looking at jstack it looks like screenRoot is blocked waiting for data sent
to the client to be acknowledged.

Do we have a JIRA similar to this ?

-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: query hanging in CANCELLATION_REQUEST

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
DRILL-4296 created. I'm in the process of attaching jstack and heap dump

On Thu, Jan 21, 2016 at 8:54 AM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> I may have a jstack but not the heap dump. I will try to reproduce it once
> again and will open a JIRA for it.
>
> On Thu, Jan 21, 2016 at 8:37 AM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
>> Do you have a jstack and heap dump from the moment of the hang? That would
>> be the best way to determine what was going on. I've found that to be the
>> easiest way to see what was hung on and the state of system at the time.
>>
>> --
>> Jacques Nadeau
>> CTO and Co-Founder, Dremio
>>
>> On Wed, Jan 20, 2016 at 1:42 PM, Abdel Hakim Deneche <
>> adeneche@maprtech.com>
>> wrote:
>>
>> > I found the issue, I think. Sort was spilling to disk and I run out of
>> disk
>> > space, for some reason this caused Zookeeper to behave incorrectly, I
>> could
>> > still connect to Drill after the first query hang, but once I run a
>> second
>> > query ZK died.
>> >
>> > Could this explain the query hang ?
>> >
>> > On Tue, Jan 19, 2016 at 1:57 PM, Jacques Nadeau <ja...@dremio.com>
>> > wrote:
>> >
>> > > It sounds like the connection break is not correctly marking an ack
>> fail.
>> > >
>> > > --
>> > > Jacques Nadeau
>> > > CTO and Co-Founder, Dremio
>> > >
>> > > On Tue, Jan 19, 2016 at 11:47 AM, Abdel Hakim Deneche <
>> > > adeneche@maprtech.com
>> > > > wrote:
>> > >
>> > > > Ok, I will create a JIRA
>> > > >
>> > > > On Tue, Jan 19, 2016 at 11:45 AM, Hanifi Gunes <hgunes@maprtech.com
>> >
>> > > > wrote:
>> > > >
>> > > > > I had reported this problem sometime last year verbally. I don't
>> > > remember
>> > > > > creating a JIRA though. In general, I dislike this sort of
>> blocking
>> > > calls
>> > > > > anywhere in the execution even though one could argue it
>> simplifies
>> > the
>> > > > > code flow.
>> > > > >
>> > > > > A JIRA would be appreciated.
>> > > > >
>> > > > > On Tue, Jan 19, 2016 at 11:10 AM, Abdel Hakim Deneche <
>> > > > > adeneche@maprtech.com
>> > > > > > wrote:
>> > > > >
>> > > > > > I was running a query with a hash join that was generating
>> lot's of
>> > > > > > results. I cancelled the query from sqlline then closed it. Now
>> the
>> > > > query
>> > > > > > is stuck in CANCELLATION_REQUEST state.
>> > > > > >
>> > > > > > Looking at jstack it looks like screenRoot is blocked waiting
>> for
>> > > data
>> > > > > sent
>> > > > > > to the client to be acknowledged.
>> > > > > >
>> > > > > > Do we have a JIRA similar to this ?
>> > > > > >
>> > > > > > --
>> > > > > >
>> > > > > > Abdelhakim Deneche
>> > > > > >
>> > > > > > Software Engineer
>> > > > > >
>> > > > > >   <http://www.mapr.com/>
>> > > > > >
>> > > > > >
>> > > > > > Now Available - Free Hadoop On-Demand Training
>> > > > > > <
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > Abdelhakim Deneche
>> > > >
>> > > > Software Engineer
>> > > >
>> > > >   <http://www.mapr.com/>
>> > > >
>> > > >
>> > > > Now Available - Free Hadoop On-Demand Training
>> > > > <
>> > > >
>> > >
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> >
>> > Abdelhakim Deneche
>> >
>> > Software Engineer
>> >
>> >   <http://www.mapr.com/>
>> >
>> >
>> > Now Available - Free Hadoop On-Demand Training
>> > <
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > >
>> >
>>
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: query hanging in CANCELLATION_REQUEST

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
I may have a jstack but not the heap dump. I will try to reproduce it once
again and will open a JIRA for it.

On Thu, Jan 21, 2016 at 8:37 AM, Jacques Nadeau <ja...@dremio.com> wrote:

> Do you have a jstack and heap dump from the moment of the hang? That would
> be the best way to determine what was going on. I've found that to be the
> easiest way to see what was hung on and the state of system at the time.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Wed, Jan 20, 2016 at 1:42 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com>
> wrote:
>
> > I found the issue, I think. Sort was spilling to disk and I run out of
> disk
> > space, for some reason this caused Zookeeper to behave incorrectly, I
> could
> > still connect to Drill after the first query hang, but once I run a
> second
> > query ZK died.
> >
> > Could this explain the query hang ?
> >
> > On Tue, Jan 19, 2016 at 1:57 PM, Jacques Nadeau <ja...@dremio.com>
> > wrote:
> >
> > > It sounds like the connection break is not correctly marking an ack
> fail.
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Tue, Jan 19, 2016 at 11:47 AM, Abdel Hakim Deneche <
> > > adeneche@maprtech.com
> > > > wrote:
> > >
> > > > Ok, I will create a JIRA
> > > >
> > > > On Tue, Jan 19, 2016 at 11:45 AM, Hanifi Gunes <hg...@maprtech.com>
> > > > wrote:
> > > >
> > > > > I had reported this problem sometime last year verbally. I don't
> > > remember
> > > > > creating a JIRA though. In general, I dislike this sort of blocking
> > > calls
> > > > > anywhere in the execution even though one could argue it simplifies
> > the
> > > > > code flow.
> > > > >
> > > > > A JIRA would be appreciated.
> > > > >
> > > > > On Tue, Jan 19, 2016 at 11:10 AM, Abdel Hakim Deneche <
> > > > > adeneche@maprtech.com
> > > > > > wrote:
> > > > >
> > > > > > I was running a query with a hash join that was generating lot's
> of
> > > > > > results. I cancelled the query from sqlline then closed it. Now
> the
> > > > query
> > > > > > is stuck in CANCELLATION_REQUEST state.
> > > > > >
> > > > > > Looking at jstack it looks like screenRoot is blocked waiting for
> > > data
> > > > > sent
> > > > > > to the client to be acknowledged.
> > > > > >
> > > > > > Do we have a JIRA similar to this ?
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Abdelhakim Deneche
> > > > > >
> > > > > > Software Engineer
> > > > > >
> > > > > >   <http://www.mapr.com/>
> > > > > >
> > > > > >
> > > > > > Now Available - Free Hadoop On-Demand Training
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >   <http://www.mapr.com/>
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: query hanging in CANCELLATION_REQUEST

Posted by Jacques Nadeau <ja...@dremio.com>.
Do you have a jstack and heap dump from the moment of the hang? That would
be the best way to determine what was going on. I've found that to be the
easiest way to see what was hung on and the state of system at the time.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Wed, Jan 20, 2016 at 1:42 PM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> I found the issue, I think. Sort was spilling to disk and I run out of disk
> space, for some reason this caused Zookeeper to behave incorrectly, I could
> still connect to Drill after the first query hang, but once I run a second
> query ZK died.
>
> Could this explain the query hang ?
>
> On Tue, Jan 19, 2016 at 1:57 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
> > It sounds like the connection break is not correctly marking an ack fail.
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Tue, Jan 19, 2016 at 11:47 AM, Abdel Hakim Deneche <
> > adeneche@maprtech.com
> > > wrote:
> >
> > > Ok, I will create a JIRA
> > >
> > > On Tue, Jan 19, 2016 at 11:45 AM, Hanifi Gunes <hg...@maprtech.com>
> > > wrote:
> > >
> > > > I had reported this problem sometime last year verbally. I don't
> > remember
> > > > creating a JIRA though. In general, I dislike this sort of blocking
> > calls
> > > > anywhere in the execution even though one could argue it simplifies
> the
> > > > code flow.
> > > >
> > > > A JIRA would be appreciated.
> > > >
> > > > On Tue, Jan 19, 2016 at 11:10 AM, Abdel Hakim Deneche <
> > > > adeneche@maprtech.com
> > > > > wrote:
> > > >
> > > > > I was running a query with a hash join that was generating lot's of
> > > > > results. I cancelled the query from sqlline then closed it. Now the
> > > query
> > > > > is stuck in CANCELLATION_REQUEST state.
> > > > >
> > > > > Looking at jstack it looks like screenRoot is blocked waiting for
> > data
> > > > sent
> > > > > to the client to be acknowledged.
> > > > >
> > > > > Do we have a JIRA similar to this ?
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   <http://www.mapr.com/>
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: query hanging in CANCELLATION_REQUEST

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
I found the issue, I think. Sort was spilling to disk and I run out of disk
space, for some reason this caused Zookeeper to behave incorrectly, I could
still connect to Drill after the first query hang, but once I run a second
query ZK died.

Could this explain the query hang ?

On Tue, Jan 19, 2016 at 1:57 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> It sounds like the connection break is not correctly marking an ack fail.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Tue, Jan 19, 2016 at 11:47 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com
> > wrote:
>
> > Ok, I will create a JIRA
> >
> > On Tue, Jan 19, 2016 at 11:45 AM, Hanifi Gunes <hg...@maprtech.com>
> > wrote:
> >
> > > I had reported this problem sometime last year verbally. I don't
> remember
> > > creating a JIRA though. In general, I dislike this sort of blocking
> calls
> > > anywhere in the execution even though one could argue it simplifies the
> > > code flow.
> > >
> > > A JIRA would be appreciated.
> > >
> > > On Tue, Jan 19, 2016 at 11:10 AM, Abdel Hakim Deneche <
> > > adeneche@maprtech.com
> > > > wrote:
> > >
> > > > I was running a query with a hash join that was generating lot's of
> > > > results. I cancelled the query from sqlline then closed it. Now the
> > query
> > > > is stuck in CANCELLATION_REQUEST state.
> > > >
> > > > Looking at jstack it looks like screenRoot is blocked waiting for
> data
> > > sent
> > > > to the client to be acknowledged.
> > > >
> > > > Do we have a JIRA similar to this ?
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >   <http://www.mapr.com/>
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: query hanging in CANCELLATION_REQUEST

Posted by Jacques Nadeau <ja...@dremio.com>.
It sounds like the connection break is not correctly marking an ack fail.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Tue, Jan 19, 2016 at 11:47 AM, Abdel Hakim Deneche <adeneche@maprtech.com
> wrote:

> Ok, I will create a JIRA
>
> On Tue, Jan 19, 2016 at 11:45 AM, Hanifi Gunes <hg...@maprtech.com>
> wrote:
>
> > I had reported this problem sometime last year verbally. I don't remember
> > creating a JIRA though. In general, I dislike this sort of blocking calls
> > anywhere in the execution even though one could argue it simplifies the
> > code flow.
> >
> > A JIRA would be appreciated.
> >
> > On Tue, Jan 19, 2016 at 11:10 AM, Abdel Hakim Deneche <
> > adeneche@maprtech.com
> > > wrote:
> >
> > > I was running a query with a hash join that was generating lot's of
> > > results. I cancelled the query from sqlline then closed it. Now the
> query
> > > is stuck in CANCELLATION_REQUEST state.
> > >
> > > Looking at jstack it looks like screenRoot is blocked waiting for data
> > sent
> > > to the client to be acknowledged.
> > >
> > > Do we have a JIRA similar to this ?
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: query hanging in CANCELLATION_REQUEST

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
Ok, I will create a JIRA

On Tue, Jan 19, 2016 at 11:45 AM, Hanifi Gunes <hg...@maprtech.com> wrote:

> I had reported this problem sometime last year verbally. I don't remember
> creating a JIRA though. In general, I dislike this sort of blocking calls
> anywhere in the execution even though one could argue it simplifies the
> code flow.
>
> A JIRA would be appreciated.
>
> On Tue, Jan 19, 2016 at 11:10 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com
> > wrote:
>
> > I was running a query with a hash join that was generating lot's of
> > results. I cancelled the query from sqlline then closed it. Now the query
> > is stuck in CANCELLATION_REQUEST state.
> >
> > Looking at jstack it looks like screenRoot is blocked waiting for data
> sent
> > to the client to be acknowledged.
> >
> > Do we have a JIRA similar to this ?
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: query hanging in CANCELLATION_REQUEST

Posted by Hanifi Gunes <hg...@maprtech.com>.
I had reported this problem sometime last year verbally. I don't remember
creating a JIRA though. In general, I dislike this sort of blocking calls
anywhere in the execution even though one could argue it simplifies the
code flow.

A JIRA would be appreciated.

On Tue, Jan 19, 2016 at 11:10 AM, Abdel Hakim Deneche <adeneche@maprtech.com
> wrote:

> I was running a query with a hash join that was generating lot's of
> results. I cancelled the query from sqlline then closed it. Now the query
> is stuck in CANCELLATION_REQUEST state.
>
> Looking at jstack it looks like screenRoot is blocked waiting for data sent
> to the client to be acknowledged.
>
> Do we have a JIRA similar to this ?
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>