You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Anup Tiwari <an...@games24x7.com> on 2018/03/16 14:40:33 UTC

Re: [Drill 1.9.0] : [CONNECTION ERROR] :- (user client) closed unexpectedly. Drillbit down?

Hi All,
I was just going through this post and found very good suggestions.
But this issue is still there in Drill 1.12.0 and i can see
https://issues.apache.org/jira/browse/DRILL-4708 is now marked as resolved in
"1.13.0" so i am hoping that this will be fixed in drill 1.13.0.
Few things i want to ask :-
- Any Planned date for Drill 1.13.0 release?
- If i have to apply hack suggested by Francois(till Drill-4708 gets released)
which is
org.apache.drill.exec.work.foreman
QueryManager.java
private void drillbitUnregistered(.....)
....
if (atLeastOneFailure)
-> just log the error, do not cancel query.

Then should i have to just comment below line from code and rebuild drill from
source :-
if (atLeastOneFailure) {        logger.warn("Drillbits [{}] no longer registered
in cluster. Canceling query {}",            failedNodeList,
QueryIdHelper.getQueryId(queryId));foreman.addToEventQueue(QueryState.FAILED,
            new ForemanException(String.format("One more more nodes lost
connectivity during query. Identified nodes were [%s].",
                failedNodeList)));
I haven't done something like above before so i might not be making sense also
it might impact somewhere else so please suggest a path forward.  





On Tue, Mar 21, 2017 8:15 PM, François Méthot fmethot78@gmail.com  wrote:
Hi,




  We have been having client-foreman connection and ZkConnection issue few

months ago. It went from annoying to a show stopper when we moved from a 12

nodes cluster to a 220 nodes cluster.




Nodes specs

- 8 cores total (2 x E5620)

- 72 GB RAM Total

- Other applications share the same hardware.




~ 100 TB parquet data on hdfs.













Based on our observation we have done few months ago, we ended up with

those setting/guideline/changes:




- Memory Setting

  DRILL_MAX_DIRECT_MEMORY="20G"

  DRILL_HEAP="8G"




  Remaining RAM is for other applications







- Threading

  planner.width.max_per_node = 4




  We think that higher number of threads will generate network traffic or

more context switch on each node, leading to more chances of getting Zk

disconnection.

  But we observed that even with max_per_node of 1, we would still get

disconnection. We had no clear indication from Cloudera Manager that

Mem/CPU/Network is overloaded on faulty node. Although on very rare

occasion we would get no stats data at all from certain node.




- Affinity Factor

  We change the affinity factor from default to a big value.

  planner.affinity_factor = 1000.0




  This improved issue with some drillbit of our cluster scanning data

stored on remote nodes. It somehow maximizes the chances of a drillbit

reading local data. When drillbits only scan local data, it reduces the

amount of network traffic, It accelerate queries and reduce the chance of

ZkDisconnect.




- If using hdfs, make sure each data file is stored on 1 block




- Try more recent 1.8 JVM or switch to JVM 1.7

  We have had CLIENT to FOREMAN disconnection issue with certain version

of JVM (linux, windows, mac). (we sent an email about this to the dev

mailing list in the past)




- Query Pattern

  The more fields are getting selected (select * vs select few specific

field) the more chance we will get the error. More data selected means more

cpu/network activity leading to more chances of Zookeeper skipping a

heartbeat.







- Foreman QueryManager Resilience Hack

When a query would fail, our log indicated that a drillbit was getting

unregistered and then get registed again a short time after (few ms to few

seconds), but the foreman QueryManager would catch the

"drillbitUnregistered" event and fail the queries right away. As a test, we

changed the QueryManager to not fail queries when a drillbit is getting

unregistered. We have put this change in place in 1.8 and our log now

indicates Zk Disconnect-Reconnect while query keeps running, so we kept

that test code in. A query will now fail only if the drillbit lose

connection with other drillbit (through the RPC bus) at some point. We have

since move to 1.9 with that change as well. I haven't had chance to try

back without the hack in 1.9.




org.apache.drill.exec.work.foreman

  QueryManager.java

  private void drillbitUnregistered(.....)

  ....

  if (atLeastOneFailure)

  -> just log the error, do not cancel query.




our query success rate went from <50% to >95% with all the changes above.

We hope to get rid of the hack when an official fix is available.










To cover the missing 5% error (any other type of errors), we advise users

to try again. We also have built-in retry strategy implemented in our

hourly python scripts that aggregates data.




Hope it helps




Francois




























On Thu, Mar 9, 2017 at 2:31 PM, Anup Tiwari <an...@games24x7.com>

wrote:




> Hi John,

>

> First of all sorry for delayed response and thanks for your suggestion,

> reducing value of "planner.width.max_per_node" helped me a lot, above issue

> which was coming 8 out of 10 times earlier now it is coming only 2 out of

> 10 times.

>

> As mentioned above occurrences of connection error came down considerably,

> but now sometimes i get "Heap Space Error" for few queries and due to this

> sometimes drill-bits on some/all nodes gets killed. Let me know if any

> other variable i can check for this(As of now, i have 8GB of Heap and 20GB

> of Direct memory) :

>

> *Error Log :*

>

> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure

> Occurred,

> exiting. Information message: Unable to handle out of memory condition in

> FragmentExecutor.

> java.lang.OutOfMemoryError: Java heap space

> at org.apache.xerces.dom.DeferredDocumentImpl.

> getNodeObject(Unknown

> Source) ~[xercesImpl-2.11.0.jar:na]

> at

> org.apache.xerces.dom.DeferredDocumentImpl.synchronizeChildren(Unknown

> Source) ~[xercesImpl-2.11.0.jar:na]

> at

> org.apache.xerces.dom.DeferredElementImpl.synchronizeChildren(Unknown

> Source) ~[xercesImpl-2.11.0.jar:na]

> at org.apache.xerces.dom.ElementImpl.normalize(Unknown Source)

> ~[xercesImpl-2.11.0.jar:na]

> at org.apache.xerces.dom.ElementImpl.normalize(Unknown Source)

> ~[xercesImpl-2.11.0.jar:na]

> at org.apache.xerces.dom.ElementImpl.normalize(Unknown Source)

> ~[xercesImpl-2.11.0.jar:na]

> at com.games24x7.device.NewDeviceData.setup(NewDeviceData.java:94)

> ~[DeviceDataClient-0.0.1-SNAPSHOT.jar:na]

> at

> org.apache.drill.exec.test.generated.FiltererGen5369.

> doSetup(FilterTemplate2.java:97)

> ~[na:na]

> at

> org.apache.drill.exec.test.generated.FiltererGen5369.

> setup(FilterTemplate2.java:54)

> ~[na:na]

> at

> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.

> generateSV2Filterer(FilterRecordBatch.java:195)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.

> setupNewSchema(FilterRecordBatch.java:107)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(

> AbstractSingleRecordBatch.java:78)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractRecordBatch.next(

> AbstractRecordBatch.java:162)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractRecordBatch.next(

> AbstractRecordBatch.java:119)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractRecordBatch.next(

> AbstractRecordBatch.java:109)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(

> AbstractSingleRecordBatch.java:51)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.physical.impl.svremover.

> RemovingRecordBatch.innerNext(RemovingRecordBatch.java:94)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractRecordBatch.next(

> AbstractRecordBatch.java:162)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractRecordBatch.next(

> AbstractRecordBatch.java:119)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractRecordBatch.next(

> AbstractRecordBatch.java:109)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(

> AbstractSingleRecordBatch.java:51)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(

> ProjectRecordBatch.java:135)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractRecordBatch.next(

> AbstractRecordBatch.java:162)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractRecordBatch.next(

> AbstractRecordBatch.java:119)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractRecordBatch.next(

> AbstractRecordBatch.java:109)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(

> HashAggBatch.java:108)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractRecordBatch.next(

> AbstractRecordBatch.java:142)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractRecordBatch.next(

> AbstractRecordBatch.java:119)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractRecordBatch.next(

> AbstractRecordBatch.java:109)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(

> AbstractSingleRecordBatch.java:51)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(

> ProjectRecordBatch.java:135)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

> at

> org.apache.drill.exec.record.AbstractRecordBatch.next(

> AbstractRecordBatch.java:162)

> ~[drill-java-exec-1.9.0.jar:1.9.0]

>

>

> Regards,

> *Anup Tiwari*

>

> On Mon, Mar 6, 2017 at 7:30 PM, John Omernik <jo...@omernik.com> wrote:

>

> > Have you tried disabling hash joins or hash agg on the query or changing

> > the planning width? Here are some docs to check out:

> >

> > https://drill.apache.org/docs/configuring-resources-for-a-

> shared-drillbit/

> >

> > https://drill.apache.org/docs/guidelines-for-optimizing-aggregation/

> >

> > https://drill.apache.org/docs/sort-based-and-hash-based-

> > memory-constrained-operators/

> >

> > Let us know if any of these have an effect on the queries...

> >

> > Also, the three links I posted here are query based changes, so an ALTER

> > SESSION should address them. On the suggestion above with memory, that

> > WOULD have to be made on all Drill bits running, and would require a

> > restart of the Drillbit to take effect.

> >

> >

> >

> > On Sat, Mar 4, 2017 at 1:01 PM, Anup Tiwari <an...@games24x7.com>

> > wrote:

> >

> > > Hi John,

> > >

> > > I have tried above config as well but still getting this issue.

> > > And please note that we were using similar configuration params for

> Drill

> > > 1.6 where this issue was not coming.

> > > Anything else which i can try?

> > >

> > > Regards,

> > > *Anup Tiwari*

> > >

> > > On Fri, Mar 3, 2017 at 11:01 PM, Abhishek Girish <ag...@apache.org>

> > > wrote:

> > >

> > > > +1 on John's suggestion.

> > > >

> > > > On Fri, Mar 3, 2017 at 6:24 AM, John Omernik <jo...@omernik.com>

> wrote:

> > > >

> > > > > So your node has 32G of ram yet you are allowing Drill to use

> 36G. I

> > > > would

> > > > > change your settings to be 8GB of Heap, and 22GB of Direct Memory.

> > See

> > > if

> > > > > this helps with your issues. Also, are you using a distributed

> > > > filesystem?

> > > > > If so you may want to allow even more free ram...i.e. 8GB of Heap

> and

> > > > 20GB

> > > > > of Direct.

> > > > >

> > > > > On Fri, Mar 3, 2017 at 8:20 AM, Anup Tiwari <

> > anup.tiwari@games24x7.com

> > > >

> > > > > wrote:

> > > > >

> > > > > > Hi,

> > > > > >

> > > > > > Please find our configuration details :-

> > > > > >

> > > > > > Number of Nodes : 4

> > > > > > RAM/Node : 32GB

> > > > > > Core/Node : 8

> > > > > > DRILL_MAX_DIRECT_MEMORY="20G"

> > > > > > DRILL_HEAP="16G"

> > > > > >

> > > > > > And all other variables are set to default.

> > > > > >

> > > > > > Since we have tried some of the settings suggested above but

> still

> > > > facing

> > > > > > this issue more frequently, kindly suggest us what is best

> > > > configuration

> > > > > > for our environment.

> > > > > >

> > > > > > Regards,

> > > > > > *Anup Tiwari*

> > > > > >

> > > > > > On Thu, Mar 2, 2017 at 1:26 AM, John Omernik <jo...@omernik.com>

> > > wrote:

> > > > > >

> > > > > > > Another thing to consider is ensure you have a Spill Location

> > > setup,

> > > > > and

> > > > > > > then disable hashagg/hashjoin for the query...

> > > > > > >

> > > > > > > On Wed, Mar 1, 2017 at 1:25 PM, Abhishek Girish <

> > > agirish@apache.org>

> > > > > > > wrote:

> > > > > > >

> > > > > > > > Hey Anup,

> > > > > > > >

> > > > > > > > This is indeed an issue, and I can understand that having an

> > > > unstable

> > > > > > > > environment is not something anyone wants. DRILL-4708 is

> still

> > > > > > > unresolved -

> > > > > > > > hopefully someone will get to it soon. I've bumped up the

> > > priority.

> > > > > > > >

> > > > > > > > Unfortunately we do not publish any sizing guidelines, so

> you'd

> > > > have

> > > > > to

> > > > > > > > experiment to settle on the right load for your cluster.

> Please

> > > > > > decrease

> > > > > > > > the concurrency (number of queries running in parallel). And

> > try

> > > > > > bumping

> > > > > > > up

> > > > > > > > Drill DIRECT memory. Also, please set the system options

> > > > recommended

> > > > > by

> > > > > > > > Sudheesh. While this may not solve the issue, it may help

> > reduce

> > > > it's

> > > > > > > > occurrence.

> > > > > > > >

> > > > > > > > Can you also update the JIRA with your configurations, type

> of

> > > > > queries

> > > > > > > and

> > > > > > > > the relevant logs?

> > > > > > > >

> > > > > > > > -Abhishek

> > > > > > > >

> > > > > > > > On Wed, Mar 1, 2017 at 10:17 AM, Anup Tiwari <

> > > > > > anup.tiwari@games24x7.com>

> > > > > > > > wrote:

> > > > > > > >

> > > > > > > > > Hi,

> > > > > > > > >

> > > > > > > > > Can someone look into it? As we are now getting this more

> > > > > frequently

> > > > > > in

> > > > > > > > > Adhoc queries as well.

> > > > > > > > > And for automation jobs, we are moving to Hive as in drill

> we

> > > are

> > > > > > > getting

> > > > > > > > > this more frequently.

> > > > > > > > >

> > > > > > > > > Regards,

> > > > > > > > > *Anup Tiwari*

> > > > > > > > >

> > > > > > > > > On Sat, Dec 31, 2016 at 12:11 PM, Anup Tiwari <

> > > > > > > anup.tiwari@games24x7.com

> > > > > > > > >

> > > > > > > > > wrote:

> > > > > > > > >

> > > > > > > > > > Hi,

> > > > > > > > > >

> > > > > > > > > > We are getting this issue bit more frequently. can

> someone

> > > > please

> > > > > > > look

> > > > > > > > > > into it and tell us that why it is happening since as

> > mention

> > > > in

> > > > > > > > earlier

> > > > > > > > > > mail when this query gets executed no other query is

> > running

> > > at

> > > > > > that

> > > > > > > > > time.

> > > > > > > > > >

> > > > > > > > > > Thanks in advance.

> > > > > > > > > >

> > > > > > > > > > Regards,

> > > > > > > > > > *Anup Tiwari*

> > > > > > > > > >

> > > > > > > > > > On Sat, Dec 24, 2016 at 10:20 AM, Anup Tiwari <

> > > > > > > > anup.tiwari@games24x7.com

> > > > > > > > > >

> > > > > > > > > > wrote:

> > > > > > > > > >

> > > > > > > > > >> Hi Sudheesh,

> > > > > > > > > >>

> > > > > > > > > >> Please find below ans :-

> > > > > > > > > >>

> > > > > > > > > >> 1. Total 4,(3 Datanodes, 1 namenode)

> > > > > > > > > >> 2. Only one query, as this query is part of daily dump

> and

> > > > runs

> > > > > in

> > > > > > > > early

> > > > > > > > > >> morning.

> > > > > > > > > >>

> > > > > > > > > >> And as @chun mentioned , it seems similar to DRILL-4708

> ,

> > so

> > > > any

> > > > > > > > update

> > > > > > > > > >> on progress of this ticket?

> > > > > > > > > >>

> > > > > > > > > >>

> > > > > > > > > >> On 22-Dec-2016 12:13 AM, "Sudheesh Katkam" <

> > > > > skatkam@maprtech.com>

> > > > > > > > > wrote:

> > > > > > > > > >>

> > > > > > > > > >> Two more questions..

> > > > > > > > > >>

> > > > > > > > > >> (1) How many nodes in your cluster?

> > > > > > > > > >> (2) How many queries are running when the failure is

> seen?

> > > > > > > > > >>

> > > > > > > > > >> If you have multiple large queries running at the same

> > time,

> > > > the

> > > > > > > load

> > > > > > > > on

> > > > > > > > > >> the system could cause those failures (which are

> heartbeat

> > > > > > related).

> > > > > > > > > >>

> > > > > > > > > >> The two options I suggested decrease the parallelism of

> > > stages

> > > > > in

> > > > > > a

> > > > > > > > > >> query, this implies lesser load but slower execution.

> > > > > > > > > >>

> > > > > > > > > >> System level option affect all queries, and session

> level

> > > > affect

> > > > > > > > queries

> > > > > > > > > >> on a specific connection. Not sure what is preferred in

> > your

> > > > > > > > > environment.

> > > > > > > > > >>

> > > > > > > > > >> Also, you may be interested in metrics. More info here:

> > > > > > > > > >>

> > > > > > > > > >> http://drill.apache.org/docs/monitoring-metrics/ <

> > > > > > > > > >> http://drill.apache.org/docs/monitoring-metrics/>

> > > > > > > > > >>

> > > > > > > > > >> Thank you,

> > > > > > > > > >> Sudheesh

> > > > > > > > > >>

> > > > > > > > > >> > On Dec 21, 2016, at 4:31 AM, Anup Tiwari <

> > > > > > > anup.tiwari@games24x7.com

> > > > > > > > >

> > > > > > > > > >> wrote:

> > > > > > > > > >> >

> > > > > > > > > >> > @sudheesh, yes drill bit is running on

> > > > > > datanodeN/10.*.*.5:31010).

> > > > > > > > > >> >

> > > > > > > > > >> > Can you tell me how this will impact to query and do i

> > > have

> > > > to

> > > > > > set

> > > > > > > > > this

> > > > > > > > > >> at

> > > > > > > > > >> > session level OR system level?

> > > > > > > > > >> >

> > > > > > > > > >> >

> > > > > > > > > >> >

> > > > > > > > > >> > Regards,

> > > > > > > > > >> > *Anup Tiwari*

> > > > > > > > > >> >

> > > > > > > > > >> > On Tue, Dec 20, 2016 at 11:59 PM, Chun Chang <

> > > > > > cchang@maprtech.com

> > > > > > > >

> > > > > > > > > >> wrote:

> > > > > > > > > >> >

> > > > > > > > > >> >> I am pretty sure this is the same as DRILL-4708.

> > > > > > > > > >> >>

> > > > > > > > > >> >> On Tue, Dec 20, 2016 at 10:27 AM, Sudheesh Katkam <

> > > > > > > > > >> skatkam@maprtech.com>

> > > > > > > > > >> >> wrote:

> > > > > > > > > >> >>

> > > > > > > > > >> >>> Is the drillbit service (running on

> > > > > datanodeN/10.*.*.5:31010)

> > > > > > > > > actually

> > > > > > > > > >> >>> down when the error is seen?

> > > > > > > > > >> >>>

> > > > > > > > > >> >>> If not, try lowering parallelism using these two

> > session

> > > > > > > options,

> > > > > > > > > >> before

> > > > > > > > > >> >>> running the queries:

> > > > > > > > > >> >>>

> > > > > > > > > >> >>> planner.width.max_per_node (decrease this)

> > > > > > > > > >> >>> planner.slice_target (increase this)

> > > > > > > > > >> >>>

> > > > > > > > > >> >>> Thank you,

> > > > > > > > > >> >>> Sudheesh

> > > > > > > > > >> >>>

> > > > > > > > > >> >>>> On Dec 20, 2016, at 12:28 AM, Anup Tiwari <

> > > > > > > > > anup.tiwari@games24x7.com

> > > > > > > > > >> >

> > > > > > > > > >> >>> wrote:

> > > > > > > > > >> >>>>

> > > > > > > > > >> >>>> Hi Team,

> > > > > > > > > >> >>>>

> > > > > > > > > >> >>>> We are running some drill automation script on a

> > daily

> > > > > basis

> > > > > > > and

> > > > > > > > we

> > > > > > > > > >> >> often

> > > > > > > > > >> >>>> see that some query gets failed frequently by

> giving

> > > > below

> > > > > > > error

> > > > > > > > ,

> > > > > > > > > >> >> Also i

> > > > > > > > > >> >>>> came across DRILL-4708 <https://issues.apache.org/

> > > > > > > > > >> >> jira/browse/DRILL-4708

> > > > > > > > > >> >>>>

> > > > > > > > > >> >>>> which seems similar, Can anyone give me update on

> > that

> > > OR

> > > > > > > > > workaround

> > > > > > > > > >> to

> > > > > > > > > >> >>>> avoid such issue ?

> > > > > > > > > >> >>>>

> > > > > > > > > >> >>>> *Stack Trace :-*

> > > > > > > > > >> >>>>

> > > > > > > > > >> >>>> Error: CONNECTION ERROR: Connection /10.*.*.1:41613

> > > <-->

> > > > > > > > > >> >>>> datanodeN/10.*.*.5:31010 (user client) closed

> > > > unexpectedly.

> > > > > > > > > Drillbit

> > > > > > > > > >> >>> down?

> > > > > > > > > >> >>>>

> > > > > > > > > >> >>>>

> > > > > > > > > >> >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ]

> > > > > > > > (state=,code=0)

> > > > > > > > > >> >>>> java.sql.SQLException: CONNECTION ERROR: Connection

> > > > > > > > /10.*.*.1:41613

> > > > > > > > > >> >> <-->

> > > > > > > > > >> >>>> datanodeN/10.*.*.5:31010 (user client) closed

> > > > unexpectedly.

> > > > > > > > Drillb

> > > > > > > > > >> >>>> it down?

> > > > > > > > > >> >>>>

> > > > > > > > > >> >>>>

> > > > > > > > > >> >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ]

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> org.apache.drill.jdbc.impl.

> > > > DrillCursor.nextRowInternally(

> > > > > > > > > >> >>> DrillCursor.java:232)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> org.apache.drill.jdbc.impl.

> > > > DrillCursor.loadInitialSchema(

> > > > > > > > > >> >>> DrillCursor.java:275)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> org.apache.drill.jdbc.impl.

> > DrillResultSetImpl.execute(

> > > > > > > > > >> >>> DrillResultSetImpl.java:1943)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> org.apache.drill.jdbc.impl.

> > DrillResultSetImpl.execute(

> > > > > > > > > >> >>> DrillResultSetImpl.java:76)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> org.apache.calcite.avatica.

> > > AvaticaConnection$1.execute(

> > > > > > > > > >> >>> AvaticaConnection.java:473)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> org.apache.drill.jdbc.impl.DrillMetaImpl.

> > > > > prepareAndExecute(

> > > > > > > > > >> >>> DrillMetaImpl.java:465)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> org.apache.calcite.avatica.AvaticaConnection.

> > > > > > > > > >> >> prepareAndExecuteInternal(

> > > > > > > > > >> >>> AvaticaConnection.java:477)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> org.apache.drill.jdbc.impl.DrillConnectionImpl.

> > > > > > > > > >> >>> prepareAndExecuteInternal(

> > DrillConnectionImpl.java:169)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> org.apache.calcite.avatica.AvaticaStatement.

> > > > > executeInternal(

> > > > > > > > > >> >>> AvaticaStatement.java:109)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> org.apache.calcite.avatica.

> AvaticaStatement.execute(

> > > > > > > > > >> >>> AvaticaStatement.java:121)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> org.apache.drill.jdbc.impl.

> > DrillStatementImpl.execute(

> > > > > > > > > >> >>> DrillStatementImpl.java:101)

> > > > > > > > > >> >>>> at sqlline.Commands.execute(

> Commands.java:841)

> > > > > > > > > >> >>>> at sqlline.Commands.sql(Commands.java:751)

> > > > > > > > > >> >>>> at sqlline.SqlLine.dispatch(

> SqlLine.java:746)

> > > > > > > > > >> >>>> at sqlline.SqlLine.runCommands(

> > > SqlLine.java:1651)

> > > > > > > > > >> >>>> at sqlline.Commands.run(Commands.java:1304)

> > > > > > > > > >> >>>> at sun.reflect.NativeMethodAccessorImpl.

> > > > > invoke0(Native

> > > > > > > > > Method)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> sun.reflect.NativeMethodAccessorImpl.invoke(

> > > > > > > > > >> >>> NativeMethodAccessorImpl.java:62)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(

> > > > > > > > > >> >>> DelegatingMethodAccessorImpl.java:43)

> > > > > > > > > >> >>>> at java.lang.reflect.Method.

> > > > invoke(Method.java:498)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> sqlline.ReflectiveCommandHandler.execute(

> > > > > > > > > >> >> ReflectiveCommandHandler.java:

> > > > > > > > > >> >>> 36)

> > > > > > > > > >> >>>> at sqlline.SqlLine.dispatch(

> SqlLine.java:742)

> > > > > > > > > >> >>>> at sqlline.SqlLine.initArgs(

> SqlLine.java:553)

> > > > > > > > > >> >>>> at sqlline.SqlLine.begin(SqlLine.java:596)

> > > > > > > > > >> >>>> at sqlline.SqlLine.start(SqlLine.java:375)

> > > > > > > > > >> >>>> at sqlline.SqlLine.main(SqlLine.java:268)

> > > > > > > > > >> >>>> Caused by: org.apache.drill.common.

> > > > > exceptions.UserException:

> > > > > > > > > >> >> CONNECTION

> > > > > > > > > >> >>>> ERROR: Connection /10.*.*.1:41613 <-->

> > > > > > datanodeN/10.*.*.5:31010

> > > > > > > > > (user

> > > > > > > > > >> >>>> client) closed unexpectedly. Drillbit down?

> > > > > > > > > >> >>>>

> > > > > > > > > >> >>>>

> > > > > > > > > >> >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ]

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> org.apache.drill.common.exceptions.UserException$

> > > > > > > > > >> >>> Builder.build(UserException.java:543)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> org.apache.drill.exec.rpc.user.QueryResultHandler$

> > > > > > > > > >> >>> ChannelClosedHandler$1.operationComplete(

> > > > > QueryResultHandler.

> > > > > > > > > java:373)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> io.netty.util.concurrent.DefaultPromise.

> > > notifyListener0(

> > > > > > > > > >> >>> DefaultPromise.java:680)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> io.netty.util.concurrent.DefaultPromise.

> > > > notifyListeners0(

> > > > > > > > > >> >>> DefaultPromise.java:603)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> io.netty.util.concurrent.DefaultPromise.

> > > notifyListeners(

> > > > > > > > > >> >>> DefaultPromise.java:563)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> io.netty.util.concurrent.

> DefaultPromise.trySuccess(

> > > > > > > > > >> >>> DefaultPromise.java:406)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> io.netty.channel.DefaultChannelPromise.trySuccess(

> > > > > > > > > >> >>> DefaultChannelPromise.java:82)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> io.netty.channel.AbstractChannel$CloseFuture.

> > > > > > > > > >> >> setClosed(AbstractChannel.

> > > > > > > > > >> >>> java:943)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> io.netty.channel.AbstractChannel$

> > > > AbstractUnsafe.doClose0(

> > > > > > > > > >> >>> AbstractChannel.java:592)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> io.netty.channel.AbstractChannel$

> > AbstractUnsafe.close(

> > > > > > > > > >> >>> AbstractChannel.java:584)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> io.netty.channel.nio.AbstractNioByteChannel$

> > > > > NioByteUnsafe.cl

> > > > > > > > > >> oseOnRead(

> > > > > > > > > >> >>> AbstractNioByteChannel.java:71)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> io.netty.channel.nio.AbstractNioByteChannel$

> > > > NioByteUnsafe.

> > > > > > > > > >> >>> handleReadException(AbstractNioByteChannel.java:89)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> io.netty.channel.nio.AbstractNioByteChannel$

> > > > > > > NioByteUnsafe.read(

> > > > > > > > > >> >>> AbstractNioByteChannel.java:162)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> io.netty.channel.nio.NioEventLoop.

> > processSelectedKey(

> > > > > > > > > >> >>> NioEventLoop.java:511)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> io.netty.channel.nio.NioEventLoop.

> > > > > > > processSelectedKeysOptimized(

> > > > > > > > > >> >>> NioEventLoop.java:468)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> io.netty.channel.nio.NioEventLoop.

> > processSelectedKeys(

> > > > > > > > > >> >>> NioEventLoop.java:382)

> > > > > > > > > >> >>>> at io.netty.channel.nio.NioEventL

> > > > > > > > > >> oop.run(NioEventLoop.java:354)

> > > > > > > > > >> >>>> at

> > > > > > > > > >> >>>> io.netty.util.concurrent.

> > SingleThreadEventExecutor$2.

> > > > > > > > > >> >>> run(SingleThreadEventExecutor.java:111)

> > > > > > > > > >> >>>> at java.lang.Thread.run(Thread.java:745)

> > > > > > > > > >> >>>>

> > > > > > > > > >> >>>>

> > > > > > > > > >> >>>> Regards,

> > > > > > > > > >> >>>> *Anup Tiwari*

> > > > > > > > > >> >>>

> > > > > > > > > >> >>>

> > > > > > > > > >> >>

> > > > > > > > > >>

> > > > > > > > > >>

> > > > > > > > > >>

> > > > > > > > > >

> > > > > > > > >

> > > > > > > >

> > > > > > >

> > > > > >

> > > > >

> > > >

> > >

> >

>






Regards,
Anup Tiwari

Re: [Drill 1.9.0] : [CONNECTION ERROR] :- (user client) closed unexpectedly. Drillbit down?

Posted by Anup Tiwari <an...@games24x7.com>.

Thanks.. will upgrade to 1.13.0 and let you know.  





On Tue, Mar 20, 2018 11:08 AM, Parth Chandra parthc@apache.org  wrote:
Hi Anup,




  I don't have full context for the proposed hack, and it might have worked,

but looks like Vlad has addressed the issue in the right place. Perhaps you

can try out 1.13.0 and let us all know.




Thanks




Parth




On Sat, Mar 17, 2018 at 11:43 AM, Anup Tiwari <an...@games24x7.com>

wrote:




> Thanks Parth for Info. I am really looking forward to it.

> But can you tell me if the second part(about hack) was right or not?

> Because i

> really want to test it as we got this issue several time in last 2-3 days

> post

> upgrading to 1.12.0.

> Also i have seen sometimes after lost connection , drillbit gets killed on

> few/all nodes and i am not getting any logs in drillbit.out/drillbit.log.

>

>

>

>

> On Fri, Mar 16, 2018 11:07 PM, Parth Chandra parthc@apache.org wrote:

> On Fri, Mar 16, 2018 at 8:10 PM, Anup Tiwari <an...@games24x7.com>

>

> wrote:

>

>

>

>

> Hi All,

>>

>

> I was just going through this post and found very good suggestions.

>>

>

> But this issue is still there in Drill 1.12.0 and i can see

>>

>

> https://issues.apache.org/jira/browse/DRILL-4708 is now marked as

>>

>

> resolved in

>>

>

> "1.13.0" so i am hoping that this will be fixed in drill 1.13.0.

>>

>

> Few things i want to ask :-

>>

>

> - Any Planned date for Drill 1.13.0 release?

>>

>

>

>>

>

>

>

>

>

>

> Real Soon Now. :)

>

> The release will be out in a couple of days. Watch this list for an

>

> announcement.

>

>

>

>

>

>

> Regards,

> Anup Tiwari






Regards,
Anup Tiwari

Re: [Drill 1.9.0] : [CONNECTION ERROR] :- (user client) closed unexpectedly. Drillbit down?

Posted by Parth Chandra <pa...@apache.org>.

Hi Anup,

 I don't have full context for the proposed hack, and it might have worked,
but looks like Vlad has addressed the issue in the right place. Perhaps you
can try out 1.13.0 and let us all know.

Thanks

Parth

On Sat, Mar 17, 2018 at 11:43 AM, Anup Tiwari <an...@games24x7.com>
wrote:

> Thanks Parth for Info. I am really looking forward to it.
> But can you tell me if the second part(about hack) was right or not?
> Because i
> really want to test it as we got this issue several time in last 2-3 days
> post
> upgrading to 1.12.0.
> Also i have seen sometimes after lost connection , drillbit gets killed on
> few/all nodes and i am not getting any logs in drillbit.out/drillbit.log.
>
>
>
>
> On Fri, Mar 16, 2018 11:07 PM, Parth Chandra parthc@apache.org  wrote:
> On Fri, Mar 16, 2018 at 8:10 PM, Anup Tiwari <an...@games24x7.com>
>
> wrote:
>
>
>
>
> Hi All,
>>
>
> I was just going through this post and found very good suggestions.
>>
>
> But this issue is still there in Drill 1.12.0 and i can see
>>
>
> https://issues.apache.org/jira/browse/DRILL-4708 is now marked as
>>
>
> resolved in
>>
>
> "1.13.0" so i am hoping that this will be fixed in drill 1.13.0.
>>
>
> Few things i want to ask :-
>>
>
> - Any Planned date for Drill 1.13.0 release?
>>
>
>
>>
>
>
>
>
>
>
> Real Soon Now. :)
>
> The release will be out in a couple of days. Watch this list for an
>
> announcement.
>
>
>
>
>
>
> Regards,
> Anup Tiwari

Re: [Drill 1.9.0] : [CONNECTION ERROR] :- (user client) closed unexpectedly. Drillbit down?

Posted by Anup Tiwari <an...@games24x7.com>.

Thanks Parth for Info. I am really looking forward to it.
But can you tell me if the second part(about hack) was right or not? Because i
really want to test it as we got this issue several time in last 2-3 days post
upgrading to 1.12.0.
Also i have seen sometimes after lost connection , drillbit gets killed on
few/all nodes and i am not getting any logs in drillbit.out/drillbit.log.  

On Fri, Mar 16, 2018 11:07 PM, Parth Chandra parthc@apache.org  wrote:
On Fri, Mar 16, 2018 at 8:10 PM, Anup Tiwari <an...@games24x7.com>

wrote:

> Hi All,

> I was just going through this post and found very good suggestions.

> But this issue is still there in Drill 1.12.0 and i can see

> https://issues.apache.org/jira/browse/DRILL-4708 is now marked as

> resolved in

> "1.13.0" so i am hoping that this will be fixed in drill 1.13.0.

> Few things i want to ask :-

> - Any Planned date for Drill 1.13.0 release?

>

Real Soon Now. :)

The release will be out in a couple of days. Watch this list for an

announcement.

Regards,
Anup Tiwari

Re: [Drill 1.9.0] : [CONNECTION ERROR] :- (user client) closed unexpectedly. Drillbit down?

Posted by Parth Chandra <pa...@apache.org>.

On Fri, Mar 16, 2018 at 8:10 PM, Anup Tiwari <an...@games24x7.com>
wrote:

> Hi All,
> I was just going through this post and found very good suggestions.
> But this issue is still there in Drill 1.12.0 and i can see
> https://issues.apache.org/jira/browse/DRILL-4708 is now marked as
> resolved in
> "1.13.0" so i am hoping that this will be fixed in drill 1.13.0.
> Few things i want to ask :-
> - Any Planned date for Drill 1.13.0 release?
>

Real Soon Now.  :)
The release will be out in a couple of days. Watch this list for an
announcement.