You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by "Chitragar, Uday (KMLWG)" <Ud...@KantarMedia.com> on 2016/08/25 15:34:05 UTC

Parallel queries to HS2/Tez

Hello,

When running parallel queries (simultaneous connections by two beeline clients to HS2), I get the following exception (full debug attached), interestingly running the queries one after the other completes without any problem.

The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode.
Apologies in advance if this forum is not the right place for this question, thank you.

2016-08-25 15:45:41,333 DEBUG [TezTaskEventRouter{attempt_1472136335089_0001_1_01_000000_0}]: impl.ShuffleInputEventHandlerImpl (ShuffleInputEventHandlerImpl.java:processDataMovementEvent(127)) - DME srcIdx: 0, targetIndex: 9, attemptNum
: 0, payload: [hasEmptyPartitions: true, host: , port: 0, pathComponent: , runDuration: 0]
2016-08-25 15:45:41,557 ERROR [TezChild]: tez.MapRecordSource (MapRecordSource.java:processRow(90)) - java.lang.IllegalStateException: Invalid input path file:/acorn/QC/OraExtract/20160131/Devices/Devices_extract_20160229T080613_3
        at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:415)
        at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:457)
        at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1069)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:501)



2016-08-25 15:45:41,817 INFO  [TezChild]: io.HiveContextAwareRecordReader (HiveContextAwareRecordReader.java:doNext(326)) -
Cannot get partition description from file:/acorn/QC/reportlib/VM_ValEdit.24656because cannot find dir = file:/ac
orn/QC/reportlib/VM_ValEdit.24656 in pathToPartitionInfo: [file:/acorn/QC/OraExtract/20160131/Devices]



Regards,
Uday




Kantar Disclaimer<http://www.kantar.com/disclaimer.html>

Parallel queries to HS2/Tez (+Hive Local mode)

Posted by "Chitragar, Uday (KMLWG)" <Ud...@KantarMedia.com>.
Hello,


Running parallel queries on HS2 (Hive in local mode) seem to mix up the configuration.


> The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode.

> Cannot get partition description from file:/acorn/QC/reportlib/VM_ValEdit.24656because cannot find dir = file:/ac
> orn/QC/reportlib/VM_ValEdit.24656 in pathToPartitionInfo: [file:/acorn/QC/OraExtract/20160131/Devices]


This is quite easy to reproduce (at least on the setup I have)

Thank you,
Uday

________________________________
From: Hitesh Shah <hi...@apache.org>
Sent: 25 August 2016 20:06
To: user@tez.apache.org
Subject: Re: Parallel queries to HS2/Tez

Hello Uday,

I don't believe anyone has tried running 2 dags in parallel in local mode within the same TezClient ( and definitely not for HiveServer2 ). If this is with 2 instances of Tez client, this could likely be a bug in terms of either how Hive is setting up the TezClient for local mode with the same directories or a bug somewhere in Tez where clashing directories for intermediate data might be causing an issue. FWIW, the Tez AM does not support running 2 dags in parallel and quite a bit of this code path is used with local mode.

It would be great if you could file a JIRA for this with more detailed logs and then take help of the dev community to come up with a patch that addresses the issue in your environment.

thanks
- Hitesh





> On Aug 25, 2016, at 8:34 AM, Chitragar, Uday (KMLWG) <Ud...@KantarMedia.com> wrote:
>
> Hello,
>
> When running parallel queries (simultaneous connections by two beeline clients to HS2), I get the following exception (full debug attached), interestingly running the queries one after the other completes without any problem.
>
> The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode.
> Apologies in advance if this forum is not the right place for this question, thank you.
>
> 2016-08-25 15:45:41,333 DEBUG [TezTaskEventRouter{attempt_1472136335089_0001_1_01_000000_0}]: impl.ShuffleInputEventHandlerImpl (ShuffleInputEventHandlerImpl.java:processDataMovementEvent(127)) - DME srcIdx: 0, targetIndex: 9, attemptNum
> : 0, payload: [hasEmptyPartitions: true, host: , port: 0, pathComponent: , runDuration: 0]
> 2016-08-25 15:45:41,557 ERROR [TezChild]: tez.MapRecordSource (MapRecordSource.java:processRow(90)) - java.lang.IllegalStateException: Invalid input path file:/acorn/QC/OraExtract/20160131/Devices/Devices_extract_20160229T080613_3
>         at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:415)
>         at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:457)
>         at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1069)
>         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:501)
>
>
>
> 2016-08-25 15:45:41,817 INFO  [TezChild]: io.HiveContextAwareRecordReader (HiveContextAwareRecordReader.java:doNext(326)) -
> Cannot get partition description from file:/acorn/QC/reportlib/VM_ValEdit.24656because cannot find dir = file:/ac
> orn/QC/reportlib/VM_ValEdit.24656 in pathToPartitionInfo: [file:/acorn/QC/OraExtract/20160131/Devices]
>
>
>
> Regards,
> Uday
>
>
>
>
> Kantar Disclaimer <hive.log.submit.gz>



Kantar Disclaimer<http://www.kantar.com/disclaimer.html>

RE: Parallel queries to HS2/Tez

Posted by "Chitragar, Uday (KMLWG)" <Ud...@KantarMedia.com>.
Hi Siddharth,

Thank you for the reply, please note I am running apache-hive-1.2.1, I assume LLAP is 2.X onwards?
The settings certainly work for me as I have a setup to reproduce the reported error pretty easily.

>In terms of Tez local mode - there's a jira open to support concurrent DAGs.
I guess the one I have reported TEZ-3420 be marked as duplicate, I am happy to test when a patch is available.

Regards,
Uday

Kantar Media | Audience Intelligence | T: +44 (0)20 8967 4760| Mob: +44 (0) 7825 675509 - Uday.Chitragar@kantarmedia.com<ma...@kantarmedia.com>

From: Siddharth Seth [mailto:sseth@apache.org]
Sent: 07 October 2016 17:44
To: user@tez.apache.org
Subject: Re: Parallel queries to HS2/Tez

Uday,
Are you running this with LLAP? Without LLAP. as far as I know, Hive will launch additional sessions even with the settings that you mentioned. If this is working for you - great. You may want to send out a note to the hive community, or open a jira requesting control over concurrent queries via a simpler option (instead of having to configure 3 parameters, potentially 2 more for thread pool sizes)
In terms of Tez local mode - there's a jira open to support concurrent DAGs.

On Thu, Oct 6, 2016 at 3:36 AM, Chitragar, Uday (KMLWG) <Ud...@kantarmedia.com>> wrote:
Just FYI:

Following settings throttle the requests sequentially as a workaround.

hive.server2.tez.initialize.default.sessions=true
hive.server2.tez.default.queues=default
hive.server2.tez.sessions.per.default.queue=1

Regards,
Uday

Kantar Media | Audience Intelligence | T: +44 (0)20 8967 4760<tel:%2B44%20%280%2920%208967%204760>| Mob: +44 (0) 7825 675509<tel:%2B44%20%280%29%207825%20675509> - Uday.Chitragar@kantarmedia.com<ma...@kantarmedia.com>

-----Original Message-----
From: Hitesh Shah [mailto:hitesh@apache.org<ma...@apache.org>]
Sent: 29 August 2016 21:42
To: user@tez.apache.org<ma...@tez.apache.org>
Subject: Re: Parallel queries to HS2/Tez

I think there are some thread pool related settings in HiveServer2 which could be used to throttle the no. of concurrent queries down to 1. One quick search led me to https://issues.apache.org/jira/browse/HIVE-5229<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HIVE-2D5229&d=DQMFaQ&c=zdK58V2JKULZdB8nuBRpog&r=p-90rivnY3ZNWQowg9GzRcPkl8oZ7brZ3jcbrVIesCk&m=qYdOVeVY5L5GBkMYoOMv5dl6fwVxeWz1v0n-Bzz7q1U&s=vX6CE7q1JHZ5vTFckaCeO7mlMLGd7HLYRfwDKkL-cjU&e=> but you may wish to ask the same question on the hive mailing lists for a definitive answer.

thanks
- Hitesh


> On Aug 27, 2016, at 1:02 AM, Chitragar, Uday (KMLWG) <Ud...@KantarMedia.com>> wrote:
>
> Hi Hitesh,
>
> Thank you for the advice. While I get dev help on TEZ-3420, are there any recommendations in terms of configuring HIVE/HS2 to run the dags sequentially? Interestingly this is not a problem with HDP deployment which obviously has a 'fuller' setup.  Local mode really helps to test.
>
> Thank you,
> Uday
> From: Hitesh Shah <hi...@apache.org>>
> Sent: 25 August 2016 20:06:30
> To: user@tez.apache.org<ma...@tez.apache.org>
> Subject: Re: Parallel queries to HS2/Tez
>
> Hello Uday,
>
> I don't believe anyone has tried running 2 dags in parallel in local mode within the same TezClient ( and definitely not for HiveServer2 ). If this is with 2 instances of Tez client, this could likely be a bug in terms of either how Hive is setting up the TezClient for local mode with the same directories or a bug somewhere in Tez where clashing directories for intermediate data might be causing an issue. FWIW, the Tez AM does not support running 2 dags in parallel and quite a bit of this code path is used with local mode.
>
> It would be great if you could file a JIRA for this with more detailed logs and then take help of the dev community to come up with a patch that addresses the issue in your environment.
>
> thanks
> - Hitesh
>
>
>
>
>
> > On Aug 25, 2016, at 8:34 AM, Chitragar, Uday (KMLWG) <Ud...@KantarMedia.com>> wrote:
> >
> > Hello,
> >
> > When running parallel queries (simultaneous connections by two beeline clients to HS2), I get the following exception (full debug attached), interestingly running the queries one after the other completes without any problem.
> >
> > The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode.
> > Apologies in advance if this forum is not the right place for this question, thank you.
> >
> > 2016-08-25 15:45:41,333 DEBUG
> > [TezTaskEventRouter{attempt_1472136335089_0001_1_01_000000_0}]:
> > impl.ShuffleInputEventHandlerImpl
> > (ShuffleInputEventHandlerImpl.java:processDataMovementEvent(127)) -
> > DME srcIdx: 0, targetIndex: 9, attemptNum
> > : 0, payload: [hasEmptyPartitions: true, host: , port: 0,
> > pathComponent: , runDuration: 0]
> > 2016-08-25 15:45:41,557 ERROR [TezChild]: tez.MapRecordSource (MapRecordSource.java:processRow(90)) - java.lang.IllegalStateException: Invalid input path file:/acorn/QC/OraExtract/20160131/Devices/Devices_extract_20160229T080613_3
> >         at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:415)
> >         at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:457)
> >         at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1069)
> >         at
> > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:
> > 501)
> >
> >
> >
> > 2016-08-25 15:45:41,817 INFO  [TezChild]:
> > io.HiveContextAwareRecordReader
> > (HiveContextAwareRecordReader.java:doNext(326)) - Cannot get
> > partition description from
> > file:/acorn/QC/reportlib/VM_ValEdit.24656because cannot find dir =
> > file:/ac
> > orn/QC/reportlib/VM_ValEdit.24656 in pathToPartitionInfo:
> > [file:/acorn/QC/OraExtract/20160131/Devices]
> >
> >
> >
> > Regards,
> > Uday
> >
> >
> >
> >
> > Kantar Disclaimer <hive.log.submit.gz>


Re: Parallel queries to HS2/Tez

Posted by Siddharth Seth <ss...@apache.org>.
Uday,
Are you running this with LLAP? Without LLAP. as far as I know, Hive will
launch additional sessions even with the settings that you mentioned. If
this is working for you - great. You may want to send out a note to the
hive community, or open a jira requesting control over concurrent queries
via a simpler option (instead of having to configure 3 parameters,
potentially 2 more for thread pool sizes)
In terms of Tez local mode - there's a jira open to support concurrent DAGs.

On Thu, Oct 6, 2016 at 3:36 AM, Chitragar, Uday (KMLWG) <
Uday.Chitragar@kantarmedia.com> wrote:

> Just FYI:
>
> Following settings throttle the requests sequentially as a workaround.
>
> hive.server2.tez.initialize.default.sessions=true
> hive.server2.tez.default.queues=default
> hive.server2.tez.sessions.per.default.queue=1
>
> Regards,
> Uday
>
> Kantar Media | Audience Intelligence | T: +44 (0)20 8967 4760| Mob: +44
> (0) 7825 675509 - Uday.Chitragar@kantarmedia.com
>
> -----Original Message-----
> From: Hitesh Shah [mailto:hitesh@apache.org]
> Sent: 29 August 2016 21:42
> To: user@tez.apache.org
> Subject: Re: Parallel queries to HS2/Tez
>
> I think there are some thread pool related settings in HiveServer2 which
> could be used to throttle the no. of concurrent queries down to 1. One
> quick search led me to https://issues.apache.org/jira/browse/HIVE-5229
> but you may wish to ask the same question on the hive mailing lists for a
> definitive answer.
>
> thanks
> - Hitesh
>
>
> > On Aug 27, 2016, at 1:02 AM, Chitragar, Uday (KMLWG)
> <Ud...@KantarMedia.com> wrote:
> >
> > Hi Hitesh,
> >
> > Thank you for the advice. While I get dev help on TEZ-3420, are there
> any recommendations in terms of configuring HIVE/HS2 to run the dags
> sequentially? Interestingly this is not a problem with HDP deployment which
> obviously has a 'fuller' setup.  Local mode really helps to test.
> >
> > Thank you,
> > Uday
> > From: Hitesh Shah <hi...@apache.org>
> > Sent: 25 August 2016 20:06:30
> > To: user@tez.apache.org
> > Subject: Re: Parallel queries to HS2/Tez
> >
> > Hello Uday,
> >
> > I don't believe anyone has tried running 2 dags in parallel in local
> mode within the same TezClient ( and definitely not for HiveServer2 ). If
> this is with 2 instances of Tez client, this could likely be a bug in terms
> of either how Hive is setting up the TezClient for local mode with the same
> directories or a bug somewhere in Tez where clashing directories for
> intermediate data might be causing an issue. FWIW, the Tez AM does not
> support running 2 dags in parallel and quite a bit of this code path is
> used with local mode.
> >
> > It would be great if you could file a JIRA for this with more detailed
> logs and then take help of the dev community to come up with a patch that
> addresses the issue in your environment.
> >
> > thanks
> > - Hitesh
> >
> >
> >
> >
> >
> > > On Aug 25, 2016, at 8:34 AM, Chitragar, Uday (KMLWG)
> <Ud...@KantarMedia.com> wrote:
> > >
> > > Hello,
> > >
> > > When running parallel queries (simultaneous connections by two beeline
> clients to HS2), I get the following exception (full debug attached),
> interestingly running the queries one after the other completes without any
> problem.
> > >
> > > The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode.
> > > Apologies in advance if this forum is not the right place for this
> question, thank you.
> > >
> > > 2016-08-25 15:45:41,333 DEBUG
> > > [TezTaskEventRouter{attempt_1472136335089_0001_1_01_000000_0}]:
> > > impl.ShuffleInputEventHandlerImpl
> > > (ShuffleInputEventHandlerImpl.java:processDataMovementEvent(127)) -
> > > DME srcIdx: 0, targetIndex: 9, attemptNum
> > > : 0, payload: [hasEmptyPartitions: true, host: , port: 0,
> > > pathComponent: , runDuration: 0]
> > > 2016-08-25 15:45:41,557 ERROR [TezChild]: tez.MapRecordSource
> (MapRecordSource.java:processRow(90)) - java.lang.IllegalStateException:
> Invalid input path file:/acorn/QC/OraExtract/20160131/Devices/Devices_
> extract_20160229T080613_3
> > >         at org.apache.hadoop.hive.ql.exec.MapOperator.
> getNominalPath(MapOperator.java:415)
> > >         at org.apache.hadoop.hive.ql.exec.MapOperator.
> cleanUpInputFileChangedOp(MapOperator.java:457)
> > >         at org.apache.hadoop.hive.ql.exec.Operator.
> cleanUpInputFileChanged(Operator.java:1069)
> > >         at
> > > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:
> > > 501)
> > >
> > >
> > >
> > > 2016-08-25 15:45:41,817 INFO  [TezChild]:
> > > io.HiveContextAwareRecordReader
> > > (HiveContextAwareRecordReader.java:doNext(326)) - Cannot get
> > > partition description from
> > > file:/acorn/QC/reportlib/VM_ValEdit.24656because cannot find dir =
> > > file:/ac
> > > orn/QC/reportlib/VM_ValEdit.24656 in pathToPartitionInfo:
> > > [file:/acorn/QC/OraExtract/20160131/Devices]
> > >
> > >
> > >
> > > Regards,
> > > Uday
> > >
> > >
> > >
> > >
> > > Kantar Disclaimer <hive.log.submit.gz>
>
>

RE: Parallel queries to HS2/Tez

Posted by "Chitragar, Uday (KMLWG)" <Ud...@KantarMedia.com>.
Just FYI:

Following settings throttle the requests sequentially as a workaround.

hive.server2.tez.initialize.default.sessions=true
hive.server2.tez.default.queues=default
hive.server2.tez.sessions.per.default.queue=1

Regards,
Uday

Kantar Media | Audience Intelligence | T: +44 (0)20 8967 4760| Mob: +44 (0) 7825 675509 - Uday.Chitragar@kantarmedia.com 

-----Original Message-----
From: Hitesh Shah [mailto:hitesh@apache.org] 
Sent: 29 August 2016 21:42
To: user@tez.apache.org
Subject: Re: Parallel queries to HS2/Tez

I think there are some thread pool related settings in HiveServer2 which could be used to throttle the no. of concurrent queries down to 1. One quick search led me to https://issues.apache.org/jira/browse/HIVE-5229 but you may wish to ask the same question on the hive mailing lists for a definitive answer. 

thanks
- Hitesh 


> On Aug 27, 2016, at 1:02 AM, Chitragar, Uday (KMLWG) <Ud...@KantarMedia.com> wrote:
> 
> Hi Hitesh,
> 
> Thank you for the advice. While I get dev help on TEZ-3420, are there any recommendations in terms of configuring HIVE/HS2 to run the dags sequentially? Interestingly this is not a problem with HDP deployment which obviously has a 'fuller' setup.  Local mode really helps to test.
> 
> Thank you,
> Uday
> From: Hitesh Shah <hi...@apache.org>
> Sent: 25 August 2016 20:06:30
> To: user@tez.apache.org
> Subject: Re: Parallel queries to HS2/Tez
>  
> Hello Uday,
> 
> I don't believe anyone has tried running 2 dags in parallel in local mode within the same TezClient ( and definitely not for HiveServer2 ). If this is with 2 instances of Tez client, this could likely be a bug in terms of either how Hive is setting up the TezClient for local mode with the same directories or a bug somewhere in Tez where clashing directories for intermediate data might be causing an issue. FWIW, the Tez AM does not support running 2 dags in parallel and quite a bit of this code path is used with local mode. 
> 
> It would be great if you could file a JIRA for this with more detailed logs and then take help of the dev community to come up with a patch that addresses the issue in your environment.
> 
> thanks
> - Hitesh
> 
> 
> 
>  
> 
> > On Aug 25, 2016, at 8:34 AM, Chitragar, Uday (KMLWG) <Ud...@KantarMedia.com> wrote:
> > 
> > Hello,
> >  
> > When running parallel queries (simultaneous connections by two beeline clients to HS2), I get the following exception (full debug attached), interestingly running the queries one after the other completes without any problem. 
> >  
> > The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode.
> > Apologies in advance if this forum is not the right place for this question, thank you.
> >  
> > 2016-08-25 15:45:41,333 DEBUG 
> > [TezTaskEventRouter{attempt_1472136335089_0001_1_01_000000_0}]: 
> > impl.ShuffleInputEventHandlerImpl 
> > (ShuffleInputEventHandlerImpl.java:processDataMovementEvent(127)) - 
> > DME srcIdx: 0, targetIndex: 9, attemptNum
> > : 0, payload: [hasEmptyPartitions: true, host: , port: 0, 
> > pathComponent: , runDuration: 0]
> > 2016-08-25 15:45:41,557 ERROR [TezChild]: tez.MapRecordSource (MapRecordSource.java:processRow(90)) - java.lang.IllegalStateException: Invalid input path file:/acorn/QC/OraExtract/20160131/Devices/Devices_extract_20160229T080613_3
> >         at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:415)
> >         at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:457)
> >         at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1069)
> >         at 
> > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:
> > 501)
> >  
> >  
> >  
> > 2016-08-25 15:45:41,817 INFO  [TezChild]: 
> > io.HiveContextAwareRecordReader 
> > (HiveContextAwareRecordReader.java:doNext(326)) - Cannot get 
> > partition description from 
> > file:/acorn/QC/reportlib/VM_ValEdit.24656because cannot find dir = 
> > file:/ac
> > orn/QC/reportlib/VM_ValEdit.24656 in pathToPartitionInfo: 
> > [file:/acorn/QC/OraExtract/20160131/Devices]
> >  
> >  
> >  
> > Regards,
> > Uday
> >  
> >  
> > 
> > 
> > Kantar Disclaimer <hive.log.submit.gz>


Re: Parallel queries to HS2/Tez

Posted by Hitesh Shah <hi...@apache.org>.
I think there are some thread pool related settings in HiveServer2 which could be used to throttle the no. of concurrent queries down to 1. One quick search led me to https://issues.apache.org/jira/browse/HIVE-5229 but you may wish to ask the same question on the hive mailing lists for a definitive answer. 

thanks
— Hitesh 


> On Aug 27, 2016, at 1:02 AM, Chitragar, Uday (KMLWG) <Ud...@KantarMedia.com> wrote:
> 
> Hi Hitesh,
> 
> Thank you for the advice. While I get dev help on TEZ-3420, are there any recommendations in terms of configuring HIVE/HS2 to run the dags sequentially? Interestingly this is not a problem with HDP deployment which obviously has a 'fuller' setup.  Local mode really helps to test.
> 
> Thank you,
> Uday
> From: Hitesh Shah <hi...@apache.org>
> Sent: 25 August 2016 20:06:30
> To: user@tez.apache.org
> Subject: Re: Parallel queries to HS2/Tez
>  
> Hello Uday,
> 
> I don’t believe anyone has tried running 2 dags in parallel in local mode within the same TezClient ( and definitely not for HiveServer2 ). If this is with 2 instances of Tez client, this could likely be a bug in terms of either how Hive is setting up the TezClient for local mode with the same directories or a bug somewhere in Tez where clashing directories for intermediate data might be causing an issue. FWIW, the Tez AM does not support running 2 dags in parallel and quite a bit of this code path is used with local mode. 
> 
> It would be great if you could file a JIRA for this with more detailed logs and then take help of the dev community to come up with a patch that addresses the issue in your environment.
> 
> thanks
> — Hitesh 
> 
> 
> 
>  
> 
> > On Aug 25, 2016, at 8:34 AM, Chitragar, Uday (KMLWG) <Ud...@KantarMedia.com> wrote:
> > 
> > Hello,
> >  
> > When running parallel queries (simultaneous connections by two beeline clients to HS2), I get the following exception (full debug attached), interestingly running the queries one after the other completes without any problem. 
> >  
> > The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode.
> > Apologies in advance if this forum is not the right place for this question, thank you.
> >  
> > 2016-08-25 15:45:41,333 DEBUG [TezTaskEventRouter{attempt_1472136335089_0001_1_01_000000_0}]: impl.ShuffleInputEventHandlerImpl (ShuffleInputEventHandlerImpl.java:processDataMovementEvent(127)) - DME srcIdx: 0, targetIndex: 9, attemptNum
> > : 0, payload: [hasEmptyPartitions: true, host: , port: 0, pathComponent: , runDuration: 0]
> > 2016-08-25 15:45:41,557 ERROR [TezChild]: tez.MapRecordSource (MapRecordSource.java:processRow(90)) - java.lang.IllegalStateException: Invalid input path file:/acorn/QC/OraExtract/20160131/Devices/Devices_extract_20160229T080613_3
> >         at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:415)
> >         at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:457)
> >         at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1069)
> >         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:501)
> >  
> >  
> >  
> > 2016-08-25 15:45:41,817 INFO  [TezChild]: io.HiveContextAwareRecordReader (HiveContextAwareRecordReader.java:doNext(326)) –
> > Cannot get partition description from file:/acorn/QC/reportlib/VM_ValEdit.24656because cannot find dir = file:/ac
> > orn/QC/reportlib/VM_ValEdit.24656 in pathToPartitionInfo: [file:/acorn/QC/OraExtract/20160131/Devices]
> >  
> >  
> >  
> > Regards,
> > Uday
> >  
> >  
> > 
> > 
> > Kantar Disclaimer <hive.log.submit.gz>


Re: Parallel queries to HS2/Tez

Posted by "Chitragar, Uday (KMLWG)" <Ud...@KantarMedia.com>.
Hi Hitesh,


Thank you for the advice. While I get dev help on TEZ-3420<https://issues.apache.org/jira/browse/TEZ-3420>, are there any recommendations in terms of configuring HIVE/HS2 to run the dags sequentially? Interestingly this is not a problem with HDP deployment which obviously has a 'fuller' setup.  Local mode really helps to test.


Thank you,

Uday

________________________________
From: Hitesh Shah <hi...@apache.org>
Sent: 25 August 2016 20:06:30
To: user@tez.apache.org
Subject: Re: Parallel queries to HS2/Tez

Hello Uday,

I don’t believe anyone has tried running 2 dags in parallel in local mode within the same TezClient ( and definitely not for HiveServer2 ). If this is with 2 instances of Tez client, this could likely be a bug in terms of either how Hive is setting up the TezClient for local mode with the same directories or a bug somewhere in Tez where clashing directories for intermediate data might be causing an issue. FWIW, the Tez AM does not support running 2 dags in parallel and quite a bit of this code path is used with local mode.

It would be great if you could file a JIRA for this with more detailed logs and then take help of the dev community to come up with a patch that addresses the issue in your environment.

thanks
— Hitesh





> On Aug 25, 2016, at 8:34 AM, Chitragar, Uday (KMLWG) <Ud...@KantarMedia.com> wrote:
>
> Hello,
>
> When running parallel queries (simultaneous connections by two beeline clients to HS2), I get the following exception (full debug attached), interestingly running the queries one after the other completes without any problem.
>
> The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode.
> Apologies in advance if this forum is not the right place for this question, thank you.
>
> 2016-08-25 15:45:41,333 DEBUG [TezTaskEventRouter{attempt_1472136335089_0001_1_01_000000_0}]: impl.ShuffleInputEventHandlerImpl (ShuffleInputEventHandlerImpl.java:processDataMovementEvent(127)) - DME srcIdx: 0, targetIndex: 9, attemptNum
> : 0, payload: [hasEmptyPartitions: true, host: , port: 0, pathComponent: , runDuration: 0]
> 2016-08-25 15:45:41,557 ERROR [TezChild]: tez.MapRecordSource (MapRecordSource.java:processRow(90)) - java.lang.IllegalStateException: Invalid input path file:/acorn/QC/OraExtract/20160131/Devices/Devices_extract_20160229T080613_3
>         at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:415)
>         at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:457)
>         at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1069)
>         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:501)
>
>
>
> 2016-08-25 15:45:41,817 INFO  [TezChild]: io.HiveContextAwareRecordReader (HiveContextAwareRecordReader.java:doNext(326)) –
> Cannot get partition description from file:/acorn/QC/reportlib/VM_ValEdit.24656because cannot find dir = file:/ac
> orn/QC/reportlib/VM_ValEdit.24656 in pathToPartitionInfo: [file:/acorn/QC/OraExtract/20160131/Devices]
>
>
>
> Regards,
> Uday
>
>
>
>
> Kantar Disclaimer <hive.log.submit.gz>


Re: Parallel queries to HS2/Tez

Posted by Hitesh Shah <hi...@apache.org>.
Hello Uday,

I don’t believe anyone has tried running 2 dags in parallel in local mode within the same TezClient ( and definitely not for HiveServer2 ). If this is with 2 instances of Tez client, this could likely be a bug in terms of either how Hive is setting up the TezClient for local mode with the same directories or a bug somewhere in Tez where clashing directories for intermediate data might be causing an issue. FWIW, the Tez AM does not support running 2 dags in parallel and quite a bit of this code path is used with local mode. 

It would be great if you could file a JIRA for this with more detailed logs and then take help of the dev community to come up with a patch that addresses the issue in your environment.

thanks
— Hitesh 



 

> On Aug 25, 2016, at 8:34 AM, Chitragar, Uday (KMLWG) <Ud...@KantarMedia.com> wrote:
> 
> Hello,
>  
> When running parallel queries (simultaneous connections by two beeline clients to HS2), I get the following exception (full debug attached), interestingly running the queries one after the other completes without any problem. 
>  
> The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode.
> Apologies in advance if this forum is not the right place for this question, thank you.
>  
> 2016-08-25 15:45:41,333 DEBUG [TezTaskEventRouter{attempt_1472136335089_0001_1_01_000000_0}]: impl.ShuffleInputEventHandlerImpl (ShuffleInputEventHandlerImpl.java:processDataMovementEvent(127)) - DME srcIdx: 0, targetIndex: 9, attemptNum
> : 0, payload: [hasEmptyPartitions: true, host: , port: 0, pathComponent: , runDuration: 0]
> 2016-08-25 15:45:41,557 ERROR [TezChild]: tez.MapRecordSource (MapRecordSource.java:processRow(90)) - java.lang.IllegalStateException: Invalid input path file:/acorn/QC/OraExtract/20160131/Devices/Devices_extract_20160229T080613_3
>         at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:415)
>         at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:457)
>         at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1069)
>         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:501)
>  
>  
>  
> 2016-08-25 15:45:41,817 INFO  [TezChild]: io.HiveContextAwareRecordReader (HiveContextAwareRecordReader.java:doNext(326)) –
> Cannot get partition description from file:/acorn/QC/reportlib/VM_ValEdit.24656because cannot find dir = file:/ac
> orn/QC/reportlib/VM_ValEdit.24656 in pathToPartitionInfo: [file:/acorn/QC/OraExtract/20160131/Devices]
>  
>  
>  
> Regards,
> Uday
>  
>  
> 
> 
> Kantar Disclaimer <hive.log.submit.gz>