You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by Chris K Wensel <ch...@wensel.net> on 2014/09/11 21:09:24 UTC

noob local resource question

I'm setting my what used to be called a hadoop job jar as a local resource, with APPLICATION visibility, of type PATTERN with the pattern "(?:classes/|lib/).*" (right from the JobConf)

the good news is when a remote tez client starts, the job jar is downloaded, and unpacked using the pattern

proof:

find /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/ | grep logparser.jar
/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/.tmp_logparser.jar.crc
/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar
/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib
/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/jgraphx-2.0.0.1.jar
/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/jgrapht-ext-0.9.0.jar
/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/jgraph-5.13.0.0.jar
/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/cascading-xml-3.0.0-wip-dev.jar
/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/tagsoup-1.2.jar
/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/riffle-0.1-dev.jar
/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/jgrapht-core-0.9.0.jar
/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/cascading-hadoop2-tez-3.0.0-wip-dev.jar
/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/janino-2.6.1.jar
/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/cascading-core-3.0.0-wip-dev.jar
/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/commons-compiler-2.6.1.jar
/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/logparser.jar

the bad news is that the 'launch_container.sh' is only adding
ln -sf "/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar" "logparser.jar"

so the containers only see
/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/container_1410456214816_0001_01_000004/logparser.jar

which isn't terribly helpful as it result in

2014-09-11 10:57:41,001 INFO [TezChild] org.apache.tez.runtime.task.TezTaskRunner: Encounted an error while executing task: attempt_1410456214816_0001_1_00_000000_2
org.apache.tez.dag.api.TezUncheckedException: Unable to load class: cascading.flow.tez.FlowProcessor
	at org.apache.tez.common.ReflectionUtils.getClazz(ReflectionUtils.java:45)
	at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:96)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(LogicalIOProcessorRuntimeTask.java:563)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(LogicalIOProcessorRuntimeTask.java:187)

i'm obviously missing something. or is making classic job jars (with a lib folder) isn't really supported transparently anymore (as an option) which will cause some grief.

this is hadoop 2.4.1

ckw

--
Chris K Wensel
chris@concurrentinc.com
http://concurrentinc.com


RE: noob local resource question

Posted by Bikas Saha <bi...@hortonworks.com>.
We should probably update the javadocs to clarify that depending on the
local resource configuration yarn will unpack archives etc but yarn will
not do anything to the classpath because yarn does not know the
semantics/structure of that archive. To do something like that, the user
needs to add files (based on the structure of the archive) to the
classpath using the setTaskEnvironment() API.

If you have a generic helper that does the classpath addition based on
archive structure then we could consider adding that as a helper method in
TezUtils.

Bikas

-----Original Message-----
From: Chris K Wensel [mailto:chris@wensel.net]
Sent: Thursday, September 11, 2014 1:41 PM
To: user@tez.apache.org
Subject: Re: noob local resource question

Thanks.

some of the confusion comes from DAG offering up commonTaskLocalFiles
(which support extraction patterns and magically making the resources
classpath aware, -- now obviously not the extracted bits --) but not a
'commonTaskEnvironment', so some naive leaps were made.

ckw

On Sep 11, 2014, at 1:15 PM, Hitesh Shah <hi...@apache.org> wrote:

> Hi Chris,
>
> Unlike MR and its support of distributed cache, Tez does not make any
inferences into the structure of the LocalResources specified ( i.e
structure of tarball, jar, etc ) and therefore expects the user to modify
the class path as needed.
>
> It might be something worth considering as a new feature ( please file a
jira ) but the current implementation expects the user to setup the
classpath as needed to handle tar-balls, fat-jars, etc correctly.
>
> - Hitesh
>
>
> On Sep 11, 2014, at 12:09 PM, Chris K Wensel <ch...@wensel.net> wrote:
>
>>
>> I'm setting my what used to be called a hadoop job jar as a local
>> resource, with APPLICATION visibility, of type PATTERN with the
>> pattern "(?:classes/|lib/).*" (right from the JobConf)
>>
>> the good news is when a remote tez client starts, the job jar is
>> downloaded, and unpacked using the pattern
>>
>> proof:
>>
>> find
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/ | grep logparser.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/filecache/14/.tmp_logparser.jar.crc
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/filecache/14/logparser.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/filecache/14/logparser.jar/lib
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/filecache/14/logparser.jar/lib/jgraphx-2.0.0.1.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/filecache/14/logparser.jar/lib/jgrapht-ext-0.9.0.j
>> ar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/filecache/14/logparser.jar/lib/jgraph-5.13.0.0.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/filecache/14/logparser.jar/lib/cascading-xml-3.0.0
>> -wip-dev.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/filecache/14/logparser.jar/lib/tagsoup-1.2.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/filecache/14/logparser.jar/lib/riffle-0.1-dev.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/filecache/14/logparser.jar/lib/jgrapht-core-0.9.0.
>> jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/filecache/14/logparser.jar/lib/cascading-hadoop2-t
>> ez-3.0.0-wip-dev.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/filecache/14/logparser.jar/lib/janino-2.6.1.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/filecache/14/logparser.jar/lib/cascading-core-3.0.
>> 0-wip-dev.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/filecache/14/logparser.jar/lib/commons-compiler-2.
>> 6.1.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/filecache/14/logparser.jar/logparser.jar
>>
>> the bad news is that the 'launch_container.sh' is only adding ln -sf
>>
"/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410
456214816_0001/filecache/14/logparser.jar" "logparser.jar"
>>
>> so the containers only see
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_
>> 1410456214816_0001/container_1410456214816_0001_01_000004/logparser.j
>> ar
>>
>> which isn't terribly helpful as it result in
>>
>> 2014-09-11 10:57:41,001 INFO [TezChild]
>> org.apache.tez.runtime.task.TezTaskRunner: Encounted an error while
>> executing task: attempt_1410456214816_0001_1_00_000000_2
>> org.apache.tez.dag.api.TezUncheckedException: Unable to load class:
cascading.flow.tez.FlowProcessor
>> 	at
org.apache.tez.common.ReflectionUtils.getClazz(ReflectionUtils.java:45)
>> 	at
org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.
java:96)
>> 	at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(Logic
alIOProcessorRuntimeTask.java:563)
>> 	at
>> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(Logic
>> alIOProcessorRuntimeTask.java:187)
>>
>> i'm obviously missing something. or is making classic job jars (with a
lib folder) isn't really supported transparently anymore (as an option)
which will cause some grief.
>>
>> this is hadoop 2.4.1
>>
>> ckw
>>
>> --
>> Chris K Wensel
>> chris@concurrentinc.com
>> http://concurrentinc.com
>>
>

--
Chris K Wensel
chris@concurrentinc.com
http://concurrentinc.com

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: noob local resource question

Posted by Chris K Wensel <ch...@wensel.net>.
Thanks. 

some of the confusion comes from DAG offering up commonTaskLocalFiles (which support extraction patterns and magically making the resources classpath aware, -- now obviously not the extracted bits --) but not a 'commonTaskEnvironment', so some naive leaps were made.

ckw

On Sep 11, 2014, at 1:15 PM, Hitesh Shah <hi...@apache.org> wrote:

> Hi Chris, 
> 
> Unlike MR and its support of distributed cache, Tez does not make any inferences into the structure of the LocalResources specified ( i.e structure of tarball, jar, etc ) and therefore expects the user to modify the class path as needed. 
> 
> It might be something worth considering as a new feature ( please file a jira ) but the current implementation expects the user to setup the classpath as needed to handle tar-balls, fat-jars, etc correctly. 
> 
> — Hitesh 
> 
> 
> On Sep 11, 2014, at 12:09 PM, Chris K Wensel <ch...@wensel.net> wrote:
> 
>> 
>> I'm setting my what used to be called a hadoop job jar as a local resource, with APPLICATION visibility, of type PATTERN with the pattern "(?:classes/|lib/).*" (right from the JobConf)
>> 
>> the good news is when a remote tez client starts, the job jar is downloaded, and unpacked using the pattern
>> 
>> proof:
>> 
>> find /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/ | grep logparser.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/.tmp_logparser.jar.crc
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/jgraphx-2.0.0.1.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/jgrapht-ext-0.9.0.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/jgraph-5.13.0.0.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/cascading-xml-3.0.0-wip-dev.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/tagsoup-1.2.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/riffle-0.1-dev.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/jgrapht-core-0.9.0.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/cascading-hadoop2-tez-3.0.0-wip-dev.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/janino-2.6.1.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/cascading-core-3.0.0-wip-dev.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/commons-compiler-2.6.1.jar
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/logparser.jar
>> 
>> the bad news is that the 'launch_container.sh' is only adding
>> ln -sf "/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar" "logparser.jar"
>> 
>> so the containers only see
>> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/container_1410456214816_0001_01_000004/logparser.jar
>> 
>> which isn't terribly helpful as it result in
>> 
>> 2014-09-11 10:57:41,001 INFO [TezChild] org.apache.tez.runtime.task.TezTaskRunner: Encounted an error while executing task: attempt_1410456214816_0001_1_00_000000_2
>> org.apache.tez.dag.api.TezUncheckedException: Unable to load class: cascading.flow.tez.FlowProcessor
>> 	at org.apache.tez.common.ReflectionUtils.getClazz(ReflectionUtils.java:45)
>> 	at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:96)
>> 	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(LogicalIOProcessorRuntimeTask.java:563)
>> 	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(LogicalIOProcessorRuntimeTask.java:187)
>> 
>> i'm obviously missing something. or is making classic job jars (with a lib folder) isn't really supported transparently anymore (as an option) which will cause some grief.
>> 
>> this is hadoop 2.4.1
>> 
>> ckw
>> 
>> --
>> Chris K Wensel
>> chris@concurrentinc.com
>> http://concurrentinc.com
>> 
> 

--
Chris K Wensel
chris@concurrentinc.com
http://concurrentinc.com


Re: noob local resource question

Posted by Hitesh Shah <hi...@apache.org>.
Hi Chris, 

Unlike MR and its support of distributed cache, Tez does not make any inferences into the structure of the LocalResources specified ( i.e structure of tarball, jar, etc ) and therefore expects the user to modify the class path as needed. 

It might be something worth considering as a new feature ( please file a jira ) but the current implementation expects the user to setup the classpath as needed to handle tar-balls, fat-jars, etc correctly. 

— Hitesh 


On Sep 11, 2014, at 12:09 PM, Chris K Wensel <ch...@wensel.net> wrote:

> 
> I'm setting my what used to be called a hadoop job jar as a local resource, with APPLICATION visibility, of type PATTERN with the pattern "(?:classes/|lib/).*" (right from the JobConf)
> 
> the good news is when a remote tez client starts, the job jar is downloaded, and unpacked using the pattern
> 
> proof:
> 
> find /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/ | grep logparser.jar
> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/.tmp_logparser.jar.crc
> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar
> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib
> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/jgraphx-2.0.0.1.jar
> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/jgrapht-ext-0.9.0.jar
> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/jgraph-5.13.0.0.jar
> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/cascading-xml-3.0.0-wip-dev.jar
> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/tagsoup-1.2.jar
> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/riffle-0.1-dev.jar
> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/jgrapht-core-0.9.0.jar
> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/cascading-hadoop2-tez-3.0.0-wip-dev.jar
> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/janino-2.6.1.jar
> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/cascading-core-3.0.0-wip-dev.jar
> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/lib/commons-compiler-2.6.1.jar
> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar/logparser.jar
> 
> the bad news is that the 'launch_container.sh' is only adding
> ln -sf "/tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/filecache/14/logparser.jar" "logparser.jar"
> 
> so the containers only see
> /tmp/hadoop-root/nm-local-dir/usercache/cwensel/appcache/application_1410456214816_0001/container_1410456214816_0001_01_000004/logparser.jar
> 
> which isn't terribly helpful as it result in
> 
> 2014-09-11 10:57:41,001 INFO [TezChild] org.apache.tez.runtime.task.TezTaskRunner: Encounted an error while executing task: attempt_1410456214816_0001_1_00_000000_2
> org.apache.tez.dag.api.TezUncheckedException: Unable to load class: cascading.flow.tez.FlowProcessor
> 	at org.apache.tez.common.ReflectionUtils.getClazz(ReflectionUtils.java:45)
> 	at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:96)
> 	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(LogicalIOProcessorRuntimeTask.java:563)
> 	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.initialize(LogicalIOProcessorRuntimeTask.java:187)
> 
> i'm obviously missing something. or is making classic job jars (with a lib folder) isn't really supported transparently anymore (as an option) which will cause some grief.
> 
> this is hadoop 2.4.1
> 
> ckw
> 
> --
> Chris K Wensel
> chris@concurrentinc.com
> http://concurrentinc.com
>