You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Kim Chew <kc...@gmail.com> on 2014/04/09 00:27:43 UTC

using "-libjars" in Hadoop 2.2.1

It seems to me that in Hadoop 2.2.1, using the "libjars" option does not
search the jars located in the the local file system but HDFS. For example,

hadoop jar target/myJar.jar Foo -libjars /home/kchew/test-libs/testJar.jar
/user/kchew/inputs/raw.vector /user/kchew/outputs hdfs://remoteNN:8020
remoteJT:8021

14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging area
file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
14/04/08 15:11:02 ERROR security.UserGroupInformation:
PriviledgedActionException as:kchew (auth:SIMPLE)
cause:java.io.FileNotFoundException: File does not exist:
hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
java.io.FileNotFoundException: File does not exist:
hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
    at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
    at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
    at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
    at
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
    at
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
    at
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
    at
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
    at
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)

So under Hadoop 2.2.1, do I have to explicitly set some configurations so
when using the "libjars" option it will copy the file to hdfs from local fs?

TIA

Kim

Re: using "-libjars" in Hadoop 2.2.1

Posted by Abdelrahman Shettia <as...@hortonworks.com>.

Hi Kim,

Correction, the command is :

ps aux | grep -i resource


Also, I notice that you are using some configurations of Jobtracker, which
is not going to be used in for Hadoop 2.x. Here is a sample for all of the
RM configurations from my sandbox one node machine:


mapred-site.xml-    <property>

mapred-site.xml-    <name>mapreduce.jobhistory.webapp.address</name>

mapred-site.xml:    <value>sandbox.com:19888</value>

mapred-site.xml-  </property>

mapred-site.xml-    <property>

--

mapred-site.xml-    <property>

mapred-site.xml-    <name>mapreduce.jobhistory.address</name>

mapred-site.xml:    <value>sandbox.com:10020</value>

mapred-site.xml-  </property>

mapred-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.resource-tracker.address</name>

yarn-site.xml:    <value>sandbox.com:8025</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.admin.address</name>

yarn-site.xml:    <value>sandbox.com:8141</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.hostname</name>

yarn-site.xml:    <value>sandbox.com</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.scheduler.address</name>

yarn-site.xml:    <value>sandbox.com:8030</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.log.server.url</name>

yarn-site.xml:    <value>http://sandbox.com:19888/jobhistory/logs</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.webapp.address</name>

yarn-site.xml:    <value>sandbox.com:8088</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.address</name>

yarn-site.xml:    <value>sandbox.com:8050</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>


Thanks

-Rahman





On Wed, Apr 16, 2014 at 1:30 PM, Abdelrahman Shettia <
ashettia@hortonworks.com> wrote:

> Hi Kim,
>
> You can try to grep on the RM java process by running the following
> command:
>
> ps aux | grep
>
>
>
>
> On Wed, Apr 16, 2014 at 10:31 AM, Kim Chew <kc...@gmail.com> wrote:
>
>> Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml
>> so it tried to run the job locally. Now I am running into the problem that
>> Rahul has, I am unable to to connect to the ResourceManager.
>>
>> The setup of my targeted cluster runs MR1 instead of YARN, hence the "
>> mapreduce.framework.name" is set to "classic".
>>
>> Here are my settings in my mapred-site.xml on the client side.
>>
>> <property>
>>     <!-- Pointed to the remote JobTracker -->
>>         <name>mapreduce.job.tracker.address</name>
>>         <value>172.31.3.150:8021</value>
>>     </property>
>>     <property>
>>         <name>mapreduce.framework.name</name>
>>         <value>yarn</value>
>>     </property>
>>
>> and my yarn-site.xml
>>
>>        <property>
>>             <description>The hostname of the RM.</description>
>>             <name>yarn.resourcemanager.hostname</name>
>>             <value>172.31.3.150</value>
>>         </property>
>>
>>         <property>
>>             <description>The address of the applications manager
>> interface in the RM.</description>
>>             <name>yarn.resourcemanager.address</name>
>>             <value>${yarn.resourcemanager.hostname}:8032</value>
>>         </property>
>>
>> 14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
>> 172.31.3.150:8032
>> 14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
>> hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
>> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>> sleepTime=1 SECONDS)
>>
>> Therefore, the question is how do I figure out where the ResourceManager
>> is running?
>>
>> TIA
>>
>> Kim
>>
>>
>>
>>  On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
>> ashettia@hortonworks.com> wrote:
>>
>>>  Hi Kim,
>>>
>>> It looks like it is pointing to hdfs location. Can you create the hdfs
>>> dir and put the jar there? Hope this helps
>>> Thanks,
>>> Rahman
>>>
>>> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
>>> wrote:
>>>
>>> any help...all are welcome?
>>>
>>>
>>> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <smart.rahul.iiit@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>  I am running with the following command but still, jar is not
>>>> available to mapper and reducers.
>>>>
>>>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>>>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>>>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>>>> -Dmapreduce.user.classpath.first=true
>>>>
>>>>
>>>> Error Log
>>>>
>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at
>>>> /0.0.0.0:8032
>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at
>>>> /0.0.0.0:8032
>>>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>> option parsing not performed. Implement the Tool interface and execute your
>>>> application with ToolRunner to remedy this.
>>>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>>>> process : 1
>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for
>>>> job: job_1397534064728_0028
>>>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>>>> application_1397534064728_0028
>>>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>>>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>>>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job:
>>>> job_1397534064728_0028
>>>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028
>>>> running in uber mode : false
>>>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>>>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>>>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>>>> Error: java.lang.RuntimeException: Error in configuring object
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>>>     at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>>     ... 9 more
>>>> Caused by: java.lang.NoClassDefFoundError:
>>>> org/json/simple/parser/ParseException
>>>>     at java.lang.Class.forName0(Native Method)
>>>>     at java.lang.Class.forName(Class.java:270)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>>>     at
>>>> org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>>>     ... 14 more
>>>> Caused by: java.lang.ClassNotFoundException:
>>>> org.json.simple.parser.ParseException
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>>>     ... 22 more
>>>>
>>>> When i analyzed the logs it says
>>>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>> option parsing not performed. Implement the Tool interface and execute your
>>>> application with ToolRunner to remedy this."
>>>>
>>>> But i have implemented the tool class as described below:
>>>>
>>>> package my.search;
>>>>
>>>> import org.apache.hadoop.conf.Configured;
>>>> import org.apache.hadoop.fs.Path;
>>>> import org.apache.hadoop.io.Text;
>>>> import org.apache.hadoop.mapred.FileInputFormat;
>>>> import org.apache.hadoop.mapred.FileOutputFormat;
>>>> import org.apache.hadoop.mapred.JobClient;
>>>> import org.apache.hadoop.mapred.JobConf;
>>>> import org.apache.hadoop.mapred.TextInputFormat;
>>>> import org.apache.hadoop.mapred.TextOutputFormat;
>>>> import org.apache.hadoop.util.Tool;
>>>> import org.apache.hadoop.util.ToolRunner;
>>>>
>>>> public class Minerva extends Configured implements Tool
>>>> {
>>>>     public int run(String[] args) throws Exception {
>>>>         JobConf conf = new JobConf(Minerva.class);
>>>>         conf.setJobName("minerva sample job");
>>>>
>>>>         conf.setMapOutputKeyClass(Text.class);
>>>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>>>
>>>>         conf.setOutputKeyClass(Text.class);
>>>>         conf.setOutputValueClass(Text.class);
>>>>
>>>>         conf.setMapperClass(Map.class);
>>>>         // conf.setCombinerClass(Reduce.class);
>>>>         conf.setReducerClass(Reduce.class);
>>>>
>>>>         conf.setInputFormat(TextInputFormat.class);
>>>>         conf.setOutputFormat(TextOutputFormat.class);
>>>>
>>>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>>
>>>>         JobClient.runJob(conf);
>>>>
>>>>         return 0;
>>>>     }
>>>>
>>>>     public static void main(String[] args) throws Exception {
>>>>         int res = ToolRunner.run(new Minerva(), args);
>>>>         System.exit(res);
>>>>     }
>>>> }
>>>>
>>>>
>>>> Please let me know if you see any issues?
>>>>
>>>>
>>>>
>>>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com>wrote:
>>>>
>>>>> add '-Dmapreduce.user.classpath.first=true' to your command and try
>>>>> again
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>>>
>>>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>>>> not search the jars located in the the local file system but HDFS. For
>>>>>> example,
>>>>>>
>>>>>> hadoop jar target/myJar.jar Foo -libjars
>>>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>>>
>>>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>>>> processName=JobTracker, sessionId=
>>>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the
>>>>>> staging area
>>>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>> java.io.FileNotFoundException: File does not exist:
>>>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>>>     at
>>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>>>
>>>>>> So under Hadoop 2.2.1, do I have to explicitly set some
>>>>>> configurations so when using the "libjars" option it will copy the file to
>>>>>> hdfs from local fs?
>>>>>>
>>>>>> TIA
>>>>>>
>>>>>> Kim
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>> Shengjun
>>>>>
>>>>
>>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: using "-libjars" in Hadoop 2.2.1

Posted by Abdelrahman Shettia <as...@hortonworks.com>.

Hi Kim,

Correction, the command is :

ps aux | grep -i resource


Also, I notice that you are using some configurations of Jobtracker, which
is not going to be used in for Hadoop 2.x. Here is a sample for all of the
RM configurations from my sandbox one node machine:


mapred-site.xml-    <property>

mapred-site.xml-    <name>mapreduce.jobhistory.webapp.address</name>

mapred-site.xml:    <value>sandbox.com:19888</value>

mapred-site.xml-  </property>

mapred-site.xml-    <property>

--

mapred-site.xml-    <property>

mapred-site.xml-    <name>mapreduce.jobhistory.address</name>

mapred-site.xml:    <value>sandbox.com:10020</value>

mapred-site.xml-  </property>

mapred-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.resource-tracker.address</name>

yarn-site.xml:    <value>sandbox.com:8025</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.admin.address</name>

yarn-site.xml:    <value>sandbox.com:8141</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.hostname</name>

yarn-site.xml:    <value>sandbox.com</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.scheduler.address</name>

yarn-site.xml:    <value>sandbox.com:8030</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.log.server.url</name>

yarn-site.xml:    <value>http://sandbox.com:19888/jobhistory/logs</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.webapp.address</name>

yarn-site.xml:    <value>sandbox.com:8088</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.address</name>

yarn-site.xml:    <value>sandbox.com:8050</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>


Thanks

-Rahman





On Wed, Apr 16, 2014 at 1:30 PM, Abdelrahman Shettia <
ashettia@hortonworks.com> wrote:

> Hi Kim,
>
> You can try to grep on the RM java process by running the following
> command:
>
> ps aux | grep
>
>
>
>
> On Wed, Apr 16, 2014 at 10:31 AM, Kim Chew <kc...@gmail.com> wrote:
>
>> Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml
>> so it tried to run the job locally. Now I am running into the problem that
>> Rahul has, I am unable to to connect to the ResourceManager.
>>
>> The setup of my targeted cluster runs MR1 instead of YARN, hence the "
>> mapreduce.framework.name" is set to "classic".
>>
>> Here are my settings in my mapred-site.xml on the client side.
>>
>> <property>
>>     <!-- Pointed to the remote JobTracker -->
>>         <name>mapreduce.job.tracker.address</name>
>>         <value>172.31.3.150:8021</value>
>>     </property>
>>     <property>
>>         <name>mapreduce.framework.name</name>
>>         <value>yarn</value>
>>     </property>
>>
>> and my yarn-site.xml
>>
>>        <property>
>>             <description>The hostname of the RM.</description>
>>             <name>yarn.resourcemanager.hostname</name>
>>             <value>172.31.3.150</value>
>>         </property>
>>
>>         <property>
>>             <description>The address of the applications manager
>> interface in the RM.</description>
>>             <name>yarn.resourcemanager.address</name>
>>             <value>${yarn.resourcemanager.hostname}:8032</value>
>>         </property>
>>
>> 14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
>> 172.31.3.150:8032
>> 14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
>> hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
>> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>> sleepTime=1 SECONDS)
>>
>> Therefore, the question is how do I figure out where the ResourceManager
>> is running?
>>
>> TIA
>>
>> Kim
>>
>>
>>
>>  On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
>> ashettia@hortonworks.com> wrote:
>>
>>>  Hi Kim,
>>>
>>> It looks like it is pointing to hdfs location. Can you create the hdfs
>>> dir and put the jar there? Hope this helps
>>> Thanks,
>>> Rahman
>>>
>>> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
>>> wrote:
>>>
>>> any help...all are welcome?
>>>
>>>
>>> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <smart.rahul.iiit@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>  I am running with the following command but still, jar is not
>>>> available to mapper and reducers.
>>>>
>>>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>>>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>>>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>>>> -Dmapreduce.user.classpath.first=true
>>>>
>>>>
>>>> Error Log
>>>>
>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at
>>>> /0.0.0.0:8032
>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at
>>>> /0.0.0.0:8032
>>>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>> option parsing not performed. Implement the Tool interface and execute your
>>>> application with ToolRunner to remedy this.
>>>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>>>> process : 1
>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for
>>>> job: job_1397534064728_0028
>>>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>>>> application_1397534064728_0028
>>>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>>>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>>>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job:
>>>> job_1397534064728_0028
>>>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028
>>>> running in uber mode : false
>>>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>>>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>>>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>>>> Error: java.lang.RuntimeException: Error in configuring object
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>>>     at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>>     ... 9 more
>>>> Caused by: java.lang.NoClassDefFoundError:
>>>> org/json/simple/parser/ParseException
>>>>     at java.lang.Class.forName0(Native Method)
>>>>     at java.lang.Class.forName(Class.java:270)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>>>     at
>>>> org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>>>     ... 14 more
>>>> Caused by: java.lang.ClassNotFoundException:
>>>> org.json.simple.parser.ParseException
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>>>     ... 22 more
>>>>
>>>> When i analyzed the logs it says
>>>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>> option parsing not performed. Implement the Tool interface and execute your
>>>> application with ToolRunner to remedy this."
>>>>
>>>> But i have implemented the tool class as described below:
>>>>
>>>> package my.search;
>>>>
>>>> import org.apache.hadoop.conf.Configured;
>>>> import org.apache.hadoop.fs.Path;
>>>> import org.apache.hadoop.io.Text;
>>>> import org.apache.hadoop.mapred.FileInputFormat;
>>>> import org.apache.hadoop.mapred.FileOutputFormat;
>>>> import org.apache.hadoop.mapred.JobClient;
>>>> import org.apache.hadoop.mapred.JobConf;
>>>> import org.apache.hadoop.mapred.TextInputFormat;
>>>> import org.apache.hadoop.mapred.TextOutputFormat;
>>>> import org.apache.hadoop.util.Tool;
>>>> import org.apache.hadoop.util.ToolRunner;
>>>>
>>>> public class Minerva extends Configured implements Tool
>>>> {
>>>>     public int run(String[] args) throws Exception {
>>>>         JobConf conf = new JobConf(Minerva.class);
>>>>         conf.setJobName("minerva sample job");
>>>>
>>>>         conf.setMapOutputKeyClass(Text.class);
>>>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>>>
>>>>         conf.setOutputKeyClass(Text.class);
>>>>         conf.setOutputValueClass(Text.class);
>>>>
>>>>         conf.setMapperClass(Map.class);
>>>>         // conf.setCombinerClass(Reduce.class);
>>>>         conf.setReducerClass(Reduce.class);
>>>>
>>>>         conf.setInputFormat(TextInputFormat.class);
>>>>         conf.setOutputFormat(TextOutputFormat.class);
>>>>
>>>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>>
>>>>         JobClient.runJob(conf);
>>>>
>>>>         return 0;
>>>>     }
>>>>
>>>>     public static void main(String[] args) throws Exception {
>>>>         int res = ToolRunner.run(new Minerva(), args);
>>>>         System.exit(res);
>>>>     }
>>>> }
>>>>
>>>>
>>>> Please let me know if you see any issues?
>>>>
>>>>
>>>>
>>>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com>wrote:
>>>>
>>>>> add '-Dmapreduce.user.classpath.first=true' to your command and try
>>>>> again
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>>>
>>>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>>>> not search the jars located in the the local file system but HDFS. For
>>>>>> example,
>>>>>>
>>>>>> hadoop jar target/myJar.jar Foo -libjars
>>>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>>>
>>>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>>>> processName=JobTracker, sessionId=
>>>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the
>>>>>> staging area
>>>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>> java.io.FileNotFoundException: File does not exist:
>>>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>>>     at
>>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>>>
>>>>>> So under Hadoop 2.2.1, do I have to explicitly set some
>>>>>> configurations so when using the "libjars" option it will copy the file to
>>>>>> hdfs from local fs?
>>>>>>
>>>>>> TIA
>>>>>>
>>>>>> Kim
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>> Shengjun
>>>>>
>>>>
>>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: using "-libjars" in Hadoop 2.2.1

Posted by Rahul Singh <sm...@gmail.com>.

Please could anyone respond to my query above:


Why i am getting this warning?

14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.

Because of this my libjar is not getting picked up and i am getting class
def not found error.

Thanks and Regards,
Rahul Singh


On Thu, Apr 17, 2014 at 2:08 AM, Kim Chew <kc...@gmail.com> wrote:

> Thanks Rahman. This problem can be boiled down to how to submit a job
> compiled with Hadoop-1.1.1 remotely to a Hadoop 2 cluster that has not
> turned on YARN. I will open another thread for it.
>
> Kim
>
>
> On Wed, Apr 16, 2014 at 1:30 PM, Abdelrahman Shettia <
> ashettia@hortonworks.com> wrote:
>
>> Hi Kim,
>>
>> You can try to grep on the RM java process by running the following
>> command:
>>
>> ps aux | grep
>>
>>
>>
>>
>> On Wed, Apr 16, 2014 at 10:31 AM, Kim Chew <kc...@gmail.com> wrote:
>>
>>> Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml
>>> so it tried to run the job locally. Now I am running into the problem that
>>> Rahul has, I am unable to to connect to the ResourceManager.
>>>
>>> The setup of my targeted cluster runs MR1 instead of YARN, hence the "
>>> mapreduce.framework.name" is set to "classic".
>>>
>>> Here are my settings in my mapred-site.xml on the client side.
>>>
>>> <property>
>>>     <!-- Pointed to the remote JobTracker -->
>>>         <name>mapreduce.job.tracker.address</name>
>>>         <value>172.31.3.150:8021</value>
>>>     </property>
>>>     <property>
>>>         <name>mapreduce.framework.name</name>
>>>         <value>yarn</value>
>>>     </property>
>>>
>>> and my yarn-site.xml
>>>
>>>        <property>
>>>             <description>The hostname of the RM.</description>
>>>             <name>yarn.resourcemanager.hostname</name>
>>>             <value>172.31.3.150</value>
>>>         </property>
>>>
>>>         <property>
>>>             <description>The address of the applications manager
>>> interface in the RM.</description>
>>>             <name>yarn.resourcemanager.address</name>
>>>             <value>${yarn.resourcemanager.hostname}:8032</value>
>>>         </property>
>>>
>>> 14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 172.31.3.150:8032
>>> 14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
>>> hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
>>> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>>> sleepTime=1 SECONDS)
>>>
>>> Therefore, the question is how do I figure out where the ResourceManager
>>> is running?
>>>
>>> TIA
>>>
>>> Kim
>>>
>>>
>>>
>>>  On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
>>> ashettia@hortonworks.com> wrote:
>>>
>>>>  Hi Kim,
>>>>
>>>> It looks like it is pointing to hdfs location. Can you create the hdfs
>>>> dir and put the jar there? Hope this helps
>>>> Thanks,
>>>> Rahman
>>>>
>>>> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
>>>> wrote:
>>>>
>>>> any help...all are welcome?
>>>>
>>>>
>>>> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <
>>>> smart.rahul.iiit@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>  I am running with the following command but still, jar is not
>>>>> available to mapper and reducers.
>>>>>
>>>>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>>>>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>>>>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>>>>> -Dmapreduce.user.classpath.first=true
>>>>>
>>>>>
>>>>> Error Log
>>>>>
>>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager
>>>>> at /0.0.0.0:8032
>>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager
>>>>> at /0.0.0.0:8032
>>>>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>>> option parsing not performed. Implement the Tool interface and execute your
>>>>> application with ToolRunner to remedy this.
>>>>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>>>>> process : 1
>>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for
>>>>> job: job_1397534064728_0028
>>>>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>>>>> application_1397534064728_0028
>>>>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>>>>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>>>>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job:
>>>>> job_1397534064728_0028
>>>>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028
>>>>> running in uber mode : false
>>>>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>>>>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>>>>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>>>>> Error: java.lang.RuntimeException: Error in configuring object
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>>>>     at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>>>     ... 9 more
>>>>> Caused by: java.lang.NoClassDefFoundError:
>>>>> org/json/simple/parser/ParseException
>>>>>     at java.lang.Class.forName0(Native Method)
>>>>>     at java.lang.Class.forName(Class.java:270)
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>>>>     at
>>>>> org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>>>>     ... 14 more
>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>> org.json.simple.parser.ParseException
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>>>>     ... 22 more
>>>>>
>>>>> When i analyzed the logs it says
>>>>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>>> option parsing not performed. Implement the Tool interface and execute your
>>>>> application with ToolRunner to remedy this."
>>>>>
>>>>> But i have implemented the tool class as described below:
>>>>>
>>>>> package my.search;
>>>>>
>>>>> import org.apache.hadoop.conf.Configured;
>>>>> import org.apache.hadoop.fs.Path;
>>>>> import org.apache.hadoop.io.Text;
>>>>> import org.apache.hadoop.mapred.FileInputFormat;
>>>>> import org.apache.hadoop.mapred.FileOutputFormat;
>>>>> import org.apache.hadoop.mapred.JobClient;
>>>>> import org.apache.hadoop.mapred.JobConf;
>>>>> import org.apache.hadoop.mapred.TextInputFormat;
>>>>> import org.apache.hadoop.mapred.TextOutputFormat;
>>>>> import org.apache.hadoop.util.Tool;
>>>>> import org.apache.hadoop.util.ToolRunner;
>>>>>
>>>>> public class Minerva extends Configured implements Tool
>>>>> {
>>>>>     public int run(String[] args) throws Exception {
>>>>>         JobConf conf = new JobConf(Minerva.class);
>>>>>         conf.setJobName("minerva sample job");
>>>>>
>>>>>         conf.setMapOutputKeyClass(Text.class);
>>>>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>>>>
>>>>>         conf.setOutputKeyClass(Text.class);
>>>>>         conf.setOutputValueClass(Text.class);
>>>>>
>>>>>         conf.setMapperClass(Map.class);
>>>>>         // conf.setCombinerClass(Reduce.class);
>>>>>         conf.setReducerClass(Reduce.class);
>>>>>
>>>>>         conf.setInputFormat(TextInputFormat.class);
>>>>>         conf.setOutputFormat(TextOutputFormat.class);
>>>>>
>>>>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>>>
>>>>>         JobClient.runJob(conf);
>>>>>
>>>>>         return 0;
>>>>>     }
>>>>>
>>>>>     public static void main(String[] args) throws Exception {
>>>>>         int res = ToolRunner.run(new Minerva(), args);
>>>>>         System.exit(res);
>>>>>     }
>>>>> }
>>>>>
>>>>>
>>>>> Please let me know if you see any issues?
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com>wrote:
>>>>>
>>>>>> add '-Dmapreduce.user.classpath.first=true' to your command and try
>>>>>> again
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>>>>
>>>>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>>>>> not search the jars located in the the local file system but HDFS. For
>>>>>>> example,
>>>>>>>
>>>>>>> hadoop jar target/myJar.jar Foo -libjars
>>>>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>>>>
>>>>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>>>>> processName=JobTracker, sessionId=
>>>>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the
>>>>>>> staging area
>>>>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>>> java.io.FileNotFoundException: File does not exist:
>>>>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>>>>     at
>>>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>>>>
>>>>>>> So under Hadoop 2.2.1, do I have to explicitly set some
>>>>>>> configurations so when using the "libjars" option it will copy the file to
>>>>>>> hdfs from local fs?
>>>>>>>
>>>>>>> TIA
>>>>>>>
>>>>>>> Kim
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards
>>>>>> Shengjun
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
>

Re: using "-libjars" in Hadoop 2.2.1

Posted by Rahul Singh <sm...@gmail.com>.

Please could anyone respond to my query above:


Why i am getting this warning?

14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.

Because of this my libjar is not getting picked up and i am getting class
def not found error.

Thanks and Regards,
Rahul Singh


On Thu, Apr 17, 2014 at 2:08 AM, Kim Chew <kc...@gmail.com> wrote:

> Thanks Rahman. This problem can be boiled down to how to submit a job
> compiled with Hadoop-1.1.1 remotely to a Hadoop 2 cluster that has not
> turned on YARN. I will open another thread for it.
>
> Kim
>
>
> On Wed, Apr 16, 2014 at 1:30 PM, Abdelrahman Shettia <
> ashettia@hortonworks.com> wrote:
>
>> Hi Kim,
>>
>> You can try to grep on the RM java process by running the following
>> command:
>>
>> ps aux | grep
>>
>>
>>
>>
>> On Wed, Apr 16, 2014 at 10:31 AM, Kim Chew <kc...@gmail.com> wrote:
>>
>>> Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml
>>> so it tried to run the job locally. Now I am running into the problem that
>>> Rahul has, I am unable to to connect to the ResourceManager.
>>>
>>> The setup of my targeted cluster runs MR1 instead of YARN, hence the "
>>> mapreduce.framework.name" is set to "classic".
>>>
>>> Here are my settings in my mapred-site.xml on the client side.
>>>
>>> <property>
>>>     <!-- Pointed to the remote JobTracker -->
>>>         <name>mapreduce.job.tracker.address</name>
>>>         <value>172.31.3.150:8021</value>
>>>     </property>
>>>     <property>
>>>         <name>mapreduce.framework.name</name>
>>>         <value>yarn</value>
>>>     </property>
>>>
>>> and my yarn-site.xml
>>>
>>>        <property>
>>>             <description>The hostname of the RM.</description>
>>>             <name>yarn.resourcemanager.hostname</name>
>>>             <value>172.31.3.150</value>
>>>         </property>
>>>
>>>         <property>
>>>             <description>The address of the applications manager
>>> interface in the RM.</description>
>>>             <name>yarn.resourcemanager.address</name>
>>>             <value>${yarn.resourcemanager.hostname}:8032</value>
>>>         </property>
>>>
>>> 14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 172.31.3.150:8032
>>> 14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
>>> hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
>>> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>>> sleepTime=1 SECONDS)
>>>
>>> Therefore, the question is how do I figure out where the ResourceManager
>>> is running?
>>>
>>> TIA
>>>
>>> Kim
>>>
>>>
>>>
>>>  On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
>>> ashettia@hortonworks.com> wrote:
>>>
>>>>  Hi Kim,
>>>>
>>>> It looks like it is pointing to hdfs location. Can you create the hdfs
>>>> dir and put the jar there? Hope this helps
>>>> Thanks,
>>>> Rahman
>>>>
>>>> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
>>>> wrote:
>>>>
>>>> any help...all are welcome?
>>>>
>>>>
>>>> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <
>>>> smart.rahul.iiit@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>  I am running with the following command but still, jar is not
>>>>> available to mapper and reducers.
>>>>>
>>>>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>>>>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>>>>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>>>>> -Dmapreduce.user.classpath.first=true
>>>>>
>>>>>
>>>>> Error Log
>>>>>
>>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager
>>>>> at /0.0.0.0:8032
>>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager
>>>>> at /0.0.0.0:8032
>>>>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>>> option parsing not performed. Implement the Tool interface and execute your
>>>>> application with ToolRunner to remedy this.
>>>>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>>>>> process : 1
>>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for
>>>>> job: job_1397534064728_0028
>>>>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>>>>> application_1397534064728_0028
>>>>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>>>>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>>>>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job:
>>>>> job_1397534064728_0028
>>>>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028
>>>>> running in uber mode : false
>>>>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>>>>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>>>>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>>>>> Error: java.lang.RuntimeException: Error in configuring object
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>>>>     at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>>>     ... 9 more
>>>>> Caused by: java.lang.NoClassDefFoundError:
>>>>> org/json/simple/parser/ParseException
>>>>>     at java.lang.Class.forName0(Native Method)
>>>>>     at java.lang.Class.forName(Class.java:270)
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>>>>     at
>>>>> org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>>>>     ... 14 more
>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>> org.json.simple.parser.ParseException
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>>>>     ... 22 more
>>>>>
>>>>> When i analyzed the logs it says
>>>>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>>> option parsing not performed. Implement the Tool interface and execute your
>>>>> application with ToolRunner to remedy this."
>>>>>
>>>>> But i have implemented the tool class as described below:
>>>>>
>>>>> package my.search;
>>>>>
>>>>> import org.apache.hadoop.conf.Configured;
>>>>> import org.apache.hadoop.fs.Path;
>>>>> import org.apache.hadoop.io.Text;
>>>>> import org.apache.hadoop.mapred.FileInputFormat;
>>>>> import org.apache.hadoop.mapred.FileOutputFormat;
>>>>> import org.apache.hadoop.mapred.JobClient;
>>>>> import org.apache.hadoop.mapred.JobConf;
>>>>> import org.apache.hadoop.mapred.TextInputFormat;
>>>>> import org.apache.hadoop.mapred.TextOutputFormat;
>>>>> import org.apache.hadoop.util.Tool;
>>>>> import org.apache.hadoop.util.ToolRunner;
>>>>>
>>>>> public class Minerva extends Configured implements Tool
>>>>> {
>>>>>     public int run(String[] args) throws Exception {
>>>>>         JobConf conf = new JobConf(Minerva.class);
>>>>>         conf.setJobName("minerva sample job");
>>>>>
>>>>>         conf.setMapOutputKeyClass(Text.class);
>>>>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>>>>
>>>>>         conf.setOutputKeyClass(Text.class);
>>>>>         conf.setOutputValueClass(Text.class);
>>>>>
>>>>>         conf.setMapperClass(Map.class);
>>>>>         // conf.setCombinerClass(Reduce.class);
>>>>>         conf.setReducerClass(Reduce.class);
>>>>>
>>>>>         conf.setInputFormat(TextInputFormat.class);
>>>>>         conf.setOutputFormat(TextOutputFormat.class);
>>>>>
>>>>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>>>
>>>>>         JobClient.runJob(conf);
>>>>>
>>>>>         return 0;
>>>>>     }
>>>>>
>>>>>     public static void main(String[] args) throws Exception {
>>>>>         int res = ToolRunner.run(new Minerva(), args);
>>>>>         System.exit(res);
>>>>>     }
>>>>> }
>>>>>
>>>>>
>>>>> Please let me know if you see any issues?
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com>wrote:
>>>>>
>>>>>> add '-Dmapreduce.user.classpath.first=true' to your command and try
>>>>>> again
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>>>>
>>>>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>>>>> not search the jars located in the the local file system but HDFS. For
>>>>>>> example,
>>>>>>>
>>>>>>> hadoop jar target/myJar.jar Foo -libjars
>>>>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>>>>
>>>>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>>>>> processName=JobTracker, sessionId=
>>>>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the
>>>>>>> staging area
>>>>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>>> java.io.FileNotFoundException: File does not exist:
>>>>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>>>>     at
>>>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>>>>
>>>>>>> So under Hadoop 2.2.1, do I have to explicitly set some
>>>>>>> configurations so when using the "libjars" option it will copy the file to
>>>>>>> hdfs from local fs?
>>>>>>>
>>>>>>> TIA
>>>>>>>
>>>>>>> Kim
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards
>>>>>> Shengjun
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
>

Re: using "-libjars" in Hadoop 2.2.1

Posted by Rahul Singh <sm...@gmail.com>.

Please could anyone respond to my query above:


Why i am getting this warning?

14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.

Because of this my libjar is not getting picked up and i am getting class
def not found error.

Thanks and Regards,
Rahul Singh


On Thu, Apr 17, 2014 at 2:08 AM, Kim Chew <kc...@gmail.com> wrote:

> Thanks Rahman. This problem can be boiled down to how to submit a job
> compiled with Hadoop-1.1.1 remotely to a Hadoop 2 cluster that has not
> turned on YARN. I will open another thread for it.
>
> Kim
>
>
> On Wed, Apr 16, 2014 at 1:30 PM, Abdelrahman Shettia <
> ashettia@hortonworks.com> wrote:
>
>> Hi Kim,
>>
>> You can try to grep on the RM java process by running the following
>> command:
>>
>> ps aux | grep
>>
>>
>>
>>
>> On Wed, Apr 16, 2014 at 10:31 AM, Kim Chew <kc...@gmail.com> wrote:
>>
>>> Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml
>>> so it tried to run the job locally. Now I am running into the problem that
>>> Rahul has, I am unable to to connect to the ResourceManager.
>>>
>>> The setup of my targeted cluster runs MR1 instead of YARN, hence the "
>>> mapreduce.framework.name" is set to "classic".
>>>
>>> Here are my settings in my mapred-site.xml on the client side.
>>>
>>> <property>
>>>     <!-- Pointed to the remote JobTracker -->
>>>         <name>mapreduce.job.tracker.address</name>
>>>         <value>172.31.3.150:8021</value>
>>>     </property>
>>>     <property>
>>>         <name>mapreduce.framework.name</name>
>>>         <value>yarn</value>
>>>     </property>
>>>
>>> and my yarn-site.xml
>>>
>>>        <property>
>>>             <description>The hostname of the RM.</description>
>>>             <name>yarn.resourcemanager.hostname</name>
>>>             <value>172.31.3.150</value>
>>>         </property>
>>>
>>>         <property>
>>>             <description>The address of the applications manager
>>> interface in the RM.</description>
>>>             <name>yarn.resourcemanager.address</name>
>>>             <value>${yarn.resourcemanager.hostname}:8032</value>
>>>         </property>
>>>
>>> 14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 172.31.3.150:8032
>>> 14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
>>> hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
>>> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>>> sleepTime=1 SECONDS)
>>>
>>> Therefore, the question is how do I figure out where the ResourceManager
>>> is running?
>>>
>>> TIA
>>>
>>> Kim
>>>
>>>
>>>
>>>  On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
>>> ashettia@hortonworks.com> wrote:
>>>
>>>>  Hi Kim,
>>>>
>>>> It looks like it is pointing to hdfs location. Can you create the hdfs
>>>> dir and put the jar there? Hope this helps
>>>> Thanks,
>>>> Rahman
>>>>
>>>> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
>>>> wrote:
>>>>
>>>> any help...all are welcome?
>>>>
>>>>
>>>> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <
>>>> smart.rahul.iiit@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>  I am running with the following command but still, jar is not
>>>>> available to mapper and reducers.
>>>>>
>>>>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>>>>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>>>>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>>>>> -Dmapreduce.user.classpath.first=true
>>>>>
>>>>>
>>>>> Error Log
>>>>>
>>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager
>>>>> at /0.0.0.0:8032
>>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager
>>>>> at /0.0.0.0:8032
>>>>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>>> option parsing not performed. Implement the Tool interface and execute your
>>>>> application with ToolRunner to remedy this.
>>>>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>>>>> process : 1
>>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for
>>>>> job: job_1397534064728_0028
>>>>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>>>>> application_1397534064728_0028
>>>>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>>>>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>>>>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job:
>>>>> job_1397534064728_0028
>>>>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028
>>>>> running in uber mode : false
>>>>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>>>>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>>>>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>>>>> Error: java.lang.RuntimeException: Error in configuring object
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>>>>     at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>>>     ... 9 more
>>>>> Caused by: java.lang.NoClassDefFoundError:
>>>>> org/json/simple/parser/ParseException
>>>>>     at java.lang.Class.forName0(Native Method)
>>>>>     at java.lang.Class.forName(Class.java:270)
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>>>>     at
>>>>> org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>>>>     ... 14 more
>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>> org.json.simple.parser.ParseException
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>>>>     ... 22 more
>>>>>
>>>>> When i analyzed the logs it says
>>>>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>>> option parsing not performed. Implement the Tool interface and execute your
>>>>> application with ToolRunner to remedy this."
>>>>>
>>>>> But i have implemented the tool class as described below:
>>>>>
>>>>> package my.search;
>>>>>
>>>>> import org.apache.hadoop.conf.Configured;
>>>>> import org.apache.hadoop.fs.Path;
>>>>> import org.apache.hadoop.io.Text;
>>>>> import org.apache.hadoop.mapred.FileInputFormat;
>>>>> import org.apache.hadoop.mapred.FileOutputFormat;
>>>>> import org.apache.hadoop.mapred.JobClient;
>>>>> import org.apache.hadoop.mapred.JobConf;
>>>>> import org.apache.hadoop.mapred.TextInputFormat;
>>>>> import org.apache.hadoop.mapred.TextOutputFormat;
>>>>> import org.apache.hadoop.util.Tool;
>>>>> import org.apache.hadoop.util.ToolRunner;
>>>>>
>>>>> public class Minerva extends Configured implements Tool
>>>>> {
>>>>>     public int run(String[] args) throws Exception {
>>>>>         JobConf conf = new JobConf(Minerva.class);
>>>>>         conf.setJobName("minerva sample job");
>>>>>
>>>>>         conf.setMapOutputKeyClass(Text.class);
>>>>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>>>>
>>>>>         conf.setOutputKeyClass(Text.class);
>>>>>         conf.setOutputValueClass(Text.class);
>>>>>
>>>>>         conf.setMapperClass(Map.class);
>>>>>         // conf.setCombinerClass(Reduce.class);
>>>>>         conf.setReducerClass(Reduce.class);
>>>>>
>>>>>         conf.setInputFormat(TextInputFormat.class);
>>>>>         conf.setOutputFormat(TextOutputFormat.class);
>>>>>
>>>>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>>>
>>>>>         JobClient.runJob(conf);
>>>>>
>>>>>         return 0;
>>>>>     }
>>>>>
>>>>>     public static void main(String[] args) throws Exception {
>>>>>         int res = ToolRunner.run(new Minerva(), args);
>>>>>         System.exit(res);
>>>>>     }
>>>>> }
>>>>>
>>>>>
>>>>> Please let me know if you see any issues?
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com>wrote:
>>>>>
>>>>>> add '-Dmapreduce.user.classpath.first=true' to your command and try
>>>>>> again
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>>>>
>>>>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>>>>> not search the jars located in the the local file system but HDFS. For
>>>>>>> example,
>>>>>>>
>>>>>>> hadoop jar target/myJar.jar Foo -libjars
>>>>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>>>>
>>>>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>>>>> processName=JobTracker, sessionId=
>>>>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the
>>>>>>> staging area
>>>>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>>> java.io.FileNotFoundException: File does not exist:
>>>>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>>>>     at
>>>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>>>>
>>>>>>> So under Hadoop 2.2.1, do I have to explicitly set some
>>>>>>> configurations so when using the "libjars" option it will copy the file to
>>>>>>> hdfs from local fs?
>>>>>>>
>>>>>>> TIA
>>>>>>>
>>>>>>> Kim
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards
>>>>>> Shengjun
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
>

Re: using "-libjars" in Hadoop 2.2.1

Posted by Rahul Singh <sm...@gmail.com>.

Please could anyone respond to my query above:


Why i am getting this warning?

14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.

Because of this my libjar is not getting picked up and i am getting class
def not found error.

Thanks and Regards,
Rahul Singh


On Thu, Apr 17, 2014 at 2:08 AM, Kim Chew <kc...@gmail.com> wrote:

> Thanks Rahman. This problem can be boiled down to how to submit a job
> compiled with Hadoop-1.1.1 remotely to a Hadoop 2 cluster that has not
> turned on YARN. I will open another thread for it.
>
> Kim
>
>
> On Wed, Apr 16, 2014 at 1:30 PM, Abdelrahman Shettia <
> ashettia@hortonworks.com> wrote:
>
>> Hi Kim,
>>
>> You can try to grep on the RM java process by running the following
>> command:
>>
>> ps aux | grep
>>
>>
>>
>>
>> On Wed, Apr 16, 2014 at 10:31 AM, Kim Chew <kc...@gmail.com> wrote:
>>
>>> Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml
>>> so it tried to run the job locally. Now I am running into the problem that
>>> Rahul has, I am unable to to connect to the ResourceManager.
>>>
>>> The setup of my targeted cluster runs MR1 instead of YARN, hence the "
>>> mapreduce.framework.name" is set to "classic".
>>>
>>> Here are my settings in my mapred-site.xml on the client side.
>>>
>>> <property>
>>>     <!-- Pointed to the remote JobTracker -->
>>>         <name>mapreduce.job.tracker.address</name>
>>>         <value>172.31.3.150:8021</value>
>>>     </property>
>>>     <property>
>>>         <name>mapreduce.framework.name</name>
>>>         <value>yarn</value>
>>>     </property>
>>>
>>> and my yarn-site.xml
>>>
>>>        <property>
>>>             <description>The hostname of the RM.</description>
>>>             <name>yarn.resourcemanager.hostname</name>
>>>             <value>172.31.3.150</value>
>>>         </property>
>>>
>>>         <property>
>>>             <description>The address of the applications manager
>>> interface in the RM.</description>
>>>             <name>yarn.resourcemanager.address</name>
>>>             <value>${yarn.resourcemanager.hostname}:8032</value>
>>>         </property>
>>>
>>> 14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 172.31.3.150:8032
>>> 14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
>>> hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
>>> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>>> sleepTime=1 SECONDS)
>>>
>>> Therefore, the question is how do I figure out where the ResourceManager
>>> is running?
>>>
>>> TIA
>>>
>>> Kim
>>>
>>>
>>>
>>>  On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
>>> ashettia@hortonworks.com> wrote:
>>>
>>>>  Hi Kim,
>>>>
>>>> It looks like it is pointing to hdfs location. Can you create the hdfs
>>>> dir and put the jar there? Hope this helps
>>>> Thanks,
>>>> Rahman
>>>>
>>>> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
>>>> wrote:
>>>>
>>>> any help...all are welcome?
>>>>
>>>>
>>>> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <
>>>> smart.rahul.iiit@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>  I am running with the following command but still, jar is not
>>>>> available to mapper and reducers.
>>>>>
>>>>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>>>>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>>>>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>>>>> -Dmapreduce.user.classpath.first=true
>>>>>
>>>>>
>>>>> Error Log
>>>>>
>>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager
>>>>> at /0.0.0.0:8032
>>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager
>>>>> at /0.0.0.0:8032
>>>>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>>> option parsing not performed. Implement the Tool interface and execute your
>>>>> application with ToolRunner to remedy this.
>>>>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>>>>> process : 1
>>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for
>>>>> job: job_1397534064728_0028
>>>>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>>>>> application_1397534064728_0028
>>>>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>>>>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>>>>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job:
>>>>> job_1397534064728_0028
>>>>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028
>>>>> running in uber mode : false
>>>>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>>>>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>>>>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>>>>> Error: java.lang.RuntimeException: Error in configuring object
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>>>>     at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>>>     ... 9 more
>>>>> Caused by: java.lang.NoClassDefFoundError:
>>>>> org/json/simple/parser/ParseException
>>>>>     at java.lang.Class.forName0(Native Method)
>>>>>     at java.lang.Class.forName(Class.java:270)
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>>>>     at
>>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>>>>     at
>>>>> org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>>>>     ... 14 more
>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>> org.json.simple.parser.ParseException
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>>>>     ... 22 more
>>>>>
>>>>> When i analyzed the logs it says
>>>>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>>> option parsing not performed. Implement the Tool interface and execute your
>>>>> application with ToolRunner to remedy this."
>>>>>
>>>>> But i have implemented the tool class as described below:
>>>>>
>>>>> package my.search;
>>>>>
>>>>> import org.apache.hadoop.conf.Configured;
>>>>> import org.apache.hadoop.fs.Path;
>>>>> import org.apache.hadoop.io.Text;
>>>>> import org.apache.hadoop.mapred.FileInputFormat;
>>>>> import org.apache.hadoop.mapred.FileOutputFormat;
>>>>> import org.apache.hadoop.mapred.JobClient;
>>>>> import org.apache.hadoop.mapred.JobConf;
>>>>> import org.apache.hadoop.mapred.TextInputFormat;
>>>>> import org.apache.hadoop.mapred.TextOutputFormat;
>>>>> import org.apache.hadoop.util.Tool;
>>>>> import org.apache.hadoop.util.ToolRunner;
>>>>>
>>>>> public class Minerva extends Configured implements Tool
>>>>> {
>>>>>     public int run(String[] args) throws Exception {
>>>>>         JobConf conf = new JobConf(Minerva.class);
>>>>>         conf.setJobName("minerva sample job");
>>>>>
>>>>>         conf.setMapOutputKeyClass(Text.class);
>>>>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>>>>
>>>>>         conf.setOutputKeyClass(Text.class);
>>>>>         conf.setOutputValueClass(Text.class);
>>>>>
>>>>>         conf.setMapperClass(Map.class);
>>>>>         // conf.setCombinerClass(Reduce.class);
>>>>>         conf.setReducerClass(Reduce.class);
>>>>>
>>>>>         conf.setInputFormat(TextInputFormat.class);
>>>>>         conf.setOutputFormat(TextOutputFormat.class);
>>>>>
>>>>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>>>
>>>>>         JobClient.runJob(conf);
>>>>>
>>>>>         return 0;
>>>>>     }
>>>>>
>>>>>     public static void main(String[] args) throws Exception {
>>>>>         int res = ToolRunner.run(new Minerva(), args);
>>>>>         System.exit(res);
>>>>>     }
>>>>> }
>>>>>
>>>>>
>>>>> Please let me know if you see any issues?
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com>wrote:
>>>>>
>>>>>> add '-Dmapreduce.user.classpath.first=true' to your command and try
>>>>>> again
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>>>>
>>>>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>>>>> not search the jars located in the the local file system but HDFS. For
>>>>>>> example,
>>>>>>>
>>>>>>> hadoop jar target/myJar.jar Foo -libjars
>>>>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>>>>
>>>>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>>>>> processName=JobTracker, sessionId=
>>>>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the
>>>>>>> staging area
>>>>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>>> java.io.FileNotFoundException: File does not exist:
>>>>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>>>>     at
>>>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>>>>     at
>>>>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>>>>
>>>>>>> So under Hadoop 2.2.1, do I have to explicitly set some
>>>>>>> configurations so when using the "libjars" option it will copy the file to
>>>>>>> hdfs from local fs?
>>>>>>>
>>>>>>> TIA
>>>>>>>
>>>>>>> Kim
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards
>>>>>> Shengjun
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
>

Re: using "-libjars" in Hadoop 2.2.1

Posted by Kim Chew <kc...@gmail.com>.

Thanks Rahman. This problem can be boiled down to how to submit a job
compiled with Hadoop-1.1.1 remotely to a Hadoop 2 cluster that has not
turned on YARN. I will open another thread for it.

Kim


On Wed, Apr 16, 2014 at 1:30 PM, Abdelrahman Shettia <
ashettia@hortonworks.com> wrote:

> Hi Kim,
>
> You can try to grep on the RM java process by running the following
> command:
>
> ps aux | grep
>
>
>
>
> On Wed, Apr 16, 2014 at 10:31 AM, Kim Chew <kc...@gmail.com> wrote:
>
>> Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml
>> so it tried to run the job locally. Now I am running into the problem that
>> Rahul has, I am unable to to connect to the ResourceManager.
>>
>> The setup of my targeted cluster runs MR1 instead of YARN, hence the "
>> mapreduce.framework.name" is set to "classic".
>>
>> Here are my settings in my mapred-site.xml on the client side.
>>
>> <property>
>>     <!-- Pointed to the remote JobTracker -->
>>         <name>mapreduce.job.tracker.address</name>
>>         <value>172.31.3.150:8021</value>
>>     </property>
>>     <property>
>>         <name>mapreduce.framework.name</name>
>>         <value>yarn</value>
>>     </property>
>>
>> and my yarn-site.xml
>>
>>        <property>
>>             <description>The hostname of the RM.</description>
>>             <name>yarn.resourcemanager.hostname</name>
>>             <value>172.31.3.150</value>
>>         </property>
>>
>>         <property>
>>             <description>The address of the applications manager
>> interface in the RM.</description>
>>             <name>yarn.resourcemanager.address</name>
>>             <value>${yarn.resourcemanager.hostname}:8032</value>
>>         </property>
>>
>> 14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
>> 172.31.3.150:8032
>> 14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
>> hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
>> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>> sleepTime=1 SECONDS)
>>
>> Therefore, the question is how do I figure out where the ResourceManager
>> is running?
>>
>> TIA
>>
>> Kim
>>
>>
>>
>>  On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
>> ashettia@hortonworks.com> wrote:
>>
>>>  Hi Kim,
>>>
>>> It looks like it is pointing to hdfs location. Can you create the hdfs
>>> dir and put the jar there? Hope this helps
>>> Thanks,
>>> Rahman
>>>
>>> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
>>> wrote:
>>>
>>> any help...all are welcome?
>>>
>>>
>>> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <smart.rahul.iiit@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>  I am running with the following command but still, jar is not
>>>> available to mapper and reducers.
>>>>
>>>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>>>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>>>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>>>> -Dmapreduce.user.classpath.first=true
>>>>
>>>>
>>>> Error Log
>>>>
>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at
>>>> /0.0.0.0:8032
>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at
>>>> /0.0.0.0:8032
>>>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>> option parsing not performed. Implement the Tool interface and execute your
>>>> application with ToolRunner to remedy this.
>>>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>>>> process : 1
>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for
>>>> job: job_1397534064728_0028
>>>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>>>> application_1397534064728_0028
>>>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>>>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>>>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job:
>>>> job_1397534064728_0028
>>>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028
>>>> running in uber mode : false
>>>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>>>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>>>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>>>> Error: java.lang.RuntimeException: Error in configuring object
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>>>     at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>>     ... 9 more
>>>> Caused by: java.lang.NoClassDefFoundError:
>>>> org/json/simple/parser/ParseException
>>>>     at java.lang.Class.forName0(Native Method)
>>>>     at java.lang.Class.forName(Class.java:270)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>>>     at
>>>> org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>>>     ... 14 more
>>>> Caused by: java.lang.ClassNotFoundException:
>>>> org.json.simple.parser.ParseException
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>>>     ... 22 more
>>>>
>>>> When i analyzed the logs it says
>>>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>> option parsing not performed. Implement the Tool interface and execute your
>>>> application with ToolRunner to remedy this."
>>>>
>>>> But i have implemented the tool class as described below:
>>>>
>>>> package my.search;
>>>>
>>>> import org.apache.hadoop.conf.Configured;
>>>> import org.apache.hadoop.fs.Path;
>>>> import org.apache.hadoop.io.Text;
>>>> import org.apache.hadoop.mapred.FileInputFormat;
>>>> import org.apache.hadoop.mapred.FileOutputFormat;
>>>> import org.apache.hadoop.mapred.JobClient;
>>>> import org.apache.hadoop.mapred.JobConf;
>>>> import org.apache.hadoop.mapred.TextInputFormat;
>>>> import org.apache.hadoop.mapred.TextOutputFormat;
>>>> import org.apache.hadoop.util.Tool;
>>>> import org.apache.hadoop.util.ToolRunner;
>>>>
>>>> public class Minerva extends Configured implements Tool
>>>> {
>>>>     public int run(String[] args) throws Exception {
>>>>         JobConf conf = new JobConf(Minerva.class);
>>>>         conf.setJobName("minerva sample job");
>>>>
>>>>         conf.setMapOutputKeyClass(Text.class);
>>>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>>>
>>>>         conf.setOutputKeyClass(Text.class);
>>>>         conf.setOutputValueClass(Text.class);
>>>>
>>>>         conf.setMapperClass(Map.class);
>>>>         // conf.setCombinerClass(Reduce.class);
>>>>         conf.setReducerClass(Reduce.class);
>>>>
>>>>         conf.setInputFormat(TextInputFormat.class);
>>>>         conf.setOutputFormat(TextOutputFormat.class);
>>>>
>>>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>>
>>>>         JobClient.runJob(conf);
>>>>
>>>>         return 0;
>>>>     }
>>>>
>>>>     public static void main(String[] args) throws Exception {
>>>>         int res = ToolRunner.run(new Minerva(), args);
>>>>         System.exit(res);
>>>>     }
>>>> }
>>>>
>>>>
>>>> Please let me know if you see any issues?
>>>>
>>>>
>>>>
>>>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com>wrote:
>>>>
>>>>> add '-Dmapreduce.user.classpath.first=true' to your command and try
>>>>> again
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>>>
>>>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>>>> not search the jars located in the the local file system but HDFS. For
>>>>>> example,
>>>>>>
>>>>>> hadoop jar target/myJar.jar Foo -libjars
>>>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>>>
>>>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>>>> processName=JobTracker, sessionId=
>>>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the
>>>>>> staging area
>>>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>> java.io.FileNotFoundException: File does not exist:
>>>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>>>     at
>>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>>>
>>>>>> So under Hadoop 2.2.1, do I have to explicitly set some
>>>>>> configurations so when using the "libjars" option it will copy the file to
>>>>>> hdfs from local fs?
>>>>>>
>>>>>> TIA
>>>>>>
>>>>>> Kim
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>> Shengjun
>>>>>
>>>>
>>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: using "-libjars" in Hadoop 2.2.1

Posted by Kim Chew <kc...@gmail.com>.

Thanks Rahman. This problem can be boiled down to how to submit a job
compiled with Hadoop-1.1.1 remotely to a Hadoop 2 cluster that has not
turned on YARN. I will open another thread for it.

Kim


On Wed, Apr 16, 2014 at 1:30 PM, Abdelrahman Shettia <
ashettia@hortonworks.com> wrote:

> Hi Kim,
>
> You can try to grep on the RM java process by running the following
> command:
>
> ps aux | grep
>
>
>
>
> On Wed, Apr 16, 2014 at 10:31 AM, Kim Chew <kc...@gmail.com> wrote:
>
>> Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml
>> so it tried to run the job locally. Now I am running into the problem that
>> Rahul has, I am unable to to connect to the ResourceManager.
>>
>> The setup of my targeted cluster runs MR1 instead of YARN, hence the "
>> mapreduce.framework.name" is set to "classic".
>>
>> Here are my settings in my mapred-site.xml on the client side.
>>
>> <property>
>>     <!-- Pointed to the remote JobTracker -->
>>         <name>mapreduce.job.tracker.address</name>
>>         <value>172.31.3.150:8021</value>
>>     </property>
>>     <property>
>>         <name>mapreduce.framework.name</name>
>>         <value>yarn</value>
>>     </property>
>>
>> and my yarn-site.xml
>>
>>        <property>
>>             <description>The hostname of the RM.</description>
>>             <name>yarn.resourcemanager.hostname</name>
>>             <value>172.31.3.150</value>
>>         </property>
>>
>>         <property>
>>             <description>The address of the applications manager
>> interface in the RM.</description>
>>             <name>yarn.resourcemanager.address</name>
>>             <value>${yarn.resourcemanager.hostname}:8032</value>
>>         </property>
>>
>> 14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
>> 172.31.3.150:8032
>> 14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
>> hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
>> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>> sleepTime=1 SECONDS)
>>
>> Therefore, the question is how do I figure out where the ResourceManager
>> is running?
>>
>> TIA
>>
>> Kim
>>
>>
>>
>>  On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
>> ashettia@hortonworks.com> wrote:
>>
>>>  Hi Kim,
>>>
>>> It looks like it is pointing to hdfs location. Can you create the hdfs
>>> dir and put the jar there? Hope this helps
>>> Thanks,
>>> Rahman
>>>
>>> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
>>> wrote:
>>>
>>> any help...all are welcome?
>>>
>>>
>>> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <smart.rahul.iiit@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>  I am running with the following command but still, jar is not
>>>> available to mapper and reducers.
>>>>
>>>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>>>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>>>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>>>> -Dmapreduce.user.classpath.first=true
>>>>
>>>>
>>>> Error Log
>>>>
>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at
>>>> /0.0.0.0:8032
>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at
>>>> /0.0.0.0:8032
>>>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>> option parsing not performed. Implement the Tool interface and execute your
>>>> application with ToolRunner to remedy this.
>>>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>>>> process : 1
>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for
>>>> job: job_1397534064728_0028
>>>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>>>> application_1397534064728_0028
>>>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>>>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>>>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job:
>>>> job_1397534064728_0028
>>>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028
>>>> running in uber mode : false
>>>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>>>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>>>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>>>> Error: java.lang.RuntimeException: Error in configuring object
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>>>     at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>>     ... 9 more
>>>> Caused by: java.lang.NoClassDefFoundError:
>>>> org/json/simple/parser/ParseException
>>>>     at java.lang.Class.forName0(Native Method)
>>>>     at java.lang.Class.forName(Class.java:270)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>>>     at
>>>> org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>>>     ... 14 more
>>>> Caused by: java.lang.ClassNotFoundException:
>>>> org.json.simple.parser.ParseException
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>>>     ... 22 more
>>>>
>>>> When i analyzed the logs it says
>>>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>> option parsing not performed. Implement the Tool interface and execute your
>>>> application with ToolRunner to remedy this."
>>>>
>>>> But i have implemented the tool class as described below:
>>>>
>>>> package my.search;
>>>>
>>>> import org.apache.hadoop.conf.Configured;
>>>> import org.apache.hadoop.fs.Path;
>>>> import org.apache.hadoop.io.Text;
>>>> import org.apache.hadoop.mapred.FileInputFormat;
>>>> import org.apache.hadoop.mapred.FileOutputFormat;
>>>> import org.apache.hadoop.mapred.JobClient;
>>>> import org.apache.hadoop.mapred.JobConf;
>>>> import org.apache.hadoop.mapred.TextInputFormat;
>>>> import org.apache.hadoop.mapred.TextOutputFormat;
>>>> import org.apache.hadoop.util.Tool;
>>>> import org.apache.hadoop.util.ToolRunner;
>>>>
>>>> public class Minerva extends Configured implements Tool
>>>> {
>>>>     public int run(String[] args) throws Exception {
>>>>         JobConf conf = new JobConf(Minerva.class);
>>>>         conf.setJobName("minerva sample job");
>>>>
>>>>         conf.setMapOutputKeyClass(Text.class);
>>>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>>>
>>>>         conf.setOutputKeyClass(Text.class);
>>>>         conf.setOutputValueClass(Text.class);
>>>>
>>>>         conf.setMapperClass(Map.class);
>>>>         // conf.setCombinerClass(Reduce.class);
>>>>         conf.setReducerClass(Reduce.class);
>>>>
>>>>         conf.setInputFormat(TextInputFormat.class);
>>>>         conf.setOutputFormat(TextOutputFormat.class);
>>>>
>>>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>>
>>>>         JobClient.runJob(conf);
>>>>
>>>>         return 0;
>>>>     }
>>>>
>>>>     public static void main(String[] args) throws Exception {
>>>>         int res = ToolRunner.run(new Minerva(), args);
>>>>         System.exit(res);
>>>>     }
>>>> }
>>>>
>>>>
>>>> Please let me know if you see any issues?
>>>>
>>>>
>>>>
>>>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com>wrote:
>>>>
>>>>> add '-Dmapreduce.user.classpath.first=true' to your command and try
>>>>> again
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>>>
>>>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>>>> not search the jars located in the the local file system but HDFS. For
>>>>>> example,
>>>>>>
>>>>>> hadoop jar target/myJar.jar Foo -libjars
>>>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>>>
>>>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>>>> processName=JobTracker, sessionId=
>>>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the
>>>>>> staging area
>>>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>> java.io.FileNotFoundException: File does not exist:
>>>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>>>     at
>>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>>>
>>>>>> So under Hadoop 2.2.1, do I have to explicitly set some
>>>>>> configurations so when using the "libjars" option it will copy the file to
>>>>>> hdfs from local fs?
>>>>>>
>>>>>> TIA
>>>>>>
>>>>>> Kim
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>> Shengjun
>>>>>
>>>>
>>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: using "-libjars" in Hadoop 2.2.1

Posted by Abdelrahman Shettia <as...@hortonworks.com>.

Hi Kim,

Correction, the command is :

ps aux | grep -i resource


Also, I notice that you are using some configurations of Jobtracker, which
is not going to be used in for Hadoop 2.x. Here is a sample for all of the
RM configurations from my sandbox one node machine:


mapred-site.xml-    <property>

mapred-site.xml-    <name>mapreduce.jobhistory.webapp.address</name>

mapred-site.xml:    <value>sandbox.com:19888</value>

mapred-site.xml-  </property>

mapred-site.xml-    <property>

--

mapred-site.xml-    <property>

mapred-site.xml-    <name>mapreduce.jobhistory.address</name>

mapred-site.xml:    <value>sandbox.com:10020</value>

mapred-site.xml-  </property>

mapred-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.resource-tracker.address</name>

yarn-site.xml:    <value>sandbox.com:8025</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.admin.address</name>

yarn-site.xml:    <value>sandbox.com:8141</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.hostname</name>

yarn-site.xml:    <value>sandbox.com</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.scheduler.address</name>

yarn-site.xml:    <value>sandbox.com:8030</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.log.server.url</name>

yarn-site.xml:    <value>http://sandbox.com:19888/jobhistory/logs</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.webapp.address</name>

yarn-site.xml:    <value>sandbox.com:8088</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.address</name>

yarn-site.xml:    <value>sandbox.com:8050</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>


Thanks

-Rahman





On Wed, Apr 16, 2014 at 1:30 PM, Abdelrahman Shettia <
ashettia@hortonworks.com> wrote:

> Hi Kim,
>
> You can try to grep on the RM java process by running the following
> command:
>
> ps aux | grep
>
>
>
>
> On Wed, Apr 16, 2014 at 10:31 AM, Kim Chew <kc...@gmail.com> wrote:
>
>> Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml
>> so it tried to run the job locally. Now I am running into the problem that
>> Rahul has, I am unable to to connect to the ResourceManager.
>>
>> The setup of my targeted cluster runs MR1 instead of YARN, hence the "
>> mapreduce.framework.name" is set to "classic".
>>
>> Here are my settings in my mapred-site.xml on the client side.
>>
>> <property>
>>     <!-- Pointed to the remote JobTracker -->
>>         <name>mapreduce.job.tracker.address</name>
>>         <value>172.31.3.150:8021</value>
>>     </property>
>>     <property>
>>         <name>mapreduce.framework.name</name>
>>         <value>yarn</value>
>>     </property>
>>
>> and my yarn-site.xml
>>
>>        <property>
>>             <description>The hostname of the RM.</description>
>>             <name>yarn.resourcemanager.hostname</name>
>>             <value>172.31.3.150</value>
>>         </property>
>>
>>         <property>
>>             <description>The address of the applications manager
>> interface in the RM.</description>
>>             <name>yarn.resourcemanager.address</name>
>>             <value>${yarn.resourcemanager.hostname}:8032</value>
>>         </property>
>>
>> 14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
>> 172.31.3.150:8032
>> 14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
>> hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
>> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>> sleepTime=1 SECONDS)
>>
>> Therefore, the question is how do I figure out where the ResourceManager
>> is running?
>>
>> TIA
>>
>> Kim
>>
>>
>>
>>  On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
>> ashettia@hortonworks.com> wrote:
>>
>>>  Hi Kim,
>>>
>>> It looks like it is pointing to hdfs location. Can you create the hdfs
>>> dir and put the jar there? Hope this helps
>>> Thanks,
>>> Rahman
>>>
>>> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
>>> wrote:
>>>
>>> any help...all are welcome?
>>>
>>>
>>> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <smart.rahul.iiit@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>  I am running with the following command but still, jar is not
>>>> available to mapper and reducers.
>>>>
>>>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>>>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>>>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>>>> -Dmapreduce.user.classpath.first=true
>>>>
>>>>
>>>> Error Log
>>>>
>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at
>>>> /0.0.0.0:8032
>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at
>>>> /0.0.0.0:8032
>>>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>> option parsing not performed. Implement the Tool interface and execute your
>>>> application with ToolRunner to remedy this.
>>>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>>>> process : 1
>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for
>>>> job: job_1397534064728_0028
>>>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>>>> application_1397534064728_0028
>>>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>>>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>>>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job:
>>>> job_1397534064728_0028
>>>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028
>>>> running in uber mode : false
>>>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>>>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>>>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>>>> Error: java.lang.RuntimeException: Error in configuring object
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>>>     at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>>     ... 9 more
>>>> Caused by: java.lang.NoClassDefFoundError:
>>>> org/json/simple/parser/ParseException
>>>>     at java.lang.Class.forName0(Native Method)
>>>>     at java.lang.Class.forName(Class.java:270)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>>>     at
>>>> org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>>>     ... 14 more
>>>> Caused by: java.lang.ClassNotFoundException:
>>>> org.json.simple.parser.ParseException
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>>>     ... 22 more
>>>>
>>>> When i analyzed the logs it says
>>>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>> option parsing not performed. Implement the Tool interface and execute your
>>>> application with ToolRunner to remedy this."
>>>>
>>>> But i have implemented the tool class as described below:
>>>>
>>>> package my.search;
>>>>
>>>> import org.apache.hadoop.conf.Configured;
>>>> import org.apache.hadoop.fs.Path;
>>>> import org.apache.hadoop.io.Text;
>>>> import org.apache.hadoop.mapred.FileInputFormat;
>>>> import org.apache.hadoop.mapred.FileOutputFormat;
>>>> import org.apache.hadoop.mapred.JobClient;
>>>> import org.apache.hadoop.mapred.JobConf;
>>>> import org.apache.hadoop.mapred.TextInputFormat;
>>>> import org.apache.hadoop.mapred.TextOutputFormat;
>>>> import org.apache.hadoop.util.Tool;
>>>> import org.apache.hadoop.util.ToolRunner;
>>>>
>>>> public class Minerva extends Configured implements Tool
>>>> {
>>>>     public int run(String[] args) throws Exception {
>>>>         JobConf conf = new JobConf(Minerva.class);
>>>>         conf.setJobName("minerva sample job");
>>>>
>>>>         conf.setMapOutputKeyClass(Text.class);
>>>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>>>
>>>>         conf.setOutputKeyClass(Text.class);
>>>>         conf.setOutputValueClass(Text.class);
>>>>
>>>>         conf.setMapperClass(Map.class);
>>>>         // conf.setCombinerClass(Reduce.class);
>>>>         conf.setReducerClass(Reduce.class);
>>>>
>>>>         conf.setInputFormat(TextInputFormat.class);
>>>>         conf.setOutputFormat(TextOutputFormat.class);
>>>>
>>>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>>
>>>>         JobClient.runJob(conf);
>>>>
>>>>         return 0;
>>>>     }
>>>>
>>>>     public static void main(String[] args) throws Exception {
>>>>         int res = ToolRunner.run(new Minerva(), args);
>>>>         System.exit(res);
>>>>     }
>>>> }
>>>>
>>>>
>>>> Please let me know if you see any issues?
>>>>
>>>>
>>>>
>>>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com>wrote:
>>>>
>>>>> add '-Dmapreduce.user.classpath.first=true' to your command and try
>>>>> again
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>>>
>>>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>>>> not search the jars located in the the local file system but HDFS. For
>>>>>> example,
>>>>>>
>>>>>> hadoop jar target/myJar.jar Foo -libjars
>>>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>>>
>>>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>>>> processName=JobTracker, sessionId=
>>>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the
>>>>>> staging area
>>>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>> java.io.FileNotFoundException: File does not exist:
>>>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>>>     at
>>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>>>
>>>>>> So under Hadoop 2.2.1, do I have to explicitly set some
>>>>>> configurations so when using the "libjars" option it will copy the file to
>>>>>> hdfs from local fs?
>>>>>>
>>>>>> TIA
>>>>>>
>>>>>> Kim
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>> Shengjun
>>>>>
>>>>
>>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: using "-libjars" in Hadoop 2.2.1

Posted by Kim Chew <kc...@gmail.com>.

Thanks Rahman. This problem can be boiled down to how to submit a job
compiled with Hadoop-1.1.1 remotely to a Hadoop 2 cluster that has not
turned on YARN. I will open another thread for it.

Kim


On Wed, Apr 16, 2014 at 1:30 PM, Abdelrahman Shettia <
ashettia@hortonworks.com> wrote:

> Hi Kim,
>
> You can try to grep on the RM java process by running the following
> command:
>
> ps aux | grep
>
>
>
>
> On Wed, Apr 16, 2014 at 10:31 AM, Kim Chew <kc...@gmail.com> wrote:
>
>> Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml
>> so it tried to run the job locally. Now I am running into the problem that
>> Rahul has, I am unable to to connect to the ResourceManager.
>>
>> The setup of my targeted cluster runs MR1 instead of YARN, hence the "
>> mapreduce.framework.name" is set to "classic".
>>
>> Here are my settings in my mapred-site.xml on the client side.
>>
>> <property>
>>     <!-- Pointed to the remote JobTracker -->
>>         <name>mapreduce.job.tracker.address</name>
>>         <value>172.31.3.150:8021</value>
>>     </property>
>>     <property>
>>         <name>mapreduce.framework.name</name>
>>         <value>yarn</value>
>>     </property>
>>
>> and my yarn-site.xml
>>
>>        <property>
>>             <description>The hostname of the RM.</description>
>>             <name>yarn.resourcemanager.hostname</name>
>>             <value>172.31.3.150</value>
>>         </property>
>>
>>         <property>
>>             <description>The address of the applications manager
>> interface in the RM.</description>
>>             <name>yarn.resourcemanager.address</name>
>>             <value>${yarn.resourcemanager.hostname}:8032</value>
>>         </property>
>>
>> 14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
>> 172.31.3.150:8032
>> 14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
>> hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
>> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>> sleepTime=1 SECONDS)
>>
>> Therefore, the question is how do I figure out where the ResourceManager
>> is running?
>>
>> TIA
>>
>> Kim
>>
>>
>>
>>  On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
>> ashettia@hortonworks.com> wrote:
>>
>>>  Hi Kim,
>>>
>>> It looks like it is pointing to hdfs location. Can you create the hdfs
>>> dir and put the jar there? Hope this helps
>>> Thanks,
>>> Rahman
>>>
>>> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
>>> wrote:
>>>
>>> any help...all are welcome?
>>>
>>>
>>> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <smart.rahul.iiit@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>  I am running with the following command but still, jar is not
>>>> available to mapper and reducers.
>>>>
>>>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>>>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>>>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>>>> -Dmapreduce.user.classpath.first=true
>>>>
>>>>
>>>> Error Log
>>>>
>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at
>>>> /0.0.0.0:8032
>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at
>>>> /0.0.0.0:8032
>>>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>> option parsing not performed. Implement the Tool interface and execute your
>>>> application with ToolRunner to remedy this.
>>>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>>>> process : 1
>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for
>>>> job: job_1397534064728_0028
>>>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>>>> application_1397534064728_0028
>>>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>>>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>>>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job:
>>>> job_1397534064728_0028
>>>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028
>>>> running in uber mode : false
>>>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>>>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>>>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>>>> Error: java.lang.RuntimeException: Error in configuring object
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>>>     at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>>     ... 9 more
>>>> Caused by: java.lang.NoClassDefFoundError:
>>>> org/json/simple/parser/ParseException
>>>>     at java.lang.Class.forName0(Native Method)
>>>>     at java.lang.Class.forName(Class.java:270)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>>>     at
>>>> org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>>>     ... 14 more
>>>> Caused by: java.lang.ClassNotFoundException:
>>>> org.json.simple.parser.ParseException
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>>>     ... 22 more
>>>>
>>>> When i analyzed the logs it says
>>>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>> option parsing not performed. Implement the Tool interface and execute your
>>>> application with ToolRunner to remedy this."
>>>>
>>>> But i have implemented the tool class as described below:
>>>>
>>>> package my.search;
>>>>
>>>> import org.apache.hadoop.conf.Configured;
>>>> import org.apache.hadoop.fs.Path;
>>>> import org.apache.hadoop.io.Text;
>>>> import org.apache.hadoop.mapred.FileInputFormat;
>>>> import org.apache.hadoop.mapred.FileOutputFormat;
>>>> import org.apache.hadoop.mapred.JobClient;
>>>> import org.apache.hadoop.mapred.JobConf;
>>>> import org.apache.hadoop.mapred.TextInputFormat;
>>>> import org.apache.hadoop.mapred.TextOutputFormat;
>>>> import org.apache.hadoop.util.Tool;
>>>> import org.apache.hadoop.util.ToolRunner;
>>>>
>>>> public class Minerva extends Configured implements Tool
>>>> {
>>>>     public int run(String[] args) throws Exception {
>>>>         JobConf conf = new JobConf(Minerva.class);
>>>>         conf.setJobName("minerva sample job");
>>>>
>>>>         conf.setMapOutputKeyClass(Text.class);
>>>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>>>
>>>>         conf.setOutputKeyClass(Text.class);
>>>>         conf.setOutputValueClass(Text.class);
>>>>
>>>>         conf.setMapperClass(Map.class);
>>>>         // conf.setCombinerClass(Reduce.class);
>>>>         conf.setReducerClass(Reduce.class);
>>>>
>>>>         conf.setInputFormat(TextInputFormat.class);
>>>>         conf.setOutputFormat(TextOutputFormat.class);
>>>>
>>>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>>
>>>>         JobClient.runJob(conf);
>>>>
>>>>         return 0;
>>>>     }
>>>>
>>>>     public static void main(String[] args) throws Exception {
>>>>         int res = ToolRunner.run(new Minerva(), args);
>>>>         System.exit(res);
>>>>     }
>>>> }
>>>>
>>>>
>>>> Please let me know if you see any issues?
>>>>
>>>>
>>>>
>>>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com>wrote:
>>>>
>>>>> add '-Dmapreduce.user.classpath.first=true' to your command and try
>>>>> again
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>>>
>>>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>>>> not search the jars located in the the local file system but HDFS. For
>>>>>> example,
>>>>>>
>>>>>> hadoop jar target/myJar.jar Foo -libjars
>>>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>>>
>>>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>>>> processName=JobTracker, sessionId=
>>>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the
>>>>>> staging area
>>>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>> java.io.FileNotFoundException: File does not exist:
>>>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>>>     at
>>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>>>
>>>>>> So under Hadoop 2.2.1, do I have to explicitly set some
>>>>>> configurations so when using the "libjars" option it will copy the file to
>>>>>> hdfs from local fs?
>>>>>>
>>>>>> TIA
>>>>>>
>>>>>> Kim
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>> Shengjun
>>>>>
>>>>
>>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: using "-libjars" in Hadoop 2.2.1

Posted by Kim Chew <kc...@gmail.com>.

Thanks Rahman. This problem can be boiled down to how to submit a job
compiled with Hadoop-1.1.1 remotely to a Hadoop 2 cluster that has not
turned on YARN. I will open another thread for it.

Kim


On Wed, Apr 16, 2014 at 1:30 PM, Abdelrahman Shettia <
ashettia@hortonworks.com> wrote:

> Hi Kim,
>
> You can try to grep on the RM java process by running the following
> command:
>
> ps aux | grep
>
>
>
>
> On Wed, Apr 16, 2014 at 10:31 AM, Kim Chew <kc...@gmail.com> wrote:
>
>> Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml
>> so it tried to run the job locally. Now I am running into the problem that
>> Rahul has, I am unable to to connect to the ResourceManager.
>>
>> The setup of my targeted cluster runs MR1 instead of YARN, hence the "
>> mapreduce.framework.name" is set to "classic".
>>
>> Here are my settings in my mapred-site.xml on the client side.
>>
>> <property>
>>     <!-- Pointed to the remote JobTracker -->
>>         <name>mapreduce.job.tracker.address</name>
>>         <value>172.31.3.150:8021</value>
>>     </property>
>>     <property>
>>         <name>mapreduce.framework.name</name>
>>         <value>yarn</value>
>>     </property>
>>
>> and my yarn-site.xml
>>
>>        <property>
>>             <description>The hostname of the RM.</description>
>>             <name>yarn.resourcemanager.hostname</name>
>>             <value>172.31.3.150</value>
>>         </property>
>>
>>         <property>
>>             <description>The address of the applications manager
>> interface in the RM.</description>
>>             <name>yarn.resourcemanager.address</name>
>>             <value>${yarn.resourcemanager.hostname}:8032</value>
>>         </property>
>>
>> 14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
>> 172.31.3.150:8032
>> 14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
>> hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
>> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>> sleepTime=1 SECONDS)
>>
>> Therefore, the question is how do I figure out where the ResourceManager
>> is running?
>>
>> TIA
>>
>> Kim
>>
>>
>>
>>  On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
>> ashettia@hortonworks.com> wrote:
>>
>>>  Hi Kim,
>>>
>>> It looks like it is pointing to hdfs location. Can you create the hdfs
>>> dir and put the jar there? Hope this helps
>>> Thanks,
>>> Rahman
>>>
>>> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
>>> wrote:
>>>
>>> any help...all are welcome?
>>>
>>>
>>> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <smart.rahul.iiit@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>  I am running with the following command but still, jar is not
>>>> available to mapper and reducers.
>>>>
>>>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>>>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>>>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>>>> -Dmapreduce.user.classpath.first=true
>>>>
>>>>
>>>> Error Log
>>>>
>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at
>>>> /0.0.0.0:8032
>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at
>>>> /0.0.0.0:8032
>>>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>> option parsing not performed. Implement the Tool interface and execute your
>>>> application with ToolRunner to remedy this.
>>>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>>>> process : 1
>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for
>>>> job: job_1397534064728_0028
>>>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>>>> application_1397534064728_0028
>>>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>>>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>>>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job:
>>>> job_1397534064728_0028
>>>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028
>>>> running in uber mode : false
>>>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>>>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>>>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>>>> Error: java.lang.RuntimeException: Error in configuring object
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>>>     at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>>     ... 9 more
>>>> Caused by: java.lang.NoClassDefFoundError:
>>>> org/json/simple/parser/ParseException
>>>>     at java.lang.Class.forName0(Native Method)
>>>>     at java.lang.Class.forName(Class.java:270)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>>>     at
>>>> org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>>>     ... 14 more
>>>> Caused by: java.lang.ClassNotFoundException:
>>>> org.json.simple.parser.ParseException
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>>>     ... 22 more
>>>>
>>>> When i analyzed the logs it says
>>>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>> option parsing not performed. Implement the Tool interface and execute your
>>>> application with ToolRunner to remedy this."
>>>>
>>>> But i have implemented the tool class as described below:
>>>>
>>>> package my.search;
>>>>
>>>> import org.apache.hadoop.conf.Configured;
>>>> import org.apache.hadoop.fs.Path;
>>>> import org.apache.hadoop.io.Text;
>>>> import org.apache.hadoop.mapred.FileInputFormat;
>>>> import org.apache.hadoop.mapred.FileOutputFormat;
>>>> import org.apache.hadoop.mapred.JobClient;
>>>> import org.apache.hadoop.mapred.JobConf;
>>>> import org.apache.hadoop.mapred.TextInputFormat;
>>>> import org.apache.hadoop.mapred.TextOutputFormat;
>>>> import org.apache.hadoop.util.Tool;
>>>> import org.apache.hadoop.util.ToolRunner;
>>>>
>>>> public class Minerva extends Configured implements Tool
>>>> {
>>>>     public int run(String[] args) throws Exception {
>>>>         JobConf conf = new JobConf(Minerva.class);
>>>>         conf.setJobName("minerva sample job");
>>>>
>>>>         conf.setMapOutputKeyClass(Text.class);
>>>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>>>
>>>>         conf.setOutputKeyClass(Text.class);
>>>>         conf.setOutputValueClass(Text.class);
>>>>
>>>>         conf.setMapperClass(Map.class);
>>>>         // conf.setCombinerClass(Reduce.class);
>>>>         conf.setReducerClass(Reduce.class);
>>>>
>>>>         conf.setInputFormat(TextInputFormat.class);
>>>>         conf.setOutputFormat(TextOutputFormat.class);
>>>>
>>>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>>
>>>>         JobClient.runJob(conf);
>>>>
>>>>         return 0;
>>>>     }
>>>>
>>>>     public static void main(String[] args) throws Exception {
>>>>         int res = ToolRunner.run(new Minerva(), args);
>>>>         System.exit(res);
>>>>     }
>>>> }
>>>>
>>>>
>>>> Please let me know if you see any issues?
>>>>
>>>>
>>>>
>>>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com>wrote:
>>>>
>>>>> add '-Dmapreduce.user.classpath.first=true' to your command and try
>>>>> again
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>>>
>>>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>>>> not search the jars located in the the local file system but HDFS. For
>>>>>> example,
>>>>>>
>>>>>> hadoop jar target/myJar.jar Foo -libjars
>>>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>>>
>>>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>>>> processName=JobTracker, sessionId=
>>>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the
>>>>>> staging area
>>>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>> java.io.FileNotFoundException: File does not exist:
>>>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>>>     at
>>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>>>
>>>>>> So under Hadoop 2.2.1, do I have to explicitly set some
>>>>>> configurations so when using the "libjars" option it will copy the file to
>>>>>> hdfs from local fs?
>>>>>>
>>>>>> TIA
>>>>>>
>>>>>> Kim
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>> Shengjun
>>>>>
>>>>
>>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: using "-libjars" in Hadoop 2.2.1

Posted by Abdelrahman Shettia <as...@hortonworks.com>.

Hi Kim,

Correction, the command is :

ps aux | grep -i resource


Also, I notice that you are using some configurations of Jobtracker, which
is not going to be used in for Hadoop 2.x. Here is a sample for all of the
RM configurations from my sandbox one node machine:


mapred-site.xml-    <property>

mapred-site.xml-    <name>mapreduce.jobhistory.webapp.address</name>

mapred-site.xml:    <value>sandbox.com:19888</value>

mapred-site.xml-  </property>

mapred-site.xml-    <property>

--

mapred-site.xml-    <property>

mapred-site.xml-    <name>mapreduce.jobhistory.address</name>

mapred-site.xml:    <value>sandbox.com:10020</value>

mapred-site.xml-  </property>

mapred-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.resource-tracker.address</name>

yarn-site.xml:    <value>sandbox.com:8025</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.admin.address</name>

yarn-site.xml:    <value>sandbox.com:8141</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.hostname</name>

yarn-site.xml:    <value>sandbox.com</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.scheduler.address</name>

yarn-site.xml:    <value>sandbox.com:8030</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.log.server.url</name>

yarn-site.xml:    <value>http://sandbox.com:19888/jobhistory/logs</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.webapp.address</name>

yarn-site.xml:    <value>sandbox.com:8088</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>

--

yarn-site.xml-    <property>

yarn-site.xml-    <name>yarn.resourcemanager.address</name>

yarn-site.xml:    <value>sandbox.com:8050</value>

yarn-site.xml-  </property>

yarn-site.xml-    <property>


Thanks

-Rahman





On Wed, Apr 16, 2014 at 1:30 PM, Abdelrahman Shettia <
ashettia@hortonworks.com> wrote:

> Hi Kim,
>
> You can try to grep on the RM java process by running the following
> command:
>
> ps aux | grep
>
>
>
>
> On Wed, Apr 16, 2014 at 10:31 AM, Kim Chew <kc...@gmail.com> wrote:
>
>> Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml
>> so it tried to run the job locally. Now I am running into the problem that
>> Rahul has, I am unable to to connect to the ResourceManager.
>>
>> The setup of my targeted cluster runs MR1 instead of YARN, hence the "
>> mapreduce.framework.name" is set to "classic".
>>
>> Here are my settings in my mapred-site.xml on the client side.
>>
>> <property>
>>     <!-- Pointed to the remote JobTracker -->
>>         <name>mapreduce.job.tracker.address</name>
>>         <value>172.31.3.150:8021</value>
>>     </property>
>>     <property>
>>         <name>mapreduce.framework.name</name>
>>         <value>yarn</value>
>>     </property>
>>
>> and my yarn-site.xml
>>
>>        <property>
>>             <description>The hostname of the RM.</description>
>>             <name>yarn.resourcemanager.hostname</name>
>>             <value>172.31.3.150</value>
>>         </property>
>>
>>         <property>
>>             <description>The address of the applications manager
>> interface in the RM.</description>
>>             <name>yarn.resourcemanager.address</name>
>>             <value>${yarn.resourcemanager.hostname}:8032</value>
>>         </property>
>>
>> 14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
>> 172.31.3.150:8032
>> 14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
>> hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
>> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>> sleepTime=1 SECONDS)
>>
>> Therefore, the question is how do I figure out where the ResourceManager
>> is running?
>>
>> TIA
>>
>> Kim
>>
>>
>>
>>  On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
>> ashettia@hortonworks.com> wrote:
>>
>>>  Hi Kim,
>>>
>>> It looks like it is pointing to hdfs location. Can you create the hdfs
>>> dir and put the jar there? Hope this helps
>>> Thanks,
>>> Rahman
>>>
>>> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
>>> wrote:
>>>
>>> any help...all are welcome?
>>>
>>>
>>> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <smart.rahul.iiit@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>  I am running with the following command but still, jar is not
>>>> available to mapper and reducers.
>>>>
>>>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>>>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>>>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>>>> -Dmapreduce.user.classpath.first=true
>>>>
>>>>
>>>> Error Log
>>>>
>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at
>>>> /0.0.0.0:8032
>>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at
>>>> /0.0.0.0:8032
>>>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>> option parsing not performed. Implement the Tool interface and execute your
>>>> application with ToolRunner to remedy this.
>>>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>>>> process : 1
>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for
>>>> job: job_1397534064728_0028
>>>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>>>> application_1397534064728_0028
>>>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>>>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>>>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job:
>>>> job_1397534064728_0028
>>>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028
>>>> running in uber mode : false
>>>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>>>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>>>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>>>> Error: java.lang.RuntimeException: Error in configuring object
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>>>     at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>>>     at
>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>>     ... 9 more
>>>> Caused by: java.lang.NoClassDefFoundError:
>>>> org/json/simple/parser/ParseException
>>>>     at java.lang.Class.forName0(Native Method)
>>>>     at java.lang.Class.forName(Class.java:270)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>>>     at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>>>     at
>>>> org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>>>     ... 14 more
>>>> Caused by: java.lang.ClassNotFoundException:
>>>> org.json.simple.parser.ParseException
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>>>     ... 22 more
>>>>
>>>> When i analyzed the logs it says
>>>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>>> option parsing not performed. Implement the Tool interface and execute your
>>>> application with ToolRunner to remedy this."
>>>>
>>>> But i have implemented the tool class as described below:
>>>>
>>>> package my.search;
>>>>
>>>> import org.apache.hadoop.conf.Configured;
>>>> import org.apache.hadoop.fs.Path;
>>>> import org.apache.hadoop.io.Text;
>>>> import org.apache.hadoop.mapred.FileInputFormat;
>>>> import org.apache.hadoop.mapred.FileOutputFormat;
>>>> import org.apache.hadoop.mapred.JobClient;
>>>> import org.apache.hadoop.mapred.JobConf;
>>>> import org.apache.hadoop.mapred.TextInputFormat;
>>>> import org.apache.hadoop.mapred.TextOutputFormat;
>>>> import org.apache.hadoop.util.Tool;
>>>> import org.apache.hadoop.util.ToolRunner;
>>>>
>>>> public class Minerva extends Configured implements Tool
>>>> {
>>>>     public int run(String[] args) throws Exception {
>>>>         JobConf conf = new JobConf(Minerva.class);
>>>>         conf.setJobName("minerva sample job");
>>>>
>>>>         conf.setMapOutputKeyClass(Text.class);
>>>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>>>
>>>>         conf.setOutputKeyClass(Text.class);
>>>>         conf.setOutputValueClass(Text.class);
>>>>
>>>>         conf.setMapperClass(Map.class);
>>>>         // conf.setCombinerClass(Reduce.class);
>>>>         conf.setReducerClass(Reduce.class);
>>>>
>>>>         conf.setInputFormat(TextInputFormat.class);
>>>>         conf.setOutputFormat(TextOutputFormat.class);
>>>>
>>>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>>
>>>>         JobClient.runJob(conf);
>>>>
>>>>         return 0;
>>>>     }
>>>>
>>>>     public static void main(String[] args) throws Exception {
>>>>         int res = ToolRunner.run(new Minerva(), args);
>>>>         System.exit(res);
>>>>     }
>>>> }
>>>>
>>>>
>>>> Please let me know if you see any issues?
>>>>
>>>>
>>>>
>>>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com>wrote:
>>>>
>>>>> add '-Dmapreduce.user.classpath.first=true' to your command and try
>>>>> again
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>>>
>>>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>>>> not search the jars located in the the local file system but HDFS. For
>>>>>> example,
>>>>>>
>>>>>> hadoop jar target/myJar.jar Foo -libjars
>>>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>>>
>>>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>>>> processName=JobTracker, sessionId=
>>>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the
>>>>>> staging area
>>>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>> java.io.FileNotFoundException: File does not exist:
>>>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>>>     at
>>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>>>     at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>>>     at
>>>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>>>
>>>>>> So under Hadoop 2.2.1, do I have to explicitly set some
>>>>>> configurations so when using the "libjars" option it will copy the file to
>>>>>> hdfs from local fs?
>>>>>>
>>>>>> TIA
>>>>>>
>>>>>> Kim
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>> Shengjun
>>>>>
>>>>
>>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: using "-libjars" in Hadoop 2.2.1

Posted by Abdelrahman Shettia <as...@hortonworks.com>.

Hi Kim,

You can try to grep on the RM java process by running the following
command:

ps aux | grep




On Wed, Apr 16, 2014 at 10:31 AM, Kim Chew <kc...@gmail.com> wrote:

> Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml
> so it tried to run the job locally. Now I am running into the problem that
> Rahul has, I am unable to to connect to the ResourceManager.
>
> The setup of my targeted cluster runs MR1 instead of YARN, hence the "
> mapreduce.framework.name" is set to "classic".
>
> Here are my settings in my mapred-site.xml on the client side.
>
> <property>
>     <!-- Pointed to the remote JobTracker -->
>         <name>mapreduce.job.tracker.address</name>
>         <value>172.31.3.150:8021</value>
>     </property>
>     <property>
>         <name>mapreduce.framework.name</name>
>         <value>yarn</value>
>     </property>
>
> and my yarn-site.xml
>
>        <property>
>             <description>The hostname of the RM.</description>
>             <name>yarn.resourcemanager.hostname</name>
>             <value>172.31.3.150</value>
>         </property>
>
>         <property>
>             <description>The address of the applications manager interface
> in the RM.</description>
>             <name>yarn.resourcemanager.address</name>
>             <value>${yarn.resourcemanager.hostname}:8032</value>
>         </property>
>
> 14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
> 172.31.3.150:8032
> 14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
> hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1 SECONDS)
>
> Therefore, the question is how do I figure out where the ResourceManager
> is running?
>
> TIA
>
> Kim
>
>
>
> On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
> ashettia@hortonworks.com> wrote:
>
>> Hi Kim,
>>
>> It looks like it is pointing to hdfs location. Can you create the hdfs
>> dir and put the jar there? Hope this helps
>> Thanks,
>> Rahman
>>
>> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
>> wrote:
>>
>> any help...all are welcome?
>>
>>
>> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <sm...@gmail.com>wrote:
>>
>>> Hi,
>>>  I am running with the following command but still, jar is not available
>>> to mapper and reducers.
>>>
>>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>>> -Dmapreduce.user.classpath.first=true
>>>
>>>
>>> Error Log
>>>
>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 0.0.0.0:8032
>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 0.0.0.0:8032
>>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>> option parsing not performed. Implement the Tool interface and execute your
>>> application with ToolRunner to remedy this.
>>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>>> process : 1
>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for
>>> job: job_1397534064728_0028
>>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>>> application_1397534064728_0028
>>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
>>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running
>>> in uber mode : false
>>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>>> Error: java.lang.RuntimeException: Error in configuring object
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>     ... 9 more
>>> Caused by: java.lang.NoClassDefFoundError:
>>> org/json/simple/parser/ParseException
>>>     at java.lang.Class.forName0(Native Method)
>>>     at java.lang.Class.forName(Class.java:270)
>>>     at
>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>>     at
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>>     at
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>>     at
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>>     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>>     ... 14 more
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.json.simple.parser.ParseException
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>>     ... 22 more
>>>
>>> When i analyzed the logs it says
>>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>> option parsing not performed. Implement the Tool interface and execute your
>>> application with ToolRunner to remedy this."
>>>
>>> But i have implemented the tool class as described below:
>>>
>>> package my.search;
>>>
>>> import org.apache.hadoop.conf.Configured;
>>> import org.apache.hadoop.fs.Path;
>>> import org.apache.hadoop.io.Text;
>>> import org.apache.hadoop.mapred.FileInputFormat;
>>> import org.apache.hadoop.mapred.FileOutputFormat;
>>> import org.apache.hadoop.mapred.JobClient;
>>> import org.apache.hadoop.mapred.JobConf;
>>> import org.apache.hadoop.mapred.TextInputFormat;
>>> import org.apache.hadoop.mapred.TextOutputFormat;
>>> import org.apache.hadoop.util.Tool;
>>> import org.apache.hadoop.util.ToolRunner;
>>>
>>> public class Minerva extends Configured implements Tool
>>> {
>>>     public int run(String[] args) throws Exception {
>>>         JobConf conf = new JobConf(Minerva.class);
>>>         conf.setJobName("minerva sample job");
>>>
>>>         conf.setMapOutputKeyClass(Text.class);
>>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>>
>>>         conf.setOutputKeyClass(Text.class);
>>>         conf.setOutputValueClass(Text.class);
>>>
>>>         conf.setMapperClass(Map.class);
>>>         // conf.setCombinerClass(Reduce.class);
>>>         conf.setReducerClass(Reduce.class);
>>>
>>>         conf.setInputFormat(TextInputFormat.class);
>>>         conf.setOutputFormat(TextOutputFormat.class);
>>>
>>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>
>>>         JobClient.runJob(conf);
>>>
>>>         return 0;
>>>     }
>>>
>>>     public static void main(String[] args) throws Exception {
>>>         int res = ToolRunner.run(new Minerva(), args);
>>>         System.exit(res);
>>>     }
>>> }
>>>
>>>
>>> Please let me know if you see any issues?
>>>
>>>
>>>
>>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com>wrote:
>>>
>>>> add '-Dmapreduce.user.classpath.first=true' to your command and try
>>>> again
>>>>
>>>>
>>>>
>>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>>
>>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>>> not search the jars located in the the local file system but HDFS. For
>>>>> example,
>>>>>
>>>>> hadoop jar target/myJar.jar Foo -libjars
>>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>>
>>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>>> processName=JobTracker, sessionId=
>>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
>>>>> area
>>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>> java.io.FileNotFoundException: File does not exist:
>>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>>     at
>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>>
>>>>> So under Hadoop 2.2.1, do I have to explicitly set some configurations
>>>>> so when using the "libjars" option it will copy the file to hdfs from local
>>>>> fs?
>>>>>
>>>>> TIA
>>>>>
>>>>> Kim
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>> Shengjun
>>>>
>>>
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: using "-libjars" in Hadoop 2.2.1

Posted by Abdelrahman Shettia <as...@hortonworks.com>.

Hi Kim,

You can try to grep on the RM java process by running the following
command:

ps aux | grep




On Wed, Apr 16, 2014 at 10:31 AM, Kim Chew <kc...@gmail.com> wrote:

> Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml
> so it tried to run the job locally. Now I am running into the problem that
> Rahul has, I am unable to to connect to the ResourceManager.
>
> The setup of my targeted cluster runs MR1 instead of YARN, hence the "
> mapreduce.framework.name" is set to "classic".
>
> Here are my settings in my mapred-site.xml on the client side.
>
> <property>
>     <!-- Pointed to the remote JobTracker -->
>         <name>mapreduce.job.tracker.address</name>
>         <value>172.31.3.150:8021</value>
>     </property>
>     <property>
>         <name>mapreduce.framework.name</name>
>         <value>yarn</value>
>     </property>
>
> and my yarn-site.xml
>
>        <property>
>             <description>The hostname of the RM.</description>
>             <name>yarn.resourcemanager.hostname</name>
>             <value>172.31.3.150</value>
>         </property>
>
>         <property>
>             <description>The address of the applications manager interface
> in the RM.</description>
>             <name>yarn.resourcemanager.address</name>
>             <value>${yarn.resourcemanager.hostname}:8032</value>
>         </property>
>
> 14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
> 172.31.3.150:8032
> 14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
> hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1 SECONDS)
>
> Therefore, the question is how do I figure out where the ResourceManager
> is running?
>
> TIA
>
> Kim
>
>
>
> On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
> ashettia@hortonworks.com> wrote:
>
>> Hi Kim,
>>
>> It looks like it is pointing to hdfs location. Can you create the hdfs
>> dir and put the jar there? Hope this helps
>> Thanks,
>> Rahman
>>
>> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
>> wrote:
>>
>> any help...all are welcome?
>>
>>
>> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <sm...@gmail.com>wrote:
>>
>>> Hi,
>>>  I am running with the following command but still, jar is not available
>>> to mapper and reducers.
>>>
>>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>>> -Dmapreduce.user.classpath.first=true
>>>
>>>
>>> Error Log
>>>
>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 0.0.0.0:8032
>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 0.0.0.0:8032
>>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>> option parsing not performed. Implement the Tool interface and execute your
>>> application with ToolRunner to remedy this.
>>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>>> process : 1
>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for
>>> job: job_1397534064728_0028
>>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>>> application_1397534064728_0028
>>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
>>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running
>>> in uber mode : false
>>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>>> Error: java.lang.RuntimeException: Error in configuring object
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>     ... 9 more
>>> Caused by: java.lang.NoClassDefFoundError:
>>> org/json/simple/parser/ParseException
>>>     at java.lang.Class.forName0(Native Method)
>>>     at java.lang.Class.forName(Class.java:270)
>>>     at
>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>>     at
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>>     at
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>>     at
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>>     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>>     ... 14 more
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.json.simple.parser.ParseException
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>>     ... 22 more
>>>
>>> When i analyzed the logs it says
>>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>> option parsing not performed. Implement the Tool interface and execute your
>>> application with ToolRunner to remedy this."
>>>
>>> But i have implemented the tool class as described below:
>>>
>>> package my.search;
>>>
>>> import org.apache.hadoop.conf.Configured;
>>> import org.apache.hadoop.fs.Path;
>>> import org.apache.hadoop.io.Text;
>>> import org.apache.hadoop.mapred.FileInputFormat;
>>> import org.apache.hadoop.mapred.FileOutputFormat;
>>> import org.apache.hadoop.mapred.JobClient;
>>> import org.apache.hadoop.mapred.JobConf;
>>> import org.apache.hadoop.mapred.TextInputFormat;
>>> import org.apache.hadoop.mapred.TextOutputFormat;
>>> import org.apache.hadoop.util.Tool;
>>> import org.apache.hadoop.util.ToolRunner;
>>>
>>> public class Minerva extends Configured implements Tool
>>> {
>>>     public int run(String[] args) throws Exception {
>>>         JobConf conf = new JobConf(Minerva.class);
>>>         conf.setJobName("minerva sample job");
>>>
>>>         conf.setMapOutputKeyClass(Text.class);
>>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>>
>>>         conf.setOutputKeyClass(Text.class);
>>>         conf.setOutputValueClass(Text.class);
>>>
>>>         conf.setMapperClass(Map.class);
>>>         // conf.setCombinerClass(Reduce.class);
>>>         conf.setReducerClass(Reduce.class);
>>>
>>>         conf.setInputFormat(TextInputFormat.class);
>>>         conf.setOutputFormat(TextOutputFormat.class);
>>>
>>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>
>>>         JobClient.runJob(conf);
>>>
>>>         return 0;
>>>     }
>>>
>>>     public static void main(String[] args) throws Exception {
>>>         int res = ToolRunner.run(new Minerva(), args);
>>>         System.exit(res);
>>>     }
>>> }
>>>
>>>
>>> Please let me know if you see any issues?
>>>
>>>
>>>
>>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com>wrote:
>>>
>>>> add '-Dmapreduce.user.classpath.first=true' to your command and try
>>>> again
>>>>
>>>>
>>>>
>>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>>
>>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>>> not search the jars located in the the local file system but HDFS. For
>>>>> example,
>>>>>
>>>>> hadoop jar target/myJar.jar Foo -libjars
>>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>>
>>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>>> processName=JobTracker, sessionId=
>>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
>>>>> area
>>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>> java.io.FileNotFoundException: File does not exist:
>>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>>     at
>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>>
>>>>> So under Hadoop 2.2.1, do I have to explicitly set some configurations
>>>>> so when using the "libjars" option it will copy the file to hdfs from local
>>>>> fs?
>>>>>
>>>>> TIA
>>>>>
>>>>> Kim
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>> Shengjun
>>>>
>>>
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: using "-libjars" in Hadoop 2.2.1

Posted by Abdelrahman Shettia <as...@hortonworks.com>.

Hi Kim,

You can try to grep on the RM java process by running the following
command:

ps aux | grep




On Wed, Apr 16, 2014 at 10:31 AM, Kim Chew <kc...@gmail.com> wrote:

> Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml
> so it tried to run the job locally. Now I am running into the problem that
> Rahul has, I am unable to to connect to the ResourceManager.
>
> The setup of my targeted cluster runs MR1 instead of YARN, hence the "
> mapreduce.framework.name" is set to "classic".
>
> Here are my settings in my mapred-site.xml on the client side.
>
> <property>
>     <!-- Pointed to the remote JobTracker -->
>         <name>mapreduce.job.tracker.address</name>
>         <value>172.31.3.150:8021</value>
>     </property>
>     <property>
>         <name>mapreduce.framework.name</name>
>         <value>yarn</value>
>     </property>
>
> and my yarn-site.xml
>
>        <property>
>             <description>The hostname of the RM.</description>
>             <name>yarn.resourcemanager.hostname</name>
>             <value>172.31.3.150</value>
>         </property>
>
>         <property>
>             <description>The address of the applications manager interface
> in the RM.</description>
>             <name>yarn.resourcemanager.address</name>
>             <value>${yarn.resourcemanager.hostname}:8032</value>
>         </property>
>
> 14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
> 172.31.3.150:8032
> 14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
> hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1 SECONDS)
>
> Therefore, the question is how do I figure out where the ResourceManager
> is running?
>
> TIA
>
> Kim
>
>
>
> On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
> ashettia@hortonworks.com> wrote:
>
>> Hi Kim,
>>
>> It looks like it is pointing to hdfs location. Can you create the hdfs
>> dir and put the jar there? Hope this helps
>> Thanks,
>> Rahman
>>
>> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
>> wrote:
>>
>> any help...all are welcome?
>>
>>
>> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <sm...@gmail.com>wrote:
>>
>>> Hi,
>>>  I am running with the following command but still, jar is not available
>>> to mapper and reducers.
>>>
>>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>>> -Dmapreduce.user.classpath.first=true
>>>
>>>
>>> Error Log
>>>
>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 0.0.0.0:8032
>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 0.0.0.0:8032
>>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>> option parsing not performed. Implement the Tool interface and execute your
>>> application with ToolRunner to remedy this.
>>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>>> process : 1
>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for
>>> job: job_1397534064728_0028
>>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>>> application_1397534064728_0028
>>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
>>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running
>>> in uber mode : false
>>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>>> Error: java.lang.RuntimeException: Error in configuring object
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>     ... 9 more
>>> Caused by: java.lang.NoClassDefFoundError:
>>> org/json/simple/parser/ParseException
>>>     at java.lang.Class.forName0(Native Method)
>>>     at java.lang.Class.forName(Class.java:270)
>>>     at
>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>>     at
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>>     at
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>>     at
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>>     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>>     ... 14 more
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.json.simple.parser.ParseException
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>>     ... 22 more
>>>
>>> When i analyzed the logs it says
>>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>> option parsing not performed. Implement the Tool interface and execute your
>>> application with ToolRunner to remedy this."
>>>
>>> But i have implemented the tool class as described below:
>>>
>>> package my.search;
>>>
>>> import org.apache.hadoop.conf.Configured;
>>> import org.apache.hadoop.fs.Path;
>>> import org.apache.hadoop.io.Text;
>>> import org.apache.hadoop.mapred.FileInputFormat;
>>> import org.apache.hadoop.mapred.FileOutputFormat;
>>> import org.apache.hadoop.mapred.JobClient;
>>> import org.apache.hadoop.mapred.JobConf;
>>> import org.apache.hadoop.mapred.TextInputFormat;
>>> import org.apache.hadoop.mapred.TextOutputFormat;
>>> import org.apache.hadoop.util.Tool;
>>> import org.apache.hadoop.util.ToolRunner;
>>>
>>> public class Minerva extends Configured implements Tool
>>> {
>>>     public int run(String[] args) throws Exception {
>>>         JobConf conf = new JobConf(Minerva.class);
>>>         conf.setJobName("minerva sample job");
>>>
>>>         conf.setMapOutputKeyClass(Text.class);
>>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>>
>>>         conf.setOutputKeyClass(Text.class);
>>>         conf.setOutputValueClass(Text.class);
>>>
>>>         conf.setMapperClass(Map.class);
>>>         // conf.setCombinerClass(Reduce.class);
>>>         conf.setReducerClass(Reduce.class);
>>>
>>>         conf.setInputFormat(TextInputFormat.class);
>>>         conf.setOutputFormat(TextOutputFormat.class);
>>>
>>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>
>>>         JobClient.runJob(conf);
>>>
>>>         return 0;
>>>     }
>>>
>>>     public static void main(String[] args) throws Exception {
>>>         int res = ToolRunner.run(new Minerva(), args);
>>>         System.exit(res);
>>>     }
>>> }
>>>
>>>
>>> Please let me know if you see any issues?
>>>
>>>
>>>
>>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com>wrote:
>>>
>>>> add '-Dmapreduce.user.classpath.first=true' to your command and try
>>>> again
>>>>
>>>>
>>>>
>>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>>
>>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>>> not search the jars located in the the local file system but HDFS. For
>>>>> example,
>>>>>
>>>>> hadoop jar target/myJar.jar Foo -libjars
>>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>>
>>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>>> processName=JobTracker, sessionId=
>>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
>>>>> area
>>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>> java.io.FileNotFoundException: File does not exist:
>>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>>     at
>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>>
>>>>> So under Hadoop 2.2.1, do I have to explicitly set some configurations
>>>>> so when using the "libjars" option it will copy the file to hdfs from local
>>>>> fs?
>>>>>
>>>>> TIA
>>>>>
>>>>> Kim
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>> Shengjun
>>>>
>>>
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: using "-libjars" in Hadoop 2.2.1

Posted by Abdelrahman Shettia <as...@hortonworks.com>.

Hi Kim,

You can try to grep on the RM java process by running the following
command:

ps aux | grep




On Wed, Apr 16, 2014 at 10:31 AM, Kim Chew <kc...@gmail.com> wrote:

> Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml
> so it tried to run the job locally. Now I am running into the problem that
> Rahul has, I am unable to to connect to the ResourceManager.
>
> The setup of my targeted cluster runs MR1 instead of YARN, hence the "
> mapreduce.framework.name" is set to "classic".
>
> Here are my settings in my mapred-site.xml on the client side.
>
> <property>
>     <!-- Pointed to the remote JobTracker -->
>         <name>mapreduce.job.tracker.address</name>
>         <value>172.31.3.150:8021</value>
>     </property>
>     <property>
>         <name>mapreduce.framework.name</name>
>         <value>yarn</value>
>     </property>
>
> and my yarn-site.xml
>
>        <property>
>             <description>The hostname of the RM.</description>
>             <name>yarn.resourcemanager.hostname</name>
>             <value>172.31.3.150</value>
>         </property>
>
>         <property>
>             <description>The address of the applications manager interface
> in the RM.</description>
>             <name>yarn.resourcemanager.address</name>
>             <value>${yarn.resourcemanager.hostname}:8032</value>
>         </property>
>
> 14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
> 172.31.3.150:8032
> 14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
> hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1 SECONDS)
>
> Therefore, the question is how do I figure out where the ResourceManager
> is running?
>
> TIA
>
> Kim
>
>
>
> On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
> ashettia@hortonworks.com> wrote:
>
>> Hi Kim,
>>
>> It looks like it is pointing to hdfs location. Can you create the hdfs
>> dir and put the jar there? Hope this helps
>> Thanks,
>> Rahman
>>
>> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
>> wrote:
>>
>> any help...all are welcome?
>>
>>
>> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <sm...@gmail.com>wrote:
>>
>>> Hi,
>>>  I am running with the following command but still, jar is not available
>>> to mapper and reducers.
>>>
>>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>>> -Dmapreduce.user.classpath.first=true
>>>
>>>
>>> Error Log
>>>
>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 0.0.0.0:8032
>>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
>>> 0.0.0.0:8032
>>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>> option parsing not performed. Implement the Tool interface and execute your
>>> application with ToolRunner to remedy this.
>>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>>> process : 1
>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for
>>> job: job_1397534064728_0028
>>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>>> application_1397534064728_0028
>>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
>>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running
>>> in uber mode : false
>>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>>> Error: java.lang.RuntimeException: Error in configuring object
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>     ... 9 more
>>> Caused by: java.lang.NoClassDefFoundError:
>>> org/json/simple/parser/ParseException
>>>     at java.lang.Class.forName0(Native Method)
>>>     at java.lang.Class.forName(Class.java:270)
>>>     at
>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>>     at
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>>     at
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>>     at
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>>     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>>     ... 14 more
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.json.simple.parser.ParseException
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>>     ... 22 more
>>>
>>> When i analyzed the logs it says
>>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>>> option parsing not performed. Implement the Tool interface and execute your
>>> application with ToolRunner to remedy this."
>>>
>>> But i have implemented the tool class as described below:
>>>
>>> package my.search;
>>>
>>> import org.apache.hadoop.conf.Configured;
>>> import org.apache.hadoop.fs.Path;
>>> import org.apache.hadoop.io.Text;
>>> import org.apache.hadoop.mapred.FileInputFormat;
>>> import org.apache.hadoop.mapred.FileOutputFormat;
>>> import org.apache.hadoop.mapred.JobClient;
>>> import org.apache.hadoop.mapred.JobConf;
>>> import org.apache.hadoop.mapred.TextInputFormat;
>>> import org.apache.hadoop.mapred.TextOutputFormat;
>>> import org.apache.hadoop.util.Tool;
>>> import org.apache.hadoop.util.ToolRunner;
>>>
>>> public class Minerva extends Configured implements Tool
>>> {
>>>     public int run(String[] args) throws Exception {
>>>         JobConf conf = new JobConf(Minerva.class);
>>>         conf.setJobName("minerva sample job");
>>>
>>>         conf.setMapOutputKeyClass(Text.class);
>>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>>
>>>         conf.setOutputKeyClass(Text.class);
>>>         conf.setOutputValueClass(Text.class);
>>>
>>>         conf.setMapperClass(Map.class);
>>>         // conf.setCombinerClass(Reduce.class);
>>>         conf.setReducerClass(Reduce.class);
>>>
>>>         conf.setInputFormat(TextInputFormat.class);
>>>         conf.setOutputFormat(TextOutputFormat.class);
>>>
>>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>
>>>         JobClient.runJob(conf);
>>>
>>>         return 0;
>>>     }
>>>
>>>     public static void main(String[] args) throws Exception {
>>>         int res = ToolRunner.run(new Minerva(), args);
>>>         System.exit(res);
>>>     }
>>> }
>>>
>>>
>>> Please let me know if you see any issues?
>>>
>>>
>>>
>>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com>wrote:
>>>
>>>> add '-Dmapreduce.user.classpath.first=true' to your command and try
>>>> again
>>>>
>>>>
>>>>
>>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>>
>>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>>> not search the jars located in the the local file system but HDFS. For
>>>>> example,
>>>>>
>>>>> hadoop jar target/myJar.jar Foo -libjars
>>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>>
>>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>>> processName=JobTracker, sessionId=
>>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
>>>>> area
>>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>> java.io.FileNotFoundException: File does not exist:
>>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>>     at
>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>>     at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>>     at
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>>
>>>>> So under Hadoop 2.2.1, do I have to explicitly set some configurations
>>>>> so when using the "libjars" option it will copy the file to hdfs from local
>>>>> fs?
>>>>>
>>>>> TIA
>>>>>
>>>>> Kim
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>> Shengjun
>>>>
>>>
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: using "-libjars" in Hadoop 2.2.1

Posted by Kim Chew <kc...@gmail.com>.

Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml so
it tried to run the job locally. Now I am running into the problem that
Rahul has, I am unable to to connect to the ResourceManager.

The setup of my targeted cluster runs MR1 instead of YARN, hence the "
mapreduce.framework.name" is set to "classic".

Here are my settings in my mapred-site.xml on the client side.

<property>
    <!-- Pointed to the remote JobTracker -->
        <name>mapreduce.job.tracker.address</name>
        <value>172.31.3.150:8021</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

and my yarn-site.xml

       <property>
            <description>The hostname of the RM.</description>
            <name>yarn.resourcemanager.hostname</name>
            <value>172.31.3.150</value>
        </property>

        <property>
            <description>The address of the applications manager interface
in the RM.</description>
            <name>yarn.resourcemanager.address</name>
            <value>${yarn.resourcemanager.hostname}:8032</value>
        </property>

14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
172.31.3.150:8032
14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1 SECONDS)

Therefore, the question is how do I figure out where the ResourceManager is
running?

TIA

Kim



On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
ashettia@hortonworks.com> wrote:

> Hi Kim,
>
> It looks like it is pointing to hdfs location. Can you create the hdfs dir
> and put the jar there? Hope this helps
> Thanks,
> Rahman
>
> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
> wrote:
>
> any help...all are welcome?
>
>
> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <sm...@gmail.com>wrote:
>
>> Hi,
>>  I am running with the following command but still, jar is not available
>> to mapper and reducers.
>>
>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>> -Dmapreduce.user.classpath.first=true
>>
>>
>> Error Log
>>
>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
>> 0.0.0.0:8032
>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
>> 0.0.0.0:8032
>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
>> parsing not performed. Implement the Tool interface and execute your
>> application with ToolRunner to remedy this.
>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397534064728_0028
>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>> application_1397534064728_0028
>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running
>> in uber mode : false
>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>> Error: java.lang.RuntimeException: Error in configuring object
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> Caused by: java.lang.reflect.InvocationTargetException
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>     ... 9 more
>> Caused by: java.lang.NoClassDefFoundError:
>> org/json/simple/parser/ParseException
>>     at java.lang.Class.forName0(Native Method)
>>     at java.lang.Class.forName(Class.java:270)
>>     at
>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>     at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>     at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>     at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>     ... 14 more
>> Caused by: java.lang.ClassNotFoundException:
>> org.json.simple.parser.ParseException
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>     ... 22 more
>>
>> When i analyzed the logs it says
>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>> option parsing not performed. Implement the Tool interface and execute your
>> application with ToolRunner to remedy this."
>>
>> But i have implemented the tool class as described below:
>>
>> package my.search;
>>
>> import org.apache.hadoop.conf.Configured;
>> import org.apache.hadoop.fs.Path;
>> import org.apache.hadoop.io.Text;
>> import org.apache.hadoop.mapred.FileInputFormat;
>> import org.apache.hadoop.mapred.FileOutputFormat;
>> import org.apache.hadoop.mapred.JobClient;
>> import org.apache.hadoop.mapred.JobConf;
>> import org.apache.hadoop.mapred.TextInputFormat;
>> import org.apache.hadoop.mapred.TextOutputFormat;
>> import org.apache.hadoop.util.Tool;
>> import org.apache.hadoop.util.ToolRunner;
>>
>> public class Minerva extends Configured implements Tool
>> {
>>     public int run(String[] args) throws Exception {
>>         JobConf conf = new JobConf(Minerva.class);
>>         conf.setJobName("minerva sample job");
>>
>>         conf.setMapOutputKeyClass(Text.class);
>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>
>>         conf.setOutputKeyClass(Text.class);
>>         conf.setOutputValueClass(Text.class);
>>
>>         conf.setMapperClass(Map.class);
>>         // conf.setCombinerClass(Reduce.class);
>>         conf.setReducerClass(Reduce.class);
>>
>>         conf.setInputFormat(TextInputFormat.class);
>>         conf.setOutputFormat(TextOutputFormat.class);
>>
>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>
>>         JobClient.runJob(conf);
>>
>>         return 0;
>>     }
>>
>>     public static void main(String[] args) throws Exception {
>>         int res = ToolRunner.run(new Minerva(), args);
>>         System.exit(res);
>>     }
>> }
>>
>>
>> Please let me know if you see any issues?
>>
>>
>>
>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com> wrote:
>>
>>> add '-Dmapreduce.user.classpath.first=true' to your command and try again
>>>
>>>
>>>
>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>
>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>> not search the jars located in the the local file system but HDFS. For
>>>> example,
>>>>
>>>> hadoop jar target/myJar.jar Foo -libjars
>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>
>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>> processName=JobTracker, sessionId=
>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
>>>> area
>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>> java.io.FileNotFoundException: File does not exist:
>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>     at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>     at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>     at
>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>     at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>     at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>     at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>     at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>     at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>     at
>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>
>>>> So under Hadoop 2.2.1, do I have to explicitly set some configurations
>>>> so when using the "libjars" option it will copy the file to hdfs from local
>>>> fs?
>>>>
>>>> TIA
>>>>
>>>> Kim
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Shengjun
>>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: using "-libjars" in Hadoop 2.2.1

Posted by Kim Chew <kc...@gmail.com>.

Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml so
it tried to run the job locally. Now I am running into the problem that
Rahul has, I am unable to to connect to the ResourceManager.

The setup of my targeted cluster runs MR1 instead of YARN, hence the "
mapreduce.framework.name" is set to "classic".

Here are my settings in my mapred-site.xml on the client side.

<property>
    <!-- Pointed to the remote JobTracker -->
        <name>mapreduce.job.tracker.address</name>
        <value>172.31.3.150:8021</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

and my yarn-site.xml

       <property>
            <description>The hostname of the RM.</description>
            <name>yarn.resourcemanager.hostname</name>
            <value>172.31.3.150</value>
        </property>

        <property>
            <description>The address of the applications manager interface
in the RM.</description>
            <name>yarn.resourcemanager.address</name>
            <value>${yarn.resourcemanager.hostname}:8032</value>
        </property>

14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
172.31.3.150:8032
14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1 SECONDS)

Therefore, the question is how do I figure out where the ResourceManager is
running?

TIA

Kim



On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
ashettia@hortonworks.com> wrote:

> Hi Kim,
>
> It looks like it is pointing to hdfs location. Can you create the hdfs dir
> and put the jar there? Hope this helps
> Thanks,
> Rahman
>
> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
> wrote:
>
> any help...all are welcome?
>
>
> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <sm...@gmail.com>wrote:
>
>> Hi,
>>  I am running with the following command but still, jar is not available
>> to mapper and reducers.
>>
>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>> -Dmapreduce.user.classpath.first=true
>>
>>
>> Error Log
>>
>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
>> 0.0.0.0:8032
>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
>> 0.0.0.0:8032
>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
>> parsing not performed. Implement the Tool interface and execute your
>> application with ToolRunner to remedy this.
>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397534064728_0028
>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>> application_1397534064728_0028
>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running
>> in uber mode : false
>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>> Error: java.lang.RuntimeException: Error in configuring object
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> Caused by: java.lang.reflect.InvocationTargetException
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>     ... 9 more
>> Caused by: java.lang.NoClassDefFoundError:
>> org/json/simple/parser/ParseException
>>     at java.lang.Class.forName0(Native Method)
>>     at java.lang.Class.forName(Class.java:270)
>>     at
>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>     at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>     at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>     at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>     ... 14 more
>> Caused by: java.lang.ClassNotFoundException:
>> org.json.simple.parser.ParseException
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>     ... 22 more
>>
>> When i analyzed the logs it says
>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>> option parsing not performed. Implement the Tool interface and execute your
>> application with ToolRunner to remedy this."
>>
>> But i have implemented the tool class as described below:
>>
>> package my.search;
>>
>> import org.apache.hadoop.conf.Configured;
>> import org.apache.hadoop.fs.Path;
>> import org.apache.hadoop.io.Text;
>> import org.apache.hadoop.mapred.FileInputFormat;
>> import org.apache.hadoop.mapred.FileOutputFormat;
>> import org.apache.hadoop.mapred.JobClient;
>> import org.apache.hadoop.mapred.JobConf;
>> import org.apache.hadoop.mapred.TextInputFormat;
>> import org.apache.hadoop.mapred.TextOutputFormat;
>> import org.apache.hadoop.util.Tool;
>> import org.apache.hadoop.util.ToolRunner;
>>
>> public class Minerva extends Configured implements Tool
>> {
>>     public int run(String[] args) throws Exception {
>>         JobConf conf = new JobConf(Minerva.class);
>>         conf.setJobName("minerva sample job");
>>
>>         conf.setMapOutputKeyClass(Text.class);
>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>
>>         conf.setOutputKeyClass(Text.class);
>>         conf.setOutputValueClass(Text.class);
>>
>>         conf.setMapperClass(Map.class);
>>         // conf.setCombinerClass(Reduce.class);
>>         conf.setReducerClass(Reduce.class);
>>
>>         conf.setInputFormat(TextInputFormat.class);
>>         conf.setOutputFormat(TextOutputFormat.class);
>>
>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>
>>         JobClient.runJob(conf);
>>
>>         return 0;
>>     }
>>
>>     public static void main(String[] args) throws Exception {
>>         int res = ToolRunner.run(new Minerva(), args);
>>         System.exit(res);
>>     }
>> }
>>
>>
>> Please let me know if you see any issues?
>>
>>
>>
>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com> wrote:
>>
>>> add '-Dmapreduce.user.classpath.first=true' to your command and try again
>>>
>>>
>>>
>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>
>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>> not search the jars located in the the local file system but HDFS. For
>>>> example,
>>>>
>>>> hadoop jar target/myJar.jar Foo -libjars
>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>
>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>> processName=JobTracker, sessionId=
>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
>>>> area
>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>> java.io.FileNotFoundException: File does not exist:
>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>     at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>     at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>     at
>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>     at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>     at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>     at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>     at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>     at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>     at
>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>
>>>> So under Hadoop 2.2.1, do I have to explicitly set some configurations
>>>> so when using the "libjars" option it will copy the file to hdfs from local
>>>> fs?
>>>>
>>>> TIA
>>>>
>>>> Kim
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Shengjun
>>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: using "-libjars" in Hadoop 2.2.1

Posted by Kim Chew <kc...@gmail.com>.

Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml so
it tried to run the job locally. Now I am running into the problem that
Rahul has, I am unable to to connect to the ResourceManager.

The setup of my targeted cluster runs MR1 instead of YARN, hence the "
mapreduce.framework.name" is set to "classic".

Here are my settings in my mapred-site.xml on the client side.

<property>
    <!-- Pointed to the remote JobTracker -->
        <name>mapreduce.job.tracker.address</name>
        <value>172.31.3.150:8021</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

and my yarn-site.xml

       <property>
            <description>The hostname of the RM.</description>
            <name>yarn.resourcemanager.hostname</name>
            <value>172.31.3.150</value>
        </property>

        <property>
            <description>The address of the applications manager interface
in the RM.</description>
            <name>yarn.resourcemanager.address</name>
            <value>${yarn.resourcemanager.hostname}:8032</value>
        </property>

14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
172.31.3.150:8032
14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1 SECONDS)

Therefore, the question is how do I figure out where the ResourceManager is
running?

TIA

Kim



On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
ashettia@hortonworks.com> wrote:

> Hi Kim,
>
> It looks like it is pointing to hdfs location. Can you create the hdfs dir
> and put the jar there? Hope this helps
> Thanks,
> Rahman
>
> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
> wrote:
>
> any help...all are welcome?
>
>
> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <sm...@gmail.com>wrote:
>
>> Hi,
>>  I am running with the following command but still, jar is not available
>> to mapper and reducers.
>>
>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>> -Dmapreduce.user.classpath.first=true
>>
>>
>> Error Log
>>
>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
>> 0.0.0.0:8032
>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
>> 0.0.0.0:8032
>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
>> parsing not performed. Implement the Tool interface and execute your
>> application with ToolRunner to remedy this.
>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397534064728_0028
>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>> application_1397534064728_0028
>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running
>> in uber mode : false
>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>> Error: java.lang.RuntimeException: Error in configuring object
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> Caused by: java.lang.reflect.InvocationTargetException
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>     ... 9 more
>> Caused by: java.lang.NoClassDefFoundError:
>> org/json/simple/parser/ParseException
>>     at java.lang.Class.forName0(Native Method)
>>     at java.lang.Class.forName(Class.java:270)
>>     at
>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>     at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>     at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>     at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>     ... 14 more
>> Caused by: java.lang.ClassNotFoundException:
>> org.json.simple.parser.ParseException
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>     ... 22 more
>>
>> When i analyzed the logs it says
>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>> option parsing not performed. Implement the Tool interface and execute your
>> application with ToolRunner to remedy this."
>>
>> But i have implemented the tool class as described below:
>>
>> package my.search;
>>
>> import org.apache.hadoop.conf.Configured;
>> import org.apache.hadoop.fs.Path;
>> import org.apache.hadoop.io.Text;
>> import org.apache.hadoop.mapred.FileInputFormat;
>> import org.apache.hadoop.mapred.FileOutputFormat;
>> import org.apache.hadoop.mapred.JobClient;
>> import org.apache.hadoop.mapred.JobConf;
>> import org.apache.hadoop.mapred.TextInputFormat;
>> import org.apache.hadoop.mapred.TextOutputFormat;
>> import org.apache.hadoop.util.Tool;
>> import org.apache.hadoop.util.ToolRunner;
>>
>> public class Minerva extends Configured implements Tool
>> {
>>     public int run(String[] args) throws Exception {
>>         JobConf conf = new JobConf(Minerva.class);
>>         conf.setJobName("minerva sample job");
>>
>>         conf.setMapOutputKeyClass(Text.class);
>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>
>>         conf.setOutputKeyClass(Text.class);
>>         conf.setOutputValueClass(Text.class);
>>
>>         conf.setMapperClass(Map.class);
>>         // conf.setCombinerClass(Reduce.class);
>>         conf.setReducerClass(Reduce.class);
>>
>>         conf.setInputFormat(TextInputFormat.class);
>>         conf.setOutputFormat(TextOutputFormat.class);
>>
>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>
>>         JobClient.runJob(conf);
>>
>>         return 0;
>>     }
>>
>>     public static void main(String[] args) throws Exception {
>>         int res = ToolRunner.run(new Minerva(), args);
>>         System.exit(res);
>>     }
>> }
>>
>>
>> Please let me know if you see any issues?
>>
>>
>>
>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com> wrote:
>>
>>> add '-Dmapreduce.user.classpath.first=true' to your command and try again
>>>
>>>
>>>
>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>
>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>> not search the jars located in the the local file system but HDFS. For
>>>> example,
>>>>
>>>> hadoop jar target/myJar.jar Foo -libjars
>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>
>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>> processName=JobTracker, sessionId=
>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
>>>> area
>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>> java.io.FileNotFoundException: File does not exist:
>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>     at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>     at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>     at
>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>     at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>     at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>     at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>     at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>     at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>     at
>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>
>>>> So under Hadoop 2.2.1, do I have to explicitly set some configurations
>>>> so when using the "libjars" option it will copy the file to hdfs from local
>>>> fs?
>>>>
>>>> TIA
>>>>
>>>> Kim
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Shengjun
>>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: using "-libjars" in Hadoop 2.2.1

Posted by Kim Chew <kc...@gmail.com>.

Thanks Rahman, I have mixed things up a little bit in my mapred-site.xml so
it tried to run the job locally. Now I am running into the problem that
Rahul has, I am unable to to connect to the ResourceManager.

The setup of my targeted cluster runs MR1 instead of YARN, hence the "
mapreduce.framework.name" is set to "classic".

Here are my settings in my mapred-site.xml on the client side.

<property>
    <!-- Pointed to the remote JobTracker -->
        <name>mapreduce.job.tracker.address</name>
        <value>172.31.3.150:8021</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

and my yarn-site.xml

       <property>
            <description>The hostname of the RM.</description>
            <name>yarn.resourcemanager.hostname</name>
            <value>172.31.3.150</value>
        </property>

        <property>
            <description>The address of the applications manager interface
in the RM.</description>
            <name>yarn.resourcemanager.address</name>
            <value>${yarn.resourcemanager.hostname}:8032</value>
        </property>

14/04/16 10:23:02 INFO client.RMProxy: Connecting to ResourceManager at /
172.31.3.150:8032
14/04/16 10:23:10 INFO ipc.Client: Retrying connect to server:
hadoop-host1.eng.narus.com/172.31.3.150:8032. Already tried 0 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1 SECONDS)

Therefore, the question is how do I figure out where the ResourceManager is
running?

TIA

Kim



On Wed, Apr 16, 2014 at 8:43 AM, Abdelrahman Shettia <
ashettia@hortonworks.com> wrote:

> Hi Kim,
>
> It looks like it is pointing to hdfs location. Can you create the hdfs dir
> and put the jar there? Hope this helps
> Thanks,
> Rahman
>
> On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com>
> wrote:
>
> any help...all are welcome?
>
>
> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <sm...@gmail.com>wrote:
>
>> Hi,
>>  I am running with the following command but still, jar is not available
>> to mapper and reducers.
>>
>> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
>> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
>> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
>> -Dmapreduce.user.classpath.first=true
>>
>>
>> Error Log
>>
>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
>> 0.0.0.0:8032
>> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
>> 0.0.0.0:8032
>> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
>> parsing not performed. Implement the Tool interface and execute your
>> application with ToolRunner to remedy this.
>> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
>> process : 1
>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
>> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1397534064728_0028
>> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
>> application_1397534064728_0028
>> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
>> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/<http://l-rahul-tech:8088/proxy/application_1397534064728_0028/>
>> 14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
>> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running
>> in uber mode : false
>> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
>> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
>> attempt_1397534064728_0028_m_000005_0, Status : FAILED
>> Error: java.lang.RuntimeException: Error in configuring object
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:416)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> Caused by: java.lang.reflect.InvocationTargetException
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:622)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>     ... 9 more
>> Caused by: java.lang.NoClassDefFoundError:
>> org/json/simple/parser/ParseException
>>     at java.lang.Class.forName0(Native Method)
>>     at java.lang.Class.forName(Class.java:270)
>>     at
>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>>     at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>>     at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>>     at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>>     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>     ... 14 more
>> Caused by: java.lang.ClassNotFoundException:
>> org.json.simple.parser.ParseException
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>>     ... 22 more
>>
>> When i analyzed the logs it says
>> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line
>> option parsing not performed. Implement the Tool interface and execute your
>> application with ToolRunner to remedy this."
>>
>> But i have implemented the tool class as described below:
>>
>> package my.search;
>>
>> import org.apache.hadoop.conf.Configured;
>> import org.apache.hadoop.fs.Path;
>> import org.apache.hadoop.io.Text;
>> import org.apache.hadoop.mapred.FileInputFormat;
>> import org.apache.hadoop.mapred.FileOutputFormat;
>> import org.apache.hadoop.mapred.JobClient;
>> import org.apache.hadoop.mapred.JobConf;
>> import org.apache.hadoop.mapred.TextInputFormat;
>> import org.apache.hadoop.mapred.TextOutputFormat;
>> import org.apache.hadoop.util.Tool;
>> import org.apache.hadoop.util.ToolRunner;
>>
>> public class Minerva extends Configured implements Tool
>> {
>>     public int run(String[] args) throws Exception {
>>         JobConf conf = new JobConf(Minerva.class);
>>         conf.setJobName("minerva sample job");
>>
>>         conf.setMapOutputKeyClass(Text.class);
>>         conf.setMapOutputValueClass(TextArrayWritable.class);
>>
>>         conf.setOutputKeyClass(Text.class);
>>         conf.setOutputValueClass(Text.class);
>>
>>         conf.setMapperClass(Map.class);
>>         // conf.setCombinerClass(Reduce.class);
>>         conf.setReducerClass(Reduce.class);
>>
>>         conf.setInputFormat(TextInputFormat.class);
>>         conf.setOutputFormat(TextOutputFormat.class);
>>
>>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>
>>         JobClient.runJob(conf);
>>
>>         return 0;
>>     }
>>
>>     public static void main(String[] args) throws Exception {
>>         int res = ToolRunner.run(new Minerva(), args);
>>         System.exit(res);
>>     }
>> }
>>
>>
>> Please let me know if you see any issues?
>>
>>
>>
>> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com> wrote:
>>
>>> add '-Dmapreduce.user.classpath.first=true' to your command and try again
>>>
>>>
>>>
>>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>>
>>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does
>>>> not search the jars located in the the local file system but HDFS. For
>>>> example,
>>>>
>>>> hadoop jar target/myJar.jar Foo -libjars
>>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>>
>>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>>> processName=JobTracker, sessionId=
>>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
>>>> area
>>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>>> cause:java.io.FileNotFoundException: File does not exist:
>>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>> java.io.FileNotFoundException: File does not exist:
>>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>>     at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>>     at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>>     at
>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>>     at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>>     at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>>     at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>>     at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>>     at
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>>     at
>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>>
>>>> So under Hadoop 2.2.1, do I have to explicitly set some configurations
>>>> so when using the "libjars" option it will copy the file to hdfs from local
>>>> fs?
>>>>
>>>> TIA
>>>>
>>>> Kim
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Shengjun
>>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: using "-libjars" in Hadoop 2.2.1

Posted by Abdelrahman Shettia <as...@hortonworks.com>.

Hi Kim,

It looks like it is pointing to hdfs location. Can you create the hdfs dir and put the jar there? Hope this helps 
Thanks,
Rahman

On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com> wrote:

> any help...all are welcome?
> 
> 
> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <sm...@gmail.com> wrote:
> Hi,
>  I am running with the following command but still, jar is not available to mapper and reducers.
> 
> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3 -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar -Dmapreduce.user.classpath.first=true
> 
> 
> Error Log
> 
> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to process : 1
> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1397534064728_0028
> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application application_1397534064728_0028
> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job: http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/
> 14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running in uber mode : false
> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id : attempt_1397534064728_0028_m_000005_0, Status : FAILED
> Error: java.lang.RuntimeException: Error in configuring object
>     at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>     at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>     at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:416)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:622)
>     at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>     ... 9 more
> Caused by: java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:270)
>     at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>     at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>     ... 14 more
> Caused by: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>     ... 22 more
> 
> When i analyzed the logs it says
> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this." 
> 
> But i have implemented the tool class as described below: 
> 
> package my.search;
> 
> import org.apache.hadoop.conf.Configured;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapred.FileInputFormat;
> import org.apache.hadoop.mapred.FileOutputFormat;
> import org.apache.hadoop.mapred.JobClient;
> import org.apache.hadoop.mapred.JobConf;
> import org.apache.hadoop.mapred.TextInputFormat;
> import org.apache.hadoop.mapred.TextOutputFormat;
> import org.apache.hadoop.util.Tool;
> import org.apache.hadoop.util.ToolRunner;
> 
> public class Minerva extends Configured implements Tool
> {
>     public int run(String[] args) throws Exception {
>         JobConf conf = new JobConf(Minerva.class);
>         conf.setJobName("minerva sample job");
> 
>         conf.setMapOutputKeyClass(Text.class);
>         conf.setMapOutputValueClass(TextArrayWritable.class);
> 
>         conf.setOutputKeyClass(Text.class);
>         conf.setOutputValueClass(Text.class);
> 
>         conf.setMapperClass(Map.class);
>         // conf.setCombinerClass(Reduce.class);
>         conf.setReducerClass(Reduce.class);
> 
>         conf.setInputFormat(TextInputFormat.class);
>         conf.setOutputFormat(TextOutputFormat.class);
> 
>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
> 
>         JobClient.runJob(conf);
>         
>         return 0;
>     }
>     
>     public static void main(String[] args) throws Exception {
>         int res = ToolRunner.run(new Minerva(), args);
>         System.exit(res);
>     }
> }
> 
> 
> Please let me know if you see any issues?
> 
> 
> 
> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com> wrote:
> add '-Dmapreduce.user.classpath.first=true' to your command and try again
> 
> 
> 
> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
> It seems to me that in Hadoop 2.2.1, using the "libjars" option does not search the jars located in the the local file system but HDFS. For example,
> 
> hadoop jar target/myJar.jar Foo -libjars /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
> 
> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
> 14/04/08 15:11:02 ERROR security.UserGroupInformation: PriviledgedActionException as:kchew (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
> java.io.FileNotFoundException: File does not exist: hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>     at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>     at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>     at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>     at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>     at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>     at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>     at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>     at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
> 
> So under Hadoop 2.2.1, do I have to explicitly set some configurations so when using the "libjars" option it will copy the file to hdfs from local fs?
> 
> TIA
> 
> Kim
> 
> 
> 
> -- 
> Regards 
> Shengjun
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: using "-libjars" in Hadoop 2.2.1

Posted by Abdelrahman Shettia <as...@hortonworks.com>.

Hi Kim,

It looks like it is pointing to hdfs location. Can you create the hdfs dir and put the jar there? Hope this helps 
Thanks,
Rahman

On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com> wrote:

> any help...all are welcome?
> 
> 
> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <sm...@gmail.com> wrote:
> Hi,
>  I am running with the following command but still, jar is not available to mapper and reducers.
> 
> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3 -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar -Dmapreduce.user.classpath.first=true
> 
> 
> Error Log
> 
> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to process : 1
> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1397534064728_0028
> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application application_1397534064728_0028
> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job: http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/
> 14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running in uber mode : false
> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id : attempt_1397534064728_0028_m_000005_0, Status : FAILED
> Error: java.lang.RuntimeException: Error in configuring object
>     at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>     at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>     at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:416)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:622)
>     at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>     ... 9 more
> Caused by: java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:270)
>     at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>     at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>     ... 14 more
> Caused by: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>     ... 22 more
> 
> When i analyzed the logs it says
> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this." 
> 
> But i have implemented the tool class as described below: 
> 
> package my.search;
> 
> import org.apache.hadoop.conf.Configured;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapred.FileInputFormat;
> import org.apache.hadoop.mapred.FileOutputFormat;
> import org.apache.hadoop.mapred.JobClient;
> import org.apache.hadoop.mapred.JobConf;
> import org.apache.hadoop.mapred.TextInputFormat;
> import org.apache.hadoop.mapred.TextOutputFormat;
> import org.apache.hadoop.util.Tool;
> import org.apache.hadoop.util.ToolRunner;
> 
> public class Minerva extends Configured implements Tool
> {
>     public int run(String[] args) throws Exception {
>         JobConf conf = new JobConf(Minerva.class);
>         conf.setJobName("minerva sample job");
> 
>         conf.setMapOutputKeyClass(Text.class);
>         conf.setMapOutputValueClass(TextArrayWritable.class);
> 
>         conf.setOutputKeyClass(Text.class);
>         conf.setOutputValueClass(Text.class);
> 
>         conf.setMapperClass(Map.class);
>         // conf.setCombinerClass(Reduce.class);
>         conf.setReducerClass(Reduce.class);
> 
>         conf.setInputFormat(TextInputFormat.class);
>         conf.setOutputFormat(TextOutputFormat.class);
> 
>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
> 
>         JobClient.runJob(conf);
>         
>         return 0;
>     }
>     
>     public static void main(String[] args) throws Exception {
>         int res = ToolRunner.run(new Minerva(), args);
>         System.exit(res);
>     }
> }
> 
> 
> Please let me know if you see any issues?
> 
> 
> 
> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com> wrote:
> add '-Dmapreduce.user.classpath.first=true' to your command and try again
> 
> 
> 
> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
> It seems to me that in Hadoop 2.2.1, using the "libjars" option does not search the jars located in the the local file system but HDFS. For example,
> 
> hadoop jar target/myJar.jar Foo -libjars /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
> 
> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
> 14/04/08 15:11:02 ERROR security.UserGroupInformation: PriviledgedActionException as:kchew (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
> java.io.FileNotFoundException: File does not exist: hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>     at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>     at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>     at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>     at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>     at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>     at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>     at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>     at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
> 
> So under Hadoop 2.2.1, do I have to explicitly set some configurations so when using the "libjars" option it will copy the file to hdfs from local fs?
> 
> TIA
> 
> Kim
> 
> 
> 
> -- 
> Regards 
> Shengjun
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: using "-libjars" in Hadoop 2.2.1

Posted by Abdelrahman Shettia <as...@hortonworks.com>.

Hi Kim,

It looks like it is pointing to hdfs location. Can you create the hdfs dir and put the jar there? Hope this helps 
Thanks,
Rahman

On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com> wrote:

> any help...all are welcome?
> 
> 
> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <sm...@gmail.com> wrote:
> Hi,
>  I am running with the following command but still, jar is not available to mapper and reducers.
> 
> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3 -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar -Dmapreduce.user.classpath.first=true
> 
> 
> Error Log
> 
> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to process : 1
> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1397534064728_0028
> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application application_1397534064728_0028
> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job: http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/
> 14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running in uber mode : false
> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id : attempt_1397534064728_0028_m_000005_0, Status : FAILED
> Error: java.lang.RuntimeException: Error in configuring object
>     at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>     at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>     at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:416)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:622)
>     at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>     ... 9 more
> Caused by: java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:270)
>     at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>     at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>     ... 14 more
> Caused by: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>     ... 22 more
> 
> When i analyzed the logs it says
> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this." 
> 
> But i have implemented the tool class as described below: 
> 
> package my.search;
> 
> import org.apache.hadoop.conf.Configured;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapred.FileInputFormat;
> import org.apache.hadoop.mapred.FileOutputFormat;
> import org.apache.hadoop.mapred.JobClient;
> import org.apache.hadoop.mapred.JobConf;
> import org.apache.hadoop.mapred.TextInputFormat;
> import org.apache.hadoop.mapred.TextOutputFormat;
> import org.apache.hadoop.util.Tool;
> import org.apache.hadoop.util.ToolRunner;
> 
> public class Minerva extends Configured implements Tool
> {
>     public int run(String[] args) throws Exception {
>         JobConf conf = new JobConf(Minerva.class);
>         conf.setJobName("minerva sample job");
> 
>         conf.setMapOutputKeyClass(Text.class);
>         conf.setMapOutputValueClass(TextArrayWritable.class);
> 
>         conf.setOutputKeyClass(Text.class);
>         conf.setOutputValueClass(Text.class);
> 
>         conf.setMapperClass(Map.class);
>         // conf.setCombinerClass(Reduce.class);
>         conf.setReducerClass(Reduce.class);
> 
>         conf.setInputFormat(TextInputFormat.class);
>         conf.setOutputFormat(TextOutputFormat.class);
> 
>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
> 
>         JobClient.runJob(conf);
>         
>         return 0;
>     }
>     
>     public static void main(String[] args) throws Exception {
>         int res = ToolRunner.run(new Minerva(), args);
>         System.exit(res);
>     }
> }
> 
> 
> Please let me know if you see any issues?
> 
> 
> 
> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com> wrote:
> add '-Dmapreduce.user.classpath.first=true' to your command and try again
> 
> 
> 
> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
> It seems to me that in Hadoop 2.2.1, using the "libjars" option does not search the jars located in the the local file system but HDFS. For example,
> 
> hadoop jar target/myJar.jar Foo -libjars /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
> 
> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
> 14/04/08 15:11:02 ERROR security.UserGroupInformation: PriviledgedActionException as:kchew (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
> java.io.FileNotFoundException: File does not exist: hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>     at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>     at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>     at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>     at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>     at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>     at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>     at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>     at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
> 
> So under Hadoop 2.2.1, do I have to explicitly set some configurations so when using the "libjars" option it will copy the file to hdfs from local fs?
> 
> TIA
> 
> Kim
> 
> 
> 
> -- 
> Regards 
> Shengjun
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: using "-libjars" in Hadoop 2.2.1

Posted by Abdelrahman Shettia <as...@hortonworks.com>.

Hi Kim,

It looks like it is pointing to hdfs location. Can you create the hdfs dir and put the jar there? Hope this helps 
Thanks,
Rahman

On Apr 16, 2014, at 8:39 AM, Rahul Singh <sm...@gmail.com> wrote:

> any help...all are welcome?
> 
> 
> On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <sm...@gmail.com> wrote:
> Hi,
>  I am running with the following command but still, jar is not available to mapper and reducers.
> 
> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3 -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar -Dmapreduce.user.classpath.first=true
> 
> 
> Error Log
> 
> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to process : 1
> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1397534064728_0028
> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application application_1397534064728_0028
> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job: http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/
> 14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running in uber mode : false
> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id : attempt_1397534064728_0028_m_000005_0, Status : FAILED
> Error: java.lang.RuntimeException: Error in configuring object
>     at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>     at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>     at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:416)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:622)
>     at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>     ... 9 more
> Caused by: java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:270)
>     at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>     at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>     ... 14 more
> Caused by: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>     ... 22 more
> 
> When i analyzed the logs it says
> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this." 
> 
> But i have implemented the tool class as described below: 
> 
> package my.search;
> 
> import org.apache.hadoop.conf.Configured;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapred.FileInputFormat;
> import org.apache.hadoop.mapred.FileOutputFormat;
> import org.apache.hadoop.mapred.JobClient;
> import org.apache.hadoop.mapred.JobConf;
> import org.apache.hadoop.mapred.TextInputFormat;
> import org.apache.hadoop.mapred.TextOutputFormat;
> import org.apache.hadoop.util.Tool;
> import org.apache.hadoop.util.ToolRunner;
> 
> public class Minerva extends Configured implements Tool
> {
>     public int run(String[] args) throws Exception {
>         JobConf conf = new JobConf(Minerva.class);
>         conf.setJobName("minerva sample job");
> 
>         conf.setMapOutputKeyClass(Text.class);
>         conf.setMapOutputValueClass(TextArrayWritable.class);
> 
>         conf.setOutputKeyClass(Text.class);
>         conf.setOutputValueClass(Text.class);
> 
>         conf.setMapperClass(Map.class);
>         // conf.setCombinerClass(Reduce.class);
>         conf.setReducerClass(Reduce.class);
> 
>         conf.setInputFormat(TextInputFormat.class);
>         conf.setOutputFormat(TextOutputFormat.class);
> 
>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
> 
>         JobClient.runJob(conf);
>         
>         return 0;
>     }
>     
>     public static void main(String[] args) throws Exception {
>         int res = ToolRunner.run(new Minerva(), args);
>         System.exit(res);
>     }
> }
> 
> 
> Please let me know if you see any issues?
> 
> 
> 
> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com> wrote:
> add '-Dmapreduce.user.classpath.first=true' to your command and try again
> 
> 
> 
> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
> It seems to me that in Hadoop 2.2.1, using the "libjars" option does not search the jars located in the the local file system but HDFS. For example,
> 
> hadoop jar target/myJar.jar Foo -libjars /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
> 
> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
> 14/04/08 15:11:02 ERROR security.UserGroupInformation: PriviledgedActionException as:kchew (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
> java.io.FileNotFoundException: File does not exist: hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>     at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>     at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>     at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>     at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>     at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>     at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>     at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>     at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
> 
> So under Hadoop 2.2.1, do I have to explicitly set some configurations so when using the "libjars" option it will copy the file to hdfs from local fs?
> 
> TIA
> 
> Kim
> 
> 
> 
> -- 
> Regards 
> Shengjun
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: using "-libjars" in Hadoop 2.2.1

Posted by Rahul Singh <sm...@gmail.com>.

any help...all are welcome?


On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <sm...@gmail.com>wrote:

> Hi,
>  I am running with the following command but still, jar is not available
> to mapper and reducers.
>
> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
> -Dmapreduce.user.classpath.first=true
>
>
> Error Log
>
> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
> 0.0.0.0:8032
> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
> 0.0.0.0:8032
> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
> parsing not performed. Implement the Tool interface and execute your
> application with ToolRunner to remedy this.
> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
> process : 1
> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397534064728_0028
> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
> application_1397534064728_0028
> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/
> 14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running
> in uber mode : false
> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
> attempt_1397534064728_0028_m_000005_0, Status : FAILED
> Error: java.lang.RuntimeException: Error in configuring object
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>     at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>     at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:416)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:622)
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>     ... 9 more
> Caused by: java.lang.NoClassDefFoundError:
> org/json/simple/parser/ParseException
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:270)
>     at
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>     at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>     at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>     at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>     ... 14 more
> Caused by: java.lang.ClassNotFoundException:
> org.json.simple.parser.ParseException
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>     ... 22 more
>
> When i analyzed the logs it says
> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
> parsing not performed. Implement the Tool interface and execute your
> application with ToolRunner to remedy this."
>
> But i have implemented the tool class as described below:
>
> package my.search;
>
> import org.apache.hadoop.conf.Configured;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapred.FileInputFormat;
> import org.apache.hadoop.mapred.FileOutputFormat;
> import org.apache.hadoop.mapred.JobClient;
> import org.apache.hadoop.mapred.JobConf;
> import org.apache.hadoop.mapred.TextInputFormat;
> import org.apache.hadoop.mapred.TextOutputFormat;
> import org.apache.hadoop.util.Tool;
> import org.apache.hadoop.util.ToolRunner;
>
> public class Minerva extends Configured implements Tool
> {
>     public int run(String[] args) throws Exception {
>         JobConf conf = new JobConf(Minerva.class);
>         conf.setJobName("minerva sample job");
>
>         conf.setMapOutputKeyClass(Text.class);
>         conf.setMapOutputValueClass(TextArrayWritable.class);
>
>         conf.setOutputKeyClass(Text.class);
>         conf.setOutputValueClass(Text.class);
>
>         conf.setMapperClass(Map.class);
>         // conf.setCombinerClass(Reduce.class);
>         conf.setReducerClass(Reduce.class);
>
>         conf.setInputFormat(TextInputFormat.class);
>         conf.setOutputFormat(TextOutputFormat.class);
>
>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>
>         JobClient.runJob(conf);
>
>         return 0;
>     }
>
>     public static void main(String[] args) throws Exception {
>         int res = ToolRunner.run(new Minerva(), args);
>         System.exit(res);
>     }
> }
>
>
> Please let me know if you see any issues?
>
>
>
> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com> wrote:
>
>> add '-Dmapreduce.user.classpath.first=true' to your command and try again
>>
>>
>>
>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>
>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does not
>>> search the jars located in the the local file system but HDFS. For example,
>>>
>>> hadoop jar target/myJar.jar Foo -libjars
>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>
>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>> processName=JobTracker, sessionId=
>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
>>> area
>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>> cause:java.io.FileNotFoundException: File does not exist:
>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>> java.io.FileNotFoundException: File does not exist:
>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>     at
>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>     at
>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>     at
>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>     at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>     at
>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>     at
>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>     at
>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>     at
>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>     at
>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>
>>> So under Hadoop 2.2.1, do I have to explicitly set some configurations
>>> so when using the "libjars" option it will copy the file to hdfs from local
>>> fs?
>>>
>>> TIA
>>>
>>> Kim
>>>
>>
>>
>>
>> --
>> Regards
>> Shengjun
>>
>
>

Re: using "-libjars" in Hadoop 2.2.1

Posted by Rahul Singh <sm...@gmail.com>.

any help...all are welcome?


On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <sm...@gmail.com>wrote:

> Hi,
>  I am running with the following command but still, jar is not available
> to mapper and reducers.
>
> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
> -Dmapreduce.user.classpath.first=true
>
>
> Error Log
>
> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
> 0.0.0.0:8032
> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
> 0.0.0.0:8032
> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
> parsing not performed. Implement the Tool interface and execute your
> application with ToolRunner to remedy this.
> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
> process : 1
> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397534064728_0028
> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
> application_1397534064728_0028
> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/
> 14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running
> in uber mode : false
> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
> attempt_1397534064728_0028_m_000005_0, Status : FAILED
> Error: java.lang.RuntimeException: Error in configuring object
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>     at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>     at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:416)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:622)
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>     ... 9 more
> Caused by: java.lang.NoClassDefFoundError:
> org/json/simple/parser/ParseException
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:270)
>     at
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>     at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>     at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>     at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>     ... 14 more
> Caused by: java.lang.ClassNotFoundException:
> org.json.simple.parser.ParseException
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>     ... 22 more
>
> When i analyzed the logs it says
> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
> parsing not performed. Implement the Tool interface and execute your
> application with ToolRunner to remedy this."
>
> But i have implemented the tool class as described below:
>
> package my.search;
>
> import org.apache.hadoop.conf.Configured;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapred.FileInputFormat;
> import org.apache.hadoop.mapred.FileOutputFormat;
> import org.apache.hadoop.mapred.JobClient;
> import org.apache.hadoop.mapred.JobConf;
> import org.apache.hadoop.mapred.TextInputFormat;
> import org.apache.hadoop.mapred.TextOutputFormat;
> import org.apache.hadoop.util.Tool;
> import org.apache.hadoop.util.ToolRunner;
>
> public class Minerva extends Configured implements Tool
> {
>     public int run(String[] args) throws Exception {
>         JobConf conf = new JobConf(Minerva.class);
>         conf.setJobName("minerva sample job");
>
>         conf.setMapOutputKeyClass(Text.class);
>         conf.setMapOutputValueClass(TextArrayWritable.class);
>
>         conf.setOutputKeyClass(Text.class);
>         conf.setOutputValueClass(Text.class);
>
>         conf.setMapperClass(Map.class);
>         // conf.setCombinerClass(Reduce.class);
>         conf.setReducerClass(Reduce.class);
>
>         conf.setInputFormat(TextInputFormat.class);
>         conf.setOutputFormat(TextOutputFormat.class);
>
>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>
>         JobClient.runJob(conf);
>
>         return 0;
>     }
>
>     public static void main(String[] args) throws Exception {
>         int res = ToolRunner.run(new Minerva(), args);
>         System.exit(res);
>     }
> }
>
>
> Please let me know if you see any issues?
>
>
>
> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com> wrote:
>
>> add '-Dmapreduce.user.classpath.first=true' to your command and try again
>>
>>
>>
>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>
>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does not
>>> search the jars located in the the local file system but HDFS. For example,
>>>
>>> hadoop jar target/myJar.jar Foo -libjars
>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>
>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>> processName=JobTracker, sessionId=
>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
>>> area
>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>> cause:java.io.FileNotFoundException: File does not exist:
>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>> java.io.FileNotFoundException: File does not exist:
>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>     at
>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>     at
>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>     at
>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>     at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>     at
>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>     at
>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>     at
>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>     at
>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>     at
>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>
>>> So under Hadoop 2.2.1, do I have to explicitly set some configurations
>>> so when using the "libjars" option it will copy the file to hdfs from local
>>> fs?
>>>
>>> TIA
>>>
>>> Kim
>>>
>>
>>
>>
>> --
>> Regards
>> Shengjun
>>
>
>

Re: using "-libjars" in Hadoop 2.2.1

Posted by Rahul Singh <sm...@gmail.com>.

any help...all are welcome?


On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <sm...@gmail.com>wrote:

> Hi,
>  I am running with the following command but still, jar is not available
> to mapper and reducers.
>
> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
> -Dmapreduce.user.classpath.first=true
>
>
> Error Log
>
> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
> 0.0.0.0:8032
> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
> 0.0.0.0:8032
> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
> parsing not performed. Implement the Tool interface and execute your
> application with ToolRunner to remedy this.
> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
> process : 1
> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397534064728_0028
> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
> application_1397534064728_0028
> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/
> 14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running
> in uber mode : false
> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
> attempt_1397534064728_0028_m_000005_0, Status : FAILED
> Error: java.lang.RuntimeException: Error in configuring object
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>     at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>     at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:416)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:622)
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>     ... 9 more
> Caused by: java.lang.NoClassDefFoundError:
> org/json/simple/parser/ParseException
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:270)
>     at
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>     at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>     at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>     at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>     ... 14 more
> Caused by: java.lang.ClassNotFoundException:
> org.json.simple.parser.ParseException
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>     ... 22 more
>
> When i analyzed the logs it says
> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
> parsing not performed. Implement the Tool interface and execute your
> application with ToolRunner to remedy this."
>
> But i have implemented the tool class as described below:
>
> package my.search;
>
> import org.apache.hadoop.conf.Configured;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapred.FileInputFormat;
> import org.apache.hadoop.mapred.FileOutputFormat;
> import org.apache.hadoop.mapred.JobClient;
> import org.apache.hadoop.mapred.JobConf;
> import org.apache.hadoop.mapred.TextInputFormat;
> import org.apache.hadoop.mapred.TextOutputFormat;
> import org.apache.hadoop.util.Tool;
> import org.apache.hadoop.util.ToolRunner;
>
> public class Minerva extends Configured implements Tool
> {
>     public int run(String[] args) throws Exception {
>         JobConf conf = new JobConf(Minerva.class);
>         conf.setJobName("minerva sample job");
>
>         conf.setMapOutputKeyClass(Text.class);
>         conf.setMapOutputValueClass(TextArrayWritable.class);
>
>         conf.setOutputKeyClass(Text.class);
>         conf.setOutputValueClass(Text.class);
>
>         conf.setMapperClass(Map.class);
>         // conf.setCombinerClass(Reduce.class);
>         conf.setReducerClass(Reduce.class);
>
>         conf.setInputFormat(TextInputFormat.class);
>         conf.setOutputFormat(TextOutputFormat.class);
>
>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>
>         JobClient.runJob(conf);
>
>         return 0;
>     }
>
>     public static void main(String[] args) throws Exception {
>         int res = ToolRunner.run(new Minerva(), args);
>         System.exit(res);
>     }
> }
>
>
> Please let me know if you see any issues?
>
>
>
> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com> wrote:
>
>> add '-Dmapreduce.user.classpath.first=true' to your command and try again
>>
>>
>>
>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>
>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does not
>>> search the jars located in the the local file system but HDFS. For example,
>>>
>>> hadoop jar target/myJar.jar Foo -libjars
>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>
>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>> processName=JobTracker, sessionId=
>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
>>> area
>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>> cause:java.io.FileNotFoundException: File does not exist:
>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>> java.io.FileNotFoundException: File does not exist:
>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>     at
>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>     at
>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>     at
>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>     at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>     at
>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>     at
>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>     at
>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>     at
>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>     at
>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>
>>> So under Hadoop 2.2.1, do I have to explicitly set some configurations
>>> so when using the "libjars" option it will copy the file to hdfs from local
>>> fs?
>>>
>>> TIA
>>>
>>> Kim
>>>
>>
>>
>>
>> --
>> Regards
>> Shengjun
>>
>
>

Re: using "-libjars" in Hadoop 2.2.1

Posted by Rahul Singh <sm...@gmail.com>.

any help...all are welcome?


On Wed, Apr 16, 2014 at 1:13 PM, Rahul Singh <sm...@gmail.com>wrote:

> Hi,
>  I am running with the following command but still, jar is not available
> to mapper and reducers.
>
> hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
> /user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
> -libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
> -Dmapreduce.user.classpath.first=true
>
>
> Error Log
>
> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
> 0.0.0.0:8032
> 14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
> 0.0.0.0:8032
> 14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
> parsing not performed. Implement the Tool interface and execute your
> application with ToolRunner to remedy this.
> 14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to
> process : 1
> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
> 14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1397534064728_0028
> 14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
> application_1397534064728_0028
> 14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
> http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/
> 14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
> 14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running
> in uber mode : false
> 14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
> 14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
> attempt_1397534064728_0028_m_000005_0, Status : FAILED
> Error: java.lang.RuntimeException: Error in configuring object
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>     at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>     at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:416)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:622)
>     at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>     ... 9 more
> Caused by: java.lang.NoClassDefFoundError:
> org/json/simple/parser/ParseException
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:270)
>     at
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
>     at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
>     at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
>     at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
>     at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>     ... 14 more
> Caused by: java.lang.ClassNotFoundException:
> org.json.simple.parser.ParseException
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
>     ... 22 more
>
> When i analyzed the logs it says
> "14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
> parsing not performed. Implement the Tool interface and execute your
> application with ToolRunner to remedy this."
>
> But i have implemented the tool class as described below:
>
> package my.search;
>
> import org.apache.hadoop.conf.Configured;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapred.FileInputFormat;
> import org.apache.hadoop.mapred.FileOutputFormat;
> import org.apache.hadoop.mapred.JobClient;
> import org.apache.hadoop.mapred.JobConf;
> import org.apache.hadoop.mapred.TextInputFormat;
> import org.apache.hadoop.mapred.TextOutputFormat;
> import org.apache.hadoop.util.Tool;
> import org.apache.hadoop.util.ToolRunner;
>
> public class Minerva extends Configured implements Tool
> {
>     public int run(String[] args) throws Exception {
>         JobConf conf = new JobConf(Minerva.class);
>         conf.setJobName("minerva sample job");
>
>         conf.setMapOutputKeyClass(Text.class);
>         conf.setMapOutputValueClass(TextArrayWritable.class);
>
>         conf.setOutputKeyClass(Text.class);
>         conf.setOutputValueClass(Text.class);
>
>         conf.setMapperClass(Map.class);
>         // conf.setCombinerClass(Reduce.class);
>         conf.setReducerClass(Reduce.class);
>
>         conf.setInputFormat(TextInputFormat.class);
>         conf.setOutputFormat(TextOutputFormat.class);
>
>         FileInputFormat.setInputPaths(conf, new Path(args[0]));
>         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>
>         JobClient.runJob(conf);
>
>         return 0;
>     }
>
>     public static void main(String[] args) throws Exception {
>         int res = ToolRunner.run(new Minerva(), args);
>         System.exit(res);
>     }
> }
>
>
> Please let me know if you see any issues?
>
>
>
> On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com> wrote:
>
>> add '-Dmapreduce.user.classpath.first=true' to your command and try again
>>
>>
>>
>> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>>
>>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does not
>>> search the jars located in the the local file system but HDFS. For example,
>>>
>>> hadoop jar target/myJar.jar Foo -libjars
>>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>>
>>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>> processName=JobTracker, sessionId=
>>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
>>> area
>>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>>> PriviledgedActionException as:kchew (auth:SIMPLE)
>>> cause:java.io.FileNotFoundException: File does not exist:
>>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>>> java.io.FileNotFoundException: File does not exist:
>>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>>     at
>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>>     at
>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>>     at
>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>     at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>>     at
>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>     at
>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>     at
>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>>     at
>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>     at
>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>>
>>> So under Hadoop 2.2.1, do I have to explicitly set some configurations
>>> so when using the "libjars" option it will copy the file to hdfs from local
>>> fs?
>>>
>>> TIA
>>>
>>> Kim
>>>
>>
>>
>>
>> --
>> Regards
>> Shengjun
>>
>
>

Re: using "-libjars" in Hadoop 2.2.1

Posted by Rahul Singh <sm...@gmail.com>.

Hi,
 I am running with the following command but still, jar is not available to
mapper and reducers.

hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
/user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
-libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
-Dmapreduce.user.classpath.first=true


Error Log

14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
0.0.0.0:8032
14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
0.0.0.0:8032
14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.
14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to process
: 1
14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1397534064728_0028
14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
application_1397534064728_0028
14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/
14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running in
uber mode : false
14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
attempt_1397534064728_0028_m_000005_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
    ... 9 more
Caused by: java.lang.NoClassDefFoundError:
org/json/simple/parser/ParseException
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
    at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
    at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
    at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
    at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
    ... 14 more
Caused by: java.lang.ClassNotFoundException:
org.json.simple.parser.ParseException
    at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
    ... 22 more

When i analyzed the logs it says
"14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this."

But i have implemented the tool class as described below:

package my.search;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class Minerva extends Configured implements Tool
{
    public int run(String[] args) throws Exception {
        JobConf conf = new JobConf(Minerva.class);
        conf.setJobName("minerva sample job");

        conf.setMapOutputKeyClass(Text.class);
        conf.setMapOutputValueClass(TextArrayWritable.class);

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(Text.class);

        conf.setMapperClass(Map.class);
        // conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);

        return 0;
    }

    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Minerva(), args);
        System.exit(res);
    }
}


Please let me know if you see any issues?



On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com> wrote:

> add '-Dmapreduce.user.classpath.first=true' to your command and try again
>
>
>
> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>
>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does not
>> search the jars located in the the local file system but HDFS. For example,
>>
>> hadoop jar target/myJar.jar Foo -libjars
>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>
>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> processName=JobTracker, sessionId=
>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
>> area
>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>> PriviledgedActionException as:kchew (auth:SIMPLE)
>> cause:java.io.FileNotFoundException: File does not exist:
>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>> java.io.FileNotFoundException: File does not exist:
>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>     at
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>     at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>     at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>     at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>     at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>     at
>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>
>> So under Hadoop 2.2.1, do I have to explicitly set some configurations so
>> when using the "libjars" option it will copy the file to hdfs from local fs?
>>
>> TIA
>>
>> Kim
>>
>
>
>
> --
> Regards
> Shengjun
>

Re: using "-libjars" in Hadoop 2.2.1

Posted by Rahul Singh <sm...@gmail.com>.

Hi,
 I am running with the following command but still, jar is not available to
mapper and reducers.

hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
/user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
-libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
-Dmapreduce.user.classpath.first=true


Error Log

14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
0.0.0.0:8032
14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
0.0.0.0:8032
14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.
14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to process
: 1
14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1397534064728_0028
14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
application_1397534064728_0028
14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/
14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running in
uber mode : false
14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
attempt_1397534064728_0028_m_000005_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
    ... 9 more
Caused by: java.lang.NoClassDefFoundError:
org/json/simple/parser/ParseException
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
    at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
    at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
    at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
    at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
    ... 14 more
Caused by: java.lang.ClassNotFoundException:
org.json.simple.parser.ParseException
    at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
    ... 22 more

When i analyzed the logs it says
"14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this."

But i have implemented the tool class as described below:

package my.search;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class Minerva extends Configured implements Tool
{
    public int run(String[] args) throws Exception {
        JobConf conf = new JobConf(Minerva.class);
        conf.setJobName("minerva sample job");

        conf.setMapOutputKeyClass(Text.class);
        conf.setMapOutputValueClass(TextArrayWritable.class);

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(Text.class);

        conf.setMapperClass(Map.class);
        // conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);

        return 0;
    }

    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Minerva(), args);
        System.exit(res);
    }
}


Please let me know if you see any issues?



On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com> wrote:

> add '-Dmapreduce.user.classpath.first=true' to your command and try again
>
>
>
> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>
>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does not
>> search the jars located in the the local file system but HDFS. For example,
>>
>> hadoop jar target/myJar.jar Foo -libjars
>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>
>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> processName=JobTracker, sessionId=
>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
>> area
>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>> PriviledgedActionException as:kchew (auth:SIMPLE)
>> cause:java.io.FileNotFoundException: File does not exist:
>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>> java.io.FileNotFoundException: File does not exist:
>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>     at
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>     at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>     at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>     at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>     at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>     at
>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>
>> So under Hadoop 2.2.1, do I have to explicitly set some configurations so
>> when using the "libjars" option it will copy the file to hdfs from local fs?
>>
>> TIA
>>
>> Kim
>>
>
>
>
> --
> Regards
> Shengjun
>

Re: using "-libjars" in Hadoop 2.2.1

Posted by Rahul Singh <sm...@gmail.com>.

Hi,
 I am running with the following command but still, jar is not available to
mapper and reducers.

hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
/user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
-libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
-Dmapreduce.user.classpath.first=true


Error Log

14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
0.0.0.0:8032
14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
0.0.0.0:8032
14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.
14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to process
: 1
14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1397534064728_0028
14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
application_1397534064728_0028
14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/
14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running in
uber mode : false
14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
attempt_1397534064728_0028_m_000005_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
    ... 9 more
Caused by: java.lang.NoClassDefFoundError:
org/json/simple/parser/ParseException
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
    at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
    at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
    at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
    at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
    ... 14 more
Caused by: java.lang.ClassNotFoundException:
org.json.simple.parser.ParseException
    at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
    ... 22 more

When i analyzed the logs it says
"14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this."

But i have implemented the tool class as described below:

package my.search;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class Minerva extends Configured implements Tool
{
    public int run(String[] args) throws Exception {
        JobConf conf = new JobConf(Minerva.class);
        conf.setJobName("minerva sample job");

        conf.setMapOutputKeyClass(Text.class);
        conf.setMapOutputValueClass(TextArrayWritable.class);

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(Text.class);

        conf.setMapperClass(Map.class);
        // conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);

        return 0;
    }

    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Minerva(), args);
        System.exit(res);
    }
}


Please let me know if you see any issues?



On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com> wrote:

> add '-Dmapreduce.user.classpath.first=true' to your command and try again
>
>
>
> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>
>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does not
>> search the jars located in the the local file system but HDFS. For example,
>>
>> hadoop jar target/myJar.jar Foo -libjars
>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>
>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> processName=JobTracker, sessionId=
>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
>> area
>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>> PriviledgedActionException as:kchew (auth:SIMPLE)
>> cause:java.io.FileNotFoundException: File does not exist:
>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>> java.io.FileNotFoundException: File does not exist:
>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>     at
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>     at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>     at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>     at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>     at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>     at
>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>
>> So under Hadoop 2.2.1, do I have to explicitly set some configurations so
>> when using the "libjars" option it will copy the file to hdfs from local fs?
>>
>> TIA
>>
>> Kim
>>
>
>
>
> --
> Regards
> Shengjun
>

Re: using "-libjars" in Hadoop 2.2.1

Posted by Rahul Singh <sm...@gmail.com>.

Hi,
 I am running with the following command but still, jar is not available to
mapper and reducers.

hadoop jar /home/hduser/workspace/Minerva.jar my.search.Minerva
/user/hduser/input_minerva_actual /user/hduser/output_merva_actual3
-libjars /home/hduser/Documents/Lib/json-simple-1.1.1.jar
-Dmapreduce.user.classpath.first=true


Error Log

14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
0.0.0.0:8032
14/04/16 13:08:37 INFO client.RMProxy: Connecting to ResourceManager at /
0.0.0.0:8032
14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.
14/04/16 13:08:37 INFO mapred.FileInputFormat: Total input paths to process
: 1
14/04/16 13:08:37 INFO mapreduce.JobSubmitter: number of splits:10
14/04/16 13:08:37 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1397534064728_0028
14/04/16 13:08:38 INFO impl.YarnClientImpl: Submitted application
application_1397534064728_0028
14/04/16 13:08:38 INFO mapreduce.Job: The url to track the job:
http://L-Rahul-Tech:8088/proxy/application_1397534064728_0028/
14/04/16 13:08:38 INFO mapreduce.Job: Running job: job_1397534064728_0028
14/04/16 13:08:47 INFO mapreduce.Job: Job job_1397534064728_0028 running in
uber mode : false
14/04/16 13:08:47 INFO mapreduce.Job:  map 0% reduce 0%
14/04/16 13:08:58 INFO mapreduce.Job: Task Id :
attempt_1397534064728_0028_m_000005_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
    ... 9 more
Caused by: java.lang.NoClassDefFoundError:
org/json/simple/parser/ParseException
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1821)
    at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1786)
    at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1880)
    at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1906)
    at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:1107)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
    ... 14 more
Caused by: java.lang.ClassNotFoundException:
org.json.simple.parser.ParseException
    at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
    ... 22 more

When i analyzed the logs it says
"14/04/16 13:08:37 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this."

But i have implemented the tool class as described below:

package my.search;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class Minerva extends Configured implements Tool
{
    public int run(String[] args) throws Exception {
        JobConf conf = new JobConf(Minerva.class);
        conf.setJobName("minerva sample job");

        conf.setMapOutputKeyClass(Text.class);
        conf.setMapOutputValueClass(TextArrayWritable.class);

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(Text.class);

        conf.setMapperClass(Map.class);
        // conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);

        return 0;
    }

    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Minerva(), args);
        System.exit(res);
    }
}


Please let me know if you see any issues?



On Thu, Apr 10, 2014 at 9:29 AM, Shengjun Xin <sx...@gopivotal.com> wrote:

> add '-Dmapreduce.user.classpath.first=true' to your command and try again
>
>
>
> On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:
>
>> It seems to me that in Hadoop 2.2.1, using the "libjars" option does not
>> search the jars located in the the local file system but HDFS. For example,
>>
>> hadoop jar target/myJar.jar Foo -libjars
>> /home/kchew/test-libs/testJar.jar /user/kchew/inputs/raw.vector
>> /user/kchew/outputs hdfs://remoteNN:8020 remoteJT:8021
>>
>> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> processName=JobTracker, sessionId=
>> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
>> area
>> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
>> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
>> PriviledgedActionException as:kchew (auth:SIMPLE)
>> cause:java.io.FileNotFoundException: File does not exist:
>> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
>> java.io.FileNotFoundException: File does not exist:
>> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>>     at
>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>>     at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>     at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>     at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>>     at
>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>     at
>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>>
>> So under Hadoop 2.2.1, do I have to explicitly set some configurations so
>> when using the "libjars" option it will copy the file to hdfs from local fs?
>>
>> TIA
>>
>> Kim
>>
>
>
>
> --
> Regards
> Shengjun
>

Re: using "-libjars" in Hadoop 2.2.1

Posted by Shengjun Xin <sx...@gopivotal.com>.

add '-Dmapreduce.user.classpath.first=true' to your command and try again



On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:

> It seems to me that in Hadoop 2.2.1, using the "libjars" option does not
> search the jars located in the the local file system but HDFS. For example,
>
> hadoop jar target/myJar.jar Foo -libjars /home/kchew/test-libs/testJar.jar
> /user/kchew/inputs/raw.vector /user/kchew/outputs hdfs://remoteNN:8020
> remoteJT:8021
>
> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
> area
> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
> PriviledgedActionException as:kchew (auth:SIMPLE)
> cause:java.io.FileNotFoundException: File does not exist:
> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
> java.io.FileNotFoundException: File does not exist:
> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>     at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>     at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>     at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>     at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>     at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>     at
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>
> So under Hadoop 2.2.1, do I have to explicitly set some configurations so
> when using the "libjars" option it will copy the file to hdfs from local fs?
>
> TIA
>
> Kim
>



-- 
Regards
Shengjun

Re: using "-libjars" in Hadoop 2.2.1

Posted by Shengjun Xin <sx...@gopivotal.com>.

add '-Dmapreduce.user.classpath.first=true' to your command and try again



On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:

> It seems to me that in Hadoop 2.2.1, using the "libjars" option does not
> search the jars located in the the local file system but HDFS. For example,
>
> hadoop jar target/myJar.jar Foo -libjars /home/kchew/test-libs/testJar.jar
> /user/kchew/inputs/raw.vector /user/kchew/outputs hdfs://remoteNN:8020
> remoteJT:8021
>
> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
> area
> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
> PriviledgedActionException as:kchew (auth:SIMPLE)
> cause:java.io.FileNotFoundException: File does not exist:
> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
> java.io.FileNotFoundException: File does not exist:
> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>     at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>     at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>     at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>     at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>     at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>     at
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>
> So under Hadoop 2.2.1, do I have to explicitly set some configurations so
> when using the "libjars" option it will copy the file to hdfs from local fs?
>
> TIA
>
> Kim
>



-- 
Regards
Shengjun

Re: using "-libjars" in Hadoop 2.2.1

Posted by Shengjun Xin <sx...@gopivotal.com>.

add '-Dmapreduce.user.classpath.first=true' to your command and try again



On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:

> It seems to me that in Hadoop 2.2.1, using the "libjars" option does not
> search the jars located in the the local file system but HDFS. For example,
>
> hadoop jar target/myJar.jar Foo -libjars /home/kchew/test-libs/testJar.jar
> /user/kchew/inputs/raw.vector /user/kchew/outputs hdfs://remoteNN:8020
> remoteJT:8021
>
> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
> area
> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
> PriviledgedActionException as:kchew (auth:SIMPLE)
> cause:java.io.FileNotFoundException: File does not exist:
> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
> java.io.FileNotFoundException: File does not exist:
> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>     at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>     at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>     at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>     at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>     at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>     at
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>
> So under Hadoop 2.2.1, do I have to explicitly set some configurations so
> when using the "libjars" option it will copy the file to hdfs from local fs?
>
> TIA
>
> Kim
>



-- 
Regards
Shengjun

Re: using "-libjars" in Hadoop 2.2.1

Posted by Shengjun Xin <sx...@gopivotal.com>.

add '-Dmapreduce.user.classpath.first=true' to your command and try again



On Wed, Apr 9, 2014 at 6:27 AM, Kim Chew <kc...@gmail.com> wrote:

> It seems to me that in Hadoop 2.2.1, using the "libjars" option does not
> search the jars located in the the local file system but HDFS. For example,
>
> hadoop jar target/myJar.jar Foo -libjars /home/kchew/test-libs/testJar.jar
> /user/kchew/inputs/raw.vector /user/kchew/outputs hdfs://remoteNN:8020
> remoteJT:8021
>
> 14/04/08 15:11:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 14/04/08 15:11:02 INFO mapreduce.JobSubmitter: Cleaning up the staging
> area
> file:/tmp/hadoop-kchew/mapred/staging/kchew202924688/.staging/job_local202924688_0001
> 14/04/08 15:11:02 ERROR security.UserGroupInformation:
> PriviledgedActionException as:kchew (auth:SIMPLE)
> cause:java.io.FileNotFoundException: File does not exist:
> hdfs://remoteNN:8020/home/kchew/test-libs/testJar.jar
> java.io.FileNotFoundException: File does not exist:
> hdfs:/remoteNN:8020/home/kchew/test-libs/testJar.jar
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>     at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>     at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>     at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>     at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
>     at
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>     at
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
>
> So under Hadoop 2.2.1, do I have to explicitly set some configurations so
> when using the "libjars" option it will copy the file to hdfs from local fs?
>
> TIA
>
> Kim
>



-- 
Regards
Shengjun