You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jonathan Hodges <ho...@gmail.com> on 2013/12/30 20:13:11 UTC

WebHCat MapReduce Job Syntax

Hi,

I am trying to kick off a mapreduce job via WebHCat.  The following is the
hadoop jar command.

hadoop jar
/home/hadoop/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar
com.linkedin.camus.etl.kafka.CamusJob -P
/home/hadoop/camus_non_avro.properties

As you can see there is an application specific parameter '-P' which
designates the properties file location.  How do I pass this to WebHCat?

Referring to the docs (
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+MapReduceJar)
I came up with the following.

curl -s -d user.name=hadoop \
       -d
jar=/tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar \
       -d class=com.linkedin.camus.etl.kafka.CamusJob \
       -d arg=/tmp/camus_non_avro.properties \
       '
http://internal-daalt-hcatalog-1507773817.us-east-1.elb.amazonaws.com/templeton/v1/mapreduce/jar
'

This command gets the following response from WebHCat
{"id":"job_201312212124_0161"}

However I only see TempletonControllerJob in the jobtracker UI.  I don't
see the Camus jobs that will show up if executed at the command-line.

The following are the only things showing in webhcat.log


The jar and properties files are in the /tmp directory on HDFS.

hadoop fs -ls /tmp
-rw-r--r--   2 hadoop supergroup   41456481 2013-12-27 17:45
/tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar
-rw-r--r--   2 hadoop supergroup       2605 2013-12-27 17:45
/tmp/camus_non_avro.properties

Re: WebHCat MapReduce Job Syntax

Posted by Jonathan Hodges <ho...@gmail.com>.
You're the man!  When I included the 'statusdir' param I get the following
output in stderr.

Exception in thread "main" java.io.FileNotFoundException:
/tmp/camus_non_avro.properties (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:146)
        at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:592)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at com.linkedin.camus.etl.kafka.CamusJob.main(CamusJob.java:562)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:187)

It seems Camus is looking at the local filesystem instead of HDFS for the
properties file.  Thanks so much for the help!





On Mon, Dec 30, 2013 at 3:53 PM, Eugene Koifman <ek...@hortonworks.com>wrote:

> It looks like in 0.11 it writes to stderr (limited logging anyway).
>
> Perhaps you can try adding '*statusdir*' param to your REST call and see
> if anything useful is written to that directory.
>
>
> On Mon, Dec 30, 2013 at 2:22 PM, Jonathan Hodges <ho...@gmail.com>wrote:
>
>> I don't see 'TrivialExecService' output in the jobtracker or tasktracker
>> logs.  We are using hive 0.11 though so maybe not set to DEBUG?
>>
>>
>> On Mon, Dec 30, 2013 at 2:11 PM, Eugene Koifman <ekoifman@hortonworks.com
>> > wrote:
>>
>>> Is there any output from TrivialExecService class in any hadoop logs?
>>>  (it's DEBUG level log4j output in hive 0.12).
>>> It should print the command that TempletonControllerJob's launcher task
>>> (LaunchMapper) is trying to launch
>>>
>>>
>>> On Mon, Dec 30, 2013 at 12:55 PM, Jonathan Hodges <ho...@gmail.com>wrote:
>>>
>>>> I didn't try that before, but I just did.
>>>>
>>>> curl -s -d user.name=hadoop \
>>>>
>>>> >        -d
>>>> jar=/tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar \
>>>>
>>>> >        -d class=com.linkedin.camus.etl.kafka.CamusJob \
>>>>
>>>> >        -d arg=-P \
>>>>
>>>> >        -d arg=/tmp/camus_non_avro.properties \
>>>>
>>>> >        '
>>>> http://internal-daalt-hcatalog-1507773817.us-east-1.elb.amazonaws.com/templeton/v1/mapreduce/jar
>>>> '
>>>>
>>>> {"id":"job_201312212124_0166"}
>>>>
>>>> DEBUG | 30 Dec 2013 20:33:43,157 | org.apache.hcatalog.templeton.Server
>>>> | queued job job_201312212124_0166 in 300 ms
>>>>
>>>> I still the same behavior with just the TempletonControllerJob getting
>>>> kicked off and ending successfully without the Camus job starting.  I
>>>> didn't see any errors in the jobtracker or tasktracker logs.  It just seems
>>>> to silently fail and I can't figure out why.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Dec 30, 2013 at 12:35 PM, Eugene Koifman <
>>>> ekoifman@hortonworks.com> wrote:
>>>>
>>>>> have you tried adding
>>>>> -d arg=-P
>>>>> before
>>>>> -d arg=/tmp/....properites
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Dec 30, 2013 at 11:14 AM, Jonathan Hodges <ho...@gmail.com>wrote:
>>>>>
>>>>>> Sorry accidentally hit send before adding the lines from webhcat.log
>>>>>>
>>>>>> DEBUG | 30 Dec 2013 19:08:01,042 |
>>>>>> org.apache.hcatalog.templeton.Server | queued job job_201312212124_0161 in
>>>>>> 267 ms
>>>>>>
>>>>>> DEBUG | 30 Dec 2013 19:08:38,880 |
>>>>>> org.apache.hcatalog.templeton.tool.HDFSStorage | Couldn't find
>>>>>> /templeton-hadoop/jobs/job_201312212124_0161/notified: File does not exist:
>>>>>> /templeton-hadoop/jobs/job_201312212124_0161/notified
>>>>>>
>>>>>> DEBUG | 30 Dec 2013 19:08:38,881 |
>>>>>> org.apache.hcatalog.templeton.tool.HDFSStorage | Couldn't find
>>>>>> /templeton-hadoop/jobs/job_201312212124_0161/callback: File does not exist:
>>>>>> /templeton-hadoop/jobs/job_201312212124_0161/callback
>>>>>>
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>>
>>>>>> On Mon, Dec 30, 2013 at 12:13 PM, Jonathan Hodges <ho...@gmail.com>wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am trying to kick off a mapreduce job via WebHCat.  The following
>>>>>>> is the hadoop jar command.
>>>>>>>
>>>>>>> hadoop jar
>>>>>>> /home/hadoop/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar
>>>>>>> com.linkedin.camus.etl.kafka.CamusJob -P
>>>>>>> /home/hadoop/camus_non_avro.properties
>>>>>>>
>>>>>>> As you can see there is an application specific parameter '-P' which
>>>>>>> designates the properties file location.  How do I pass this to WebHCat?
>>>>>>>
>>>>>>> Referring to the docs (
>>>>>>> https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+MapReduceJar)
>>>>>>> I came up with the following.
>>>>>>>
>>>>>>> curl -s -d user.name=hadoop \
>>>>>>>        -d
>>>>>>> jar=/tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar \
>>>>>>>        -d class=com.linkedin.camus.etl.kafka.CamusJob \
>>>>>>>        -d arg=/tmp/camus_non_avro.properties \
>>>>>>>        '
>>>>>>> http://internal-daalt-hcatalog-1507773817.us-east-1.elb.amazonaws.com/templeton/v1/mapreduce/jar
>>>>>>> '
>>>>>>>
>>>>>>> This command gets the following response from WebHCat
>>>>>>> {"id":"job_201312212124_0161"}
>>>>>>>
>>>>>>> However I only see TempletonControllerJob in the jobtracker UI.  I
>>>>>>> don't see the Camus jobs that will show up if executed at the command-line.
>>>>>>>
>>>>>>> The following are the only things showing in webhcat.log
>>>>>>>
>>>>>>>
>>>>>>> The jar and properties files are in the /tmp directory on HDFS.
>>>>>>>
>>>>>>> hadoop fs -ls /tmp
>>>>>>> -rw-r--r--   2 hadoop supergroup   41456481 2013-12-27 17:45
>>>>>>> /tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar
>>>>>>> -rw-r--r--   2 hadoop supergroup       2605 2013-12-27 17:45
>>>>>>> /tmp/camus_non_avro.properties
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>>>>> entity to which it is addressed and may contain information that is
>>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>>> If the reader of this message is not the intended recipient, you are hereby
>>>>> notified that any printing, copying, dissemination, distribution,
>>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>>> you have received this communication in error, please contact the sender
>>>>> immediately and delete it from your system. Thank You.
>>>>
>>>>
>>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>>
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: WebHCat MapReduce Job Syntax

Posted by Eugene Koifman <ek...@hortonworks.com>.
It looks like in 0.11 it writes to stderr (limited logging anyway).

Perhaps you can try adding '*statusdir*' param to your REST call and see if
anything useful is written to that directory.


On Mon, Dec 30, 2013 at 2:22 PM, Jonathan Hodges <ho...@gmail.com> wrote:

> I don't see 'TrivialExecService' output in the jobtracker or tasktracker
> logs.  We are using hive 0.11 though so maybe not set to DEBUG?
>
>
> On Mon, Dec 30, 2013 at 2:11 PM, Eugene Koifman <ek...@hortonworks.com>wrote:
>
>> Is there any output from TrivialExecService class in any hadoop logs?
>>  (it's DEBUG level log4j output in hive 0.12).
>> It should print the command that TempletonControllerJob's launcher task
>> (LaunchMapper) is trying to launch
>>
>>
>> On Mon, Dec 30, 2013 at 12:55 PM, Jonathan Hodges <ho...@gmail.com>wrote:
>>
>>> I didn't try that before, but I just did.
>>>
>>> curl -s -d user.name=hadoop \
>>>
>>> >        -d
>>> jar=/tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar \
>>>
>>> >        -d class=com.linkedin.camus.etl.kafka.CamusJob \
>>>
>>> >        -d arg=-P \
>>>
>>> >        -d arg=/tmp/camus_non_avro.properties \
>>>
>>> >        '
>>> http://internal-daalt-hcatalog-1507773817.us-east-1.elb.amazonaws.com/templeton/v1/mapreduce/jar
>>> '
>>>
>>> {"id":"job_201312212124_0166"}
>>>
>>> DEBUG | 30 Dec 2013 20:33:43,157 | org.apache.hcatalog.templeton.Server
>>> | queued job job_201312212124_0166 in 300 ms
>>>
>>> I still the same behavior with just the TempletonControllerJob getting
>>> kicked off and ending successfully without the Camus job starting.  I
>>> didn't see any errors in the jobtracker or tasktracker logs.  It just seems
>>> to silently fail and I can't figure out why.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Dec 30, 2013 at 12:35 PM, Eugene Koifman <
>>> ekoifman@hortonworks.com> wrote:
>>>
>>>> have you tried adding
>>>> -d arg=-P
>>>> before
>>>> -d arg=/tmp/....properites
>>>>
>>>>
>>>>
>>>> On Mon, Dec 30, 2013 at 11:14 AM, Jonathan Hodges <ho...@gmail.com>wrote:
>>>>
>>>>> Sorry accidentally hit send before adding the lines from webhcat.log
>>>>>
>>>>> DEBUG | 30 Dec 2013 19:08:01,042 |
>>>>> org.apache.hcatalog.templeton.Server | queued job job_201312212124_0161 in
>>>>> 267 ms
>>>>>
>>>>> DEBUG | 30 Dec 2013 19:08:38,880 |
>>>>> org.apache.hcatalog.templeton.tool.HDFSStorage | Couldn't find
>>>>> /templeton-hadoop/jobs/job_201312212124_0161/notified: File does not exist:
>>>>> /templeton-hadoop/jobs/job_201312212124_0161/notified
>>>>>
>>>>> DEBUG | 30 Dec 2013 19:08:38,881 |
>>>>> org.apache.hcatalog.templeton.tool.HDFSStorage | Couldn't find
>>>>> /templeton-hadoop/jobs/job_201312212124_0161/callback: File does not exist:
>>>>> /templeton-hadoop/jobs/job_201312212124_0161/callback
>>>>>
>>>>>
>>>>> Any ideas?
>>>>>
>>>>>
>>>>> On Mon, Dec 30, 2013 at 12:13 PM, Jonathan Hodges <ho...@gmail.com>wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to kick off a mapreduce job via WebHCat.  The following
>>>>>> is the hadoop jar command.
>>>>>>
>>>>>> hadoop jar
>>>>>> /home/hadoop/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar
>>>>>> com.linkedin.camus.etl.kafka.CamusJob -P
>>>>>> /home/hadoop/camus_non_avro.properties
>>>>>>
>>>>>> As you can see there is an application specific parameter '-P' which
>>>>>> designates the properties file location.  How do I pass this to WebHCat?
>>>>>>
>>>>>> Referring to the docs (
>>>>>> https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+MapReduceJar)
>>>>>> I came up with the following.
>>>>>>
>>>>>> curl -s -d user.name=hadoop \
>>>>>>        -d
>>>>>> jar=/tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar \
>>>>>>        -d class=com.linkedin.camus.etl.kafka.CamusJob \
>>>>>>        -d arg=/tmp/camus_non_avro.properties \
>>>>>>        '
>>>>>> http://internal-daalt-hcatalog-1507773817.us-east-1.elb.amazonaws.com/templeton/v1/mapreduce/jar
>>>>>> '
>>>>>>
>>>>>> This command gets the following response from WebHCat
>>>>>> {"id":"job_201312212124_0161"}
>>>>>>
>>>>>> However I only see TempletonControllerJob in the jobtracker UI.  I
>>>>>> don't see the Camus jobs that will show up if executed at the command-line.
>>>>>>
>>>>>> The following are the only things showing in webhcat.log
>>>>>>
>>>>>>
>>>>>> The jar and properties files are in the /tmp directory on HDFS.
>>>>>>
>>>>>> hadoop fs -ls /tmp
>>>>>> -rw-r--r--   2 hadoop supergroup   41456481 2013-12-27 17:45
>>>>>> /tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar
>>>>>> -rw-r--r--   2 hadoop supergroup       2605 2013-12-27 17:45
>>>>>> /tmp/camus_non_avro.properties
>>>>>>
>>>>>
>>>>>
>>>>
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.
>>>> If the reader of this message is not the intended recipient, you are hereby
>>>> notified that any printing, copying, dissemination, distribution,
>>>> disclosure or forwarding of this communication is strictly prohibited. If
>>>> you have received this communication in error, please contact the sender
>>>> immediately and delete it from your system. Thank You.
>>>
>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: WebHCat MapReduce Job Syntax

Posted by Jonathan Hodges <ho...@gmail.com>.
I don't see 'TrivialExecService' output in the jobtracker or tasktracker
logs.  We are using hive 0.11 though so maybe not set to DEBUG?


On Mon, Dec 30, 2013 at 2:11 PM, Eugene Koifman <ek...@hortonworks.com>wrote:

> Is there any output from TrivialExecService class in any hadoop logs?
>  (it's DEBUG level log4j output in hive 0.12).
> It should print the command that TempletonControllerJob's launcher task
> (LaunchMapper) is trying to launch
>
>
> On Mon, Dec 30, 2013 at 12:55 PM, Jonathan Hodges <ho...@gmail.com>wrote:
>
>> I didn't try that before, but I just did.
>>
>> curl -s -d user.name=hadoop \
>>
>> >        -d
>> jar=/tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar \
>>
>> >        -d class=com.linkedin.camus.etl.kafka.CamusJob \
>>
>> >        -d arg=-P \
>>
>> >        -d arg=/tmp/camus_non_avro.properties \
>>
>> >        '
>> http://internal-daalt-hcatalog-1507773817.us-east-1.elb.amazonaws.com/templeton/v1/mapreduce/jar
>> '
>>
>> {"id":"job_201312212124_0166"}
>>
>> DEBUG | 30 Dec 2013 20:33:43,157 | org.apache.hcatalog.templeton.Server |
>> queued job job_201312212124_0166 in 300 ms
>>
>> I still the same behavior with just the TempletonControllerJob getting
>> kicked off and ending successfully without the Camus job starting.  I
>> didn't see any errors in the jobtracker or tasktracker logs.  It just seems
>> to silently fail and I can't figure out why.
>>
>>
>>
>>
>>
>>
>> On Mon, Dec 30, 2013 at 12:35 PM, Eugene Koifman <
>> ekoifman@hortonworks.com> wrote:
>>
>>> have you tried adding
>>> -d arg=-P
>>> before
>>> -d arg=/tmp/....properites
>>>
>>>
>>>
>>> On Mon, Dec 30, 2013 at 11:14 AM, Jonathan Hodges <ho...@gmail.com>wrote:
>>>
>>>> Sorry accidentally hit send before adding the lines from webhcat.log
>>>>
>>>> DEBUG | 30 Dec 2013 19:08:01,042 | org.apache.hcatalog.templeton.Server
>>>> | queued job job_201312212124_0161 in 267 ms
>>>>
>>>> DEBUG | 30 Dec 2013 19:08:38,880 |
>>>> org.apache.hcatalog.templeton.tool.HDFSStorage | Couldn't find
>>>> /templeton-hadoop/jobs/job_201312212124_0161/notified: File does not exist:
>>>> /templeton-hadoop/jobs/job_201312212124_0161/notified
>>>>
>>>> DEBUG | 30 Dec 2013 19:08:38,881 |
>>>> org.apache.hcatalog.templeton.tool.HDFSStorage | Couldn't find
>>>> /templeton-hadoop/jobs/job_201312212124_0161/callback: File does not exist:
>>>> /templeton-hadoop/jobs/job_201312212124_0161/callback
>>>>
>>>>
>>>> Any ideas?
>>>>
>>>>
>>>> On Mon, Dec 30, 2013 at 12:13 PM, Jonathan Hodges <ho...@gmail.com>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to kick off a mapreduce job via WebHCat.  The following is
>>>>> the hadoop jar command.
>>>>>
>>>>> hadoop jar
>>>>> /home/hadoop/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar
>>>>> com.linkedin.camus.etl.kafka.CamusJob -P
>>>>> /home/hadoop/camus_non_avro.properties
>>>>>
>>>>> As you can see there is an application specific parameter '-P' which
>>>>> designates the properties file location.  How do I pass this to WebHCat?
>>>>>
>>>>> Referring to the docs (
>>>>> https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+MapReduceJar)
>>>>> I came up with the following.
>>>>>
>>>>> curl -s -d user.name=hadoop \
>>>>>        -d
>>>>> jar=/tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar \
>>>>>        -d class=com.linkedin.camus.etl.kafka.CamusJob \
>>>>>        -d arg=/tmp/camus_non_avro.properties \
>>>>>        '
>>>>> http://internal-daalt-hcatalog-1507773817.us-east-1.elb.amazonaws.com/templeton/v1/mapreduce/jar
>>>>> '
>>>>>
>>>>> This command gets the following response from WebHCat
>>>>> {"id":"job_201312212124_0161"}
>>>>>
>>>>> However I only see TempletonControllerJob in the jobtracker UI.  I
>>>>> don't see the Camus jobs that will show up if executed at the command-line.
>>>>>
>>>>> The following are the only things showing in webhcat.log
>>>>>
>>>>>
>>>>> The jar and properties files are in the /tmp directory on HDFS.
>>>>>
>>>>> hadoop fs -ls /tmp
>>>>> -rw-r--r--   2 hadoop supergroup   41456481 2013-12-27 17:45
>>>>> /tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar
>>>>> -rw-r--r--   2 hadoop supergroup       2605 2013-12-27 17:45
>>>>> /tmp/camus_non_avro.properties
>>>>>
>>>>
>>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: WebHCat MapReduce Job Syntax

Posted by Eugene Koifman <ek...@hortonworks.com>.
Is there any output from TrivialExecService class in any hadoop logs?
 (it's DEBUG level log4j output in hive 0.12).
It should print the command that TempletonControllerJob's launcher task
(LaunchMapper) is trying to launch


On Mon, Dec 30, 2013 at 12:55 PM, Jonathan Hodges <ho...@gmail.com> wrote:

> I didn't try that before, but I just did.
>
> curl -s -d user.name=hadoop \
>
> >        -d
> jar=/tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar \
>
> >        -d class=com.linkedin.camus.etl.kafka.CamusJob \
>
> >        -d arg=-P \
>
> >        -d arg=/tmp/camus_non_avro.properties \
>
> >        '
> http://internal-daalt-hcatalog-1507773817.us-east-1.elb.amazonaws.com/templeton/v1/mapreduce/jar
> '
>
> {"id":"job_201312212124_0166"}
>
> DEBUG | 30 Dec 2013 20:33:43,157 | org.apache.hcatalog.templeton.Server |
> queued job job_201312212124_0166 in 300 ms
>
> I still the same behavior with just the TempletonControllerJob getting
> kicked off and ending successfully without the Camus job starting.  I
> didn't see any errors in the jobtracker or tasktracker logs.  It just seems
> to silently fail and I can't figure out why.
>
>
>
>
>
>
> On Mon, Dec 30, 2013 at 12:35 PM, Eugene Koifman <ekoifman@hortonworks.com
> > wrote:
>
>> have you tried adding
>> -d arg=-P
>> before
>> -d arg=/tmp/....properites
>>
>>
>>
>> On Mon, Dec 30, 2013 at 11:14 AM, Jonathan Hodges <ho...@gmail.com>wrote:
>>
>>> Sorry accidentally hit send before adding the lines from webhcat.log
>>>
>>> DEBUG | 30 Dec 2013 19:08:01,042 | org.apache.hcatalog.templeton.Server
>>> | queued job job_201312212124_0161 in 267 ms
>>>
>>> DEBUG | 30 Dec 2013 19:08:38,880 |
>>> org.apache.hcatalog.templeton.tool.HDFSStorage | Couldn't find
>>> /templeton-hadoop/jobs/job_201312212124_0161/notified: File does not exist:
>>> /templeton-hadoop/jobs/job_201312212124_0161/notified
>>>
>>> DEBUG | 30 Dec 2013 19:08:38,881 |
>>> org.apache.hcatalog.templeton.tool.HDFSStorage | Couldn't find
>>> /templeton-hadoop/jobs/job_201312212124_0161/callback: File does not exist:
>>> /templeton-hadoop/jobs/job_201312212124_0161/callback
>>>
>>>
>>> Any ideas?
>>>
>>>
>>> On Mon, Dec 30, 2013 at 12:13 PM, Jonathan Hodges <ho...@gmail.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to kick off a mapreduce job via WebHCat.  The following is
>>>> the hadoop jar command.
>>>>
>>>> hadoop jar
>>>> /home/hadoop/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar
>>>> com.linkedin.camus.etl.kafka.CamusJob -P
>>>> /home/hadoop/camus_non_avro.properties
>>>>
>>>> As you can see there is an application specific parameter '-P' which
>>>> designates the properties file location.  How do I pass this to WebHCat?
>>>>
>>>> Referring to the docs (
>>>> https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+MapReduceJar)
>>>> I came up with the following.
>>>>
>>>> curl -s -d user.name=hadoop \
>>>>        -d
>>>> jar=/tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar \
>>>>        -d class=com.linkedin.camus.etl.kafka.CamusJob \
>>>>        -d arg=/tmp/camus_non_avro.properties \
>>>>        '
>>>> http://internal-daalt-hcatalog-1507773817.us-east-1.elb.amazonaws.com/templeton/v1/mapreduce/jar
>>>> '
>>>>
>>>> This command gets the following response from WebHCat
>>>> {"id":"job_201312212124_0161"}
>>>>
>>>> However I only see TempletonControllerJob in the jobtracker UI.  I
>>>> don't see the Camus jobs that will show up if executed at the command-line.
>>>>
>>>> The following are the only things showing in webhcat.log
>>>>
>>>>
>>>> The jar and properties files are in the /tmp directory on HDFS.
>>>>
>>>> hadoop fs -ls /tmp
>>>> -rw-r--r--   2 hadoop supergroup   41456481 2013-12-27 17:45
>>>> /tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar
>>>> -rw-r--r--   2 hadoop supergroup       2605 2013-12-27 17:45
>>>> /tmp/camus_non_avro.properties
>>>>
>>>
>>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: WebHCat MapReduce Job Syntax

Posted by Jonathan Hodges <ho...@gmail.com>.
I didn't try that before, but I just did.

curl -s -d user.name=hadoop \

>        -d
jar=/tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar \

>        -d class=com.linkedin.camus.etl.kafka.CamusJob \

>        -d arg=-P \

>        -d arg=/tmp/camus_non_avro.properties \

>        '
http://internal-daalt-hcatalog-1507773817.us-east-1.elb.amazonaws.com/templeton/v1/mapreduce/jar
'

{"id":"job_201312212124_0166"}

DEBUG | 30 Dec 2013 20:33:43,157 | org.apache.hcatalog.templeton.Server |
queued job job_201312212124_0166 in 300 ms

I still the same behavior with just the TempletonControllerJob getting
kicked off and ending successfully without the Camus job starting.  I
didn't see any errors in the jobtracker or tasktracker logs.  It just seems
to silently fail and I can't figure out why.






On Mon, Dec 30, 2013 at 12:35 PM, Eugene Koifman
<ek...@hortonworks.com>wrote:

> have you tried adding
> -d arg=-P
> before
> -d arg=/tmp/....properites
>
>
>
> On Mon, Dec 30, 2013 at 11:14 AM, Jonathan Hodges <ho...@gmail.com>wrote:
>
>> Sorry accidentally hit send before adding the lines from webhcat.log
>>
>> DEBUG | 30 Dec 2013 19:08:01,042 | org.apache.hcatalog.templeton.Server |
>> queued job job_201312212124_0161 in 267 ms
>>
>> DEBUG | 30 Dec 2013 19:08:38,880 |
>> org.apache.hcatalog.templeton.tool.HDFSStorage | Couldn't find
>> /templeton-hadoop/jobs/job_201312212124_0161/notified: File does not exist:
>> /templeton-hadoop/jobs/job_201312212124_0161/notified
>>
>> DEBUG | 30 Dec 2013 19:08:38,881 |
>> org.apache.hcatalog.templeton.tool.HDFSStorage | Couldn't find
>> /templeton-hadoop/jobs/job_201312212124_0161/callback: File does not exist:
>> /templeton-hadoop/jobs/job_201312212124_0161/callback
>>
>>
>> Any ideas?
>>
>>
>> On Mon, Dec 30, 2013 at 12:13 PM, Jonathan Hodges <ho...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I am trying to kick off a mapreduce job via WebHCat.  The following is
>>> the hadoop jar command.
>>>
>>> hadoop jar
>>> /home/hadoop/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar
>>> com.linkedin.camus.etl.kafka.CamusJob -P
>>> /home/hadoop/camus_non_avro.properties
>>>
>>> As you can see there is an application specific parameter '-P' which
>>> designates the properties file location.  How do I pass this to WebHCat?
>>>
>>> Referring to the docs (
>>> https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+MapReduceJar)
>>> I came up with the following.
>>>
>>> curl -s -d user.name=hadoop \
>>>        -d
>>> jar=/tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar \
>>>        -d class=com.linkedin.camus.etl.kafka.CamusJob \
>>>        -d arg=/tmp/camus_non_avro.properties \
>>>        '
>>> http://internal-daalt-hcatalog-1507773817.us-east-1.elb.amazonaws.com/templeton/v1/mapreduce/jar
>>> '
>>>
>>> This command gets the following response from WebHCat
>>> {"id":"job_201312212124_0161"}
>>>
>>> However I only see TempletonControllerJob in the jobtracker UI.  I don't
>>> see the Camus jobs that will show up if executed at the command-line.
>>>
>>> The following are the only things showing in webhcat.log
>>>
>>>
>>> The jar and properties files are in the /tmp directory on HDFS.
>>>
>>> hadoop fs -ls /tmp
>>> -rw-r--r--   2 hadoop supergroup   41456481 2013-12-27 17:45
>>> /tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar
>>> -rw-r--r--   2 hadoop supergroup       2605 2013-12-27 17:45
>>> /tmp/camus_non_avro.properties
>>>
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: WebHCat MapReduce Job Syntax

Posted by Eugene Koifman <ek...@hortonworks.com>.
have you tried adding
-d arg=-P
before
-d arg=/tmp/....properites



On Mon, Dec 30, 2013 at 11:14 AM, Jonathan Hodges <ho...@gmail.com> wrote:

> Sorry accidentally hit send before adding the lines from webhcat.log
>
> DEBUG | 30 Dec 2013 19:08:01,042 | org.apache.hcatalog.templeton.Server |
> queued job job_201312212124_0161 in 267 ms
>
> DEBUG | 30 Dec 2013 19:08:38,880 |
> org.apache.hcatalog.templeton.tool.HDFSStorage | Couldn't find
> /templeton-hadoop/jobs/job_201312212124_0161/notified: File does not exist:
> /templeton-hadoop/jobs/job_201312212124_0161/notified
>
> DEBUG | 30 Dec 2013 19:08:38,881 |
> org.apache.hcatalog.templeton.tool.HDFSStorage | Couldn't find
> /templeton-hadoop/jobs/job_201312212124_0161/callback: File does not exist:
> /templeton-hadoop/jobs/job_201312212124_0161/callback
>
>
> Any ideas?
>
>
> On Mon, Dec 30, 2013 at 12:13 PM, Jonathan Hodges <ho...@gmail.com>wrote:
>
>> Hi,
>>
>> I am trying to kick off a mapreduce job via WebHCat.  The following is
>> the hadoop jar command.
>>
>> hadoop jar
>> /home/hadoop/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar
>> com.linkedin.camus.etl.kafka.CamusJob -P
>> /home/hadoop/camus_non_avro.properties
>>
>> As you can see there is an application specific parameter '-P' which
>> designates the properties file location.  How do I pass this to WebHCat?
>>
>> Referring to the docs (
>> https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+MapReduceJar)
>> I came up with the following.
>>
>> curl -s -d user.name=hadoop \
>>        -d
>> jar=/tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar \
>>        -d class=com.linkedin.camus.etl.kafka.CamusJob \
>>        -d arg=/tmp/camus_non_avro.properties \
>>        '
>> http://internal-daalt-hcatalog-1507773817.us-east-1.elb.amazonaws.com/templeton/v1/mapreduce/jar
>> '
>>
>> This command gets the following response from WebHCat
>> {"id":"job_201312212124_0161"}
>>
>> However I only see TempletonControllerJob in the jobtracker UI.  I don't
>> see the Camus jobs that will show up if executed at the command-line.
>>
>> The following are the only things showing in webhcat.log
>>
>>
>> The jar and properties files are in the /tmp directory on HDFS.
>>
>> hadoop fs -ls /tmp
>> -rw-r--r--   2 hadoop supergroup   41456481 2013-12-27 17:45
>> /tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar
>> -rw-r--r--   2 hadoop supergroup       2605 2013-12-27 17:45
>> /tmp/camus_non_avro.properties
>>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: WebHCat MapReduce Job Syntax

Posted by Jonathan Hodges <ho...@gmail.com>.
Sorry accidentally hit send before adding the lines from webhcat.log

DEBUG | 30 Dec 2013 19:08:01,042 | org.apache.hcatalog.templeton.Server |
queued job job_201312212124_0161 in 267 ms

DEBUG | 30 Dec 2013 19:08:38,880 |
org.apache.hcatalog.templeton.tool.HDFSStorage | Couldn't find
/templeton-hadoop/jobs/job_201312212124_0161/notified: File does not exist:
/templeton-hadoop/jobs/job_201312212124_0161/notified

DEBUG | 30 Dec 2013 19:08:38,881 |
org.apache.hcatalog.templeton.tool.HDFSStorage | Couldn't find
/templeton-hadoop/jobs/job_201312212124_0161/callback: File does not exist:
/templeton-hadoop/jobs/job_201312212124_0161/callback


Any ideas?


On Mon, Dec 30, 2013 at 12:13 PM, Jonathan Hodges <ho...@gmail.com> wrote:

> Hi,
>
> I am trying to kick off a mapreduce job via WebHCat.  The following is the
> hadoop jar command.
>
> hadoop jar
> /home/hadoop/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar
> com.linkedin.camus.etl.kafka.CamusJob -P
> /home/hadoop/camus_non_avro.properties
>
> As you can see there is an application specific parameter '-P' which
> designates the properties file location.  How do I pass this to WebHCat?
>
> Referring to the docs (
> https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+MapReduceJar)
> I came up with the following.
>
> curl -s -d user.name=hadoop \
>        -d
> jar=/tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar \
>        -d class=com.linkedin.camus.etl.kafka.CamusJob \
>        -d arg=/tmp/camus_non_avro.properties \
>        '
> http://internal-daalt-hcatalog-1507773817.us-east-1.elb.amazonaws.com/templeton/v1/mapreduce/jar
> '
>
> This command gets the following response from WebHCat
> {"id":"job_201312212124_0161"}
>
> However I only see TempletonControllerJob in the jobtracker UI.  I don't
> see the Camus jobs that will show up if executed at the command-line.
>
> The following are the only things showing in webhcat.log
>
>
> The jar and properties files are in the /tmp directory on HDFS.
>
> hadoop fs -ls /tmp
> -rw-r--r--   2 hadoop supergroup   41456481 2013-12-27 17:45
> /tmp/camus-non-avro-consumer-1.0-SNAPSHOT-jar-with-dependencies.jar
> -rw-r--r--   2 hadoop supergroup       2605 2013-12-27 17:45
> /tmp/camus_non_avro.properties
>