You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Matt Pestritto <ma...@pestritto.com> on 2009/09/30 15:51:04 UTC

Fwd: Hive-74

Including hive-user in case someone has any experience with this..
Thanks
-Matt

---------- Forwarded message ----------
From: Matt Pestritto <ma...@pestritto.com>
Date: Tue, Sep 29, 2009 at 5:26 PM
Subject: Hive-74
To: hive-dev@hadoop.apache.org


Hi-

I'm having a problem using CombineHiveInputSplit.  I believe this was
patched in http://issues.apache.org/jira/browse/HIVE-74

I'm currently running hadoop 20.1 using hive trunk.

hive-default.xml has the following property:
<property>
  <name>hive.input.format</name>
  <value></value>
  <description>The default input format, if it is not specified, the system
assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
always be manually set to HiveInputFormat. </description>
</property>

I added the following to hive-site.xml:  ( Notice, the description in
hive-default.xml has CombinedHiveInputFormat which does not work for me -
the property value seems to be Combine(-d) )
<property>
  <name>hive.input.format</name>
  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
  <description>The default input format, if it is not specified, the system
assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
always be manually set to HiveInputFormat. </description>
</property>

When I launch a job the cli exits immediately:
hive> select count(1) from my_table;
Total MapReduce jobs = 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver
hive> exit ;

If I set the property value to org.apache.hadoop.hive.ql.io.HiveInputFormat,
the job runs fine.

Suggestions ? Is there something that I am missing ?

Thanks
-Matt

Re: Hive-74

Posted by Matt Pestritto <ma...@pestritto.com>.
Namit -

I have tried hive-trunk as of this afternoon and hive release 814942 (
revision with CombineHiveInputFormat commit ) .

Also - there are no logs that get generated on the tasktrackers for the
hadoop job that fails.  The only log that is generated on the jobtracker is
the jobconf.

Thanks
-Matt

On Thu, Oct 8, 2009 at 1:26 AM, Namit Jain <nj...@facebook.com> wrote:

>  Hi Matt,
>
> Sorry for the late reply.
>
> hive> set
> hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
>
> I tried it running on hadoop 20 and it ran fine for me.
>
> Which hive release are you using ?
>
> Also, you got a runtime error – can you see the stderr logs on the tracker
> ?
>
> Thanks,
> -namit
>
>
>
> On 10/1/09 5:01 PM, "Matt Pestritto" <ma...@pestritto.com> wrote:
>
> Namit -
> Any idea on how to resolve ?
> Thanks
>
> On Thu, Oct 1, 2009 at 10:52 AM, Matt Pestritto <ma...@pestritto.com>
> wrote:
>
> > There were errors in the hive.log
> >
> > 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.resources" but it cannot be resolved.
> > 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.resources" but it cannot be resolved.
> > 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.runtime" but it cannot be resolved.
> > 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.runtime" but it cannot be resolved.
> > 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.text" but it cannot be resolved.
> > 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.text" but it cannot be resolved.
> > 2009-10-01 10:40:57,143 WARN  mapred.JobClient
> > (JobClient.java:configureCommandLineOptions(539)) - Use
> GenericOptionsParser
> > for parsing the arguments. Applications should implement Tool for the
> same.
> > 2009-10-01 10:40:58,609 ERROR exec.ExecDriver
> > (SessionState.java:printError(248)) - Ended Job = job_200909301537_0068
> with
> > errors
> > 2009-10-01 10:40:58,622 ERROR ql.Driver
> (SessionState.java:printError(248))
> > - FAILED: Execution Error, return code 2 from
> > org.apache.hadoop.hive.ql.exec.ExecDriver
> >
> >
> >
> > On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain <nj...@facebook.com> wrote:
> >
> >> What you are doing seems OK ?
> >> Can you get the stack trace from /tmp/<username>/hive.log ?
> >>
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Matt Pestritto [mailto:matt@pestritto.com] <matt@pestritto.com]>
> >> Sent: Wednesday, September 30, 2009 6:51 AM
> >> To: hive-dev@hadoop.apache.org; hive-user@hadoop.apache.org
> >> Subject: Fwd: Hive-74
> >>
> >> Including hive-user in case someone has any experience with this..
> >> Thanks
> >> -Matt
> >>
> >> ---------- Forwarded message ----------
> >> From: Matt Pestritto <ma...@pestritto.com>
> >> Date: Tue, Sep 29, 2009 at 5:26 PM
> >> Subject: Hive-74
> >> To: hive-dev@hadoop.apache.org
> >>
> >>
> >> Hi-
> >>
> >> I'm having a problem using CombineHiveInputSplit.  I believe this was
> >> patched in http://issues.apache.org/jira/browse/HIVE-74
> >>
> >> I'm currently running hadoop 20.1 using hive trunk.
> >>
> >> hive-default.xml has the following property:
> >> <property>
> >>  <name>hive.input.format</name>
> >>  <value></value>
> >>  <description>The default input format, if it is not specified, the
> system
> >> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
> >> 19,
> >> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
> >> always overwrite it - if there is a bug in CombinedHiveInputFormat, it
> can
> >> always be manually set to HiveInputFormat. </description>
> >> </property>
> >>
> >> I added the following to hive-site.xml:  ( Notice, the description in
> >> hive-default.xml has CombinedHiveInputFormat which does not work for me
> -
> >> the property value seems to be Combine(-d) )
> >> <property>
> >>  <name>hive.input.format</name>
> >>  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
> >>  <description>The default input format, if it is not specified, the
> system
> >> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
> >> 19,
> >> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
> >> always overwrite it - if there is a bug in CombinedHiveInputFormat, it
> can
> >> always be manually set to HiveInputFormat. </description>
> >> </property>
> >>
> >> When I launch a job the cli exits immediately:
> >> hive> select count(1) from my_table;
> >> Total MapReduce jobs = 1
> >> Number of reduce tasks determined at compile time: 1
> >> In order to change the average load for a reducer (in bytes):
> >>  set hive.exec.reducers.bytes.per.reducer=<number>
> >> In order to limit the maximum number of reducers:
> >>  set hive.exec.reducers.max=<number>
> >> In order to set a constant number of reducers:
> >>  set mapred.reduce.tasks=<number>
> >> FAILED: Execution Error, return code 2 from
> >> org.apache.hadoop.hive.ql.exec.ExecDriver
> >> hive> exit ;
> >>
> >> If I set the property value to
> >> org.apache.hadoop.hive.ql.io.HiveInputFormat,
> >> the job runs fine.
> >>
> >> Suggestions ? Is there something that I am missing ?
> >>
> >> Thanks
> >> -Matt
> >>
> >
> >
>
>

Re: Hive-74

Posted by Matt Pestritto <ma...@pestritto.com>.
Namit -

I have tried hive-trunk as of this afternoon and hive release 814942 (
revision with CombineHiveInputFormat commit ) .

Also - there are no logs that get generated on the tasktrackers for the
hadoop job that fails.  The only log that is generated on the jobtracker is
the jobconf.

Thanks
-Matt

On Thu, Oct 8, 2009 at 1:26 AM, Namit Jain <nj...@facebook.com> wrote:

>  Hi Matt,
>
> Sorry for the late reply.
>
> hive> set
> hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
>
> I tried it running on hadoop 20 and it ran fine for me.
>
> Which hive release are you using ?
>
> Also, you got a runtime error – can you see the stderr logs on the tracker
> ?
>
> Thanks,
> -namit
>
>
>
> On 10/1/09 5:01 PM, "Matt Pestritto" <ma...@pestritto.com> wrote:
>
> Namit -
> Any idea on how to resolve ?
> Thanks
>
> On Thu, Oct 1, 2009 at 10:52 AM, Matt Pestritto <ma...@pestritto.com>
> wrote:
>
> > There were errors in the hive.log
> >
> > 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.resources" but it cannot be resolved.
> > 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.resources" but it cannot be resolved.
> > 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.runtime" but it cannot be resolved.
> > 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.runtime" but it cannot be resolved.
> > 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.text" but it cannot be resolved.
> > 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.text" but it cannot be resolved.
> > 2009-10-01 10:40:57,143 WARN  mapred.JobClient
> > (JobClient.java:configureCommandLineOptions(539)) - Use
> GenericOptionsParser
> > for parsing the arguments. Applications should implement Tool for the
> same.
> > 2009-10-01 10:40:58,609 ERROR exec.ExecDriver
> > (SessionState.java:printError(248)) - Ended Job = job_200909301537_0068
> with
> > errors
> > 2009-10-01 10:40:58,622 ERROR ql.Driver
> (SessionState.java:printError(248))
> > - FAILED: Execution Error, return code 2 from
> > org.apache.hadoop.hive.ql.exec.ExecDriver
> >
> >
> >
> > On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain <nj...@facebook.com> wrote:
> >
> >> What you are doing seems OK ?
> >> Can you get the stack trace from /tmp/<username>/hive.log ?
> >>
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Matt Pestritto [mailto:matt@pestritto.com] <matt@pestritto.com]>
> >> Sent: Wednesday, September 30, 2009 6:51 AM
> >> To: hive-dev@hadoop.apache.org; hive-user@hadoop.apache.org
> >> Subject: Fwd: Hive-74
> >>
> >> Including hive-user in case someone has any experience with this..
> >> Thanks
> >> -Matt
> >>
> >> ---------- Forwarded message ----------
> >> From: Matt Pestritto <ma...@pestritto.com>
> >> Date: Tue, Sep 29, 2009 at 5:26 PM
> >> Subject: Hive-74
> >> To: hive-dev@hadoop.apache.org
> >>
> >>
> >> Hi-
> >>
> >> I'm having a problem using CombineHiveInputSplit.  I believe this was
> >> patched in http://issues.apache.org/jira/browse/HIVE-74
> >>
> >> I'm currently running hadoop 20.1 using hive trunk.
> >>
> >> hive-default.xml has the following property:
> >> <property>
> >>  <name>hive.input.format</name>
> >>  <value></value>
> >>  <description>The default input format, if it is not specified, the
> system
> >> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
> >> 19,
> >> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
> >> always overwrite it - if there is a bug in CombinedHiveInputFormat, it
> can
> >> always be manually set to HiveInputFormat. </description>
> >> </property>
> >>
> >> I added the following to hive-site.xml:  ( Notice, the description in
> >> hive-default.xml has CombinedHiveInputFormat which does not work for me
> -
> >> the property value seems to be Combine(-d) )
> >> <property>
> >>  <name>hive.input.format</name>
> >>  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
> >>  <description>The default input format, if it is not specified, the
> system
> >> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
> >> 19,
> >> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
> >> always overwrite it - if there is a bug in CombinedHiveInputFormat, it
> can
> >> always be manually set to HiveInputFormat. </description>
> >> </property>
> >>
> >> When I launch a job the cli exits immediately:
> >> hive> select count(1) from my_table;
> >> Total MapReduce jobs = 1
> >> Number of reduce tasks determined at compile time: 1
> >> In order to change the average load for a reducer (in bytes):
> >>  set hive.exec.reducers.bytes.per.reducer=<number>
> >> In order to limit the maximum number of reducers:
> >>  set hive.exec.reducers.max=<number>
> >> In order to set a constant number of reducers:
> >>  set mapred.reduce.tasks=<number>
> >> FAILED: Execution Error, return code 2 from
> >> org.apache.hadoop.hive.ql.exec.ExecDriver
> >> hive> exit ;
> >>
> >> If I set the property value to
> >> org.apache.hadoop.hive.ql.io.HiveInputFormat,
> >> the job runs fine.
> >>
> >> Suggestions ? Is there something that I am missing ?
> >>
> >> Thanks
> >> -Matt
> >>
> >
> >
>
>

Re: Hive-74

Posted by Matt Pestritto <ma...@pestritto.com>.
Thanks Namit -
The patch worked for me.

On Mon, Oct 19, 2009 at 1:23 PM, Namit Jain <nj...@facebook.com> wrote:

>  Yes, we also ran into this problem, and Zheng has a patch for this.
>
> Either you can apply the patch on hadoop and get it to work, or copy the
> code in
>
> Hive and have a hive patch. I am not sure if that is the best approach,
> since that will
>
> lead to code duplication.
>
>
>
>
>
> Thanks,
>
> -namit
>
>
>
>
>
> *From:* Matt Pestritto [mailto:matt@pestritto.com]
> *Sent:* Monday, October 19, 2009 10:20 AM
> *To:* hive-user@hadoop.apache.org
> *Subject:* Re: Hive-74
>
>
>
> Namit -
>
> I finally had a chance to look at this again.  I am running hadoop 20.1 and
> hive trunk.
>
> I'm still having a problem with combine input.  I found an error in my
> jobtracker logs:
>
> 12:48:08,792 INFO  [JobInProgress] Input size for job job_200910160957_0003 = 13. Number of splits = 1
>
> 12:48:08,794 ERROR [JobTracker] Job initialization failed:
>
> java.lang.IllegalArgumentException: Network location name contains /: /default-rack
>
>         at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75)
>
>         at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
>
>         at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2390)
>
>         at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2384)
>
>         at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:349)
>
>         at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:450)
>
>         at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3147)
>
>         at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>         at java.lang.Thread.run(Thread.java:619)
>
> 12:48:08,794 INFO  [JobTracker] Failing job job_200910160957_0003
>
> 12:48:09,560 INFO  [JobTracker] Killing job job_200910160957_0003
>
> This seems to be related to hadoop-5759 but it looks like it was patched
> coincidently today by Zheng.
> https://issues.apache.org/jira/browse/HADOOP-5759
>
> I'm guessing I need to build hadoop from 20 branch for this to work ?
>
> Thanks
> -Matt
>
> On Thu, Oct 8, 2009 at 1:26 AM, Namit Jain <nj...@facebook.com> wrote:
>
> Hi Matt,
>
> Sorry for the late reply.
>
> hive> set
> hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
>
> I tried it running on hadoop 20 and it ran fine for me.
>
> Which hive release are you using ?
>
> Also, you got a runtime error – can you see the stderr logs on the tracker
> ?
>
> Thanks,
> -namit
>
>
>
>
> On 10/1/09 5:01 PM, "Matt Pestritto" <ma...@pestritto.com> wrote:
>
> Namit -
> Any idea on how to resolve ?
> Thanks
>
> On Thu, Oct 1, 2009 at 10:52 AM, Matt Pestritto <ma...@pestritto.com>
> wrote:
>
> > There were errors in the hive.log
> >
> > 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.resources" but it cannot be resolved.
> > 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.resources" but it cannot be resolved.
> > 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.runtime" but it cannot be resolved.
> > 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.runtime" but it cannot be resolved.
> > 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.text" but it cannot be resolved.
> > 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.text" but it cannot be resolved.
> > 2009-10-01 10:40:57,143 WARN  mapred.JobClient
> > (JobClient.java:configureCommandLineOptions(539)) - Use
> GenericOptionsParser
> > for parsing the arguments. Applications should implement Tool for the
> same.
> > 2009-10-01 10:40:58,609 ERROR exec.ExecDriver
> > (SessionState.java:printError(248)) - Ended Job = job_200909301537_0068
> with
> > errors
> > 2009-10-01 10:40:58,622 ERROR ql.Driver
> (SessionState.java:printError(248))
> > - FAILED: Execution Error, return code 2 from
> > org.apache.hadoop.hive.ql.exec.ExecDriver
> >
> >
> >
> > On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain <nj...@facebook.com> wrote:
> >
> >> What you are doing seems OK ?
> >> Can you get the stack trace from /tmp/<username>/hive.log ?
> >>
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Matt Pestritto [mailto:matt@pestritto.com] <matt@pestritto.com]>
> >> Sent: Wednesday, September 30, 2009 6:51 AM
> >> To: hive-dev@hadoop.apache.org; hive-user@hadoop.apache.org
> >> Subject: Fwd: Hive-74
> >>
> >> Including hive-user in case someone has any experience with this..
> >> Thanks
> >> -Matt
> >>
> >> ---------- Forwarded message ----------
> >> From: Matt Pestritto <ma...@pestritto.com>
> >> Date: Tue, Sep 29, 2009 at 5:26 PM
> >> Subject: Hive-74
> >> To: hive-dev@hadoop.apache.org
> >>
> >>
> >> Hi-
> >>
> >> I'm having a problem using CombineHiveInputSplit.  I believe this was
> >> patched in http://issues.apache.org/jira/browse/HIVE-74
> >>
> >> I'm currently running hadoop 20.1 using hive trunk.
> >>
> >> hive-default.xml has the following property:
> >> <property>
> >>  <name>hive.input.format</name>
> >>  <value></value>
> >>  <description>The default input format, if it is not specified, the
> system
> >> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
> >> 19,
> >> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
> >> always overwrite it - if there is a bug in CombinedHiveInputFormat, it
> can
> >> always be manually set to HiveInputFormat. </description>
> >> </property>
> >>
> >> I added the following to hive-site.xml:  ( Notice, the description in
> >> hive-default.xml has CombinedHiveInputFormat which does not work for me
> -
> >> the property value seems to be Combine(-d) )
> >> <property>
> >>  <name>hive.input.format</name>
> >>  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
> >>  <description>The default input format, if it is not specified, the
> system
> >> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
> >> 19,
> >> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
> >> always overwrite it - if there is a bug in CombinedHiveInputFormat, it
> can
> >> always be manually set to HiveInputFormat. </description>
> >> </property>
> >>
> >> When I launch a job the cli exits immediately:
> >> hive> select count(1) from my_table;
> >> Total MapReduce jobs = 1
> >> Number of reduce tasks determined at compile time: 1
> >> In order to change the average load for a reducer (in bytes):
> >>  set hive.exec.reducers.bytes.per.reducer=<number>
> >> In order to limit the maximum number of reducers:
> >>  set hive.exec.reducers.max=<number>
> >> In order to set a constant number of reducers:
> >>  set mapred.reduce.tasks=<number>
> >> FAILED: Execution Error, return code 2 from
> >> org.apache.hadoop.hive.ql.exec.ExecDriver
> >> hive> exit ;
> >>
> >> If I set the property value to
> >> org.apache.hadoop.hive.ql.io.HiveInputFormat,
> >> the job runs fine.
> >>
> >> Suggestions ? Is there something that I am missing ?
> >>
> >> Thanks
> >> -Matt
> >>
> >
> >
>
>
>

RE: Hive-74

Posted by Namit Jain <nj...@facebook.com>.
Yes, we also ran into this problem, and Zheng has a patch for this.
Either you can apply the patch on hadoop and get it to work, or copy the code in
Hive and have a hive patch. I am not sure if that is the best approach, since that will
lead to code duplication.


Thanks,
-namit


From: Matt Pestritto [mailto:matt@pestritto.com]
Sent: Monday, October 19, 2009 10:20 AM
To: hive-user@hadoop.apache.org
Subject: Re: Hive-74

Namit -

I finally had a chance to look at this again.  I am running hadoop 20.1 and hive trunk.

I'm still having a problem with combine input.  I found an error in my jobtracker logs:

12:48:08,792 INFO  [JobInProgress] Input size for job job_200910160957_0003 = 13. Number of splits = 1





12:48:08,794 ERROR [JobTracker] Job initialization failed:

java.lang.IllegalArgumentException: Network location name contains /: /default-rack

        at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75)

        at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)





        at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2390)

        at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2384)

        at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:349)





        at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:450)

        at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3147)

        at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)





        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:619)





12:48:08,794 INFO  [JobTracker] Failing job job_200910160957_0003

12:48:09,560 INFO  [JobTracker] Killing job job_200910160957_0003
This seems to be related to hadoop-5759 but it looks like it was patched coincidently today by Zheng.  https://issues.apache.org/jira/browse/HADOOP-5759

I'm guessing I need to build hadoop from 20 branch for this to work ?

Thanks
-Matt
On Thu, Oct 8, 2009 at 1:26 AM, Namit Jain <nj...@facebook.com>> wrote:
Hi Matt,

Sorry for the late reply.

hive> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

I tried it running on hadoop 20 and it ran fine for me.

Which hive release are you using ?

Also, you got a runtime error - can you see the stderr logs on the tracker ?

Thanks,
-namit



On 10/1/09 5:01 PM, "Matt Pestritto" <ma...@pestritto.com>> wrote:
Namit -
Any idea on how to resolve ?
Thanks

On Thu, Oct 1, 2009 at 10:52 AM, Matt Pestritto <ma...@pestritto.com>> wrote:

> There were errors in the hive.log
>
> 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2009-10-01 10:40:57,143 WARN  mapred.JobClient
> (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser
> for parsing the arguments. Applications should implement Tool for the same.
> 2009-10-01 10:40:58,609 ERROR exec.ExecDriver
> (SessionState.java:printError(248)) - Ended Job = job_200909301537_0068 with
> errors
> 2009-10-01 10:40:58,622 ERROR ql.Driver (SessionState.java:printError(248))
> - FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
>
>
>
> On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain <nj...@facebook.com>> wrote:
>
>> What you are doing seems OK ?
>> Can you get the stack trace from /tmp/<username>/hive.log ?
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Matt Pestritto [mailto:matt@pestritto.com]
>> Sent: Wednesday, September 30, 2009 6:51 AM
>> To: hive-dev@hadoop.apache.org<ma...@hadoop.apache.org>; hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
>> Subject: Fwd: Hive-74
>>
>> Including hive-user in case someone has any experience with this..
>> Thanks
>> -Matt
>>
>> ---------- Forwarded message ----------
>> From: Matt Pestritto <ma...@pestritto.com>>
>> Date: Tue, Sep 29, 2009 at 5:26 PM
>> Subject: Hive-74
>> To: hive-dev@hadoop.apache.org<ma...@hadoop.apache.org>
>>
>>
>> Hi-
>>
>> I'm having a problem using CombineHiveInputSplit.  I believe this was
>> patched in http://issues.apache.org/jira/browse/HIVE-74
>>
>> I'm currently running hadoop 20.1 using hive trunk.
>>
>> hive-default.xml has the following property:
>> <property>
>>  <name>hive.input.format</name>
>>  <value></value>
>>  <description>The default input format, if it is not specified, the system
>> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
>> 19,
>> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
>> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
>> always be manually set to HiveInputFormat. </description>
>> </property>
>>
>> I added the following to hive-site.xml:  ( Notice, the description in
>> hive-default.xml has CombinedHiveInputFormat which does not work for me -
>> the property value seems to be Combine(-d) )
>> <property>
>>  <name>hive.input.format</name>
>>  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
>>  <description>The default input format, if it is not specified, the system
>> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
>> 19,
>> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
>> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
>> always be manually set to HiveInputFormat. </description>
>> </property>
>>
>> When I launch a job the cli exits immediately:
>> hive> select count(1) from my_table;
>> Total MapReduce jobs = 1
>> Number of reduce tasks determined at compile time: 1
>> In order to change the average load for a reducer (in bytes):
>>  set hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the maximum number of reducers:
>>  set hive.exec.reducers.max=<number>
>> In order to set a constant number of reducers:
>>  set mapred.reduce.tasks=<number>
>> FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.ExecDriver
>> hive> exit ;
>>
>> If I set the property value to
>> org.apache.hadoop.hive.ql.io.HiveInputFormat,
>> the job runs fine.
>>
>> Suggestions ? Is there something that I am missing ?
>>
>> Thanks
>> -Matt
>>
>
>


Re: Hive-74

Posted by Matt Pestritto <ma...@pestritto.com>.
Namit -

I finally had a chance to look at this again.  I am running hadoop 20.1 and
hive trunk.

I'm still having a problem with combine input.  I found an error in my
jobtracker logs:

12:48:08,792 INFO  [JobInProgress] Input size for job
job_200910160957_0003 = 13. Number of splits = 1
12:48:08,794 ERROR [JobTracker] Job initialization failed:
java.lang.IllegalArgumentException: Network location name contains /:
/default-rack
	at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75)
	at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
	at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2390)
	at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2384)
	at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:349)
	at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:450)
	at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3147)
	at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)

12:48:08,794 INFO  [JobTracker] Failing job job_200910160957_0003
12:48:09,560 INFO  [JobTracker] Killing job job_200910160957_0003

This seems to be related to hadoop-5759 but it looks like it was patched
coincidently today by Zheng.
https://issues.apache.org/jira/browse/HADOOP-5759

I'm guessing I need to build hadoop from 20 branch for this to work ?

Thanks
-Matt

On Thu, Oct 8, 2009 at 1:26 AM, Namit Jain <nj...@facebook.com> wrote:

>  Hi Matt,
>
> Sorry for the late reply.
>
> hive> set
> hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
>
> I tried it running on hadoop 20 and it ran fine for me.
>
> Which hive release are you using ?
>
> Also, you got a runtime error – can you see the stderr logs on the tracker
> ?
>
> Thanks,
> -namit
>
>
>
> On 10/1/09 5:01 PM, "Matt Pestritto" <ma...@pestritto.com> wrote:
>
> Namit -
> Any idea on how to resolve ?
> Thanks
>
> On Thu, Oct 1, 2009 at 10:52 AM, Matt Pestritto <ma...@pestritto.com>
> wrote:
>
> > There were errors in the hive.log
> >
> > 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.resources" but it cannot be resolved.
> > 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.resources" but it cannot be resolved.
> > 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.runtime" but it cannot be resolved.
> > 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.core.runtime" but it cannot be resolved.
> > 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.text" but it cannot be resolved.
> > 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> > (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> > "org.eclipse.text" but it cannot be resolved.
> > 2009-10-01 10:40:57,143 WARN  mapred.JobClient
> > (JobClient.java:configureCommandLineOptions(539)) - Use
> GenericOptionsParser
> > for parsing the arguments. Applications should implement Tool for the
> same.
> > 2009-10-01 10:40:58,609 ERROR exec.ExecDriver
> > (SessionState.java:printError(248)) - Ended Job = job_200909301537_0068
> with
> > errors
> > 2009-10-01 10:40:58,622 ERROR ql.Driver
> (SessionState.java:printError(248))
> > - FAILED: Execution Error, return code 2 from
> > org.apache.hadoop.hive.ql.exec.ExecDriver
> >
> >
> >
> > On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain <nj...@facebook.com> wrote:
> >
> >> What you are doing seems OK ?
> >> Can you get the stack trace from /tmp/<username>/hive.log ?
> >>
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Matt Pestritto [mailto:matt@pestritto.com] <matt@pestritto.com]>
> >> Sent: Wednesday, September 30, 2009 6:51 AM
> >> To: hive-dev@hadoop.apache.org; hive-user@hadoop.apache.org
> >> Subject: Fwd: Hive-74
> >>
> >> Including hive-user in case someone has any experience with this..
> >> Thanks
> >> -Matt
> >>
> >> ---------- Forwarded message ----------
> >> From: Matt Pestritto <ma...@pestritto.com>
> >> Date: Tue, Sep 29, 2009 at 5:26 PM
> >> Subject: Hive-74
> >> To: hive-dev@hadoop.apache.org
> >>
> >>
> >> Hi-
> >>
> >> I'm having a problem using CombineHiveInputSplit.  I believe this was
> >> patched in http://issues.apache.org/jira/browse/HIVE-74
> >>
> >> I'm currently running hadoop 20.1 using hive trunk.
> >>
> >> hive-default.xml has the following property:
> >> <property>
> >>  <name>hive.input.format</name>
> >>  <value></value>
> >>  <description>The default input format, if it is not specified, the
> system
> >> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
> >> 19,
> >> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
> >> always overwrite it - if there is a bug in CombinedHiveInputFormat, it
> can
> >> always be manually set to HiveInputFormat. </description>
> >> </property>
> >>
> >> I added the following to hive-site.xml:  ( Notice, the description in
> >> hive-default.xml has CombinedHiveInputFormat which does not work for me
> -
> >> the property value seems to be Combine(-d) )
> >> <property>
> >>  <name>hive.input.format</name>
> >>  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
> >>  <description>The default input format, if it is not specified, the
> system
> >> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
> >> 19,
> >> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
> >> always overwrite it - if there is a bug in CombinedHiveInputFormat, it
> can
> >> always be manually set to HiveInputFormat. </description>
> >> </property>
> >>
> >> When I launch a job the cli exits immediately:
> >> hive> select count(1) from my_table;
> >> Total MapReduce jobs = 1
> >> Number of reduce tasks determined at compile time: 1
> >> In order to change the average load for a reducer (in bytes):
> >>  set hive.exec.reducers.bytes.per.reducer=<number>
> >> In order to limit the maximum number of reducers:
> >>  set hive.exec.reducers.max=<number>
> >> In order to set a constant number of reducers:
> >>  set mapred.reduce.tasks=<number>
> >> FAILED: Execution Error, return code 2 from
> >> org.apache.hadoop.hive.ql.exec.ExecDriver
> >> hive> exit ;
> >>
> >> If I set the property value to
> >> org.apache.hadoop.hive.ql.io.HiveInputFormat,
> >> the job runs fine.
> >>
> >> Suggestions ? Is there something that I am missing ?
> >>
> >> Thanks
> >> -Matt
> >>
> >
> >
>
>

Re: Hive-74

Posted by Namit Jain <nj...@facebook.com>.
Hi Matt,

Sorry for the late reply.

hive> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

I tried it running on hadoop 20 and it ran fine for me.

Which hive release are you using ?

Also, you got a runtime error - can you see the stderr logs on the tracker ?

Thanks,
-namit


On 10/1/09 5:01 PM, "Matt Pestritto" <ma...@pestritto.com> wrote:

Namit -
Any idea on how to resolve ?
Thanks

On Thu, Oct 1, 2009 at 10:52 AM, Matt Pestritto <ma...@pestritto.com> wrote:

> There were errors in the hive.log
>
> 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2009-10-01 10:40:57,143 WARN  mapred.JobClient
> (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser
> for parsing the arguments. Applications should implement Tool for the same.
> 2009-10-01 10:40:58,609 ERROR exec.ExecDriver
> (SessionState.java:printError(248)) - Ended Job = job_200909301537_0068 with
> errors
> 2009-10-01 10:40:58,622 ERROR ql.Driver (SessionState.java:printError(248))
> - FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
>
>
>
> On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain <nj...@facebook.com> wrote:
>
>> What you are doing seems OK ?
>> Can you get the stack trace from /tmp/<username>/hive.log ?
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Matt Pestritto [mailto:matt@pestritto.com]
>> Sent: Wednesday, September 30, 2009 6:51 AM
>> To: hive-dev@hadoop.apache.org; hive-user@hadoop.apache.org
>> Subject: Fwd: Hive-74
>>
>> Including hive-user in case someone has any experience with this..
>> Thanks
>> -Matt
>>
>> ---------- Forwarded message ----------
>> From: Matt Pestritto <ma...@pestritto.com>
>> Date: Tue, Sep 29, 2009 at 5:26 PM
>> Subject: Hive-74
>> To: hive-dev@hadoop.apache.org
>>
>>
>> Hi-
>>
>> I'm having a problem using CombineHiveInputSplit.  I believe this was
>> patched in http://issues.apache.org/jira/browse/HIVE-74
>>
>> I'm currently running hadoop 20.1 using hive trunk.
>>
>> hive-default.xml has the following property:
>> <property>
>>  <name>hive.input.format</name>
>>  <value></value>
>>  <description>The default input format, if it is not specified, the system
>> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
>> 19,
>> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
>> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
>> always be manually set to HiveInputFormat. </description>
>> </property>
>>
>> I added the following to hive-site.xml:  ( Notice, the description in
>> hive-default.xml has CombinedHiveInputFormat which does not work for me -
>> the property value seems to be Combine(-d) )
>> <property>
>>  <name>hive.input.format</name>
>>  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
>>  <description>The default input format, if it is not specified, the system
>> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
>> 19,
>> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
>> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
>> always be manually set to HiveInputFormat. </description>
>> </property>
>>
>> When I launch a job the cli exits immediately:
>> hive> select count(1) from my_table;
>> Total MapReduce jobs = 1
>> Number of reduce tasks determined at compile time: 1
>> In order to change the average load for a reducer (in bytes):
>>  set hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the maximum number of reducers:
>>  set hive.exec.reducers.max=<number>
>> In order to set a constant number of reducers:
>>  set mapred.reduce.tasks=<number>
>> FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.ExecDriver
>> hive> exit ;
>>
>> If I set the property value to
>> org.apache.hadoop.hive.ql.io.HiveInputFormat,
>> the job runs fine.
>>
>> Suggestions ? Is there something that I am missing ?
>>
>> Thanks
>> -Matt
>>
>
>


Re: Hive-74

Posted by Namit Jain <nj...@facebook.com>.
Hi Matt,

Sorry for the late reply.

hive> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

I tried it running on hadoop 20 and it ran fine for me.

Which hive release are you using ?

Also, you got a runtime error - can you see the stderr logs on the tracker ?

Thanks,
-namit


On 10/1/09 5:01 PM, "Matt Pestritto" <ma...@pestritto.com> wrote:

Namit -
Any idea on how to resolve ?
Thanks

On Thu, Oct 1, 2009 at 10:52 AM, Matt Pestritto <ma...@pestritto.com> wrote:

> There were errors in the hive.log
>
> 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2009-10-01 10:40:57,143 WARN  mapred.JobClient
> (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser
> for parsing the arguments. Applications should implement Tool for the same.
> 2009-10-01 10:40:58,609 ERROR exec.ExecDriver
> (SessionState.java:printError(248)) - Ended Job = job_200909301537_0068 with
> errors
> 2009-10-01 10:40:58,622 ERROR ql.Driver (SessionState.java:printError(248))
> - FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
>
>
>
> On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain <nj...@facebook.com> wrote:
>
>> What you are doing seems OK ?
>> Can you get the stack trace from /tmp/<username>/hive.log ?
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Matt Pestritto [mailto:matt@pestritto.com]
>> Sent: Wednesday, September 30, 2009 6:51 AM
>> To: hive-dev@hadoop.apache.org; hive-user@hadoop.apache.org
>> Subject: Fwd: Hive-74
>>
>> Including hive-user in case someone has any experience with this..
>> Thanks
>> -Matt
>>
>> ---------- Forwarded message ----------
>> From: Matt Pestritto <ma...@pestritto.com>
>> Date: Tue, Sep 29, 2009 at 5:26 PM
>> Subject: Hive-74
>> To: hive-dev@hadoop.apache.org
>>
>>
>> Hi-
>>
>> I'm having a problem using CombineHiveInputSplit.  I believe this was
>> patched in http://issues.apache.org/jira/browse/HIVE-74
>>
>> I'm currently running hadoop 20.1 using hive trunk.
>>
>> hive-default.xml has the following property:
>> <property>
>>  <name>hive.input.format</name>
>>  <value></value>
>>  <description>The default input format, if it is not specified, the system
>> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
>> 19,
>> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
>> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
>> always be manually set to HiveInputFormat. </description>
>> </property>
>>
>> I added the following to hive-site.xml:  ( Notice, the description in
>> hive-default.xml has CombinedHiveInputFormat which does not work for me -
>> the property value seems to be Combine(-d) )
>> <property>
>>  <name>hive.input.format</name>
>>  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
>>  <description>The default input format, if it is not specified, the system
>> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
>> 19,
>> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
>> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
>> always be manually set to HiveInputFormat. </description>
>> </property>
>>
>> When I launch a job the cli exits immediately:
>> hive> select count(1) from my_table;
>> Total MapReduce jobs = 1
>> Number of reduce tasks determined at compile time: 1
>> In order to change the average load for a reducer (in bytes):
>>  set hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the maximum number of reducers:
>>  set hive.exec.reducers.max=<number>
>> In order to set a constant number of reducers:
>>  set mapred.reduce.tasks=<number>
>> FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.ExecDriver
>> hive> exit ;
>>
>> If I set the property value to
>> org.apache.hadoop.hive.ql.io.HiveInputFormat,
>> the job runs fine.
>>
>> Suggestions ? Is there something that I am missing ?
>>
>> Thanks
>> -Matt
>>
>
>


Re: Hive-74

Posted by Matt Pestritto <ma...@pestritto.com>.
Namit -
Any idea on how to resolve ?
Thanks

On Thu, Oct 1, 2009 at 10:52 AM, Matt Pestritto <ma...@pestritto.com> wrote:

> There were errors in the hive.log
>
> 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2009-10-01 10:40:57,143 WARN  mapred.JobClient
> (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser
> for parsing the arguments. Applications should implement Tool for the same.
> 2009-10-01 10:40:58,609 ERROR exec.ExecDriver
> (SessionState.java:printError(248)) - Ended Job = job_200909301537_0068 with
> errors
> 2009-10-01 10:40:58,622 ERROR ql.Driver (SessionState.java:printError(248))
> - FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
>
>
>
> On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain <nj...@facebook.com> wrote:
>
>> What you are doing seems OK ?
>> Can you get the stack trace from /tmp/<username>/hive.log ?
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Matt Pestritto [mailto:matt@pestritto.com]
>> Sent: Wednesday, September 30, 2009 6:51 AM
>> To: hive-dev@hadoop.apache.org; hive-user@hadoop.apache.org
>> Subject: Fwd: Hive-74
>>
>> Including hive-user in case someone has any experience with this..
>> Thanks
>> -Matt
>>
>> ---------- Forwarded message ----------
>> From: Matt Pestritto <ma...@pestritto.com>
>> Date: Tue, Sep 29, 2009 at 5:26 PM
>> Subject: Hive-74
>> To: hive-dev@hadoop.apache.org
>>
>>
>> Hi-
>>
>> I'm having a problem using CombineHiveInputSplit.  I believe this was
>> patched in http://issues.apache.org/jira/browse/HIVE-74
>>
>> I'm currently running hadoop 20.1 using hive trunk.
>>
>> hive-default.xml has the following property:
>> <property>
>>  <name>hive.input.format</name>
>>  <value></value>
>>  <description>The default input format, if it is not specified, the system
>> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
>> 19,
>> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
>> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
>> always be manually set to HiveInputFormat. </description>
>> </property>
>>
>> I added the following to hive-site.xml:  ( Notice, the description in
>> hive-default.xml has CombinedHiveInputFormat which does not work for me -
>> the property value seems to be Combine(-d) )
>> <property>
>>  <name>hive.input.format</name>
>>  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
>>  <description>The default input format, if it is not specified, the system
>> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
>> 19,
>> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
>> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
>> always be manually set to HiveInputFormat. </description>
>> </property>
>>
>> When I launch a job the cli exits immediately:
>> hive> select count(1) from my_table;
>> Total MapReduce jobs = 1
>> Number of reduce tasks determined at compile time: 1
>> In order to change the average load for a reducer (in bytes):
>>  set hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the maximum number of reducers:
>>  set hive.exec.reducers.max=<number>
>> In order to set a constant number of reducers:
>>  set mapred.reduce.tasks=<number>
>> FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.ExecDriver
>> hive> exit ;
>>
>> If I set the property value to
>> org.apache.hadoop.hive.ql.io.HiveInputFormat,
>> the job runs fine.
>>
>> Suggestions ? Is there something that I am missing ?
>>
>> Thanks
>> -Matt
>>
>
>

Re: Hive-74

Posted by Matt Pestritto <ma...@pestritto.com>.
Namit -
Any idea on how to resolve ?
Thanks

On Thu, Oct 1, 2009 at 10:52 AM, Matt Pestritto <ma...@pestritto.com> wrote:

> There were errors in the hive.log
>
> 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2009-10-01 10:40:57,143 WARN  mapred.JobClient
> (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser
> for parsing the arguments. Applications should implement Tool for the same.
> 2009-10-01 10:40:58,609 ERROR exec.ExecDriver
> (SessionState.java:printError(248)) - Ended Job = job_200909301537_0068 with
> errors
> 2009-10-01 10:40:58,622 ERROR ql.Driver (SessionState.java:printError(248))
> - FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
>
>
>
> On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain <nj...@facebook.com> wrote:
>
>> What you are doing seems OK ?
>> Can you get the stack trace from /tmp/<username>/hive.log ?
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Matt Pestritto [mailto:matt@pestritto.com]
>> Sent: Wednesday, September 30, 2009 6:51 AM
>> To: hive-dev@hadoop.apache.org; hive-user@hadoop.apache.org
>> Subject: Fwd: Hive-74
>>
>> Including hive-user in case someone has any experience with this..
>> Thanks
>> -Matt
>>
>> ---------- Forwarded message ----------
>> From: Matt Pestritto <ma...@pestritto.com>
>> Date: Tue, Sep 29, 2009 at 5:26 PM
>> Subject: Hive-74
>> To: hive-dev@hadoop.apache.org
>>
>>
>> Hi-
>>
>> I'm having a problem using CombineHiveInputSplit.  I believe this was
>> patched in http://issues.apache.org/jira/browse/HIVE-74
>>
>> I'm currently running hadoop 20.1 using hive trunk.
>>
>> hive-default.xml has the following property:
>> <property>
>>  <name>hive.input.format</name>
>>  <value></value>
>>  <description>The default input format, if it is not specified, the system
>> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
>> 19,
>> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
>> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
>> always be manually set to HiveInputFormat. </description>
>> </property>
>>
>> I added the following to hive-site.xml:  ( Notice, the description in
>> hive-default.xml has CombinedHiveInputFormat which does not work for me -
>> the property value seems to be Combine(-d) )
>> <property>
>>  <name>hive.input.format</name>
>>  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
>>  <description>The default input format, if it is not specified, the system
>> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and
>> 19,
>> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
>> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
>> always be manually set to HiveInputFormat. </description>
>> </property>
>>
>> When I launch a job the cli exits immediately:
>> hive> select count(1) from my_table;
>> Total MapReduce jobs = 1
>> Number of reduce tasks determined at compile time: 1
>> In order to change the average load for a reducer (in bytes):
>>  set hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the maximum number of reducers:
>>  set hive.exec.reducers.max=<number>
>> In order to set a constant number of reducers:
>>  set mapred.reduce.tasks=<number>
>> FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.ExecDriver
>> hive> exit ;
>>
>> If I set the property value to
>> org.apache.hadoop.hive.ql.io.HiveInputFormat,
>> the job runs fine.
>>
>> Suggestions ? Is there something that I am missing ?
>>
>> Thanks
>> -Matt
>>
>
>

Re: Hive-74

Posted by Matt Pestritto <ma...@pestritto.com>.
There were errors in the hive.log

2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.resources" but it cannot be resolved.
2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.resources" but it cannot be resolved.
2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.runtime" but it cannot be resolved.
2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.runtime" but it cannot be resolved.
2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.text" but it cannot be resolved.
2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.text" but it cannot be resolved.
2009-10-01 10:40:57,143 WARN  mapred.JobClient
(JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the same.
2009-10-01 10:40:58,609 ERROR exec.ExecDriver
(SessionState.java:printError(248)) - Ended Job = job_200909301537_0068 with
errors
2009-10-01 10:40:58,622 ERROR ql.Driver (SessionState.java:printError(248))
- FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver


On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain <nj...@facebook.com> wrote:

> What you are doing seems OK ?
> Can you get the stack trace from /tmp/<username>/hive.log ?
>
>
>
>
>
> -----Original Message-----
> From: Matt Pestritto [mailto:matt@pestritto.com]
> Sent: Wednesday, September 30, 2009 6:51 AM
> To: hive-dev@hadoop.apache.org; hive-user@hadoop.apache.org
> Subject: Fwd: Hive-74
>
> Including hive-user in case someone has any experience with this..
> Thanks
> -Matt
>
> ---------- Forwarded message ----------
> From: Matt Pestritto <ma...@pestritto.com>
> Date: Tue, Sep 29, 2009 at 5:26 PM
> Subject: Hive-74
> To: hive-dev@hadoop.apache.org
>
>
> Hi-
>
> I'm having a problem using CombineHiveInputSplit.  I believe this was
> patched in http://issues.apache.org/jira/browse/HIVE-74
>
> I'm currently running hadoop 20.1 using hive trunk.
>
> hive-default.xml has the following property:
> <property>
>  <name>hive.input.format</name>
>  <value></value>
>  <description>The default input format, if it is not specified, the system
> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
> always be manually set to HiveInputFormat. </description>
> </property>
>
> I added the following to hive-site.xml:  ( Notice, the description in
> hive-default.xml has CombinedHiveInputFormat which does not work for me -
> the property value seems to be Combine(-d) )
> <property>
>  <name>hive.input.format</name>
>  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
>  <description>The default input format, if it is not specified, the system
> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
> always be manually set to HiveInputFormat. </description>
> </property>
>
> When I launch a job the cli exits immediately:
> hive> select count(1) from my_table;
> Total MapReduce jobs = 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>  set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>  set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>  set mapred.reduce.tasks=<number>
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
> hive> exit ;
>
> If I set the property value to
> org.apache.hadoop.hive.ql.io.HiveInputFormat,
> the job runs fine.
>
> Suggestions ? Is there something that I am missing ?
>
> Thanks
> -Matt
>

Re: Hive-74

Posted by Matt Pestritto <ma...@pestritto.com>.
There were errors in the hive.log

2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.resources" but it cannot be resolved.
2009-10-01 10:40:53,631 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.resources" but it cannot be resolved.
2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.runtime" but it cannot be resolved.
2009-10-01 10:40:53,633 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.runtime" but it cannot be resolved.
2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.text" but it cannot be resolved.
2009-10-01 10:40:53,634 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.text" but it cannot be resolved.
2009-10-01 10:40:57,143 WARN  mapred.JobClient
(JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the same.
2009-10-01 10:40:58,609 ERROR exec.ExecDriver
(SessionState.java:printError(248)) - Ended Job = job_200909301537_0068 with
errors
2009-10-01 10:40:58,622 ERROR ql.Driver (SessionState.java:printError(248))
- FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver


On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain <nj...@facebook.com> wrote:

> What you are doing seems OK ?
> Can you get the stack trace from /tmp/<username>/hive.log ?
>
>
>
>
>
> -----Original Message-----
> From: Matt Pestritto [mailto:matt@pestritto.com]
> Sent: Wednesday, September 30, 2009 6:51 AM
> To: hive-dev@hadoop.apache.org; hive-user@hadoop.apache.org
> Subject: Fwd: Hive-74
>
> Including hive-user in case someone has any experience with this..
> Thanks
> -Matt
>
> ---------- Forwarded message ----------
> From: Matt Pestritto <ma...@pestritto.com>
> Date: Tue, Sep 29, 2009 at 5:26 PM
> Subject: Hive-74
> To: hive-dev@hadoop.apache.org
>
>
> Hi-
>
> I'm having a problem using CombineHiveInputSplit.  I believe this was
> patched in http://issues.apache.org/jira/browse/HIVE-74
>
> I'm currently running hadoop 20.1 using hive trunk.
>
> hive-default.xml has the following property:
> <property>
>  <name>hive.input.format</name>
>  <value></value>
>  <description>The default input format, if it is not specified, the system
> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
> always be manually set to HiveInputFormat. </description>
> </property>
>
> I added the following to hive-site.xml:  ( Notice, the description in
> hive-default.xml has CombinedHiveInputFormat which does not work for me -
> the property value seems to be Combine(-d) )
> <property>
>  <name>hive.input.format</name>
>  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
>  <description>The default input format, if it is not specified, the system
> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
> whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
> always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
> always be manually set to HiveInputFormat. </description>
> </property>
>
> When I launch a job the cli exits immediately:
> hive> select count(1) from my_table;
> Total MapReduce jobs = 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>  set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>  set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>  set mapred.reduce.tasks=<number>
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
> hive> exit ;
>
> If I set the property value to
> org.apache.hadoop.hive.ql.io.HiveInputFormat,
> the job runs fine.
>
> Suggestions ? Is there something that I am missing ?
>
> Thanks
> -Matt
>

RE: Hive-74

Posted by Namit Jain <nj...@facebook.com>.
What you are doing seems OK ?
Can you get the stack trace from /tmp/<username>/hive.log ?





-----Original Message-----
From: Matt Pestritto [mailto:matt@pestritto.com] 
Sent: Wednesday, September 30, 2009 6:51 AM
To: hive-dev@hadoop.apache.org; hive-user@hadoop.apache.org
Subject: Fwd: Hive-74

Including hive-user in case someone has any experience with this..
Thanks
-Matt

---------- Forwarded message ----------
From: Matt Pestritto <ma...@pestritto.com>
Date: Tue, Sep 29, 2009 at 5:26 PM
Subject: Hive-74
To: hive-dev@hadoop.apache.org


Hi-

I'm having a problem using CombineHiveInputSplit.  I believe this was
patched in http://issues.apache.org/jira/browse/HIVE-74

I'm currently running hadoop 20.1 using hive trunk.

hive-default.xml has the following property:
<property>
  <name>hive.input.format</name>
  <value></value>
  <description>The default input format, if it is not specified, the system
assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
always be manually set to HiveInputFormat. </description>
</property>

I added the following to hive-site.xml:  ( Notice, the description in
hive-default.xml has CombinedHiveInputFormat which does not work for me -
the property value seems to be Combine(-d) )
<property>
  <name>hive.input.format</name>
  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
  <description>The default input format, if it is not specified, the system
assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
always be manually set to HiveInputFormat. </description>
</property>

When I launch a job the cli exits immediately:
hive> select count(1) from my_table;
Total MapReduce jobs = 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver
hive> exit ;

If I set the property value to org.apache.hadoop.hive.ql.io.HiveInputFormat,
the job runs fine.

Suggestions ? Is there something that I am missing ?

Thanks
-Matt

RE: Hive-74

Posted by Namit Jain <nj...@facebook.com>.
What you are doing seems OK ?
Can you get the stack trace from /tmp/<username>/hive.log ?





-----Original Message-----
From: Matt Pestritto [mailto:matt@pestritto.com] 
Sent: Wednesday, September 30, 2009 6:51 AM
To: hive-dev@hadoop.apache.org; hive-user@hadoop.apache.org
Subject: Fwd: Hive-74

Including hive-user in case someone has any experience with this..
Thanks
-Matt

---------- Forwarded message ----------
From: Matt Pestritto <ma...@pestritto.com>
Date: Tue, Sep 29, 2009 at 5:26 PM
Subject: Hive-74
To: hive-dev@hadoop.apache.org


Hi-

I'm having a problem using CombineHiveInputSplit.  I believe this was
patched in http://issues.apache.org/jira/browse/HIVE-74

I'm currently running hadoop 20.1 using hive trunk.

hive-default.xml has the following property:
<property>
  <name>hive.input.format</name>
  <value></value>
  <description>The default input format, if it is not specified, the system
assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
always be manually set to HiveInputFormat. </description>
</property>

I added the following to hive-site.xml:  ( Notice, the description in
hive-default.xml has CombinedHiveInputFormat which does not work for me -
the property value seems to be Combine(-d) )
<property>
  <name>hive.input.format</name>
  <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
  <description>The default input format, if it is not specified, the system
assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
always be manually set to HiveInputFormat. </description>
</property>

When I launch a job the cli exits immediately:
hive> select count(1) from my_table;
Total MapReduce jobs = 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver
hive> exit ;

If I set the property value to org.apache.hadoop.hive.ql.io.HiveInputFormat,
the job runs fine.

Suggestions ? Is there something that I am missing ?

Thanks
-Matt