You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by "Christopher, Pat" <pa...@hp.com> on 2011/01/27 02:47:03 UTC

Hive Error on medium sized dataset

Hi,
I'm attempting to load a small to medium sized log file, ~250MB, and produce some basic reports from it, counts etc.  Nothing fancy.  However, whenever I try and read the entire dataset, ~330k rows, I get the following error:

  FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

This result gets produced with basic queries like:

  SELECT count(1) FROM medium_table;

However, if do the following:

  SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl;

It works okay until I get to around 70,800ish then I get the first error message again.  I'm running my HDFS system in single node, pseudo distributed mode with 1.5GB of memory and 20 GB of disk as a virtual machine.  And I am using a custom SerDe.  I don't think it's the SerDe but I'm open to suggestions for how I can check if it is causing the problem.  I can't see anything in the data that would be causing it though.

Anyone have any ideas of what might be causing this or something I can check?

Thanks,
Pat

Re: Hive Error on medium sized dataset

Posted by hadoop n00b <ne...@gmail.com>.

Return code 2 essentially means a hadoop error. Congrats on locating and
fixing your issue.

However, can somebody still throw some light on this particular error code?


On Fri, Jan 28, 2011 at 6:16 AM, Christopher, Pat <
patrick.christopher@hp.com> wrote:

>  It was the SerDe.  There was a null pointer error.  It was getting
> reported to a hadoop logfile and not to anywhere in Hive.  I found the
> hadoop log and fixed the problem.
>
>
>
> Thanks for the help!
>
>
>
> Pat
>
>
>
> *From:* Christopher, Pat
> *Sent:* Thursday, January 27, 2011 11:21 AM
>
> *To:* user@hive.apache.org
> *Subject:* RE: Hive Error on medium sized dataset
>
>
>
> I removed the part of the SerDe that handled the arbitrary key/value pairs
> and I was able to process my entire data set.  Sadly the part I removed has
> all the interesting data.
>
>
>
> I’ll play more with the heap settings and see if that lets me process the
> key/value pairs.  Is the below the correct way to set the child heap value?
>
>
>
>
> Thanks,
>
> Pat
>
>
>
> *From:* Christopher, Pat
> *Sent:* Thursday, January 27, 2011 10:27 AM
> *To:* user@hive.apache.org
> *Subject:* RE: Hive Error on medium sized dataset
>
>
>
> It will be tricky to clean up the data format as I’m operating on somewhat
> arbitrary key-value pairs in part of the record.  I will try and create
> something similar though, might take a bit.  Thanks.
>
>
>
> I’ve tried resetting the heap size, I think.  I added the following block
> to my mapred-site.xml:
>
>
>
>   <property>
>
>     <name>mapred.child.java.opts</name>
>
>    <value>-Xm512M</value>
>
>   </property>
>
>
>
> Is that how I’m supposed to do that?
>
>
>
> Thanks,
>
> Pat
>
>
>
> *From:* hadoop n00b [mailto:new2hive@gmail.com]
> *Sent:* Wednesday, January 26, 2011 9:09 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Hive Error on medium sized dataset
>
>
>
> We typically get this error while running complex queries on our 4-node
> setup when the child JVM runs out of heap size. Would be interested in what
> the experts have to say about this error.
>
> On Thu, Jan 27, 2011 at 7:27 AM, Ajo Fod <aj...@gmail.com> wrote:
>
> Any chance you can convert the data to a tab separated text file and try
> the same query?
>
> It may not be the SerDe, but it may be good to isolate that away as a
> potential  source of the problem.
>
> -Ajo.
>
>
>
> On Wed, Jan 26, 2011 at 5:47 PM, Christopher, Pat <
> patrick.christopher@hp.com> wrote:
>
> Hi,
>
> I’m attempting to load a small to medium sized log file, ~250MB, and
> produce some basic reports from it, counts etc.  Nothing fancy.  However,
> whenever I try and read the entire dataset, ~330k rows, I get the following
> error:
>
>
>
>   FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
>
>
> This result gets produced with basic queries like:
>
>
>
>   SELECT count(1) FROM medium_table;
>
>
>
> However, if do the following:
>
>
>
>   SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl;
>
>
>
> It works okay until I get to around 70,800ish then I get the first error
> message again.  I’m running my HDFS system in single node, pseudo
> distributed mode with 1.5GB of memory and 20 GB of disk as a virtual
> machine.  And I am using a custom SerDe.  I don’t think it’s the SerDe but
> I’m open to suggestions for how I can check if it is causing the problem.  I
> can’t see anything in the data that would be causing it though.
>
>
>
> Anyone have any ideas of what might be causing this or something I can
> check?
>
>
>
> Thanks,
>
> Pat
>
>
>
>
>

RE: Hive Error on medium sized dataset

Posted by "Christopher, Pat" <pa...@hp.com>.

It was the SerDe.  There was a null pointer error.  It was getting reported to a hadoop logfile and not to anywhere in Hive.  I found the hadoop log and fixed the problem.

Thanks for the help!

Pat

From: Christopher, Pat
Sent: Thursday, January 27, 2011 11:21 AM
To: user@hive.apache.org
Subject: RE: Hive Error on medium sized dataset

I removed the part of the SerDe that handled the arbitrary key/value pairs and I was able to process my entire data set.  Sadly the part I removed has all the interesting data.

I'll play more with the heap settings and see if that lets me process the key/value pairs.  Is the below the correct way to set the child heap value?

Thanks,
Pat

From: Christopher, Pat
Sent: Thursday, January 27, 2011 10:27 AM
To: user@hive.apache.org
Subject: RE: Hive Error on medium sized dataset

It will be tricky to clean up the data format as I'm operating on somewhat arbitrary key-value pairs in part of the record.  I will try and create something similar though, might take a bit.  Thanks.

I've tried resetting the heap size, I think.  I added the following block to my mapred-site.xml:

  <property>
    <name>mapred.child.java.opts</name>
   <value>-Xm512M</value>
  </property>

Is that how I'm supposed to do that?

Thanks,
Pat

From: hadoop n00b [mailto:new2hive@gmail.com]
Sent: Wednesday, January 26, 2011 9:09 PM
To: user@hive.apache.org
Subject: Re: Hive Error on medium sized dataset

We typically get this error while running complex queries on our 4-node setup when the child JVM runs out of heap size. Would be interested in what the experts have to say about this error.
On Thu, Jan 27, 2011 at 7:27 AM, Ajo Fod <aj...@gmail.com>> wrote:
Any chance you can convert the data to a tab separated text file and try the same query?

It may not be the SerDe, but it may be good to isolate that away as a potential  source of the problem.

-Ajo.

On Wed, Jan 26, 2011 at 5:47 PM, Christopher, Pat <pa...@hp.com>> wrote:
Hi,
I'm attempting to load a small to medium sized log file, ~250MB, and produce some basic reports from it, counts etc.  Nothing fancy.  However, whenever I try and read the entire dataset, ~330k rows, I get the following error:

  FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

This result gets produced with basic queries like:

  SELECT count(1) FROM medium_table;

However, if do the following:

  SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl;

It works okay until I get to around 70,800ish then I get the first error message again.  I'm running my HDFS system in single node, pseudo distributed mode with 1.5GB of memory and 20 GB of disk as a virtual machine.  And I am using a custom SerDe.  I don't think it's the SerDe but I'm open to suggestions for how I can check if it is causing the problem.  I can't see anything in the data that would be causing it though.

Anyone have any ideas of what might be causing this or something I can check?

Thanks,
Pat

RE: Hive Error on medium sized dataset

Posted by "Christopher, Pat" <pa...@hp.com>.

I removed the part of the SerDe that handled the arbitrary key/value pairs and I was able to process my entire data set.  Sadly the part I removed has all the interesting data.

I'll play more with the heap settings and see if that lets me process the key/value pairs.  Is the below the correct way to set the child heap value?

Thanks,
Pat

From: Christopher, Pat
Sent: Thursday, January 27, 2011 10:27 AM
To: user@hive.apache.org
Subject: RE: Hive Error on medium sized dataset

It will be tricky to clean up the data format as I'm operating on somewhat arbitrary key-value pairs in part of the record.  I will try and create something similar though, might take a bit.  Thanks.

I've tried resetting the heap size, I think.  I added the following block to my mapred-site.xml:

  <property>
    <name>mapred.child.java.opts</name>
   <value>-Xm512M</value>
  </property>

Is that how I'm supposed to do that?

Thanks,
Pat

From: hadoop n00b [mailto:new2hive@gmail.com]
Sent: Wednesday, January 26, 2011 9:09 PM
To: user@hive.apache.org
Subject: Re: Hive Error on medium sized dataset

We typically get this error while running complex queries on our 4-node setup when the child JVM runs out of heap size. Would be interested in what the experts have to say about this error.
On Thu, Jan 27, 2011 at 7:27 AM, Ajo Fod <aj...@gmail.com>> wrote:
Any chance you can convert the data to a tab separated text file and try the same query?

It may not be the SerDe, but it may be good to isolate that away as a potential  source of the problem.

-Ajo.

On Wed, Jan 26, 2011 at 5:47 PM, Christopher, Pat <pa...@hp.com>> wrote:
Hi,
I'm attempting to load a small to medium sized log file, ~250MB, and produce some basic reports from it, counts etc.  Nothing fancy.  However, whenever I try and read the entire dataset, ~330k rows, I get the following error:

  FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

This result gets produced with basic queries like:

  SELECT count(1) FROM medium_table;

However, if do the following:

  SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl;

It works okay until I get to around 70,800ish then I get the first error message again.  I'm running my HDFS system in single node, pseudo distributed mode with 1.5GB of memory and 20 GB of disk as a virtual machine.  And I am using a custom SerDe.  I don't think it's the SerDe but I'm open to suggestions for how I can check if it is causing the problem.  I can't see anything in the data that would be causing it though.

Anyone have any ideas of what might be causing this or something I can check?

Thanks,
Pat

RE: Hive Error on medium sized dataset

Posted by "Christopher, Pat" <pa...@hp.com>.

It will be tricky to clean up the data format as I'm operating on somewhat arbitrary key-value pairs in part of the record.  I will try and create something similar though, might take a bit.  Thanks.

I've tried resetting the heap size, I think.  I added the following block to my mapred-site.xml:

  <property>
    <name>mapred.child.java.opts</name>
   <value>-Xm512M</value>
  </property>

Is that how I'm supposed to do that?

Thanks,
Pat

From: hadoop n00b [mailto:new2hive@gmail.com]
Sent: Wednesday, January 26, 2011 9:09 PM
To: user@hive.apache.org
Subject: Re: Hive Error on medium sized dataset

We typically get this error while running complex queries on our 4-node setup when the child JVM runs out of heap size. Would be interested in what the experts have to say about this error.
On Thu, Jan 27, 2011 at 7:27 AM, Ajo Fod <aj...@gmail.com>> wrote:
Any chance you can convert the data to a tab separated text file and try the same query?

It may not be the SerDe, but it may be good to isolate that away as a potential  source of the problem.

-Ajo.

On Wed, Jan 26, 2011 at 5:47 PM, Christopher, Pat <pa...@hp.com>> wrote:
Hi,
I'm attempting to load a small to medium sized log file, ~250MB, and produce some basic reports from it, counts etc.  Nothing fancy.  However, whenever I try and read the entire dataset, ~330k rows, I get the following error:

  FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

This result gets produced with basic queries like:

  SELECT count(1) FROM medium_table;

However, if do the following:

  SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl;

It works okay until I get to around 70,800ish then I get the first error message again.  I'm running my HDFS system in single node, pseudo distributed mode with 1.5GB of memory and 20 GB of disk as a virtual machine.  And I am using a custom SerDe.  I don't think it's the SerDe but I'm open to suggestions for how I can check if it is causing the problem.  I can't see anything in the data that would be causing it though.

Anyone have any ideas of what might be causing this or something I can check?

Thanks,
Pat

Re: Hive Error on medium sized dataset

Posted by hadoop n00b <ne...@gmail.com>.

We typically get this error while running complex queries on our 4-node
setup when the child JVM runs out of heap size. Would be interested in what
the experts have to say about this error.

On Thu, Jan 27, 2011 at 7:27 AM, Ajo Fod <aj...@gmail.com> wrote:

> Any chance you can convert the data to a tab separated text file and try
> the same query?
>
> It may not be the SerDe, but it may be good to isolate that away as a
> potential  source of the problem.
>
> -Ajo.
>
>
> On Wed, Jan 26, 2011 at 5:47 PM, Christopher, Pat <
> patrick.christopher@hp.com> wrote:
>
>>  Hi,
>>
>> I’m attempting to load a small to medium sized log file, ~250MB, and
>> produce some basic reports from it, counts etc.  Nothing fancy.  However,
>> whenever I try and read the entire dataset, ~330k rows, I get the following
>> error:
>>
>>
>>
>>   FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.MapRedTask
>>
>>
>>
>> This result gets produced with basic queries like:
>>
>>
>>
>>   SELECT count(1) FROM medium_table;
>>
>>
>>
>> However, if do the following:
>>
>>
>>
>>   SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl;
>>
>>
>>
>> It works okay until I get to around 70,800ish then I get the first error
>> message again.  I’m running my HDFS system in single node, pseudo
>> distributed mode with 1.5GB of memory and 20 GB of disk as a virtual
>> machine.  And I am using a custom SerDe.  I don’t think it’s the SerDe but
>> I’m open to suggestions for how I can check if it is causing the problem.  I
>> can’t see anything in the data that would be causing it though.
>>
>>
>>
>> Anyone have any ideas of what might be causing this or something I can
>> check?
>>
>>
>>
>> Thanks,
>>
>> Pat
>>
>
>

Re: Hive Error on medium sized dataset

Posted by Ajo Fod <aj...@gmail.com>.

Any chance you can convert the data to a tab separated text file and try the
same query?

It may not be the SerDe, but it may be good to isolate that away as a
potential  source of the problem.

-Ajo.

On Wed, Jan 26, 2011 at 5:47 PM, Christopher, Pat <
patrick.christopher@hp.com> wrote:

> Hi,
>
> I’m attempting to load a small to medium sized log file, ~250MB, and
> produce some basic reports from it, counts etc.  Nothing fancy.  However,
> whenever I try and read the entire dataset, ~330k rows, I get the following
> error:
>
>
>
>   FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
>
>
> This result gets produced with basic queries like:
>
>
>
>   SELECT count(1) FROM medium_table;
>
>
>
> However, if do the following:
>
>
>
>   SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl;
>
>
>
> It works okay until I get to around 70,800ish then I get the first error
> message again.  I’m running my HDFS system in single node, pseudo
> distributed mode with 1.5GB of memory and 20 GB of disk as a virtual
> machine.  And I am using a custom SerDe.  I don’t think it’s the SerDe but
> I’m open to suggestions for how I can check if it is causing the problem.  I
> can’t see anything in the data that would be causing it though.
>
>
>
> Anyone have any ideas of what might be causing this or something I can
> check?
>
>
>
> Thanks,
>
> Pat
>