You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Nathanial Thelen <na...@natethelen.com> on 2013/09/03 23:03:35 UTC

Problems with 0.11, count(DISTINCT), and NPE

I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been getting NullPointerExceptions (NPE) for certain queries in our staging environment.  Only difference between stage and production is the amount of traffic we get so the data set is much smaller.  We are not using any custom code.

I have greatly simplified the query down to the bare minimum that will cause the error:

SELECT
    count(DISTINCT ag.adGroupGuid) as groups,
    count(DISTINCT av.adViewGuid) as ads,
    count(DISTINCT ac.adViewGuid) as uniqueClicks
FROM
    adgroup ag
    INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
    LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid

This will return the following before any Map Reduce jobs start:

FAILED: NullPointerException null

Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I see this error:

2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 94324 file count: 20 directory count: 1
2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 142609 file count: 21 directory count: 1
2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 65519 file count: 21 directory count: 1
2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 205096 file count: 20 directory count: 1
2013-09-03 18:09:19,800 INFO  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable
2013-09-03 18:09:19,801 INFO  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans
2013-09-03 18:09:19,801 INFO  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable
2013-09-03 18:09:19,801 INFO  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans
2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver (SessionState.java:printError(386)) - FAILED: NullPointerException null
java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
	at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
	at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:187)

The same error also happens if I do an INNER JOIN to adclick, FYI.

I have checked that there are not any null values for any of the columns referenced in the query. 

Making almost any changes to the query results it in successfully running.  Here are some I have tried:

Removed JOIN to adgroup:

SELECT
    count(DISTINCT av.adViewGuid) as ads,
    count(DISTINCT ac.adViewGuid) as uniqueClicks
FROM
    adview av
    LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;

WORKS:
561	6

Removed JOIN to adclick:

SELECT
    count(DISTINCT ag.adGroupGuid) as groups,
    count(DISTINCT av.adViewGuid) as ads
FROM
    adgroup ag
    INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid;

WORKS:
543	561

Removing DISTINCT from any of the 3 counts

SELECT
    count(DISTINCT ag.adGroupGuid) as groups,
    count(DISTINCT av.adViewGuid) as ads,
    count(ac.adViewGuid) as uniqueClicks
FROM
    adgroup ag
    INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
    LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;

WORKS:
543	561	7


SELECT
    count(ag.adGroupGuid) as groups,
    count(DISTINCT av.adViewGuid) as ads,
    count(DISTINCT ac.adViewGuid) as uniqueClicks
FROM
    adgroup ag
    INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
    LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;

WORKS:
562	561	6


SELECT
    count(DISTINCT ag.adGroupGuid) as groups,
    count(av.adViewGuid) as ads,
    count(DISTINCT ac.adViewGuid) as uniqueClicks
FROM
    adgroup ag
    INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
    LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;

WORKS:
543	562	6 

I am not exactly sure what to do next.  Thoughts?

Nate

Re: Inner Map key and value separators

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Ah….
While my BeeHive gently weeps !

Thanks

sanjay

From: Dean Wampler <de...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Friday, September 13, 2013 5:10 PM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Cc: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Re: Inner Map key and value separators

Unfortunately, I believe there's no way to do this.

Sent from my rotary phone.

On Sep 13, 2013, at 6:42 PM, Sanjay Subramanian <Sa...@wizecommerce.com>> wrote:

Hi guys

I have to load data into the following data type in hive

map <string,map<string,string>>

Is there a way to define custom SEPARATORS (while creating the table) for
- Inner map collection item
- Inner map key

delimiters for 2nd-level maps are \004 and \005 per this

http://mail-archives.apache.org/mod_mbox/hadoop-hive-user/201009.mbox/%3C4F6B25AFFFCAFE44B6259A412D5F9B101C07A390@ExchMBX104.netflix.com%3E

Thanks

sanjay

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Inner Map key and value separators

Posted by Dean Wampler <de...@gmail.com>.

Unfortunately, I believe there's no way to do this. 

Sent from my rotary phone. 


On Sep 13, 2013, at 6:42 PM, Sanjay Subramanian <Sa...@wizecommerce.com> wrote:

> Hi guys
> 
> I have to load data into the following data type in hive 
> 
> map <string,map<string,string>> 
> 
> Is there a way to define custom SEPARATORS (while creating the table) for 
> - Inner map collection item
> - Inner map key 
> delimiters for 2nd-level maps are \004 and \005 per this 
> http://mail-archives.apache.org/mod_mbox/hadoop-hive-user/201009.mbox/%3C4F6B25AFFFCAFE44B6259A412D5F9B101C07A390@ExchMBX104.netflix.com%3E
> 
> Thanks
> 
> sanjay
> 
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Inner Map key and value separators

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Hi guys

I have to load data into the following data type in hive

map <string,map<string,string>>

Is there a way to define custom SEPARATORS (while creating the table) for
- Inner map collection item
- Inner map key

delimiters for 2nd-level maps are \004 and \005 per this

http://mail-archives.apache.org/mod_mbox/hadoop-hive-user/201009.mbox/%3C4F6B25AFFFCAFE44B6259A412D5F9B101C07A390@ExchMBX104.netflix.com%3E

Thanks

sanjay

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Problems with 0.11, count(DISTINCT), and NPE

Posted by Yin Huai <hu...@gmail.com>.

yeah... when hive.auto.convert.join.noconditionaltask=false; we cannot
merge multiple Map-joins into a single Map phase... I think that trunk
should work when hive.auto.convert.join.noconditionaltask=true.

Thanks,

Yin


On Wed, Sep 4, 2013 at 4:03 PM, Nathanial Thelen <na...@natethelen.com>wrote:

> Doing:
>
> set hive.auto.convert.join.noconditionaltask=false;
>
>
> makes it work (though it does way more map reduce jobs than it should).
>  When I get some time I will test against the latest trunk.
>
> Thanks,
> Nate
>
>
> On Sep 3, 2013, at 6:09 PM, Yin Huai <hu...@gmail.com> wrote:
>
> Based on the log, it may be also related to
> https://issues.apache.org/jira/browse/HIVE-4927. To make it work (in a
> not very optimized way), can you try "set
> hive.auto.convert.join.noconditionaltask=false;" ? If you still get the
> error, give "set hive.auto.convert.join=false;" a try (it will turn off map
> join auto convert, so you will use reduce-side join).
>
> Thanks,
>
> Yin
>
>
> On Tue, Sep 3, 2013 at 6:03 PM, Ashutosh Chauhan <ha...@apache.org>wrote:
>
>> Not sure about EMR. Your best bet is to ask on EMR forums.
>>
>> Thanks,
>> Ashutosh
>>
>>
>> On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen <na...@natethelen.com>wrote:
>>
>>> Is there a way to run a patch on EMR?
>>>
>>> Thanks,
>>> Nate
>>>
>>> On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan <ha...@apache.org>
>>> wrote:
>>>
>>> Fix in very related area has been checked in trunk today :
>>> https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix
>>> your issue.
>>> Can you try latest trunk?
>>>
>>> Ashutosh
>>>
>>>
>>> On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen <na...@natethelen.com>wrote:
>>>
>>>> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I
>>>> have been getting NullPointerExceptions (NPE) for certain queries in our
>>>> staging environment.  Only difference between stage and production is the
>>>> amount of traffic we get so the data set is much smaller.  We are not using
>>>> any custom code.
>>>>
>>>> I have greatly simplified the query down to the bare minimum that will
>>>> cause the error:
>>>>
>>>> SELECT
>>>>     count(DISTINCT ag.adGroupGuid) as groups,
>>>>     count(DISTINCT av.adViewGuid) as ads,
>>>>     count(DISTINCT ac.adViewGuid) as uniqueClicks
>>>> FROM
>>>>     adgroup ag
>>>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
>>>>
>>>> This will return the following before any Map Reduce jobs start:
>>>>
>>>> FAILED: NullPointerException null
>>>>
>>>> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and
>>>> scanning, I see this error:
>>>>
>>>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
>>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>>>> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29
>>>> length: 94324 file count: 20 directory count: 1
>>>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
>>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>>>> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length:
>>>> 142609 file count: 21 directory count: 1
>>>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
>>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>>>> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30
>>>> length: 65519 file count: 21 directory count: 1
>>>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
>>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>>>> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length:
>>>> 205096 file count: 20 directory count: 1
>>>> 2013-09-03 18:09:19,800 INFO
>>>>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>>>> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
>>>> optimization is applicable
>>>> 2013-09-03 18:09:19,801 INFO
>>>>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>>>> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table
>>>> scans
>>>> 2013-09-03 18:09:19,801 INFO
>>>>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>>>> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
>>>> optimization is applicable
>>>> 2013-09-03 18:09:19,801 INFO
>>>>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>>>> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table
>>>> scans
>>>> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver
>>>> (SessionState.java:printError(386)) - FAILED: NullPointerException null
>>>> java.lang.NullPointerException
>>>> at
>>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
>>>>  at
>>>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
>>>> at
>>>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
>>>>  at
>>>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
>>>> at
>>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
>>>>  at
>>>> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
>>>> at
>>>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
>>>>  at
>>>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
>>>> at
>>>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
>>>>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
>>>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>>>>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>>>> at
>>>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
>>>>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
>>>> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
>>>>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
>>>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
>>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>  at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
>>>>
>>>> The same error also happens if I do an INNER JOIN to adclick, FYI.
>>>>
>>>> I have checked that there are not any null values for any of the
>>>> columns referenced in the query.
>>>>
>>>> Making almost any changes to the query results it in successfully
>>>> running.  Here are some I have tried:
>>>>
>>>> Removed JOIN to adgroup:
>>>>
>>>> SELECT
>>>>     count(DISTINCT av.adViewGuid) as ads,
>>>>     count(DISTINCT ac.adViewGuid) as uniqueClicks
>>>> FROM
>>>>     adview av
>>>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>>>
>>>> WORKS:
>>>> 561 6
>>>>
>>>> Removed JOIN to adclick:
>>>>
>>>> SELECT
>>>>     count(DISTINCT ag.adGroupGuid) as groups,
>>>>     count(DISTINCT av.adViewGuid) as ads
>>>> FROM
>>>>     adgroup ag
>>>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid;
>>>>
>>>> WORKS:
>>>> 543 561
>>>>
>>>> Removing DISTINCT from any of the 3 counts
>>>>
>>>> SELECT
>>>>     count(DISTINCT ag.adGroupGuid) as groups,
>>>>     count(DISTINCT av.adViewGuid) as ads,
>>>>     count(ac.adViewGuid) as uniqueClicks
>>>> FROM
>>>>     adgroup ag
>>>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>>>
>>>> WORKS:
>>>> 543 561 7
>>>>
>>>>
>>>> SELECT
>>>>     count(ag.adGroupGuid) as groups,
>>>>     count(DISTINCT av.adViewGuid) as ads,
>>>>     count(DISTINCT ac.adViewGuid) as uniqueClicks
>>>> FROM
>>>>     adgroup ag
>>>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>>>
>>>> WORKS:
>>>> 562 561 6
>>>>
>>>>
>>>> SELECT
>>>>     count(DISTINCT ag.adGroupGuid) as groups,
>>>>     count(av.adViewGuid) as ads,
>>>>     count(DISTINCT ac.adViewGuid) as uniqueClicks
>>>> FROM
>>>>     adgroup ag
>>>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>>>
>>>> WORKS:
>>>> 543 562 6
>>>>
>>>> I am not exactly sure what to do next.  Thoughts?
>>>>
>>>> Nate
>>>>
>>>
>>>
>>>
>>
>
>

Re: Problems with 0.11, count(DISTINCT), and NPE

Posted by Nathanial Thelen <na...@natethelen.com>.

Doing:

> set hive.auto.convert.join.noconditionaltask=false;

makes it work (though it does way more map reduce jobs than it should).  When I get some time I will test against the latest trunk.

Thanks,
Nate


On Sep 3, 2013, at 6:09 PM, Yin Huai <hu...@gmail.com> wrote:

> Based on the log, it may be also related to https://issues.apache.org/jira/browse/HIVE-4927. To make it work (in a not very optimized way), can you try "set hive.auto.convert.join.noconditionaltask=false;" ? If you still get the error, give "set hive.auto.convert.join=false;" a try (it will turn off map join auto convert, so you will use reduce-side join). 
> 
> Thanks,
> 
> Yin
> 
> 
> On Tue, Sep 3, 2013 at 6:03 PM, Ashutosh Chauhan <ha...@apache.org> wrote:
> Not sure about EMR. Your best bet is to ask on EMR forums.
> 
> Thanks,
> Ashutosh
> 
> 
> On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen <na...@natethelen.com> wrote:
> Is there a way to run a patch on EMR?
> 
> Thanks,
> Nate
> 
> On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan <ha...@apache.org> wrote:
> 
>> Fix in very related area has been checked in trunk today : https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your issue. 
>> Can you try latest trunk?
>> 
>> Ashutosh
>> 
>> 
>> On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen <na...@natethelen.com> wrote:
>> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been getting NullPointerExceptions (NPE) for certain queries in our staging environment.  Only difference between stage and production is the amount of traffic we get so the data set is much smaller.  We are not using any custom code.
>> 
>> I have greatly simplified the query down to the bare minimum that will cause the error:
>> 
>> SELECT
>>     count(DISTINCT ag.adGroupGuid) as groups,
>>     count(DISTINCT av.adViewGuid) as ads,
>>     count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>>     adgroup ag
>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
>> 
>> This will return the following before any Map Reduce jobs start:
>> 
>> FAILED: NullPointerException null
>> 
>> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I see this error:
>> 
>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 94324 file count: 20 directory count: 1
>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 142609 file count: 21 directory count: 1
>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 65519 file count: 21 directory count: 1
>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 205096 file count: 20 directory count: 1
>> 2013-09-03 18:09:19,800 INFO  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable
>> 2013-09-03 18:09:19,801 INFO  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans
>> 2013-09-03 18:09:19,801 INFO  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable
>> 2013-09-03 18:09:19,801 INFO  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans
>> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver (SessionState.java:printError(386)) - FAILED: NullPointerException null
>> java.lang.NullPointerException
>> 	at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
>> 	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
>> 	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
>> 	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
>> 	at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
>> 	at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
>> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
>> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
>> 	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
>> 	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
>> 	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>> 	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
>> 	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
>> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
>> 	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
>> 	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> 	at java.lang.reflect.Method.invoke(Method.java:606)
>> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
>> 
>> The same error also happens if I do an INNER JOIN to adclick, FYI.
>> 
>> I have checked that there are not any null values for any of the columns referenced in the query. 
>> 
>> Making almost any changes to the query results it in successfully running.  Here are some I have tried:
>> 
>> Removed JOIN to adgroup:
>> 
>> SELECT
>>     count(DISTINCT av.adViewGuid) as ads,
>>     count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>>     adview av
>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>> 
>> WORKS:
>> 561	6
>> 
>> Removed JOIN to adclick:
>> 
>> SELECT
>>     count(DISTINCT ag.adGroupGuid) as groups,
>>     count(DISTINCT av.adViewGuid) as ads
>> FROM
>>     adgroup ag
>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid;
>> 
>> WORKS:
>> 543	561
>> 
>> Removing DISTINCT from any of the 3 counts
>> 
>> SELECT
>>     count(DISTINCT ag.adGroupGuid) as groups,
>>     count(DISTINCT av.adViewGuid) as ads,
>>     count(ac.adViewGuid) as uniqueClicks
>> FROM
>>     adgroup ag
>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>> 
>> WORKS:
>> 543	561	7
>> 
>> 
>> SELECT
>>     count(ag.adGroupGuid) as groups,
>>     count(DISTINCT av.adViewGuid) as ads,
>>     count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>>     adgroup ag
>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>> 
>> WORKS:
>> 562	561	6
>> 
>> 
>> SELECT
>>     count(DISTINCT ag.adGroupGuid) as groups,
>>     count(av.adViewGuid) as ads,
>>     count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>>     adgroup ag
>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>> 
>> WORKS:
>> 543	562	6 
>> 
>> I am not exactly sure what to do next.  Thoughts?
>> 
>> Nate
>> 
> 
> 
>

Re: Problems with 0.11, count(DISTINCT), and NPE

Posted by Yin Huai <hu...@gmail.com>.

Based on the log, it may be also related to
https://issues.apache.org/jira/browse/HIVE-4927. To make it work (in a not
very optimized way), can you try "set
hive.auto.convert.join.noconditionaltask=false;" ? If you still get the
error, give "set hive.auto.convert.join=false;" a try (it will turn off map
join auto convert, so you will use reduce-side join).

Thanks,

Yin


On Tue, Sep 3, 2013 at 6:03 PM, Ashutosh Chauhan <ha...@apache.org>wrote:

> Not sure about EMR. Your best bet is to ask on EMR forums.
>
> Thanks,
> Ashutosh
>
>
> On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen <na...@natethelen.com>wrote:
>
>> Is there a way to run a patch on EMR?
>>
>> Thanks,
>> Nate
>>
>> On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan <ha...@apache.org>
>> wrote:
>>
>> Fix in very related area has been checked in trunk today :
>> https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix
>> your issue.
>> Can you try latest trunk?
>>
>> Ashutosh
>>
>>
>> On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen <na...@natethelen.com>wrote:
>>
>>> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have
>>> been getting NullPointerExceptions (NPE) for certain queries in our staging
>>> environment.  Only difference between stage and production is the amount of
>>> traffic we get so the data set is much smaller.  We are not using any
>>> custom code.
>>>
>>> I have greatly simplified the query down to the bare minimum that will
>>> cause the error:
>>>
>>> SELECT
>>>     count(DISTINCT ag.adGroupGuid) as groups,
>>>     count(DISTINCT av.adViewGuid) as ads,
>>>     count(DISTINCT ac.adViewGuid) as uniqueClicks
>>> FROM
>>>     adgroup ag
>>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
>>>
>>> This will return the following before any Map Reduce jobs start:
>>>
>>> FAILED: NullPointerException null
>>>
>>> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning,
>>> I see this error:
>>>
>>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>>> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29
>>> length: 94324 file count: 20 directory count: 1
>>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>>> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length:
>>> 142609 file count: 21 directory count: 1
>>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>>> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30
>>> length: 65519 file count: 21 directory count: 1
>>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>>> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length:
>>> 205096 file count: 20 directory count: 1
>>> 2013-09-03 18:09:19,800 INFO
>>>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>>> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
>>> optimization is applicable
>>> 2013-09-03 18:09:19,801 INFO
>>>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>>> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table
>>> scans
>>> 2013-09-03 18:09:19,801 INFO
>>>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>>> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
>>> optimization is applicable
>>> 2013-09-03 18:09:19,801 INFO
>>>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>>> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table
>>> scans
>>> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver
>>> (SessionState.java:printError(386)) - FAILED: NullPointerException null
>>> java.lang.NullPointerException
>>> at
>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
>>>  at
>>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
>>> at
>>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
>>>  at
>>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
>>> at
>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
>>>  at
>>> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
>>> at
>>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
>>>  at
>>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
>>> at
>>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
>>>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
>>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>>>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>>> at
>>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
>>>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
>>> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
>>>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
>>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>  at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
>>>
>>> The same error also happens if I do an INNER JOIN to adclick, FYI.
>>>
>>> I have checked that there are not any null values for any of the columns
>>> referenced in the query.
>>>
>>> Making almost any changes to the query results it in successfully
>>> running.  Here are some I have tried:
>>>
>>> Removed JOIN to adgroup:
>>>
>>> SELECT
>>>     count(DISTINCT av.adViewGuid) as ads,
>>>     count(DISTINCT ac.adViewGuid) as uniqueClicks
>>> FROM
>>>     adview av
>>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>>
>>> WORKS:
>>> 561 6
>>>
>>> Removed JOIN to adclick:
>>>
>>> SELECT
>>>     count(DISTINCT ag.adGroupGuid) as groups,
>>>     count(DISTINCT av.adViewGuid) as ads
>>> FROM
>>>     adgroup ag
>>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid;
>>>
>>> WORKS:
>>> 543 561
>>>
>>> Removing DISTINCT from any of the 3 counts
>>>
>>> SELECT
>>>     count(DISTINCT ag.adGroupGuid) as groups,
>>>     count(DISTINCT av.adViewGuid) as ads,
>>>     count(ac.adViewGuid) as uniqueClicks
>>> FROM
>>>     adgroup ag
>>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>>
>>> WORKS:
>>> 543 561 7
>>>
>>>
>>> SELECT
>>>     count(ag.adGroupGuid) as groups,
>>>     count(DISTINCT av.adViewGuid) as ads,
>>>     count(DISTINCT ac.adViewGuid) as uniqueClicks
>>> FROM
>>>     adgroup ag
>>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>>
>>> WORKS:
>>> 562 561 6
>>>
>>>
>>> SELECT
>>>     count(DISTINCT ag.adGroupGuid) as groups,
>>>     count(av.adViewGuid) as ads,
>>>     count(DISTINCT ac.adViewGuid) as uniqueClicks
>>> FROM
>>>     adgroup ag
>>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>>
>>> WORKS:
>>> 543 562 6
>>>
>>> I am not exactly sure what to do next.  Thoughts?
>>>
>>> Nate
>>>
>>
>>
>>
>

Re: Problems with 0.11, count(DISTINCT), and NPE

Posted by Ashutosh Chauhan <ha...@apache.org>.

Not sure about EMR. Your best bet is to ask on EMR forums.

Thanks,
Ashutosh


On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen <na...@natethelen.com>wrote:

> Is there a way to run a patch on EMR?
>
> Thanks,
> Nate
>
> On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan <ha...@apache.org> wrote:
>
> Fix in very related area has been checked in trunk today :
> https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your
> issue.
> Can you try latest trunk?
>
> Ashutosh
>
>
> On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen <na...@natethelen.com>wrote:
>
>> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have
>> been getting NullPointerExceptions (NPE) for certain queries in our staging
>> environment.  Only difference between stage and production is the amount of
>> traffic we get so the data set is much smaller.  We are not using any
>> custom code.
>>
>> I have greatly simplified the query down to the bare minimum that will
>> cause the error:
>>
>> SELECT
>>     count(DISTINCT ag.adGroupGuid) as groups,
>>     count(DISTINCT av.adViewGuid) as ads,
>>     count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>>     adgroup ag
>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
>>
>> This will return the following before any Map Reduce jobs start:
>>
>> FAILED: NullPointerException null
>>
>> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning,
>> I see this error:
>>
>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29
>> length: 94324 file count: 20 directory count: 1
>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length:
>> 142609 file count: 21 directory count: 1
>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30
>> length: 65519 file count: 21 directory count: 1
>> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length:
>> 205096 file count: 20 directory count: 1
>> 2013-09-03 18:09:19,800 INFO
>>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
>> optimization is applicable
>> 2013-09-03 18:09:19,801 INFO
>>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table
>> scans
>> 2013-09-03 18:09:19,801 INFO
>>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
>> optimization is applicable
>> 2013-09-03 18:09:19,801 INFO
>>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table
>> scans
>> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver
>> (SessionState.java:printError(386)) - FAILED: NullPointerException null
>> java.lang.NullPointerException
>> at
>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
>>  at
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
>> at
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
>>  at
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
>> at
>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
>>  at
>> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
>> at
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
>>  at
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
>> at
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
>>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>> at
>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
>>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
>> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
>>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>  at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
>>
>> The same error also happens if I do an INNER JOIN to adclick, FYI.
>>
>> I have checked that there are not any null values for any of the columns
>> referenced in the query.
>>
>> Making almost any changes to the query results it in successfully
>> running.  Here are some I have tried:
>>
>> Removed JOIN to adgroup:
>>
>> SELECT
>>     count(DISTINCT av.adViewGuid) as ads,
>>     count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>>     adview av
>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>
>> WORKS:
>> 561 6
>>
>> Removed JOIN to adclick:
>>
>> SELECT
>>     count(DISTINCT ag.adGroupGuid) as groups,
>>     count(DISTINCT av.adViewGuid) as ads
>> FROM
>>     adgroup ag
>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid;
>>
>> WORKS:
>> 543 561
>>
>> Removing DISTINCT from any of the 3 counts
>>
>> SELECT
>>     count(DISTINCT ag.adGroupGuid) as groups,
>>     count(DISTINCT av.adViewGuid) as ads,
>>     count(ac.adViewGuid) as uniqueClicks
>> FROM
>>     adgroup ag
>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>
>> WORKS:
>> 543 561 7
>>
>>
>> SELECT
>>     count(ag.adGroupGuid) as groups,
>>     count(DISTINCT av.adViewGuid) as ads,
>>     count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>>     adgroup ag
>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>
>> WORKS:
>> 562 561 6
>>
>>
>> SELECT
>>     count(DISTINCT ag.adGroupGuid) as groups,
>>     count(av.adViewGuid) as ads,
>>     count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>>     adgroup ag
>>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>
>> WORKS:
>> 543 562 6
>>
>> I am not exactly sure what to do next.  Thoughts?
>>
>> Nate
>>
>
>
>

Re: Problems with 0.11, count(DISTINCT), and NPE

Posted by Nathanial Thelen <na...@natethelen.com>.

Is there a way to run a patch on EMR?

Thanks,
Nate

On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan <ha...@apache.org> wrote:

> Fix in very related area has been checked in trunk today : https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your issue. 
> Can you try latest trunk?
> 
> Ashutosh
> 
> 
> On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen <na...@natethelen.com> wrote:
> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been getting NullPointerExceptions (NPE) for certain queries in our staging environment.  Only difference between stage and production is the amount of traffic we get so the data set is much smaller.  We are not using any custom code.
> 
> I have greatly simplified the query down to the bare minimum that will cause the error:
> 
> SELECT
>     count(DISTINCT ag.adGroupGuid) as groups,
>     count(DISTINCT av.adViewGuid) as ads,
>     count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
> 
> This will return the following before any Map Reduce jobs start:
> 
> FAILED: NullPointerException null
> 
> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I see this error:
> 
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 94324 file count: 20 directory count: 1
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 142609 file count: 21 directory count: 1
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 65519 file count: 21 directory count: 1
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 205096 file count: 20 directory count: 1
> 2013-09-03 18:09:19,800 INFO  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable
> 2013-09-03 18:09:19,801 INFO  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans
> 2013-09-03 18:09:19,801 INFO  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable
> 2013-09-03 18:09:19,801 INFO  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans
> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver (SessionState.java:printError(386)) - FAILED: NullPointerException null
> java.lang.NullPointerException
> 	at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
> 	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
> 	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
> 	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
> 	at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
> 	at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
> 	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
> 	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
> 	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
> 	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
> 	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
> 	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
> 
> The same error also happens if I do an INNER JOIN to adclick, FYI.
> 
> I have checked that there are not any null values for any of the columns referenced in the query. 
> 
> Making almost any changes to the query results it in successfully running.  Here are some I have tried:
> 
> Removed JOIN to adgroup:
> 
> SELECT
>     count(DISTINCT av.adViewGuid) as ads,
>     count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
>     adview av
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
> 
> WORKS:
> 561	6
> 
> Removed JOIN to adclick:
> 
> SELECT
>     count(DISTINCT ag.adGroupGuid) as groups,
>     count(DISTINCT av.adViewGuid) as ads
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid;
> 
> WORKS:
> 543	561
> 
> Removing DISTINCT from any of the 3 counts
> 
> SELECT
>     count(DISTINCT ag.adGroupGuid) as groups,
>     count(DISTINCT av.adViewGuid) as ads,
>     count(ac.adViewGuid) as uniqueClicks
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
> 
> WORKS:
> 543	561	7
> 
> 
> SELECT
>     count(ag.adGroupGuid) as groups,
>     count(DISTINCT av.adViewGuid) as ads,
>     count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
> 
> WORKS:
> 562	561	6
> 
> 
> SELECT
>     count(DISTINCT ag.adGroupGuid) as groups,
>     count(av.adViewGuid) as ads,
>     count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
> 
> WORKS:
> 543	562	6 
> 
> I am not exactly sure what to do next.  Thoughts?
> 
> Nate
>

Re: Problems with 0.11, count(DISTINCT), and NPE

Posted by Ashutosh Chauhan <ha...@apache.org>.

Fix in very related area has been checked in trunk today :
https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your
issue.
Can you try latest trunk?

Ashutosh


On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen <na...@natethelen.com>wrote:

> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have
> been getting NullPointerExceptions (NPE) for certain queries in our staging
> environment.  Only difference between stage and production is the amount of
> traffic we get so the data set is much smaller.  We are not using any
> custom code.
>
> I have greatly simplified the query down to the bare minimum that will
> cause the error:
>
> SELECT
>     count(DISTINCT ag.adGroupGuid) as groups,
>     count(DISTINCT av.adViewGuid) as ads,
>     count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
>
> This will return the following before any Map Reduce jobs start:
>
> FAILED: NullPointerException null
>
> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I
> see this error:
>
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29
> length: 94324 file count: 20 directory count: 1
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length:
> 142609 file count: 21 directory count: 1
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30
> length: 65519 file count: 21 directory count: 1
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length:
> 205096 file count: 20 directory count: 1
> 2013-09-03 18:09:19,800 INFO
>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
> optimization is applicable
> 2013-09-03 18:09:19,801 INFO
>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table
> scans
> 2013-09-03 18:09:19,801 INFO
>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
> optimization is applicable
> 2013-09-03 18:09:19,801 INFO
>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table
> scans
> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver
> (SessionState.java:printError(386)) - FAILED: NullPointerException null
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
>  at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
>  at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
> at
> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
>  at
> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
>  at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
>
> The same error also happens if I do an INNER JOIN to adclick, FYI.
>
> I have checked that there are not any null values for any of the columns
> referenced in the query.
>
> Making almost any changes to the query results it in successfully running.
>  Here are some I have tried:
>
> Removed JOIN to adgroup:
>
> SELECT
>     count(DISTINCT av.adViewGuid) as ads,
>     count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
>     adview av
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>
> WORKS:
> 561 6
>
> Removed JOIN to adclick:
>
> SELECT
>     count(DISTINCT ag.adGroupGuid) as groups,
>     count(DISTINCT av.adViewGuid) as ads
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid;
>
> WORKS:
> 543 561
>
> Removing DISTINCT from any of the 3 counts
>
> SELECT
>     count(DISTINCT ag.adGroupGuid) as groups,
>     count(DISTINCT av.adViewGuid) as ads,
>     count(ac.adViewGuid) as uniqueClicks
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>
> WORKS:
> 543 561 7
>
>
> SELECT
>     count(ag.adGroupGuid) as groups,
>     count(DISTINCT av.adViewGuid) as ads,
>     count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>
> WORKS:
> 562 561 6
>
>
> SELECT
>     count(DISTINCT ag.adGroupGuid) as groups,
>     count(av.adViewGuid) as ads,
>     count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>
> WORKS:
> 543 562 6
>
> I am not exactly sure what to do next.  Thoughts?
>
> Nate
>