You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Nathanial Thelen <na...@natethelen.com> on 2013/09/03 23:03:35 UTC
Problems with 0.11, count(DISTINCT), and NPE
I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been getting NullPointerExceptions (NPE) for certain queries in our staging environment. Only difference between stage and production is the amount of traffic we get so the data set is much smaller. We are not using any custom code.
I have greatly simplified the query down to the bare minimum that will cause the error:
SELECT
count(DISTINCT ag.adGroupGuid) as groups,
count(DISTINCT av.adViewGuid) as ads,
count(DISTINCT ac.adViewGuid) as uniqueClicks
FROM
adgroup ag
INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
This will return the following before any Map Reduce jobs start:
FAILED: NullPointerException null
Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I see this error:
2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 94324 file count: 20 directory count: 1
2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 142609 file count: 21 directory count: 1
2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 65519 file count: 21 directory count: 1
2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 205096 file count: 20 directory count: 1
2013-09-03 18:09:19,800 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable
2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans
2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable
2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans
2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver (SessionState.java:printError(386)) - FAILED: NullPointerException null
java.lang.NullPointerException
at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
The same error also happens if I do an INNER JOIN to adclick, FYI.
I have checked that there are not any null values for any of the columns referenced in the query.
Making almost any changes to the query results it in successfully running. Here are some I have tried:
Removed JOIN to adgroup:
SELECT
count(DISTINCT av.adViewGuid) as ads,
count(DISTINCT ac.adViewGuid) as uniqueClicks
FROM
adview av
LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
WORKS:
561 6
Removed JOIN to adclick:
SELECT
count(DISTINCT ag.adGroupGuid) as groups,
count(DISTINCT av.adViewGuid) as ads
FROM
adgroup ag
INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid;
WORKS:
543 561
Removing DISTINCT from any of the 3 counts
SELECT
count(DISTINCT ag.adGroupGuid) as groups,
count(DISTINCT av.adViewGuid) as ads,
count(ac.adViewGuid) as uniqueClicks
FROM
adgroup ag
INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
WORKS:
543 561 7
SELECT
count(ag.adGroupGuid) as groups,
count(DISTINCT av.adViewGuid) as ads,
count(DISTINCT ac.adViewGuid) as uniqueClicks
FROM
adgroup ag
INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
WORKS:
562 561 6
SELECT
count(DISTINCT ag.adGroupGuid) as groups,
count(av.adViewGuid) as ads,
count(DISTINCT ac.adViewGuid) as uniqueClicks
FROM
adgroup ag
INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
WORKS:
543 562 6
I am not exactly sure what to do next. Thoughts?
Nate
Re: Inner Map key and value separators
Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
Ah….
While my BeeHive gently weeps !
Thanks
sanjay
From: Dean Wampler <de...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Friday, September 13, 2013 5:10 PM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Cc: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Re: Inner Map key and value separators
Unfortunately, I believe there's no way to do this.
Sent from my rotary phone.
On Sep 13, 2013, at 6:42 PM, Sanjay Subramanian <Sa...@wizecommerce.com>> wrote:
Hi guys
I have to load data into the following data type in hive
map <string,map<string,string>>
Is there a way to define custom SEPARATORS (while creating the table) for
- Inner map collection item
- Inner map key
delimiters for 2nd-level maps are \004 and \005 per this
http://mail-archives.apache.org/mod_mbox/hadoop-hive-user/201009.mbox/%3C4F6B25AFFFCAFE44B6259A412D5F9B101C07A390@ExchMBX104.netflix.com%3E
Thanks
sanjay
CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: Inner Map key and value separators
Posted by Dean Wampler <de...@gmail.com>.
Unfortunately, I believe there's no way to do this.
Sent from my rotary phone.
On Sep 13, 2013, at 6:42 PM, Sanjay Subramanian <Sa...@wizecommerce.com> wrote:
> Hi guys
>
> I have to load data into the following data type in hive
>
> map <string,map<string,string>>
>
> Is there a way to define custom SEPARATORS (while creating the table) for
> - Inner map collection item
> - Inner map key
> delimiters for 2nd-level maps are \004 and \005 per this
> http://mail-archives.apache.org/mod_mbox/hadoop-hive-user/201009.mbox/%3C4F6B25AFFFCAFE44B6259A412D5F9B101C07A390@ExchMBX104.netflix.com%3E
>
> Thanks
>
> sanjay
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Inner Map key and value separators
Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
Hi guys
I have to load data into the following data type in hive
map <string,map<string,string>>
Is there a way to define custom SEPARATORS (while creating the table) for
- Inner map collection item
- Inner map key
delimiters for 2nd-level maps are \004 and \005 per this
http://mail-archives.apache.org/mod_mbox/hadoop-hive-user/201009.mbox/%3C4F6B25AFFFCAFE44B6259A412D5F9B101C07A390@ExchMBX104.netflix.com%3E
Thanks
sanjay
CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: Problems with 0.11, count(DISTINCT), and NPE
Posted by Yin Huai <hu...@gmail.com>.
yeah... when hive.auto.convert.join.noconditionaltask=false; we cannot
merge multiple Map-joins into a single Map phase... I think that trunk
should work when hive.auto.convert.join.noconditionaltask=true.
Thanks,
Yin
On Wed, Sep 4, 2013 at 4:03 PM, Nathanial Thelen <na...@natethelen.com>wrote:
> Doing:
>
> set hive.auto.convert.join.noconditionaltask=false;
>
>
> makes it work (though it does way more map reduce jobs than it should).
> When I get some time I will test against the latest trunk.
>
> Thanks,
> Nate
>
>
> On Sep 3, 2013, at 6:09 PM, Yin Huai <hu...@gmail.com> wrote:
>
> Based on the log, it may be also related to
> https://issues.apache.org/jira/browse/HIVE-4927. To make it work (in a
> not very optimized way), can you try "set
> hive.auto.convert.join.noconditionaltask=false;" ? If you still get the
> error, give "set hive.auto.convert.join=false;" a try (it will turn off map
> join auto convert, so you will use reduce-side join).
>
> Thanks,
>
> Yin
>
>
> On Tue, Sep 3, 2013 at 6:03 PM, Ashutosh Chauhan <ha...@apache.org>wrote:
>
>> Not sure about EMR. Your best bet is to ask on EMR forums.
>>
>> Thanks,
>> Ashutosh
>>
>>
>> On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen <na...@natethelen.com>wrote:
>>
>>> Is there a way to run a patch on EMR?
>>>
>>> Thanks,
>>> Nate
>>>
>>> On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan <ha...@apache.org>
>>> wrote:
>>>
>>> Fix in very related area has been checked in trunk today :
>>> https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix
>>> your issue.
>>> Can you try latest trunk?
>>>
>>> Ashutosh
>>>
>>>
>>> On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen <na...@natethelen.com>wrote:
>>>
>>>> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I
>>>> have been getting NullPointerExceptions (NPE) for certain queries in our
>>>> staging environment. Only difference between stage and production is the
>>>> amount of traffic we get so the data set is much smaller. We are not using
>>>> any custom code.
>>>>
>>>> I have greatly simplified the query down to the bare minimum that will
>>>> cause the error:
>>>>
>>>> SELECT
>>>> count(DISTINCT ag.adGroupGuid) as groups,
>>>> count(DISTINCT av.adViewGuid) as ads,
>>>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>>>> FROM
>>>> adgroup ag
>>>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
>>>>
>>>> This will return the following before any Map Reduce jobs start:
>>>>
>>>> FAILED: NullPointerException null
>>>>
>>>> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and
>>>> scanning, I see this error:
>>>>
>>>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities
>>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>>>> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29
>>>> length: 94324 file count: 20 directory count: 1
>>>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities
>>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>>>> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length:
>>>> 142609 file count: 21 directory count: 1
>>>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities
>>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>>>> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30
>>>> length: 65519 file count: 21 directory count: 1
>>>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities
>>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>>>> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length:
>>>> 205096 file count: 20 directory count: 1
>>>> 2013-09-03 18:09:19,800 INFO
>>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>>>> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
>>>> optimization is applicable
>>>> 2013-09-03 18:09:19,801 INFO
>>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>>>> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table
>>>> scans
>>>> 2013-09-03 18:09:19,801 INFO
>>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>>>> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
>>>> optimization is applicable
>>>> 2013-09-03 18:09:19,801 INFO
>>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>>>> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table
>>>> scans
>>>> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver
>>>> (SessionState.java:printError(386)) - FAILED: NullPointerException null
>>>> java.lang.NullPointerException
>>>> at
>>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
>>>> at
>>>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
>>>> at
>>>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
>>>> at
>>>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
>>>> at
>>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
>>>> at
>>>> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
>>>> at
>>>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
>>>> at
>>>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
>>>> at
>>>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
>>>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
>>>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>>>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>>>> at
>>>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
>>>> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
>>>> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
>>>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
>>>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
>>>>
>>>> The same error also happens if I do an INNER JOIN to adclick, FYI.
>>>>
>>>> I have checked that there are not any null values for any of the
>>>> columns referenced in the query.
>>>>
>>>> Making almost any changes to the query results it in successfully
>>>> running. Here are some I have tried:
>>>>
>>>> Removed JOIN to adgroup:
>>>>
>>>> SELECT
>>>> count(DISTINCT av.adViewGuid) as ads,
>>>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>>>> FROM
>>>> adview av
>>>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>>>
>>>> WORKS:
>>>> 561 6
>>>>
>>>> Removed JOIN to adclick:
>>>>
>>>> SELECT
>>>> count(DISTINCT ag.adGroupGuid) as groups,
>>>> count(DISTINCT av.adViewGuid) as ads
>>>> FROM
>>>> adgroup ag
>>>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid;
>>>>
>>>> WORKS:
>>>> 543 561
>>>>
>>>> Removing DISTINCT from any of the 3 counts
>>>>
>>>> SELECT
>>>> count(DISTINCT ag.adGroupGuid) as groups,
>>>> count(DISTINCT av.adViewGuid) as ads,
>>>> count(ac.adViewGuid) as uniqueClicks
>>>> FROM
>>>> adgroup ag
>>>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>>>
>>>> WORKS:
>>>> 543 561 7
>>>>
>>>>
>>>> SELECT
>>>> count(ag.adGroupGuid) as groups,
>>>> count(DISTINCT av.adViewGuid) as ads,
>>>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>>>> FROM
>>>> adgroup ag
>>>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>>>
>>>> WORKS:
>>>> 562 561 6
>>>>
>>>>
>>>> SELECT
>>>> count(DISTINCT ag.adGroupGuid) as groups,
>>>> count(av.adViewGuid) as ads,
>>>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>>>> FROM
>>>> adgroup ag
>>>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>>>
>>>> WORKS:
>>>> 543 562 6
>>>>
>>>> I am not exactly sure what to do next. Thoughts?
>>>>
>>>> Nate
>>>>
>>>
>>>
>>>
>>
>
>
Re: Problems with 0.11, count(DISTINCT), and NPE
Posted by Nathanial Thelen <na...@natethelen.com>.
Doing:
> set hive.auto.convert.join.noconditionaltask=false;
makes it work (though it does way more map reduce jobs than it should). When I get some time I will test against the latest trunk.
Thanks,
Nate
On Sep 3, 2013, at 6:09 PM, Yin Huai <hu...@gmail.com> wrote:
> Based on the log, it may be also related to https://issues.apache.org/jira/browse/HIVE-4927. To make it work (in a not very optimized way), can you try "set hive.auto.convert.join.noconditionaltask=false;" ? If you still get the error, give "set hive.auto.convert.join=false;" a try (it will turn off map join auto convert, so you will use reduce-side join).
>
> Thanks,
>
> Yin
>
>
> On Tue, Sep 3, 2013 at 6:03 PM, Ashutosh Chauhan <ha...@apache.org> wrote:
> Not sure about EMR. Your best bet is to ask on EMR forums.
>
> Thanks,
> Ashutosh
>
>
> On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen <na...@natethelen.com> wrote:
> Is there a way to run a patch on EMR?
>
> Thanks,
> Nate
>
> On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan <ha...@apache.org> wrote:
>
>> Fix in very related area has been checked in trunk today : https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your issue.
>> Can you try latest trunk?
>>
>> Ashutosh
>>
>>
>> On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen <na...@natethelen.com> wrote:
>> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been getting NullPointerExceptions (NPE) for certain queries in our staging environment. Only difference between stage and production is the amount of traffic we get so the data set is much smaller. We are not using any custom code.
>>
>> I have greatly simplified the query down to the bare minimum that will cause the error:
>>
>> SELECT
>> count(DISTINCT ag.adGroupGuid) as groups,
>> count(DISTINCT av.adViewGuid) as ads,
>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>> adgroup ag
>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
>>
>> This will return the following before any Map Reduce jobs start:
>>
>> FAILED: NullPointerException null
>>
>> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I see this error:
>>
>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 94324 file count: 20 directory count: 1
>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 142609 file count: 21 directory count: 1
>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 65519 file count: 21 directory count: 1
>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 205096 file count: 20 directory count: 1
>> 2013-09-03 18:09:19,800 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable
>> 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans
>> 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable
>> 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans
>> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver (SessionState.java:printError(386)) - FAILED: NullPointerException null
>> java.lang.NullPointerException
>> at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
>> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
>> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
>> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
>> at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
>> at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
>> at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
>> at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
>> at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
>> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
>> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
>>
>> The same error also happens if I do an INNER JOIN to adclick, FYI.
>>
>> I have checked that there are not any null values for any of the columns referenced in the query.
>>
>> Making almost any changes to the query results it in successfully running. Here are some I have tried:
>>
>> Removed JOIN to adgroup:
>>
>> SELECT
>> count(DISTINCT av.adViewGuid) as ads,
>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>> adview av
>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>
>> WORKS:
>> 561 6
>>
>> Removed JOIN to adclick:
>>
>> SELECT
>> count(DISTINCT ag.adGroupGuid) as groups,
>> count(DISTINCT av.adViewGuid) as ads
>> FROM
>> adgroup ag
>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid;
>>
>> WORKS:
>> 543 561
>>
>> Removing DISTINCT from any of the 3 counts
>>
>> SELECT
>> count(DISTINCT ag.adGroupGuid) as groups,
>> count(DISTINCT av.adViewGuid) as ads,
>> count(ac.adViewGuid) as uniqueClicks
>> FROM
>> adgroup ag
>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>
>> WORKS:
>> 543 561 7
>>
>>
>> SELECT
>> count(ag.adGroupGuid) as groups,
>> count(DISTINCT av.adViewGuid) as ads,
>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>> adgroup ag
>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>
>> WORKS:
>> 562 561 6
>>
>>
>> SELECT
>> count(DISTINCT ag.adGroupGuid) as groups,
>> count(av.adViewGuid) as ads,
>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>> adgroup ag
>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>
>> WORKS:
>> 543 562 6
>>
>> I am not exactly sure what to do next. Thoughts?
>>
>> Nate
>>
>
>
>
Re: Problems with 0.11, count(DISTINCT), and NPE
Posted by Yin Huai <hu...@gmail.com>.
Based on the log, it may be also related to
https://issues.apache.org/jira/browse/HIVE-4927. To make it work (in a not
very optimized way), can you try "set
hive.auto.convert.join.noconditionaltask=false;" ? If you still get the
error, give "set hive.auto.convert.join=false;" a try (it will turn off map
join auto convert, so you will use reduce-side join).
Thanks,
Yin
On Tue, Sep 3, 2013 at 6:03 PM, Ashutosh Chauhan <ha...@apache.org>wrote:
> Not sure about EMR. Your best bet is to ask on EMR forums.
>
> Thanks,
> Ashutosh
>
>
> On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen <na...@natethelen.com>wrote:
>
>> Is there a way to run a patch on EMR?
>>
>> Thanks,
>> Nate
>>
>> On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan <ha...@apache.org>
>> wrote:
>>
>> Fix in very related area has been checked in trunk today :
>> https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix
>> your issue.
>> Can you try latest trunk?
>>
>> Ashutosh
>>
>>
>> On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen <na...@natethelen.com>wrote:
>>
>>> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have
>>> been getting NullPointerExceptions (NPE) for certain queries in our staging
>>> environment. Only difference between stage and production is the amount of
>>> traffic we get so the data set is much smaller. We are not using any
>>> custom code.
>>>
>>> I have greatly simplified the query down to the bare minimum that will
>>> cause the error:
>>>
>>> SELECT
>>> count(DISTINCT ag.adGroupGuid) as groups,
>>> count(DISTINCT av.adViewGuid) as ads,
>>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>>> FROM
>>> adgroup ag
>>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
>>>
>>> This will return the following before any Map Reduce jobs start:
>>>
>>> FAILED: NullPointerException null
>>>
>>> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning,
>>> I see this error:
>>>
>>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities
>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>>> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29
>>> length: 94324 file count: 20 directory count: 1
>>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities
>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>>> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length:
>>> 142609 file count: 21 directory count: 1
>>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities
>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>>> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30
>>> length: 65519 file count: 21 directory count: 1
>>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities
>>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>>> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length:
>>> 205096 file count: 20 directory count: 1
>>> 2013-09-03 18:09:19,800 INFO
>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>>> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
>>> optimization is applicable
>>> 2013-09-03 18:09:19,801 INFO
>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>>> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table
>>> scans
>>> 2013-09-03 18:09:19,801 INFO
>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>>> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
>>> optimization is applicable
>>> 2013-09-03 18:09:19,801 INFO
>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>>> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table
>>> scans
>>> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver
>>> (SessionState.java:printError(386)) - FAILED: NullPointerException null
>>> java.lang.NullPointerException
>>> at
>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
>>> at
>>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
>>> at
>>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
>>> at
>>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
>>> at
>>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
>>> at
>>> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
>>> at
>>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
>>> at
>>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
>>> at
>>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
>>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
>>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>>> at
>>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
>>> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
>>> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
>>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
>>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
>>>
>>> The same error also happens if I do an INNER JOIN to adclick, FYI.
>>>
>>> I have checked that there are not any null values for any of the columns
>>> referenced in the query.
>>>
>>> Making almost any changes to the query results it in successfully
>>> running. Here are some I have tried:
>>>
>>> Removed JOIN to adgroup:
>>>
>>> SELECT
>>> count(DISTINCT av.adViewGuid) as ads,
>>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>>> FROM
>>> adview av
>>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>>
>>> WORKS:
>>> 561 6
>>>
>>> Removed JOIN to adclick:
>>>
>>> SELECT
>>> count(DISTINCT ag.adGroupGuid) as groups,
>>> count(DISTINCT av.adViewGuid) as ads
>>> FROM
>>> adgroup ag
>>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid;
>>>
>>> WORKS:
>>> 543 561
>>>
>>> Removing DISTINCT from any of the 3 counts
>>>
>>> SELECT
>>> count(DISTINCT ag.adGroupGuid) as groups,
>>> count(DISTINCT av.adViewGuid) as ads,
>>> count(ac.adViewGuid) as uniqueClicks
>>> FROM
>>> adgroup ag
>>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>>
>>> WORKS:
>>> 543 561 7
>>>
>>>
>>> SELECT
>>> count(ag.adGroupGuid) as groups,
>>> count(DISTINCT av.adViewGuid) as ads,
>>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>>> FROM
>>> adgroup ag
>>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>>
>>> WORKS:
>>> 562 561 6
>>>
>>>
>>> SELECT
>>> count(DISTINCT ag.adGroupGuid) as groups,
>>> count(av.adViewGuid) as ads,
>>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>>> FROM
>>> adgroup ag
>>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>>
>>> WORKS:
>>> 543 562 6
>>>
>>> I am not exactly sure what to do next. Thoughts?
>>>
>>> Nate
>>>
>>
>>
>>
>
Re: Problems with 0.11, count(DISTINCT), and NPE
Posted by Ashutosh Chauhan <ha...@apache.org>.
Not sure about EMR. Your best bet is to ask on EMR forums.
Thanks,
Ashutosh
On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen <na...@natethelen.com>wrote:
> Is there a way to run a patch on EMR?
>
> Thanks,
> Nate
>
> On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan <ha...@apache.org> wrote:
>
> Fix in very related area has been checked in trunk today :
> https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your
> issue.
> Can you try latest trunk?
>
> Ashutosh
>
>
> On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen <na...@natethelen.com>wrote:
>
>> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have
>> been getting NullPointerExceptions (NPE) for certain queries in our staging
>> environment. Only difference between stage and production is the amount of
>> traffic we get so the data set is much smaller. We are not using any
>> custom code.
>>
>> I have greatly simplified the query down to the bare minimum that will
>> cause the error:
>>
>> SELECT
>> count(DISTINCT ag.adGroupGuid) as groups,
>> count(DISTINCT av.adViewGuid) as ads,
>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>> adgroup ag
>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
>>
>> This will return the following before any Map Reduce jobs start:
>>
>> FAILED: NullPointerException null
>>
>> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning,
>> I see this error:
>>
>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities
>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29
>> length: 94324 file count: 20 directory count: 1
>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities
>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length:
>> 142609 file count: 21 directory count: 1
>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities
>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30
>> length: 65519 file count: 21 directory count: 1
>> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities
>> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
>> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length:
>> 205096 file count: 20 directory count: 1
>> 2013-09-03 18:09:19,800 INFO
>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
>> optimization is applicable
>> 2013-09-03 18:09:19,801 INFO
>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table
>> scans
>> 2013-09-03 18:09:19,801 INFO
>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
>> optimization is applicable
>> 2013-09-03 18:09:19,801 INFO
>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
>> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table
>> scans
>> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver
>> (SessionState.java:printError(386)) - FAILED: NullPointerException null
>> java.lang.NullPointerException
>> at
>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
>> at
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
>> at
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
>> at
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
>> at
>> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
>> at
>> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
>> at
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
>> at
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
>> at
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>> at
>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
>> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
>> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
>>
>> The same error also happens if I do an INNER JOIN to adclick, FYI.
>>
>> I have checked that there are not any null values for any of the columns
>> referenced in the query.
>>
>> Making almost any changes to the query results it in successfully
>> running. Here are some I have tried:
>>
>> Removed JOIN to adgroup:
>>
>> SELECT
>> count(DISTINCT av.adViewGuid) as ads,
>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>> adview av
>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>
>> WORKS:
>> 561 6
>>
>> Removed JOIN to adclick:
>>
>> SELECT
>> count(DISTINCT ag.adGroupGuid) as groups,
>> count(DISTINCT av.adViewGuid) as ads
>> FROM
>> adgroup ag
>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid;
>>
>> WORKS:
>> 543 561
>>
>> Removing DISTINCT from any of the 3 counts
>>
>> SELECT
>> count(DISTINCT ag.adGroupGuid) as groups,
>> count(DISTINCT av.adViewGuid) as ads,
>> count(ac.adViewGuid) as uniqueClicks
>> FROM
>> adgroup ag
>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>
>> WORKS:
>> 543 561 7
>>
>>
>> SELECT
>> count(ag.adGroupGuid) as groups,
>> count(DISTINCT av.adViewGuid) as ads,
>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>> adgroup ag
>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>
>> WORKS:
>> 562 561 6
>>
>>
>> SELECT
>> count(DISTINCT ag.adGroupGuid) as groups,
>> count(av.adViewGuid) as ads,
>> count(DISTINCT ac.adViewGuid) as uniqueClicks
>> FROM
>> adgroup ag
>> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>>
>> WORKS:
>> 543 562 6
>>
>> I am not exactly sure what to do next. Thoughts?
>>
>> Nate
>>
>
>
>
Re: Problems with 0.11, count(DISTINCT), and NPE
Posted by Nathanial Thelen <na...@natethelen.com>.
Is there a way to run a patch on EMR?
Thanks,
Nate
On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan <ha...@apache.org> wrote:
> Fix in very related area has been checked in trunk today : https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your issue.
> Can you try latest trunk?
>
> Ashutosh
>
>
> On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen <na...@natethelen.com> wrote:
> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been getting NullPointerExceptions (NPE) for certain queries in our staging environment. Only difference between stage and production is the amount of traffic we get so the data set is much smaller. We are not using any custom code.
>
> I have greatly simplified the query down to the bare minimum that will cause the error:
>
> SELECT
> count(DISTINCT ag.adGroupGuid) as groups,
> count(DISTINCT av.adViewGuid) as ads,
> count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
> adgroup ag
> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
>
> This will return the following before any Map Reduce jobs start:
>
> FAILED: NullPointerException null
>
> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I see this error:
>
> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 length: 94324 file count: 20 directory count: 1
> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: 142609 file count: 21 directory count: 1
> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 length: 65519 file count: 21 directory count: 1
> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889)) - Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: 205096 file count: 20 directory count: 1
> 2013-09-03 18:09:19,800 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable
> 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans
> 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is applicable
> 2013-09-03 18:09:19,801 INFO org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans
> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver (SessionState.java:printError(386)) - FAILED: NullPointerException null
> java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
> at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
> at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
> at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
> at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
> at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
>
> The same error also happens if I do an INNER JOIN to adclick, FYI.
>
> I have checked that there are not any null values for any of the columns referenced in the query.
>
> Making almost any changes to the query results it in successfully running. Here are some I have tried:
>
> Removed JOIN to adgroup:
>
> SELECT
> count(DISTINCT av.adViewGuid) as ads,
> count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
> adview av
> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>
> WORKS:
> 561 6
>
> Removed JOIN to adclick:
>
> SELECT
> count(DISTINCT ag.adGroupGuid) as groups,
> count(DISTINCT av.adViewGuid) as ads
> FROM
> adgroup ag
> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid;
>
> WORKS:
> 543 561
>
> Removing DISTINCT from any of the 3 counts
>
> SELECT
> count(DISTINCT ag.adGroupGuid) as groups,
> count(DISTINCT av.adViewGuid) as ads,
> count(ac.adViewGuid) as uniqueClicks
> FROM
> adgroup ag
> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>
> WORKS:
> 543 561 7
>
>
> SELECT
> count(ag.adGroupGuid) as groups,
> count(DISTINCT av.adViewGuid) as ads,
> count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
> adgroup ag
> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>
> WORKS:
> 562 561 6
>
>
> SELECT
> count(DISTINCT ag.adGroupGuid) as groups,
> count(av.adViewGuid) as ads,
> count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
> adgroup ag
> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>
> WORKS:
> 543 562 6
>
> I am not exactly sure what to do next. Thoughts?
>
> Nate
>
Re: Problems with 0.11, count(DISTINCT), and NPE
Posted by Ashutosh Chauhan <ha...@apache.org>.
Fix in very related area has been checked in trunk today :
https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your
issue.
Can you try latest trunk?
Ashutosh
On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen <na...@natethelen.com>wrote:
> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have
> been getting NullPointerExceptions (NPE) for certain queries in our staging
> environment. Only difference between stage and production is the amount of
> traffic we get so the data set is much smaller. We are not using any
> custom code.
>
> I have greatly simplified the query down to the bare minimum that will
> cause the error:
>
> SELECT
> count(DISTINCT ag.adGroupGuid) as groups,
> count(DISTINCT av.adViewGuid) as ads,
> count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
> adgroup ag
> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
>
> This will return the following before any Map Reduce jobs start:
>
> FAILED: NullPointerException null
>
> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I
> see this error:
>
> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities
> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29
> length: 94324 file count: 20 directory count: 1
> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities
> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length:
> 142609 file count: 21 directory count: 1
> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities
> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30
> length: 65519 file count: 21 directory count: 1
> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities
> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length:
> 205096 file count: 20 directory count: 1
> 2013-09-03 18:09:19,800 INFO
> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
> optimization is applicable
> 2013-09-03 18:09:19,801 INFO
> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table
> scans
> 2013-09-03 18:09:19,801 INFO
> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
> optimization is applicable
> 2013-09-03 18:09:19,801 INFO
> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table
> scans
> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver
> (SessionState.java:printError(386)) - FAILED: NullPointerException null
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
> at
> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
> at
> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
>
> The same error also happens if I do an INNER JOIN to adclick, FYI.
>
> I have checked that there are not any null values for any of the columns
> referenced in the query.
>
> Making almost any changes to the query results it in successfully running.
> Here are some I have tried:
>
> Removed JOIN to adgroup:
>
> SELECT
> count(DISTINCT av.adViewGuid) as ads,
> count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
> adview av
> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>
> WORKS:
> 561 6
>
> Removed JOIN to adclick:
>
> SELECT
> count(DISTINCT ag.adGroupGuid) as groups,
> count(DISTINCT av.adViewGuid) as ads
> FROM
> adgroup ag
> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid;
>
> WORKS:
> 543 561
>
> Removing DISTINCT from any of the 3 counts
>
> SELECT
> count(DISTINCT ag.adGroupGuid) as groups,
> count(DISTINCT av.adViewGuid) as ads,
> count(ac.adViewGuid) as uniqueClicks
> FROM
> adgroup ag
> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>
> WORKS:
> 543 561 7
>
>
> SELECT
> count(ag.adGroupGuid) as groups,
> count(DISTINCT av.adViewGuid) as ads,
> count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
> adgroup ag
> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>
> WORKS:
> 562 561 6
>
>
> SELECT
> count(DISTINCT ag.adGroupGuid) as groups,
> count(av.adViewGuid) as ads,
> count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
> adgroup ag
> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>
> WORKS:
> 543 562 6
>
> I am not exactly sure what to do next. Thoughts?
>
> Nate
>