You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Zoltan Haindrich (JIRA)" <ji...@apache.org> on 2017/10/19 15:21:00 UTC

[jira] [Commented] (HIVE-17722) Execution of selectDistinctStar.q breaks stats in optimize_nullscan.q

    [ https://issues.apache.org/jira/browse/HIVE-17722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211192#comment-16211192 ] 

Zoltan Haindrich commented on HIVE-17722:
-----------------------------------------

the following happens:

* a select is executed; which scans 1 partition of {{srcpart}}
* the statistics are cached at the metastore for that sole partition (the caching key doesn't contain the partition it belongs to)
* a select came in for the full {{srcpart}} table 4 partitions;
* statistics are served from the 1 partition information..

issue reproduction qtest:
{code}
-- set hive.metastore.aggregate.stats.cache.enabled=false;
set hive.cbo.enable=false;
-- executing this will make everything fine...
-- explain extended
-- select * from (select key from src where false) a left outer join (select key from srcpart limit 0) b on a.key=b.key;

set hive.cbo.enable=true;

-- from: selectDistinctStar.q
set hive.mapred.mode=nonstrict;
set hive.explain.user=false;
SELECT distinct *
FROM src1 x JOIN src y ON (x.key = y.key) 
JOIN srcpart z ON (x.value = z.value and z.ds='2008-04-08' and z.hr=11);
-- from: optimize_nullscan.q
set hive.mapred.mode=nonstrict;
set hive.cbo.enable=false;
explain extended
select * from (select key from src where false) a left outer join (select key from srcpart limit 0) b on a.key=b.key;
{code}

I feel that the cache key should contain a hash of the partition it belongs to...instead of trying to guess some value; I think estimation should be be done at the compiler side.

setting: {{set hive.metastore.aggregate.stats.cache.enabled=false;}} makes the problem go away.

[~prasanth_j]: seems to me that you worked on caching aggr stats; what do you think - is this a problem; or we should just add the missing set?

> Execution of selectDistinctStar.q breaks stats in optimize_nullscan.q
> ---------------------------------------------------------------------
>
>                 Key: HIVE-17722
>                 URL: https://issues.apache.org/jira/browse/HIVE-17722
>             Project: Hive
>          Issue Type: Bug
>          Components: Test
>            Reporter: Zoltan Haindrich
>
> {code}
> M_OPTS+=" -q -T9 -Dmaven.surefire.plugin.version=2.20.1"
> M_OPTS+=" -Pitests -DskipSparkTests"
> M_OPTS+=" -Dtest=TestMiniLlapLocalCliDriver"
> M_OPTS+=" -pl itests/qtest"
> M_OPTS+=" install"
> #fail
> mvn $M_OPTS -Dqfile=selectDistinctStar.q,optimize_nullscan.q
> #pass
> mvn $M_OPTS -Dqfile=optimize_nullscan.q
> mvn $M_OPTS -Dqfile=selectDistinctStar.q
> {code}
> my guess is that something have "happend" with the sacred src table...or that view might cause some trouble?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)