You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2018/05/03 20:33:00 UTC
[jira] [Comment Edited] (HIVE-18395) Using stats for aggregate query on Acid/MM is off even with "hive.compute.query.using.stats" is true.

    [ https://issues.apache.org/jira/browse/HIVE-18395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463006#comment-16463006 ] 

Sergey Shelukhin edited comment on HIVE-18395 at 5/3/18 8:32 PM:
-----------------------------------------------------------------

Some small comments:
1) Seems to include some generated config in the patch.
2) TABLE_PARAMS_TXN - can we normalize this and not rely on JSON object? Since this will only be used for stats I assume, and stats only have one parameter (JSON string), we can store it as proper SQL data
[~ekoifman] you might want to look at the new tables schema since it's based on what ACID state is sufficient to map stats to a snapshot.
3) {noformat}
+      if (SessionState.get().getTxnMgr() != null && isTransactional) {
+        request.setTxnid(SessionState.get().getTxnMgr().getCurrentTxnId());
+      }
{noformat}
Is it enough to store transaction ID? Transaction ID will invalidate stats for every other table with every transaction, right?
Also the doc calls for storing entire state information about when the stats were written - otherwise you cannot tell based on read-time snapshot if the stats are valid, for example in case of two inserts, where both can write stats, but both stats are invalid because the inserts don't see each other. 
4) {noformat}
+        /*
         if (isFullAcid && !work.isTargetRewritten()) {
           // Don't bother with aggregation in this case, it will probably be invalid.
           parameters.remove(statType);
           continue;
         }
+        */
{noformat}
Aggregation for full ACID will still be invalid.
5) This doesn't seem to affect basic stats in all the places they are updated. However, basic stats are also used for count(*) type queries (numRows). 
I sent a patch separately that has some changes for basic stats... Might make sense to also modify these places in this patch. Doesn't have to be the same change I made as long as it is sufficient to add txn info to basic stats.

It might help to exclude the generated code for review. I sometimes use the script like this, where $1 is base branch and $2 is file name:
{noformat}
rm -f  ~/patches/$2.nogen.patch
for f in `git diff $1 --name-only | grep -v "gen-" | grep -v "\/gen\/"`
do
  git diff $1 -- $f >> ~/patches/$2.nogen.patch
{noformat}


was (Author: sershe):
Some small comments:
1) Seems to include some generated config in the patch.
2) TABLE_PARAMS_TXN - can we normalize this and not rely on JSON object? Since this will only be used for stats I assume, and stats only have one parameter.
[~ekoifman] you might want to look at the new tables schema since it's based on what ACID state is sufficient to map stats to a snapshot.
3) {noformat}
+      if (SessionState.get().getTxnMgr() != null && isTransactional) {
+        request.setTxnid(SessionState.get().getTxnMgr().getCurrentTxnId());
+      }
{noformat}
Is it enough to store transaction ID? Transaction ID will invalidate stats for every other table with every transaction, right?
Also the doc calls for storing entire state information about when the stats were written - otherwise you cannot tell based on read-time snapshot if the stats are valid, for example in case of two inserts, where both can write stats, but both stats are invalid because the inserts don't see each other. 
4) {noformat}
+        /*
         if (isFullAcid && !work.isTargetRewritten()) {
           // Don't bother with aggregation in this case, it will probably be invalid.
           parameters.remove(statType);
           continue;
         }
+        */
{noformat}
Aggregation for full ACID will still be invalid.
5) This doesn't seem to affect basic stats in all the places they are updated. However, basic stats are also used for count(*) type queries (numRows). 
I sent a patch separately that has some changes for basic stats... Might make sense to also modify these places in this patch. Doesn't have to be the same change I made as long as it is sufficient to add txn info to basic stats.

It might help to exclude the generated code for review. I sometimes use the script like this, where $1 is base branch and $2 is file name:
{noformat}
rm -f  ~/patches/$2.nogen.patch
for f in `git diff $1 --name-only | grep -v "gen-" | grep -v "\/gen\/"`
do
  git diff $1 -- $f >> ~/patches/$2.nogen.patch
{noformat}

> Using stats for aggregate query on Acid/MM is off even with "hive.compute.query.using.stats" is true.
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18395
>                 URL: https://issues.apache.org/jira/browse/HIVE-18395
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Steve Yeom
>            Assignee: Steve Yeom
>            Priority: Major
>         Attachments: HIVE-18395.01.preview
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)