You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/06/15 09:27:40 UTC

[GitHub] [hudi] cocopc opened a new issue #1736: [SUPPORT]query by hive-on-mr get the wrong result

cocopc opened a new issue #1736:
URL: https://github.com/apache/hudi/issues/1736


   Env:
   Hive 2.1.1
   Hudi: 0.5.2
   Spark: 2.4.5
   
   MOR table and upsert  operation , when  query with spark-sql get the right result, but query with hive-on-mr get the wrong result.  
   My Table Info:
   Table Name: user
   Recored Key: distinct_id 
   
   SQL  : select distinct_id ,count(1) from user group by distinct_id order by distinct_id desc limit 10
   Query with Spark ,result is right. 
   +-----------+--------+
   |distinct_id|count(1)|
   +-----------+--------+
   |   51819928|       1|
   |   51819908|       1|
   |   51819791|       1|
   |   51819580|       1|
   |   51819136|       1|
   |   51819001|       1|
   |   51818734|       1|
   |   51818645|       1|
   |   51818417|       1|
   |   51818329|       1|
   +-----------+--------+
   
   Query with hive:  result is wrong, the count value should be  1 for each distinct_id ,because the distinct_id is record key , upsert shoud be merge. 
   +--------------+-----+--+
   | distinct_id  | c1  |
   +--------------+-----+--+
   | 51819928     | 8   |
   | 51819908     | 22  |
   | 51819791     | 7   |
   | 51819580     | 11  |
   | 51819136     | 9   |
   | 51819001     | 24  |
   | 51818734     | 9   |
   | 51818645     | 23  |
   | 51818417     | 22  |
   | 51818329     | 26  |
   
   Query with hive:  select * from user where distinct_id='51819928' ;
   the query result  only one row, it is right.   so strange!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha commented on issue #1736: [SUPPORT]query by hive-on-mr get the wrong result

Posted by GitBox <gi...@apache.org>.
bhasudha commented on issue #1736:
URL: https://github.com/apache/hudi/issues/1736#issuecomment-644439683


   It looks like Hudi's input format is not picked up in hive. are you using beeline to query hive ?  Could you share your beeline command ? 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] cocopc closed issue #1736: [SUPPORT]query by hive-on-mr get the wrong result

Posted by GitBox <gi...@apache.org>.
cocopc closed issue #1736:
URL: https://github.com/apache/hudi/issues/1736


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] cocopc commented on issue #1736: [SUPPORT]query by hive-on-mr get the wrong result

Posted by GitBox <gi...@apache.org>.
cocopc commented on issue #1736:
URL: https://github.com/apache/hudi/issues/1736#issuecomment-644490377


   set hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat
   solved the problem


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org