You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/09 09:54:10 UTC

[GitHub] [hudi] bvaradar edited a comment on issue #2338: [SUPPORT] MOR table found duplicate and process so slowly

bvaradar edited a comment on issue #2338:
URL: https://github.com/apache/hudi/issues/2338#issuecomment-757125096


   @so-lazy : 
   
   when you query through spark datasource (not just single file), are you able to see unique record ?
   
   val df = spark.read.format("hudi").load("hdfs://hadoop01:9000/hudi/cars/carsdata/inf_car_bin/*")
   ....
   
   Also, Are you passing the config (spark.sql.hive.convertMetastoreParquet=false) when you are launching spark ? https://hudi.apache.org/docs/querying_data.html#spark-sql. 
   
   Also, I see you have space around "=" sign (set spark.sql.hive.convertMetastoreParquet = false;)   Try removing it. Please also enable INFO logging and run the select group by query and attach them if the problem persists.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org