You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/04 19:19:02 UTC

[GitHub] [hudi] parisni opened a new issue #4506: [SUPPORT] Hive Sync fails silently with embedded derby hive metastore

parisni opened a new issue #4506:
URL: https://github.com/apache/hudi/issues/4506


   When using a embebded derby database (the spark default), the hive sync does not work silently.
   
   The bellow code should create a hive table (hoodie.datasource.hive_sync.enable=true). However, this does not happens. 
   
   If I use a real hive metastore (eg: postgres backend), then it works as expected. 
   
   THis happens with at least 0.10.0 and 0.9.0
   
   ```python
   sc.setLogLevel("WARN")
   dataGen = sc._jvm.org.apache.hudi.QuickstartUtils.DataGenerator()
   inserts = sc._jvm.org.apache.hudi.QuickstartUtils.convertToStringList(
       dataGen.generateInserts(10)
   )
   from pyspark.sql.functions import expr
   
   df = spark.read.json(spark.sparkContext.parallelize(inserts, 10)).withColumn(
       "part", expr("'foo'")
   )
   
   tableName = "test_hudi_pyspark_local"
   basePath = "/tmp/{}".format(tableName)
   
   hudi_options = {
       "hoodie.table.name": tableName,
       "hoodie.datasource.write.recordkey.field": "uuid",
       "hoodie.datasource.write.partitionpath.field": "part",
       "hoodie.datasource.write.table.name": tableName,
       "hoodie.datasource.write.operation": "upsert",
       "hoodie.datasource.write.precombine.field": "ts",
       "hoodie.upsert.shuffle.parallelism": 2,
       "hoodie.insert.shuffle.parallelism": 2,
       "hoodie.datasource.hive_sync.database": "default",
       "hoodie.datasource.hive_sync.table": tableName,
       "hoodie.datasource.hive_sync.mode": "hms",
       "hoodie.datasource.hive_sync.enable": "true",
       "hoodie.datasource.hive_sync.partition_fields": "part",
       "hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor",
       "index.global.enabled": "true",
       "hoodie.index.type": "GLOBAL_BLOOM",
   }
   (df.write.format("hudi").options(**hudi_options).mode("overwrite").save(basePath))
   ```
   
   
   This is empty.
   
   ```
   >>> spark.sql("show tables").show(1000,False)
   +--------+---------+-----------+
   |database|tableName|isTemporary|
   +--------+---------+-----------+
   +--------+---------+-----------+
   ```
   
   
   here are th spark session config:
   
   ```
   >>> spark.sql("set").filter("key rlike 'metastore|jdo'").show(1000,False)
   +------------------------------------------------------------------+---------------------------------------+
   |key                                                               |value                                  |
   +------------------------------------------------------------------+---------------------------------------+
   |spark.hadoop.hive.metastore.disallow.incompatible.col.type.changes|false                                  |
   |spark.hadoop.hive.metastore.local                                 |false                                  |
   |spark.hadoop.hive.metastore.schema.verification                   |false                                  |
   |spark.hadoop.hive.metastore.schema.verification.record.version    |false                                  |
   |spark.hadoop.javax.jdo.option.ConnectionDriverName                |org.apache.derby.jdbc.EmbeddedDriver   |
   |spark.hadoop.javax.jdo.option.ConnectionURL                       |jdbc:derby:memory:myInMemDB;create=true|
   +------------------------------------------------------------------+---------------------------------------+
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4506: [SUPPORT] Hive Sync fails silently with embedded derby hive metastore

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4506:
URL: https://github.com/apache/hudi/issues/4506#issuecomment-1039633556


   @parisni : can you respond to Sagar above. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4506: [SUPPORT] Hive Sync fails silently with embedded derby hive metastore

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4506:
URL: https://github.com/apache/hudi/issues/4506#issuecomment-1012269577


   CCn @codope 
   not sure if this will help. but did you set catalog impl to hive when you launched spark ? 
   ```
   spark.sql.catalogImplementation=hive
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #4506: [SUPPORT] Hive Sync fails silently with embedded derby hive metastore

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #4506:
URL: https://github.com/apache/hudi/issues/4506#issuecomment-1013853416


   @parisni can you try remove this setting?
   
   |spark.hadoop.javax.jdo.option.ConnectionURL                       |jdbc:derby:memory:myInMemDB;create=true|
   
   by default, hive creates local metastore under `metastore_db/` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on issue #4506: [SUPPORT] Hive Sync fails silently with embedded derby hive metastore

Posted by GitBox <gi...@apache.org>.
codope commented on issue #4506:
URL: https://github.com/apache/hudi/issues/4506#issuecomment-1017413379


   > can you try remove this setting?
   > 
   > |spark.hadoop.javax.jdo.option.ConnectionURL |jdbc:derby:memory:myInMemDB;create=true|
   > 
   > by default, hive creates local metastore under `metastore_db/`
   
   @parisni Did you get a chance to try out this suggestion? 
   I would also check `spark.sql.warehouse.dir` spark config, and  set `hive.metastore.warehouse.dir` in hive-site.xml to the same value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] parisni commented on issue #4506: [SUPPORT] Hive Sync fails silently with embedded derby hive metastore

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #4506:
URL: https://github.com/apache/hudi/issues/4506#issuecomment-1012311943


   > not sure if this will help. but did you set catalog impl to hive when
   > you launched spark ?
   > ```
   > spark.sql.catalogImplementation=hive
   > ```
   
   yeah, no effect sadly. still populating/using the spark json 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] parisni commented on issue #4506: [SUPPORT] Hive Sync fails silently with embedded derby hive metastore

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #4506:
URL: https://github.com/apache/hudi/issues/4506#issuecomment-1022583779


   @codope I tried them without success on my side. with the reproductible example I crafted in the OP


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4506: [SUPPORT] Hive Sync fails silently with embedded derby hive metastore

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4506:
URL: https://github.com/apache/hudi/issues/4506#issuecomment-1047399870


   Hey @parisni : can you share the logs. If you got it resolved, feel free to close the issue. Or would be nice if you can share more logs for Sagar to triage. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4506: [SUPPORT] Hive Sync fails silently with embedded derby hive metastore

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4506:
URL: https://github.com/apache/hudi/issues/4506#issuecomment-1039633556


   @parisni : can you respond to Sagar above. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on issue #4506: [SUPPORT] Hive Sync fails silently with embedded derby hive metastore

Posted by GitBox <gi...@apache.org>.
codope commented on issue #4506:
URL: https://github.com/apache/hudi/issues/4506#issuecomment-1030870343


   @parisni What do you mean by "fails silently"? Did you see any hive sync log in the console? Can you share the log statements after `df.write`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4506: [SUPPORT] Hive Sync fails silently with embedded derby hive metastore

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4506:
URL: https://github.com/apache/hudi/issues/4506#issuecomment-1017019367


   CC @codope 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4506: [SUPPORT] Hive Sync fails silently with embedded derby hive metastore

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4506:
URL: https://github.com/apache/hudi/issues/4506#issuecomment-1067591056


   Closing this due to no activity. feel free to reach out if you are looking for further assistance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #4506: [SUPPORT] Hive Sync fails silently with embedded derby hive metastore

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #4506:
URL: https://github.com/apache/hudi/issues/4506


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org