You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/09/24 10:28:49 UTC

[GitHub] [hudi] calleo opened a new issue #3713: [SUPPORT] Cannot read from Hudi table if same job created it

calleo opened a new issue #3713:
URL: https://github.com/apache/hudi/issues/3713


   **Describe the problem you faced**
   
   I am trying to get Hudi + Hive sync to run in a docker container. The end-goal is to have a docker image which can be used to run tests locally on a dev machine.
   
   To start with, I created a simple job that writes to a Hudi table and then reads it. While running this I have observed that reading fails with the error: 
   
   `ERROR XSDB6: Another instance of Derby may have already booted the database /opt/bitnami/spark/metastore_db`
   
   The only way I can get it to work is by writing and reading using separate spark jobs, or start derby in a separate process and use jdbc to connect.
   
   Looks like the hive-sync functionality locks the databases, and won't release it until the process ends (stopping the spark session does not help).
   
   Is there any way to get around this? 
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   Created a docker image and script that demonstrates the issue.
   
   1. Clone https://github.com/calleo/hudi-spark-standalone
   2. Run `make run_fail_1`
   3. Run `make run_fail_2`
   
   **Expected behavior**
   
   It should be possible to read from a Hudi table, that has been previously created by the same Spark job.
   
   **Environment Description**
   
   * Hudi version : 0.9
   
   * Spark version : 3.1.1
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : Local
   
   * Running on Docker? (yes/no) : yes
   
   **Additional context**
   
   
   
   **Stacktrace**
   
   ```
   Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /opt/bitnami/spark/metastore_db.
   	at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
   	at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
   	at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source)
   	at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown Source)
   	at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
   	at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
   	at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
   	at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
   	at org.apache.derby.impl.services.monitor.FileMonitor.startModule(Unknown Source)
   	at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
   	at org.apache.derby.impl.store.raw.RawStore$6.run(Unknown Source)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at org.apache.derby.impl.store.raw.RawStore.bootServiceModule(Unknown Source)
   	at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
   	at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
   	at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
   	at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
   	at org.apache.derby.impl.services.monitor.FileMonitor.startModule(Unknown Source)
   	at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
   	at org.apache.derby.impl.store.access.RAMAccessManager$5.run(Unknown Source)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at org.apache.derby.impl.store.access.RAMAccessManager.bootServiceModule(Unknown Source)
   	at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source)
   	at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
   	at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
   	at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
   	at org.apache.derby.impl.services.monitor.FileMonitor.startModule(Unknown Source)
   	at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
   	at org.apache.derby.impl.db.BasicDatabase$5.run(Unknown Source)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at org.apache.derby.impl.db.BasicDatabase.bootServiceModule(Unknown Source)
   	at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
   	at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
   	at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
   	at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
   	at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
   	at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source)
   	at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source)
   	at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source)
   	at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source)
   	at org.apache.derby.impl.jdbc.EmbedConnection$4.run(Unknown Source)
   	at org.apache.derby.impl.jdbc.EmbedConnection$4.run(Unknown Source)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at org.apache.derby.impl.jdbc.EmbedConnection.startPersistentService(Unknown Source)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #3713: [SUPPORT] Cannot read from Hudi table created by same Spark job

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #3713:
URL: https://github.com/apache/hudi/issues/3713


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on issue #3713: [SUPPORT] Cannot read from Hudi table created by same Spark job

Posted by GitBox <gi...@apache.org>.
codope commented on issue #3713:
URL: https://github.com/apache/hudi/issues/3713#issuecomment-929284948


   @calleo The hive sync tool does not lock. I mean, internallt HoodiHiveClient does initiate the connection but that gets closed as soon as sync completes. I think in docker setup, derby for metastore is running in emebedded mode which won't allow to read and write within the same process. Can you try to setup MariaDB for metastore in a separate container?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on issue #3713: [SUPPORT] Cannot read from Hudi table created by same Spark job

Posted by GitBox <gi...@apache.org>.
codope commented on issue #3713:
URL: https://github.com/apache/hudi/issues/3713#issuecomment-929284948


   @calleo The hive sync tool does not lock. I mean, internallt HoodiHiveClient does initiate the connection but that gets closed as soon as sync completes. I think in docker setup, derby for metastore is running in emebedded mode which won't allow to read and write within the same process. Can you try to setup MariaDB for metastore in a separate container?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3713: [SUPPORT] Cannot read from Hudi table created by same Spark job

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3713:
URL: https://github.com/apache/hudi/issues/3713#issuecomment-997299018


   @calleo : did you get a chance to try out the docker set up that sagar pointed out earlier. it does have all components (spark, hive, presto, etc) to assist in testing. Let us know if you have any updates. Or feel free to close it out if the docker worked out for you. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on issue #3713: [SUPPORT] Cannot read from Hudi table created by same Spark job

Posted by GitBox <gi...@apache.org>.
codope commented on issue #3713:
URL: https://github.com/apache/hudi/issues/3713#issuecomment-930112068


   > The end-goal is to have a docker image which can be used to run tests locally on a dev machine.
   For this I would suggest to give our readymade [docker setup](https://hudi.apache.org/docs/docker_demo/) a try.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope edited a comment on issue #3713: [SUPPORT] Cannot read from Hudi table created by same Spark job

Posted by GitBox <gi...@apache.org>.
codope edited a comment on issue #3713:
URL: https://github.com/apache/hudi/issues/3713#issuecomment-930112068


   > The end-goal is to have a docker image which can be used to run tests locally on a dev machine.
   
   For this I would suggest to give our readymade [docker setup](https://hudi.apache.org/docs/docker_demo/) a try.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3713: [SUPPORT] Cannot read from Hudi table created by same Spark job

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3713:
URL: https://github.com/apache/hudi/issues/3713#issuecomment-1008550188


   Closing this due to inactivity. Feel free to re-open if need be. would be happy to help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope edited a comment on issue #3713: [SUPPORT] Cannot read from Hudi table created by same Spark job

Posted by GitBox <gi...@apache.org>.
codope edited a comment on issue #3713:
URL: https://github.com/apache/hudi/issues/3713#issuecomment-930112068


   > The end-goal is to have a docker image which can be used to run tests locally on a dev machine.
   
   For this I would suggest to give our readymade [docker setup](https://hudi.apache.org/docs/docker_demo/) a try.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on issue #3713: [SUPPORT] Cannot read from Hudi table created by same Spark job

Posted by GitBox <gi...@apache.org>.
codope commented on issue #3713:
URL: https://github.com/apache/hudi/issues/3713#issuecomment-930112068


   > The end-goal is to have a docker image which can be used to run tests locally on a dev machine.
   For this I would suggest to give our readymade [docker setup](https://hudi.apache.org/docs/docker_demo/) a try.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org