You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/09/29 09:13:27 UTC

[GitHub] [hudi] parisni opened a new issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

parisni opened a new issue #3731:
URL: https://github.com/apache/hudi/issues/3731


   hudi 0.9.0, spark3.1
   
   To experiment with OCC I setup this local tools:
   - local hive metastore
   - pyspark script
   - run concurrently with xargs
   
   Sometimes it works as expected (mostrly with 2 concurrent process). But with 4 process I get randomly one of those stacktrace:
   
   Type 1 error:
   
   ```
    : org.apache.hudi.exception.HoodieLockException: Unable to acquire lock, lock object LockResponse(lockid:255, state:WAITING)
    at org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:82)
    at org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:64)
   
   ```
   
   Type 2 error:
   
   ```
    : org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20210921153357
    at org.apache.hudi.table.action.commit.AbstractWriteHelper.write(AbstractWriteHelper.java:62)
    at org.apache.hudi.table.action.commit.SparkUpsertCommitActionExecutor.execute(SparkUpsertCommitActionExecutor.java:46)
    Caused by: java.lang.IllegalArgumentException
   ```
   
   Type 3 error:
   
   ```
    /tmp/test_hudi_pyspark_local/.hoodie/20210921151138.commit.requested
    at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createImmutableFileInPath(HoodieActiveTimeline.java:544)
    at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createFileInMetaPath(HoodieActiveTimeline.java:505)
    Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already exists: file:/tmp/test_hudi_pyspark_local/.hoodie/20210921151138.commit.requested
   ``` 
   
   
   
   Reproduce step:
   
   Python script:
   
   ```python
   ## The idea is to generate a random partition
   ## They are run with a little delay in order to understand why I got the error onthe same commit timestamp
   ## but this is not actually needed
   ## There should be a COUNT=(NB+1) * 10 , where NB is the number of concurrent spark jobs
   
   from pyspark.sql import SparkSession
   
   import pyspark
   from numpy import random
   from time import sleep
   
   sleeptime = random.uniform(2, 5)
   print("sleeping for:", sleeptime, "seconds")
   sleep(sleeptime)
   conf = pyspark.SparkConf()
   spark_conf = [
       (
           "spark.jars.packages",
           "org.apache.hudi:hudi-spark3-bundle_2.12:0.9.0,org.apache.spark:spark-avro_2.12:3.1.2",
       ),
       ("spark.serializer", "org.apache.spark.serializer.KryoSerializer"),
       ("spark.hadoop.hive.metastore.uris", "thrift://localhost:9083"),
       ("spark.hadoop.javax.jdo.option.ConnectionUserName", "hive"),
       ("spark.hadoop.javax.jdo.option.ConnectionPassword", "hive"),
       ("spark.hadoop.hive.server2.thrift.url", "jdbc:hive2://localhost:10000"),
   ]
   conf.setAll(spark_conf)
   spark = (
       SparkSession.builder.appName("test-hudi-hive-sync")
       .config(conf=conf)
       .enableHiveSupport()
       .getOrCreate()
   )
   sc = spark.sparkContext
   
   # Create a table
   sc.setLogLevel("ERROR")
   dataGen = sc._jvm.org.apache.hudi.QuickstartUtils.DataGenerator()
   inserts = sc._jvm.org.apache.hudi.QuickstartUtils.convertToStringList(
       dataGen.generateInserts(10)
   )
   from pyspark.sql.functions import expr
   
   df = (
       spark.read.json(spark.sparkContext.parallelize(inserts, 10))
       .withColumn("part", expr(f"'foo{sleeptime}'"))
    # One partition per run !!
       .withColumn("id", expr("row_number() over(partition by 1 order by 1)"))
   )
   
   
   databaseName = "default"
   tableName = "test_hudi_pyspark_local"
   basePath = f"/tmp/{tableName}"
   
   hudi_options = {
       "hoodie.table.name": tableName,
       "hoodie.datasource.write.recordkey.field": "uuid",
       "hoodie.datasource.write.partitionpath.field": "part",
       "hoodie.datasource.write.table.name": tableName,
       "hoodie.datasource.write.operation": "upsert",
       "hoodie.datasource.write.precombine.field": "ts",
       "hoodie.upsert.shuffle.parallelism": 2,
       "hoodie.insert.shuffle.parallelism": 2,
       # For hive sync metastore
       "hoodie.datasource.hive_sync.database": databaseName,
       "hoodie.datasource.hive_sync.table": tableName,
       "hoodie.datasource.hive_sync.mode": "jdbc",
       "hoodie.datasource.hive_sync.enable": "true",
       "hoodie.datasource.hive_sync.partition_fields": "part",
       "hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor",
       # For concurrency write locks with hive metastore
       "hoodie.write.concurrency.mode": "optimistic_concurrency_control",
       "hoodie.cleaner.policy.failed.writes": "LAZY",
       "hoodie.write.lock.provider": "org.apache.hudi.hive.HiveMetastoreBasedLockProvider",
       "hoodie.write.lock.hivemetastore.database": databaseName,
       "hoodie.write.lock.hivemetastore.table": tableName,
       "hoodie.write.lock.wait_time_ms": "12000",
       "hoodie.write.lock.num_retries": "4",
       "hoodie.embed.timeline.server": "false",
       "hoodie.datasource.write.commitmeta.key.prefix": "deltastreamer.checkpoint.key",
   }
   
   (df.write.format("hudi").options(**hudi_options).mode("append").save(basePath))
   print(
       "@@@@@@@@@@@@@@@@ COUNT={} @@@@@@@@@@@@@@@@@@".format(
           spark.read.format("hudi").load(basePath).count()
       )
   )
   ```
   
   Bash script:
   ```
   #!/usr/bin/env bash
   NB=$1
   rm -rf /tmp/test_hudi_pyspark_local/
   python3 concurrent.py
   seq 1 $NB  | xargs -n 1 -P $NB python3 concurrent.py
   
   ```
   
   Run it:
   ```
   ./conccurrent.sh 4
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-1009428809


   guess I found the issue. hoodie.write.lock.wait_time_ms. Can you set hoodie.write.lock.wait_time_ms to 5 secs may be (5000). you have set it to too high of a value. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] jdattani commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
jdattani commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-942110461


   @parisni @nsivabalan I see you've mentioned you're using hudi 0.9.0 on spark3.1. Is that compatible? 
   https://issues.apache.org/jira/browse/HUDI-1869
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] parisni commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-1002956250


   > Also, are these diff processes in same jvm or from different servers/nodes altogether?
   
   They are independant JVM process : multiple spark-submit run un parallele with xargs 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-1018634072


   @parisni : Did you get a chance to try it out. Let us know if you have any updates on this regard. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan edited a comment on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-997298457


   @parisni:  glad to know most of the issues are gone. 
   what type of lock provider are you using ?
   Also, are these diff processes in same jvm or from different servers/nodes altogether? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] jdattani commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
jdattani commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-942110461


   @parisni @nsivabalan I see you've mentioned you're using hudi 0.9.0 on spark3.1. Is that compatible? 
   https://issues.apache.org/jira/browse/HUDI-1869
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-1008549934


   what kind of lock provider are you using? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-1039649123


   @parisni : Is there any updates on this regard. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] parisni commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-930043315


   tried to dig the source code, there is nothing about partition level lock. Apparently the lock mecanism is on the database+table level only.
   
   https://github.com/apache/hudi/blob/f52cb32f5f0c9e31b5addf29adbb886ca6d167dd/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/TransactionManager.java#L52
   https://github.com/apache/hudi/blob/f52cb32f5f0c9e31b5addf29adbb886ca6d167dd/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/LockManager.java
   https://github.com/apache/hudi/blob/f52cb32f5f0c9e31b5addf29adbb886ca6d167dd/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveMetastoreBasedLockProvider.java#L104
   
   Also according to  the hivemestastore implementation SHALL have a zookeeper. This is not mentionned in the documentation
   https://github.com/apache/hudi/blob/f52cb32f5f0c9e31b5addf29adbb886ca6d167dd/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveMetastoreBasedLockProvider.java#L68
   
   > This feature is currently experimental and requires either Zookeeper or HiveMetastore to acquire locks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] parisni commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-948732989


   @nsivabalan still having random errors : I am not able to make the OCC working correctly, even when each writer deal with a dedicated partition (this should not trigger error)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] parisni commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-1008727425


   this is the hive provider as mentioned in the OP


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-1039649123


   @parisni : Is there any updates on this regard. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] parisni edited a comment on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
parisni edited a comment on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-964051750


   Tried zookeeper implem. Again random errors rise. This only apply when concurrent writers > 2.
   I also tested #3824  This does not solve the issue (my random sleeping time avoid time collision anyway)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-997298457


   glad to know most of the issues are gone. 
   what type of lock provider are you using ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] parisni commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-942310291


   @jdattani  AFAIK, only spark sql features are broken on 3.1 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-1047401320


   @parisni : if you were able to get it resolved, feel free to close out the issue. if not, let us know. if we triage any bugs, we can try to fix it once for all in 0.11 and solidify multi writers for hudi. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-991853688


   can you please try 0.10.0 and let us know how it goes. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] parisni commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-992283484


   with the 0.10.0, things looks better: I only get one kind of concurrency error.
   When running 4 concurrent writers into their own partition, I consistently get
   the bellow error for 2 of them. I
   
   from my config, I am expecting the lock to be able to wait a long time before
   failing. Here the whole process is running for 25 seconds :
   
   ```python
       "hoodie.write.lock.wait_time_ms": "120000",
       "hoodie.write.lock.wait_time_ms_between_retry": "2000",
       "hoodie.write.lock.num_retries": "100",
   ```
   
   
   > org.apache.hudi.exception.HoodieLockException: Unable to acquire lock, lock object LockResponse(lockid:27, state:WAITING)
   >	at org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:82)
   >	at org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:64)
   >	at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:192)
   >	at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:124)
   >	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:633)
   >	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:284)
   >	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #3731:
URL: https://github.com/apache/hudi/issues/3731


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-968189581


   We have 2 fixes on this end. 1 is [millisec granularity](https://github.com/apache/hudi/pull/3824) and another is [rolling back concurrent writers](https://github.com/apache/hudi/pull/3956). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-935132854


   Can you set these configs as well
   hoodie.write.lock.wait_time_ms_between_retry=2000
   hoodie.write.lock.hivemetastore.uris= 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] parisni commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-964051750


   Tried zookeeper implem. Again random errors rise. This only apply when concurrent writers > 2
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] parisni commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-930043315


   tried to dig the source code, there is nothing about partition level lock. Apparently the lock mecanism is on the database+table level only.
   
   https://github.com/apache/hudi/blob/f52cb32f5f0c9e31b5addf29adbb886ca6d167dd/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/TransactionManager.java#L52
   https://github.com/apache/hudi/blob/f52cb32f5f0c9e31b5addf29adbb886ca6d167dd/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/LockManager.java
   https://github.com/apache/hudi/blob/f52cb32f5f0c9e31b5addf29adbb886ca6d167dd/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveMetastoreBasedLockProvider.java#L104
   
   Also according to  the hivemestastore implementation SHALL have a zookeeper. This is not mentionned in the documentation
   https://github.com/apache/hudi/blob/f52cb32f5f0c9e31b5addf29adbb886ca6d167dd/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveMetastoreBasedLockProvider.java#L68
   
   > This feature is currently experimental and requires either Zookeeper or HiveMetastore to acquire locks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan edited a comment on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-1009428809


   guess I found the issue. Can you set hoodie.write.lock.wait_time_ms to 5 secs may be (5000). you have set it to too high of a value. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-968190251


   for transaction manager timing out, can you try adding retries. 
   hoodie.write.lock.client.num_retries
   hoodie.write.lock.client.wait_time_ms_between_retry
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] parisni commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-942310291


   @jdattani  AFAIK, only spark sql features are broken on 3.1 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-1073074305


   @parisni : will go ahead and close this issue out. If you hit any more issues, feel free to create new issue w/ details. 
   thanks! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3731: [SUPPORT] Concurrent write (OCC) on distinct partitions random errors

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3731:
URL: https://github.com/apache/hudi/issues/3731#issuecomment-1002289918


   @parisni : do you have any updates for us.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org