You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2023/01/12 09:10:29 UTC

[GitHub] [hudi] hemanth-gowda-12 opened a new issue, #7654: [SUPPORT] Starvation on Hudi Java Client in OCC mode

hemanth-gowda-12 opened a new issue, #7654:
URL: https://github.com/apache/hudi/issues/7654

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   Trying to replicate a distributed system via a test running Hudi Java Client in OCC mode.
   [link](https://github.com/hemanth-gowda-12/ApacheHudiOccTest/blob/main/occ/src/test/java/org/example/HudiOccTest.java)
   
   Running into a scenario where there is starvation waiting for locks just using 3 writers to mimic 3 distributed machines. The performance doesn't seem practical the way I'm testing it. Trying to understand how to optimize. 
   The starvation exists when using both the ZooKeeper and FS lock providers but it more prominent on ZK since there are multiple requests for locks which results in infinite starvation.
   
   TLDR; Run the below test, after a few writes, the client goes into a starvation phase and remains idle doing no work and eventually failing with the below exception 
   `org.apache.hudi.exception.HoodieLockException: Unable to acquire lock, lock object`   
   
   **To Reproduce**
   Run the test [here](https://github.com/hemanth-gowda-12/ApacheHudiOccTest/blob/main/occ/src/test/java/org/example/HudiOccTest.java) and look at the logs and the occ/tmp/hudiTest dir for the test table.
   
   Steps to reproduce the behavior:
   1. Just run the test to reproduce the starvation using FS lock proviser.
   2. To reproduce Zookeeper starvation scenario, comment line 151-156 and Uncomment lines 160-168
   3. Delete the occ/tmp directory and re-run the test
   4. Install Docker and run `docker run -d  --name zookeeper  -p 2181:2181  jplock/zookeeper`
   5. The test will hang due to starvation after a few seconds of running. You can inspect the Zookeeper locks being held un-released as shown below.
   6. Download Zookeeper client and do `sh /opt/zookeeper-3.7.1-bin/bin/zkCli.sh -server 127.0.0.1:218`
   7. After the client connects, do `ls /test/test_table`
   
   **Expected behavior**
   Test completes with reasonable performance - The test generates records with keys with range 0-99 10 times. Each partition should have 1 insert and 9 updates happening in parallel.
   
   A clear and concise description of what you expected to happen.
   
   OCC mode having reasonable performance using the Java Client to support high throughput writes/updates.
   
   **Environment Description**
   
   * Hudi version : 0.12.2
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : Local FS
   
   * Running on Docker? (yes/no) : No
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   Client runs for a while and then starves at log point
   `2023-01-12 00:59:03,814 [INFO  ] ConnectionStateManager - State change: CONNECTED
   2023-01-12 00:59:09,199 [INFO  ] ZookeeperBasedLockProvider - ACQUIRING lock atZkBasePath = /test, lock key = test_table
   2023-01-12 00:59:09,739 [INFO  ] ZookeeperBasedLockProvider - ACQUIRING lock atZkBasePath = /test, lock key = test_table
   2023-01-12 01:00:04,821 [INFO  ] ZookeeperBasedLockProvider - ACQUIRING lock atZkBasePath = /test, lock key = test_table
   2023-01-12 01:00:10,215 [INFO  ] ZookeeperBasedLockProvider - ACQUIRING lock atZkBasePath = /test, lock key = test_table
   2023-01-12 01:00:10,756 [INFO  ] ZookeeperBasedLockProvider - ACQUIRING lock atZkBasePath = /test, lock key = test_table
   2023-01-12 01:01:05,839 [INFO  ] ZookeeperBasedLockProvider - ACQUIRING lock atZkBasePath = /test, lock key = test_table
   2023-01-12 01:01:11,235 [INFO  ] ZookeeperBasedLockProvider - ACQUIRING lock atZkBasePath = /test, lock key = test_table
   2023-01-12 01:01:11,771 [INFO  ] ZookeeperBasedLockProvider - ACQUIRING lock atZkBasePath = /test, lock key = test_table
   2023-01-12 01:02:06,856 [INFO  ] ZookeeperBasedLockProvider - ACQUIRING lock atZkBasePath = /test, lock key = test_table
   2023-01-12 01:02:12,255 [INFO  ] ZookeeperBasedLockProvider - ACQUIRING lock atZkBasePath = /test, lock key = test_table
   2023-01-12 01:02:12,789 [INFO  ] ZookeeperBasedLockProvider - ACQUIRING lock atZkBasePath = /test, lock key = test_table
   2023-01-12 01:03:07,875 [INFO  ] ZookeeperBasedLockProvider - ACQUIRING lock atZkBasePath = /test, lock key = test_table
   2023-01-12 01:03:13,272 [INFO  ] ZookeeperBasedLockProvider - ACQUIRING lock atZkBasePath = /test, lock key = test_table
   2023-01-12 01:03:13,802 [INFO  ] ZookeeperBasedLockProvider - ACQUIRING lock atZkBasePath = /test, lock key = test_table`
   
   It eventually fails with an error
   `org.apache.hudi.exception.HoodieLockException: Unable to acquire lock, lock object `
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #7654: [SUPPORT] Starvation on Hudi Java Client in OCC mode

Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1381576992

   @yihua @nsivabalan Any ideas, seems the zk encounters deadlock.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] fengjian428 commented on issue #7654: [SUPPORT] Starvation on Hudi Java Client in OCC mode

Posted by GitBox <gi...@apache.org>.
fengjian428 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1396481587

   > Cool, do we need a patch for this issue, can you fire a JIRA issue then @fengjian428?
   
   https://issues.apache.org/jira/browse/HUDI-5583


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #7654: [SUPPORT] Starvation on Hudi Java Client in OCC mode

Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1396453511

   Cool, do we need a patch for this issue, can you fire a JIRA issue then @fengjian428?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hemanth-gowda-12 commented on issue #7654: [SUPPORT] Starvation on Hudi Java Client in OCC mode

Posted by GitBox <gi...@apache.org>.
hemanth-gowda-12 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1396728910

   @fengjian428 , sure.. Please feel free to include the test code. Thanks for quick fix.  I tried it out, did not come across deadlocks anymore with FS based lock provider.
   However, I bumped up the writers to 100 and ran into this.
   
   `2023-01-19 02:04:40,411 [INFO  ] HoodieMergeHandle - Merging new data into oldPath /Users/h0b00m9/IdeaProjects/ApacheHudiOccTest/occ/tmp/hudiTest/44/test/5dd8fea5-cf44-4432-b775-01cb67d1250d-0_0-0-0_20230119020209727.parquet, as newPath /Users/h0b00m9/IdeaProjects/ApacheHudiOccTest/occ/tmp/hudiTest/44/test/5dd8fea5-cf44-4432-b775-01cb67d1250d-0_0-0-0_20230119020440202.parquet
   2023-01-19 02:04:40,412 [INFO  ] DirectWriteMarkers - Creating Marker Path=/Users/h0b00m9/IdeaProjects/ApacheHudiOccTest/occ/tmp/hudiTest/.hoodie/.temp/20230119020440202/44/test/5dd8fea5-cf44-4432-b775-01cb67d1250d-0_0-0-0_20230119020440202.parquet.marker.MERGE
   
   org.apache.hudi.exception.HoodieUpsertException: Failed upsert schema compatibility check
   
   	at org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:820)
   	at org.apache.hudi.client.HoodieJavaWriteClient.upsert(HoodieJavaWriteClient.java:109)
   	at org.example.HudiOccTest.lambda$HudiTest$2(HudiOccTest.java:213)
   	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
   	at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
   	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
   	at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
   	at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
   	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
   	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
   	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
   	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
   	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
   Caused by: org.apache.hudi.exception.HoodieException: Failed to read schema/check compatibility for base path /Users/h0b00m9/IdeaProjects/ApacheHudiOccTest/occ/tmp/hudiTest
   	at org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:807)
   	at org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:818)
   	... 12 more
   Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit details from /Users/h0b00m9/IdeaProjects/ApacheHudiOccTest/occ/tmp/hudiTest/.hoodie/20230119020417275.commit
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:824)
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:310)
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.lambda$getCommitMetadataStream$2(HoodieActiveTimeline.java:349)
   	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
   	at java.base/java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:361)
   	at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:503)
   	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:488)
   	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
   	at java.base/java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:150)
   	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   	at java.base/java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:543)
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getLastCommitMetadataWithValidSchema(HoodieActiveTimeline.java:321)
   	at org.apache.hudi.common.table.TableSchemaResolver.getLatestCommitMetadataWithValidSchema(TableSchemaResolver.java:491)
   	at org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromLatestCommitMetadata(TableSchemaResolver.java:225)
   	at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaInternal(TableSchemaResolver.java:199)
   	at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:139)
   	at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaWithoutMetadataFields(TableSchemaResolver.java:192)
   	at org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:804)
   	... 13 more
   Caused by: java.io.FileNotFoundException: File file:/Users/h0b00m9/IdeaProjects/ApacheHudiOccTest/occ/tmp/hudiTest/.hoodie/20230119020417275.commit does not exist`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] fengjian428 commented on issue #7654: [SUPPORT] Starvation on Hudi Java Client in OCC mode

Posted by GitBox <gi...@apache.org>.
fengjian428 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1396397279

   @hemanth-gowda-12  try to remove everything under the table path before you start the occ test.
   I didn't met the second issue after doing so
   ![image](https://user-images.githubusercontent.com/4403474/213349452-eeb59b47-2ca6-47f5-bd33-17d0d620b347.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hemanth-gowda-12 commented on issue #7654: [SUPPORT] Starvation on Hudi Java Client in OCC mode

Posted by GitBox <gi...@apache.org>.
hemanth-gowda-12 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1397757982

   @fengjian428 , I see the same issue adding the above line. Will we need to create a Jira for the above? Again, thanks a lot for the effort.
   
   ```
   org.apache.hudi.exception.HoodieUpsertException: Failed upsert schema compatibility check
   
   	at org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:820)
   	at org.apache.hudi.client.HoodieJavaWriteClient.upsert(HoodieJavaWriteClient.java:109)
   	at org.example.HudiOccTest.lambda$HudiTest$2(HudiOccTest.java:220)
   	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
   	at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
   	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
   	at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
   	at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
   	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
   	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
   	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
   	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
   	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
   Caused by: org.apache.hudi.exception.HoodieException: Failed to read schema/check compatibility for base path /Users/xxxx/IdeaProjects/ApacheHudiOccTest/occ/tmp/hudiTest
   	at org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:807)
   	at org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:818)
   	... 12 more
   Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit details from /Users/xxxx/IdeaProjects/ApacheHudiOccTest/occ/tmp/hudiTest/.hoodie/20230119154516823.commit
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:824)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hemanth-gowda-12 commented on issue #7654: [SUPPORT] Starvation on Hudi Java Client in OCC mode

Posted by GitBox <gi...@apache.org>.
hemanth-gowda-12 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1387661535

   @fengjian428 I tried out using the master branch. 
   - I'm still seeing the deadlock on ZookeeperProvider.
   - On the FS based lock provider, I'm not seeing a deadlock, instead the below error. Maybe that's a different issue. 
   `Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit details from /Users/h0b00m9/IdeaProjects/ApacheHudiOccTest/occ/tmp/hudiTest/.hoodie/20230118113512946.commit
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:824)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] fengjian428 commented on issue #7654: [SUPPORT] Starvation on Hudi Java Client in OCC mode

Posted by GitBox <gi...@apache.org>.
fengjian428 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1386801332

   @danny0405 is this relevant to https://github.com/apache/hudi/pull/7509 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hemanth-gowda-12 commented on issue #7654: [SUPPORT] Deadlock on Hudi Java Client in OCC mode

Posted by "hemanth-gowda-12 (via GitHub)" <gi...@apache.org>.
hemanth-gowda-12 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1400747010

   > @fengjian428 One additional thing, doing a tree on the output dir, I see duplicates on upserts sometimes. (Looking at the different file groups under partition 8).
   > 
   > ```
   > ├── 79
   > │   └── test
   > │       └── 72383114-10ea-48dc-95c8-95862d87cea1-0_0-0-0_20230119154610592.parquet
   > ├── 8
   > │   └── test
   > │       ├── 37160938-ccc6-4125-b3a5-8671b88be605-0_0-0-0_20230119154430261.parquet
   > │       ├── 37160938-ccc6-4125-b3a5-8671b88be605-0_0-0-0_20230119154612937.parquet
   > │       ├── d0df26b9-81cb-45be-8b6c-eefb972ba6cf-0_0-0-0_20230119154442286.parquet
   > │       └── d0df26b9-81cb-45be-8b6c-eefb972ba6cf-0_0-0-0_20230119154612937.parquet
   > ```
   > 
   > Please let me know if you think I should create a different issue for that problem.
   
   Hi @yihua , Thanks for picking up the issue. Do you think I need to raise a separate issue / Jira for the above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #7654: [SUPPORT] Deadlock on Hudi Java Client in OCC mode

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1401380302

   I think we need, seems there are some issues with the fs view refresh.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hemanth-gowda-12 commented on issue #7654: [SUPPORT] Starvation on Hudi Java Client in OCC mode

Posted by GitBox <gi...@apache.org>.
hemanth-gowda-12 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1397816372

   @fengjian428 One additional thing,
   doing a tree on the output dir, I see duplicates on upserts sometimes. (Looking at the different file groups under partition 8).
   
   ```
   ├── 79
   │   └── test
   │       └── 72383114-10ea-48dc-95c8-95862d87cea1-0_0-0-0_20230119154610592.parquet
   ├── 8
   │   └── test
   │       ├── 37160938-ccc6-4125-b3a5-8671b88be605-0_0-0-0_20230119154430261.parquet
   │       ├── 37160938-ccc6-4125-b3a5-8671b88be605-0_0-0-0_20230119154612937.parquet
   │       ├── d0df26b9-81cb-45be-8b6c-eefb972ba6cf-0_0-0-0_20230119154442286.parquet
   │       └── d0df26b9-81cb-45be-8b6c-eefb972ba6cf-0_0-0-0_20230119154612937.parquet
   
   ```
   
   Please let me know if you think I should create a different issue for that problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan closed issue #7654: [SUPPORT] Deadlock on Hudi Java Client in OCC mode

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan closed issue #7654: [SUPPORT] Deadlock on Hudi Java Client in OCC mode
URL: https://github.com/apache/hudi/issues/7654


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] fengjian428 commented on issue #7654: [SUPPORT] Starvation on Hudi Java Client in OCC mode

Posted by GitBox <gi...@apache.org>.
fengjian428 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1396788939

             @hemanth-gowda-12 
    try to add this line
   .withFileSystemViewConfig(FileSystemViewStorageConfig.newBuilder().withStorageType(FileSystemViewStorageType.MEMORY).build())
   I suspect it was caused by RemoteFileSystemView having a problem with multiple writers.
   not sure it can be fixed quickly, maybe a by-design issue
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] fengjian428 commented on issue #7654: [SUPPORT] Starvation on Hudi Java Client in OCC mode

Posted by GitBox <gi...@apache.org>.
fengjian428 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1396376945

   @hemanth-gowda-12 https://github.com/apache/hudi/pull/7704  I wrote a fix for the first issue, I test with file lock it can solve this issue, you can also try it out. you shouldn't use withFileSystemLockExpire since this is only for edge case
   
   for the second issue, haven't found the root cause yet, will continue to Investigate


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] fengjian428 commented on issue #7654: [SUPPORT] Starvation on Hudi Java Client in OCC mode

Posted by GitBox <gi...@apache.org>.
fengjian428 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1386730896

   @hemanth-gowda-12 if you did not set withFileSystemLockExpire, the Filelock also stuck right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] fengjian428 commented on issue #7654: [SUPPORT] Starvation on Hudi Java Client in OCC mode

Posted by GitBox <gi...@apache.org>.
fengjian428 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1397981689

   > @fengjian428 , sure.. Please feel free to include the test code. Thanks for the quick fix.. I tried it out a snapshot built from your branch clean_deadlock, did not come across deadlocks anymore with FS based lock provider.
   > 
   > However, I bumped up the writers to 100 and ran into the below. **I have not used FileSystemLockExpire.**
   > 
   > ```
   > 2023-01-19 02:04:40,411 [INFO  ] HoodieMergeHandle - Merging new data into oldPath /Users/xxxx/IdeaProjects/ApacheHudiOccTest/occ/tmp/hudiTest/44/test/5dd8fea5-cf44-4432-b775-01cb67d1250d-0_0-0-0_20230119020209727.parquet, as newPath /Users/xxxx/IdeaProjects/ApacheHudiOccTest/occ/tmp/hudiTest/44/test/5dd8fea5-cf44-4432-b775-01cb67d1250d-0_0-0-0_20230119020440202.parquet
   > 2023-01-19 02:04:40,412 [INFO  ] DirectWriteMarkers - Creating Marker Path=/Users/xxxx/IdeaProjects/ApacheHudiOccTest/occ/tmp/hudiTest/.hoodie/.temp/20230119020440202/44/test/5dd8fea5-cf44-4432-b775-01cb67d1250d-0_0-0-0_20230119020440202.parquet.marker.MERGE
   > 
   > org.apache.hudi.exception.HoodieUpsertException: Failed upsert schema compatibility check
   > 
   > 	at org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:820)
   > 	at org.apache.hudi.client.HoodieJavaWriteClient.upsert(HoodieJavaWriteClient.java:109)
   > 	at org.example.HudiOccTest.lambda$HudiTest$2(HudiOccTest.java:213)
   > 	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
   > 	at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
   > 	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
   > 	at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
   > 	at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
   > 	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
   > 	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
   > 	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
   > 	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
   > 	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
   > Caused by: org.apache.hudi.exception.HoodieException: Failed to read schema/check compatibility for base path /Users/xxxx/IdeaProjects/ApacheHudiOccTest/occ/tmp/hudiTest
   > 	at org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:807)
   > 	at org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:818)
   > 	... 12 more
   > Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit details from /Users/xxxx/IdeaProjects/ApacheHudiOccTest/occ/tmp/hudiTest/.hoodie/20230119020417275.commit
   > 	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:824)
   > 	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:310)
   > 	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.lambda$getCommitMetadataStream$2(HoodieActiveTimeline.java:349)
   > 	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
   > 	at java.base/java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:361)
   > 	at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:503)
   > 	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:488)
   > 	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
   > 	at java.base/java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:150)
   > 	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   > 	at java.base/java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:543)
   > 	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getLastCommitMetadataWithValidSchema(HoodieActiveTimeline.java:321)
   > 	at org.apache.hudi.common.table.TableSchemaResolver.getLatestCommitMetadataWithValidSchema(TableSchemaResolver.java:491)
   > 	at org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromLatestCommitMetadata(TableSchemaResolver.java:225)
   > 	at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaInternal(TableSchemaResolver.java:199)
   > 	at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:139)
   > 	at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaWithoutMetadataFields(TableSchemaResolver.java:192)
   > 	at org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:804)
   > 	... 13 more
   > Caused by: java.io.FileNotFoundException: File file:/Users/xxxx/IdeaProjects/ApacheHudiOccTest/occ/tmp/hudiTest/.hoodie/20230119020417275.commit does not exist
   > ```
   Do you mean to increase the numHudiWriteClients to 100?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hemanth-gowda-12 commented on issue #7654: [SUPPORT] Deadlock on Hudi Java Client in OCC mode

Posted by "hemanth-gowda-12 (via GitHub)" <gi...@apache.org>.
hemanth-gowda-12 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1409226000

   @nsivabalan, the above sounds great. It would be great to understand better the scope/level of effort before committing to this. I'll move this conversation to a slack thread and setup a time to sync if it sounds good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #7654: [SUPPORT] Deadlock on Hudi Java Client in OCC mode

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1419930475

   sure. thanks! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #7654: [SUPPORT] Deadlock on Hudi Java Client in OCC mode

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1407256160

   I don't think we added multi-writer support to java client yet. https://issues.apache.org/jira/browse/HUDI-4774
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #7654: [SUPPORT] Deadlock on Hudi Java Client in OCC mode

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1407256246

   If you are interested to contribute, let us know. we can assist/guide you if need be. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #7654: [SUPPORT] Deadlock on Hudi Java Client in OCC mode

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1401382132

   I think maybe two fixes are related here:
   
   one is for clean deadlock: #7739
   another is for the fs view of multi-writer: #7738 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] fengjian428 commented on issue #7654: [SUPPORT] Starvation on Hudi Java Client in OCC mode

Posted by GitBox <gi...@apache.org>.
fengjian428 commented on issue #7654:
URL: https://github.com/apache/hudi/issues/7654#issuecomment-1396391593

   > @danny0405 is this relevant to #7509 ?
   
   sorry, shouldn't be relevant to this pr


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org