You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/04 12:12:44 UTC

[GitHub] [hudi] QuChunhe opened a new issue, #6584: [SUPPORT]Hudi Java client don't support Multi Writing

QuChunhe opened a new issue, #6584:
URL: https://github.com/apache/hudi/issues/6584

   Hudi Java client don't support Multi Writing, and throw  errors: "Cannot resolve conflicts for overlapping writes"
   
   1. Hudi version: 0.12.0, aliyun oss file system, flink 1.13.6, and hudi sink parallelism 2
   
   2. Hudi Java client  configuration  is as follows. 
   
   
      public HudiSink setZookeeperLock(String zkQuorum) {
       if (Objects.isNull(zkQuorum)) {
         return this;
       }
       writeConcurrencyMode = WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL;
       hoodieFailedWritesCleaningPolicy = HoodieFailedWritesCleaningPolicy.LAZY;
       hoodieLockConfig = HoodieLockConfig.newBuilder()
           .withZkQuorum(zkQuorum)
           .withZkBasePath("/" + databasePrefix)
           .withZkLockKey(tableName)
           .withZkConnectionTimeoutInMs(50L * 1000L)
           .withZkSessionTimeoutInMs(1500 * 1000L)
           .withRetryWaitTimeInMillis(60L * 1000L)
           .build();
       return this;
     }
   
   
   @Override
     public void open(Configuration parameters) throws Exception {
       super.open(parameters);
       org.apache.hadoop.conf.Configuration hadoopConf = new org.apache.hadoop.conf.Configuration();
       String tablePath = warehousePath + databasePrefix + "/" + tableName;
       // initialize the table, if not done already
       Path path = new Path(tablePath);
       FileSystem fs = FSUtils.getFs(tablePath, hadoopConf);
   
       String schema = Util.getStringFromResource("/" + databasePrefix + "."
           + tableName + ".schema");
   
       // Create the write client to write some records in
       HoodieWriteConfig cfg = HoodieWriteConfig.newBuilder()
           .withPath(tablePath)
           .withSchema(schema)
           .forTable(tableName)
           .withAutoCommit(true)
           .withTableServicesEnabled(true)
           .withEmbeddedTimelineServerEnabled(true)
           .withMarkersType(MarkerType.TIMELINE_SERVER_BASED.name())
           .withRollbackUsingMarkers(true)
           .withDeleteParallelism(parallelism)
           .withParallelism(parallelism, parallelism)
           .withFinalizeWriteParallelism(parallelism)
           .withRollbackParallelism(parallelism / 2)
           .withWriteBufferLimitBytes(32 * 1024 * 1024)
           .withWriteConcurrencyMode(writeConcurrencyMode)
           .withLockConfig(hoodieLockConfig)
           //.withEngineType(EngineType.SPARK)
           .withCleanConfig(HoodieCleanConfig.newBuilder()
               .withAutoClean(true)
               .withFailedWritesCleaningPolicy(hoodieFailedWritesCleaningPolicy)
               .withAsyncClean(false)
               .build())
           .withStorageConfig(
               HoodieStorageConfig.newBuilder()
                   .parquetWriteLegacyFormat("false")
                   .build())
           .withMetadataConfig(
               HoodieMetadataConfig.newBuilder()
                   .withAsyncClean(false)
                   .withAsyncIndex(false)
                   .enable(true)
                   .build())
           .withIndexConfig(
               HoodieIndexConfig.newBuilder()
                   .withIndexType(IndexType.BLOOM)
                   .build())
           .withArchivalConfig(HoodieArchivalConfig.newBuilder()
               .archiveCommitsWith(40, 60)
               .build())
           .withCompactionConfig(
               HoodieCompactionConfig.newBuilder()
                   .withCompactionLazyBlockReadEnabled(true)
                   .build())
           .build();
       client = new HoodieJavaWriteClient<>(new HoodieJavaEngineContext(hadoopConf), cfg);
   
     }
   
   
   3. I have configured the HoodieFailedWritesCleaningPolicy.LAZY and  HoodieLockConfig. The zookeeper log is as follows
   
   2022-09-04 19:27:30,947 INFO  org.apache.zookeeper.ZooKeeper                               [] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
   2022-09-04 19:27:30,947 INFO  org.apache.zookeeper.ZooKeeper                               [] - Client environment:java.io.tmpdir=/tmp
   2022-09-04 19:27:30,947 INFO  org.apache.zookeeper.ZooKeeper                               [] - Client environment:java.compiler=<NA>
   2022-09-04 19:27:30,947 INFO  org.apache.zookeeper.ZooKeeper                               [] - Client environment:os.name=Linux
   2022-09-04 19:27:30,947 INFO  org.apache.zookeeper.ZooKeeper                               [] - Client environment:os.arch=amd64
   2022-09-04 19:27:30,947 INFO  org.apache.zookeeper.ZooKeeper                               [] - Client environment:os.version=4.19.91-25.8.al7.x86_64
   2022-09-04 19:27:30,947 INFO  org.apache.zookeeper.ZooKeeper                               [] - Client environment:user.name=hadoop
   2022-09-04 19:27:30,947 INFO  org.apache.zookeeper.ZooKeeper                               [] - Client environment:user.home=/home/hadoop
   2022-09-04 19:27:30,947 INFO  org.apache.zookeeper.ZooKeeper                               [] - Client environment:user.dir=/mnt/disk4/yarn/nm-local-dir/usercache/root/appcache/application_1661840256099_1332/container_e01_1661840256099_1332_01_000002
   2022-09-04 19:27:30,948 INFO  org.apache.zookeeper.ZooKeeper                               [] - Initiating client connection, connectString=*:2181,*:2181,*:2181 sessionTimeout=1500000 watcher=org.apache.curator.ConnectionState@18582146
   2022-09-04 19:27:30,967 WARN  org.apache.zookeeper.ClientCnxn                              [] - SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/mnt/disk3/yarn/nm-local-dir/usercache/root/appcache/application_1661840256099_1332/jaas-6909589205217539821.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
   2022-09-04 19:27:30,968 INFO  shadow.gs.org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider [] - ACQUIRING lock atZkBasePath = /gs_ods.db, lock key = ods_i_rcc_robot_state_report_v2
   2022-09-04 19:27:30,969 INFO  org.apache.zookeeper.ClientCnxn                              [] - Opening socket connection to server master-1-3.c-ac534405d725db11.cn-shanghai.emr.aliyuncs.com/*:2181
   2022-09-04 19:27:30,969 INFO  org.apache.zookeeper.ClientCnxn                              [] - Socket connection established to master-1-3.c-ac534405d725db11.cn-shanghai.emr.aliyuncs.com/*:2181, initiating session
   2022-09-04 19:27:30,975 INFO  org.apache.zookeeper.ClientCnxn                              [] - Session establishment complete on server master-1-3.c-ac534405d725db11.cn-shanghai.emr.aliyuncs.com/*:2181, sessionid = 0x3008f4f8a0391be, negotiated timeout = 360000
   
   5. The errors are as follows
   
   shadow.gs.org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20220904192833325
           at shadow.gs.org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:64) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.table.action.commit.JavaInsertCommitActionExecutor.execute(JavaInsertCommitActionExecutor.java:47) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.table.HoodieJavaCopyOnWriteTable.insert(HoodieJavaCopyOnWriteTable.java:111) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.table.HoodieJavaCopyOnWriteTable.insert(HoodieJavaCopyOnWriteTable.java:82) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.client.HoodieJavaWriteClient.insert(HoodieJavaWriteClient.java:126) ~[robot-stream.jar:?]
           at com.robot.gs.sink.HudiSink.invoke(HudiSink.java:132) ~[robot-stream.jar:?]
           at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54) ~[robot-stream.jar:?]
           at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:205) ~[robot-stream.jar:?]
           at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134) ~[robot-stream.jar:?]
           at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105) ~[robot-stream.jar:?]
           at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66) ~[robot-stream.jar:?]
           at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423) ~[robot-stream.jar:?]
           at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204) ~[robot-stream.jar:?]
           at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684) ~[robot-stream.jar:?]
           at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639) ~[robot-stream.jar:?]
           at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650) [robot-stream.jar:?]
           at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623) [robot-stream.jar:?]
           at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779) [robot-stream.jar:?]
           at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) [robot-stream.jar:?]
           at java.lang.Thread.run(Thread.java:750) [?:1.8.0_332]
   Caused by: shadow.gs.org.apache.hudi.exception.HoodieWriteConflictException: java.util.ConcurrentModificationException: Cannot resolve conflicts for overlapping writes
           at shadow.gs.org.apache.hudi.client.transaction.SimpleConcurrentFileWritesConflictResolutionStrategy.resolveConflict(SimpleConcurrentFileWritesConflictResolutionStrategy.java:102) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.client.utils.TransactionUtils.lambda$resolveWriteConflictIfAny$0(TransactionUtils.java:85) ~[robot-stream.jar:?]
           at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) ~[?:1.8.0_332]
           at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) ~[?:1.8.0_332]
           at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) ~[?:1.8.0_332]
           at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) ~[?:1.8.0_332]
           at shadow.gs.org.apache.hudi.client.utils.TransactionUtils.resolveWriteConflictIfAny(TransactionUtils.java:79) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.table.action.commit.BaseCommitActionExecutor.autoCommit(BaseCommitActionExecutor.java:189) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.table.action.commit.BaseCommitActionExecutor.commitOnAutoCommit(BaseCommitActionExecutor.java:175) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.table.action.commit.BaseJavaCommitActionExecutor.updateIndexAndCommitIfNeeded(BaseJavaCommitActionExecutor.java:345) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.table.action.commit.BaseJavaCommitActionExecutor.execute(BaseJavaCommitActionExecutor.java:126) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.table.action.commit.BaseJavaCommitActionExecutor.execute(BaseJavaCommitActionExecutor.java:68) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:57) ~[robot-stream.jar:?]
           ... 19 more
   Caused by: java.util.ConcurrentModificationException: Cannot resolve conflicts for overlapping writes
           at shadow.gs.org.apache.hudi.client.transaction.SimpleConcurrentFileWritesConflictResolutionStrategy.resolveConflict(SimpleConcurrentFileWritesConflictResolutionStrategy.java:102) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.client.utils.TransactionUtils.lambda$resolveWriteConflictIfAny$0(TransactionUtils.java:85) ~[robot-stream.jar:?]
           at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) ~[?:1.8.0_332]
           at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) ~[?:1.8.0_332]
           at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) ~[?:1.8.0_332]
           at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) ~[?:1.8.0_332]
           at shadow.gs.org.apache.hudi.client.utils.TransactionUtils.resolveWriteConflictIfAny(TransactionUtils.java:79) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.table.action.commit.BaseCommitActionExecutor.autoCommit(BaseCommitActionExecutor.java:189) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.table.action.commit.BaseCommitActionExecutor.commitOnAutoCommit(BaseCommitActionExecutor.java:175) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.table.action.commit.BaseJavaCommitActionExecutor.updateIndexAndCommitIfNeeded(BaseJavaCommitActionExecutor.java:345) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.table.action.commit.BaseJavaCommitActionExecutor.execute(BaseJavaCommitActionExecutor.java:126) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.table.action.commit.BaseJavaCommitActionExecutor.execute(BaseJavaCommitActionExecutor.java:68) ~[robot-stream.jar:?]
           at shadow.gs.org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:57) ~[robot-stream.jar:?]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan closed issue #6584: [SUPPORT]Hudi Java client don't support Multi Writing

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #6584: [SUPPORT]Hudi Java client don't support Multi Writing
URL: https://github.com/apache/hudi/issues/6584


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6584: [SUPPORT]Hudi Java client don't support Multi Writing

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6584:
URL: https://github.com/apache/hudi/issues/6584#issuecomment-1236387535

   yes, we don't have multi-writer support yet w/ java client. 
   https://issues.apache.org/jira/browse/HUDI-4774
   
   Let us know if you will have bandwidth to take a stab at it. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org