You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/08/22 03:11:15 UTC

[GitHub] [hudi] gtwuser opened a new issue, #6463: [SUPPORT]Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)

gtwuser opened a new issue, #6463:
URL: https://github.com/apache/hudi/issues/6463

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   - YES
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   While writing the incremental data with concurrency we are getting below mentioned `error`. Also i noticed in issues [HUDI-2641](https://issues.apache.org/jira/browse/HUDI-2641) its fixed in version 0.10.0 and we are using 0.10.1, `hudi-spark3.1.2-bundle_2.12-0.10.1.jar` with `spark-avro_2.12-3.1.2.jar`:
   ```bash
   Caused by: java.lang.IllegalArgumentException
   	at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:466)
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:528)
   	at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.saveWorkloadProfileMetadataToInflight(BaseCommitActionExecutor.java:115)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:162)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:82)
   	at org.apache.hudi.table.action.commit.AbstractWriteHelper.write(AbstractWriteHelper.java:56)
   	... 45 more
   ```
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. `append` or `overwrite` data to hudi table concurrently
   
   **Expected behavior**
   We expect it to write to tables with no exceptions or errors
   
   **Environment Description**
   
   * Hudi version : 0.10.1
   
   * Spark version : 3.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   We are running this hudi merge via glue jobs and using below jars:
   ```bash
   1. calcite-core-1.16.0.jar
   2. hudi-spark3.1.2-bundle_2.12-0.10.1.jar
   3. spark-avro_2.12/3.1.2/spark-avro_2.12-3.1.2.jar
   ```
   **Stacktrace**
   
   ```
   2022-08-21 03:47:44,696 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
     File "/tmp/upsert-delete.py", line 267, in <module>
       main()
     File "/tmp/upsert-delete.py", line 254, in main
       for result in executor.map(start_merging, df_prefix_map_list):
     File "/usr/lib64/python3.7/concurrent/futures/_base.py", line 598, in result_iterator
       yield fs.pop().result()
     File "/usr/lib64/python3.7/concurrent/futures/_base.py", line 428, in result
       return self.__get_result()
     File "/usr/lib64/python3.7/concurrent/futures/_base.py", line 384, in __get_result
       raise self._exception
     File "/usr/lib64/python3.7/concurrent/futures/thread.py", line 57, in run
       result = self.fn(*self.args, **self.kwargs)
     File "/tmp/upsert-delete.py", line 246, in start_merging
       set_delete_markers(moids_df, combined_conf)
     File "/tmp/upsert-delete.py", line 128, in set_delete_markers
       .mode('append') \
     File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 1107, in save
       self._jwrite.save()
     File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
       answer, self.gateway_client, self.target_id, self.name)
     File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
       return f(*a, **kw)
     File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
       format(target_id, ".", name), value)
   py4j.protocol.Py4JJavaError: An error occurred while calling o1573.save.
   : org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20220821034051823
   	at org.apache.hudi.table.action.commit.AbstractWriteHelper.write(AbstractWriteHelper.java:63)
   	at org.apache.hudi.table.action.commit.SparkUpsertCommitActionExecutor.execute(SparkUpsertCommitActionExecutor.java:46)
   	at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:119)
   	at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:103)
   	at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:160)
   	at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:217)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:277)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
   	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
   	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:750)
   Caused by: java.lang.IllegalArgumentException
   	at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:466)
   	at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:528)
   	at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.saveWorkloadProfileMetadataToInflight(BaseCommitActionExecutor.java:115)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:162)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:82)
   	at org.apache.hudi.table.action.commit.AbstractWriteHelper.write(AbstractWriteHelper.java:56)
   	... 45 more
   
   
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6463: [SUPPORT]Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6463:
URL: https://github.com/apache/hudi/issues/6463#issuecomment-1247370168

   let us know if you are still looking for any assistance. if not, we can go ahead and close out the issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] gtwuser commented on issue #6463: [SUPPORT]Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)

Posted by GitBox <gi...@apache.org>.
gtwuser commented on issue #6463:
URL: https://github.com/apache/hudi/issues/6463#issuecomment-1249488123

   I will be trying it soon, got busy in other tasks and update you back, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan closed issue #6463: [SUPPORT]Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #6463: [SUPPORT]Caused by: java.lang.IllegalArgumentException 	at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
URL: https://github.com/apache/hudi/issues/6463


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6463: [SUPPORT]Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6463:
URL: https://github.com/apache/hudi/issues/6463#issuecomment-1251130987

   will go ahead and close the issue for now. but you can feel free to open a new issue once you have some updates. let me know if you are ok with it. Or if you wish to have the issue open, I am good too. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6463: [SUPPORT]Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6463:
URL: https://github.com/apache/hudi/issues/6463#issuecomment-1229255631

   can you post the contents of .hoodie. 
   also, was there any chance of multi-writers in your env untentionallly (w/o configuring lock configurations)? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6463: [SUPPORT]Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6463:
URL: https://github.com/apache/hudi/issues/6463#issuecomment-1247368648

   yes, you can find configs that need to set for zookeeper based lock or for hive metastore based lock here. 
   https://hudi.apache.org/docs/concurrency_control
   
   we also have dynamoDB based lock if you are interested. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] gtwuser commented on issue #6463: [SUPPORT]Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)

Posted by GitBox <gi...@apache.org>.
gtwuser commented on issue #6463:
URL: https://github.com/apache/hudi/issues/6463#issuecomment-1221745662

   @umehrot2 @nsivabalan  any pointers on the above issue ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] gtwuser commented on issue #6463: [SUPPORT]Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)

Posted by GitBox <gi...@apache.org>.
gtwuser commented on issue #6463:
URL: https://github.com/apache/hudi/issues/6463#issuecomment-1249487584

   Thanks for sharing this info, please correct me here, but AFAIU in case when we are writing in a multi-threaded environment, we should(mandatorily) use locks you shared right ? And that would address the actual bug for which i opened this issue. ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6463: [SUPPORT]Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6463:
URL: https://github.com/apache/hudi/issues/6463#issuecomment-1247369694

   common configs requried for any lock provider:
   
   hoodie.write.concurrency.mode=optimistic_concurrency_control
   hoodie.cleaner.policy.failed.writes=LAZY
   hoodie.write.lock.provider=<lock-provider-classname>
   
   configs for zookeeper based lock
   ```
   hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
   hoodie.write.lock.zookeeper.url
   hoodie.write.lock.zookeeper.port
   hoodie.write.lock.zookeeper.lock_key
   hoodie.write.lock.zookeeper.base_path
   ```
   
   Configs for hive metastore based lock:
   ```
   hoodie.write.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider
   hoodie.write.lock.hivemetastore.database
   hoodie.write.lock.hivemetastore.table
   ```
   
   DynamoDb based lock:
   ```
   hoodie.write.lock.provider=org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
   hoodie.write.lock.dynamodb.table
   hoodie.write.lock.dynamodb.partition_key
   hoodie.write.lock.dynamodb.region
   hoodie.write.lock.dynamodb.endpoint_url
   hoodie.write.lock.dynamodb.billing_mode
   ```
   ```
   hoodie.aws.access.key
   hoodie.aws.secret.key
   hoodie.aws.session.token
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6463: [SUPPORT]Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6463:
URL: https://github.com/apache/hudi/issues/6463#issuecomment-1237642561

   @gtwuser : gentle ping. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] gtwuser commented on issue #6463: [SUPPORT]Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)

Posted by GitBox <gi...@apache.org>.
gtwuser commented on issue #6463:
URL: https://github.com/apache/hudi/issues/6463#issuecomment-1237680757

   > 
   
   @nsivabalan  sorry for this delay in response, actually sorry i m not sure what is meant by multi-writers, but we `removed` concurrent writes to table, which was done using `ThreadPoolExecutor` concept in python https://docs.python.org/3/library/concurrent.futures.html. And yes that was without any lock configuration. 
   
   I was checking there are two suggested ways to achieve this via zookeeper or via Hive metastore. 
   Will be really kind of you to explain though, if you can explain if we dont have zookeeper available before hand, as we are working in AWS Glue environment, what can we do to achiev lock, OR can you share one example of achieving this via Hive ? 
   
   Also since i deleted all the records and recreated it sequentially i have lost the .hoodie contents. I will try to simulate the same issue again post the content asap. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6463: [SUPPORT]Caused by: java.lang.IllegalArgumentException at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6463:
URL: https://github.com/apache/hudi/issues/6463#issuecomment-1251129870

   yes, if there are multiple writers writing concurrently to same hudi table, locks are mandatory. if not, you could find data loss or data duplication and other failures. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org