You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/17 16:04:13 UTC

[GitHub] [hudi] data-storyteller opened a new issue #4621: [SUPPORT] Integ tests are failing for HUDI

data-storyteller opened a new issue #4621:
URL: https://github.com/apache/hudi/issues/4621


   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : latest (master)
   
   * Spark version : 2.4.7
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   * Yes
   
   
   **Additional context**
   Running the integ test on docker setup. The tests are failing with following stacktrace.
   Command - 
   `docker exec -i adhoc-2 /bin/bash spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.0 --conf spark.task.cpus=1 --conf spark.executor.cores=1 --conf spark.task.maxFailures=100 --conf spark.memory.fraction=0.4  --conf spark.rdd.compress=true  --conf spark.kryoserializer.buffer.max=2000m --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.memory.storageFraction=0.1 --conf spark.shuffle.service.enabled=true  --conf spark.sql.hive.convertMetastoreParquet=false  --conf spark.driver.maxResultSize=12g --conf spark.executor.heartbeatInterval=120s --conf spark.network.timeout=600s --conf spark.yarn.max.executor.failures=10 --conf spark.sql.catalogImplementation=hive --conf spark.driver.extraClassPath=/var/demo/jars/* --conf spark.executor.extraClassPath=/var/demo/jars/* --class org.apache.hudi.integ.testsuite.HoodieTestSuiteJob  /opt/$HUDI_JAR_NAME --source-ordering-field test_suite_source_ordering_field --target-base-path /user/hive/warehouse/hudi-int
 eg-test-suite/output --input-base-path /user/hive/warehouse/hudi-integ-test-suite/input --target-table table1 --props file:/var/hoodie/ws/docker/demo/config/test-suite/$PROP_FILE --schemaprovider-class org.apache.hudi.integ.testsuite.schema.TestSuiteFileBasedSchemaProvider --source-class org.apache.hudi.utilities.sources.AvroDFSSource --input-file-size 125829120 --workload-yaml-path file:/var/hoodie/ws/docker/demo/config/test-suite/$YAML_NAME --workload-generator-classname org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator --table-type $TABLE_TYPE --compact-scheduling-minshare 1 $EXTRA_SPARK_ARGS --clean-input --clean-output`
   
   
   **Stacktrace**
   
   ```
   
   22/01/17 06:36:13 INFO DagNode: Configs : {"name":"a89cea37-7224-4f36-8c00-90306ddf6172","record_size":1000,"repeat_count":1,"num_partitions_insert":1,"num_records_insert":300,"config":"third_insert"}
   --
   17732 | 22/01/17 06:36:13 INFO DagNode: Inserting input data a89cea37-7224-4f36-8c00-90306ddf6172
   17733 | 22/01/17 06:36:13 INFO HoodieTestSuiteJob: Using DFSTestSuitePathSelector, checkpoint: Option{val=2} sourceLimit: 9223372036854775807 lastBatchId: 2 nextBatchId: 3
   17734 | 00:09  WARN: Timeline-server-based markers are configured as the marker type but embedded timeline server is not enabled.  Falling back to direct markers.
   17735 | 00:10  WARN: Timeline-server-based markers are configured as the marker type but embedded timeline server is not enabled.  Falling back to direct markers.
   17736 | 00:12  WARN: Timeline-server-based markers are configured as the marker type but embedded timeline server is not enabled.  Falling back to direct markers.
   17737 | 22/01/17 06:36:16 INFO DagScheduler: Finished executing a89cea37-7224-4f36-8c00-90306ddf6172
   17738 | 22/01/17 06:36:16 WARN DagScheduler: Executing node "first_hive_sync" :: {"queue_name":"adhoc","engine":"mr","name":"994a5035-0362-4c9a-a7d7-e47397f2b113","config":"first_hive_sync"}
   17739 | 22/01/17 06:36:16 INFO DagNode: Executing hive sync node
   17740 | 22/01/17 06:36:19 INFO DagScheduler: Finished executing 994a5035-0362-4c9a-a7d7-e47397f2b113
   17741 | 22/01/17 06:36:19 WARN DagScheduler: Executing node "first_validate" :: {"name":"3f562e32-b7d8-4d96-a977-44b6b876c333","validate_hive":false,"config":"first_validate"}
   17742 | 22/01/17 06:36:19 WARN DagNode: Validation using data from input path /user/hive/warehouse/hudi-integ-test-suite/input/*/*
   17743 | 22/01/17 06:36:21 INFO ValidateDatasetNode: Validate data in target hudi path /user/hive/warehouse/hudi-integ-test-suite/output/*/*/*
   17744 | 22/01/17 06:36:21 ERROR DagScheduler: Exception executing node
   17745 | java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html
   17746 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
   17747 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
   17748 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
   17749 | at org.apache.hudi.integ.testsuite.dag.nodes.ValidateDatasetNode.getDatasetToValidate(ValidateDatasetNode.java:52)
   17750 | at org.apache.hudi.integ.testsuite.dag.nodes.BaseValidateDatasetNode.execute(BaseValidateDatasetNode.java:99)
   17751 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:139)
   17752 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:105)
   17753 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   17754 | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   17755 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   17756 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   17757 | at java.lang.Thread.run(Thread.java:748)
   17758 | Caused by: java.lang.ClassNotFoundException: hudi.DefaultSource
   17759 | at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
   17760 | at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   17761 | at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   17762 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
   17763 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
   17764 | at scala.util.Try$.apply(Try.scala:192)
   17765 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
   17766 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
   17767 | at scala.util.Try.orElse(Try.scala:84)
   17768 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)
   17769 | ... 11 more
   17770 | 22/01/17 06:36:21 INFO DagScheduler: Forcing shutdown of executor service, this might kill running tasks
   17771 | 22/01/17 06:36:21 ERROR HoodieTestSuiteJob: Failed to run Test Suite
   17772 | java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html
   17773 | at java.util.concurrent.FutureTask.report(FutureTask.java:122)
   17774 | at java.util.concurrent.FutureTask.get(FutureTask.java:206)
   17775 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.execute(DagScheduler.java:113)
   17776 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.schedule(DagScheduler.java:68)
   17777 | at org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:203)
   17778 | at org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:170)
   17779 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   17780 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   17781 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   17782 | at java.lang.reflect.Method.invoke(Method.java:498)
   17783 | at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
   17784 | at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:845)
   17785 | at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
   17786 | at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
   17787 | at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
   17788 | at org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:920)
   17789 | at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
   17790 | at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   17791 | Caused by: org.apache.hudi.exception.HoodieException: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html
   17792 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:146)
   17793 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:105)
   17794 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   17795 | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   17796 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   17797 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   17798 | at java.lang.Thread.run(Thread.java:748)
   17799 | Caused by: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html
   17800 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
   17801 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
   17802 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
   17803 | at org.apache.hudi.integ.testsuite.dag.nodes.ValidateDatasetNode.getDatasetToValidate(ValidateDatasetNode.java:52)
   17804 | at org.apache.hudi.integ.testsuite.dag.nodes.BaseValidateDatasetNode.execute(BaseValidateDatasetNode.java:99)
   17805 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:139)
   17806 | ... 6 more
   17807 | Caused by: java.lang.ClassNotFoundException: hudi.DefaultSource
   17808 | at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
   17809 | at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   17810 | at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   17811 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
   17812 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
   17813 | at scala.util.Try$.apply(Try.scala:192)
   17814 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
   17815 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
   17816 | at scala.util.Try.orElse(Try.scala:84)
   17817 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)
   17818 | ... 11 more
   17819 | Exception in thread "main" org.apache.hudi.exception.HoodieException: Failed to run Test Suite
   17820 | at org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:208)
   17821 | at org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:170)
   17822 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   17823 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   17824 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   17825 | at java.lang.reflect.Method.invoke(Method.java:498)
   17826 | at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
   17827 | at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:845)
   17828 | at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
   17829 | at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
   17830 | at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
   17831 | at org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:920)
   17832 | at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
   17833 | at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   17834 | Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html
   17835 | at java.util.concurrent.FutureTask.report(FutureTask.java:122)
   17836 | at java.util.concurrent.FutureTask.get(FutureTask.java:206)
   17837 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.execute(DagScheduler.java:113)
   17838 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.schedule(DagScheduler.java:68)
   17839 | at org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:203)
   17840 | ... 13 more
   17841 | Caused by: org.apache.hudi.exception.HoodieException: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html
   17842 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:146)
   17843 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:105)
   17844 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   17845 | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   17846 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   17847 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   17848 | at java.lang.Thread.run(Thread.java:748)
   17849 | Caused by: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html
   17850 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
   17851 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
   17852 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
   17853 | at org.apache.hudi.integ.testsuite.dag.nodes.ValidateDatasetNode.getDatasetToValidate(ValidateDatasetNode.java:52)
   17854 | at org.apache.hudi.integ.testsuite.dag.nodes.BaseValidateDatasetNode.execute(BaseValidateDatasetNode.java:99)
   17855 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:139)
   17856 | ... 6 more
   17857 | Caused by: java.lang.ClassNotFoundException: hudi.DefaultSource
   17858 | at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
   17859 | at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   17860 | at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   17861 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
   17862 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
   17863 | at scala.util.Try$.apply(Try.scala:192)
   17864 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
   17865 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
   17866 | at scala.util.Try.orElse(Try.scala:84)
   17867 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)
   17868 | ... 11 more
   17869 |  
   17870 | [Container] 2022/01/17 06:36:22 Command did not exit successfully sh run-intig-test.sh 2022-01-17 MERGE_ON_READ cow-long-running-example.yaml exit status 1
   
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan closed issue #4621: [SUPPORT] Integ tests are failing for HUDI

Posted by GitBox <gi...@apache.org>.
xushiyan closed issue #4621:
URL: https://github.com/apache/hudi/issues/4621


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan edited a comment on issue #4621: [SUPPORT] Integ tests are failing for HUDI

Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on issue #4621:
URL: https://github.com/apache/hudi/issues/4621#issuecomment-1019330391


   I could able to reproduce w/ latest master. will investigate further. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4621: [SUPPORT] Integ tests are failing for HUDI

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4621:
URL: https://github.com/apache/hudi/issues/4621#issuecomment-1017852564


   @data-storyteller : I tested integ test bundle in latest master and is all good. I have attached logs in the linked jira. 
   is your env spark2 or spark3? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] data-storyteller commented on issue #4621: [SUPPORT] Integ tests are failing for HUDI

Posted by GitBox <gi...@apache.org>.
data-storyteller commented on issue #4621:
URL: https://github.com/apache/hudi/issues/4621#issuecomment-1025794753


   Thanks @nsivabalan for fix.  This issue is resolved now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #4621: [SUPPORT] Integ tests are failing for HUDI

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #4621:
URL: https://github.com/apache/hudi/issues/4621#issuecomment-1015074689


   moving to https://issues.apache.org/jira/browse/HUDI-3262 for work tracking


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #4621: [SUPPORT] Integ tests are failing for HUDI

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #4621:
URL: https://github.com/apache/hudi/issues/4621#issuecomment-1017869502


   @data-storyteller @nsivabalan i see this param `--packages org.apache.spark:spark-avro_2.11:2.4.0` while the spark version mentioned is 2.4.7. @data-storyteller can you align these versions? should be both 2.4.7


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4621: [SUPPORT] Integ tests are failing for HUDI

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4621:
URL: https://github.com/apache/hudi/issues/4621#issuecomment-1019330391


   I tried a integ test suite job and docker demo w/ latest master and things are working fine. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org