You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/15 16:48:41 UTC

[GitHub] [hudi] rajgowtham24 opened a new issue #1835: [SUPPORT] HoodieDeltaStreamer with Json as Source

rajgowtham24 opened a new issue #1835:
URL: https://github.com/apache/hudi/issues/1835


   Hi all,
   
   I'm new to Hudi and looking to leverage Delta Streamer for JSON sources that is available in my s3 bucket.
   
   Below is the code snippet that i'm using to execute the same
   
   Source File(Json Format)
   
   {"empno":"8006","ename":"stuart","job":"salesman","hiredate":"2020-01-01 00:00:00"}
   
   Code
   spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls /usr/lib/hudi/hudi-utilities-bundle_2.11-0.5.2-incubating.jar` 
   --table-type COPY_ON_WRITE 
   --source-class org.apache.hudi.utilities.sources.JsonDFSSource 
   --target-base-path s3://gowtham_km/hudi/target> --target-table emp
   --hoodie-conf hoodie.datasource.write.recordkey.field=empno,hoodie.deltastreamer.source.dfs.root=s3://gowtham_km/hudi/source> 
   --transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer 
   --payload-class org.apache.hudi.payload.AWSDmsAvroPayload 
   --props file:/usr/lib/hudi/hudi_utilities/delta-streamer-config/dfs-source.properties  
   --schemaprovider-class org.apache.hudi.utilities.schema.SchemaProvider
   
   Error
   Exception in thread "main" java.io.IOException: Could not load schema provider class org.apache.hudi.utilities.schema.SchemaProvider
           at org.apache.hudi.utilities.UtilHelpers.createSchemaProvider(UtilHelpers.java:101)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:364)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:95)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:89)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:294)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
           at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:80)
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
           at org.apache.hudi.utilities.UtilHelpers.createSchemaProvider(UtilHelpers.java:99)
           ... 16 more
   Caused by: java.lang.NoSuchMethodException: org.apache.hudi.utilities.schema.SchemaProvider.<init>(org.apache.hudi.common.util.TypedProperties, org.apache.spark.api.java.JavaSparkContext)
           at java.lang.Class.getConstructor0(Class.java:3110)
           at java.lang.Class.getConstructor(Class.java:1853)
           at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:78)
           ... 18 more
   20/07/15 15:51:36 INFO ShutdownHookManager: Shutdown hook called
   20/07/15 15:51:36 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-51f5cf1d-db65-4c2b-853e-8e64c0666648
   20/07/15 15:51:36 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-c2584053-3620-48dc-9380-43318af38392
   
   
   Expectation
   To start with learning would like to load the json file into target table and then later will add continuous option to load the new files into target table automatically. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar closed issue #1835: [SUPPORT] HoodieDeltaStreamer with Json as Source

Posted by GitBox <gi...@apache.org>.
bvaradar closed issue #1835:
URL: https://github.com/apache/hudi/issues/1835


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1835: [SUPPORT] HoodieDeltaStreamer with Json as Source

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1835:
URL: https://github.com/apache/hudi/issues/1835#issuecomment-659885576


   THis looks like a setup issue. Are you by any change loading multiple versions of hudi in your spark ? You can try local docker based  demo to see what you are missing : https://hudi.apache.org/docs/docker_demo.html


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] rajgowtham24 commented on issue #1835: [SUPPORT] HoodieDeltaStreamer with Json as Source

Posted by GitBox <gi...@apache.org>.
rajgowtham24 commented on issue #1835:
URL: https://github.com/apache/hudi/issues/1835#issuecomment-666299251


   Hi Balaji, Since the issue is marked as jar-mismatch, let me know what other version can i try 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1835: [SUPPORT] HoodieDeltaStreamer with Json as Source

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1835:
URL: https://github.com/apache/hudi/issues/1835#issuecomment-667035480


   The marking is based on my suspicions of the root-cause. I have not seen this issue arise out of any other case.
   
   The integration tests which covers this code path does work fine and we are not seeing this issue elsewhere. 
   
   I looked at your attached logs once again.  I see 2 references to the jar (1 with version number and one without) 
   
   /usr/lib/hudi/hudi-utilities-bundle_2.11-0.5.2-incubating.jar
   
   /usr/lib/hudi/hudi-utilities-bundle.jar
   
   Maybe try not passing this "ls /usr/lib/hudi/hudi-utilities-bundle_2.11-0.5.2-incubating.jar" in your spark-submit and see ?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] rajgowtham24 commented on issue #1835: [SUPPORT] HoodieDeltaStreamer with Json as Source

Posted by GitBox <gi...@apache.org>.
rajgowtham24 commented on issue #1835:
URL: https://github.com/apache/hudi/issues/1835#issuecomment-660100564


   Thanks Balaji for the reply, actually i have spinned up aws emr v5.30.1 along with Hudi and using the same for code execution. I'm loading only one version of hudi into spark. Verified the same in application log file as well. 
   
   
   20/07/17 13:06:04 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://ip-10-80-69-179.eu-west-1.compute.internal:4041
   20/07/17 13:06:04 INFO SparkContext: Added JAR file:/usr/lib/hudi/hudi-utilities-bundle.jar at spark://ip-10-80-69-179.eu-west-1.compute.internal:37251/jars/hudi-utilities-bundle.jar with timestamp 1594991164091
   20/07/17 13:06:04 INFO Utils: Using initial executors = 100, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
   
   Let me know if i need to take a look into anything apart from the above for the setup issue.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1835: [SUPPORT] HoodieDeltaStreamer with Json as Source

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1835:
URL: https://github.com/apache/hudi/issues/1835#issuecomment-691745661


   @rajgowtham24 : Closing this issue. Please reopen if you are still having issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org