You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/24 09:14:47 UTC

[GitHub] [hudi] Ayan07 opened a new issue #3530: Unable to query Hudi table using Presto Cli

Ayan07 opened a new issue #3530:
URL: https://github.com/apache/hudi/issues/3530


   I am using presto cli to query Hudi tables.
   The actual data is store in s3.
   
   I am able run queries such as: show schemas and show create table <table-name>
   
   However getting the following error when trying to run 'select * from schema.table_name':
   
   _org.apache.hudi.hadoop.HoodieROTablePathFilter	Error checking path :s3a://<s3-bucket-name>/.hoodie, under folder: null
   java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found_
   
   I have my presto cluster and hive metastore service running on Kubernetes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
codope commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-905117668


   Probably, an older version of Presto is being used and hudi-presto-bundle jar is required in the classpath. Please check https://hudi.apache.org/docs/querying_data#prestodb


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
codope commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-905177310


   `Class org.apache.hadoop.fs.s3a.S3AFileSystem not found` usually happens when there is a version mismatch of aws java sdk or hadoop jar. Can you share Spark versions and describe more about your setup? Are you using EMR spark or Apache Spark?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
codope commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-907087837


   @Ayan07 I am not sure if this is due to Spark cluster running on k8s. But, I have come across this error when using Apache Spark on EMR but not using the right version of aws-java-sdk or hadoop jar. See [this](https://dev.to/bytearray/using-your-own-apache-spark-hudi-versions-with-aws-emr-40a0).
   By the way, are you able to write into Hudi table in S3?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Ayan07 commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
Ayan07 commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-909065470


   @codope Yes, I am able to write into Hudi table in S3.
   I just checked the version of aws-java-sdk and hadoop jar which I have used while writing to S3.
   aws-java-sdk - 1.7.4
   hadoop jars - 2.7.3
   
   I was able to resolve the issue. Followed the steps below:
   
   - Downgraded the version of presto to : 0.226 (<0.240) so that the hudi-presto-bundle jar won't be present by default in the classpath.
   
   - Manually added the hudi-presto-bundle jar to the plugin/hive-haddop2/ folder
   
   Finally was able to query the HUDI tables.
   
   However, I still want to understand why isn't it working for presto versions greater than 0.240 where adding the hudi-presto-bundle jar to the classpath is not required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-926330082


   @codope did you get a chance to reproduce this. I'd be surprised about why this could be failing.  iiuc You have been testing on later Versions of presto using the compile time dependehey?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Ayan07 commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
Ayan07 commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-968088257


   @codope Sure, will give it a try with the setup configs that you have suggested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-997563274


   Closing the issue as its a jar version issue. Feel free to re-open is you run into any other issues. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Ayan07 commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
Ayan07 commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-912563318


   @codope Please let me know if you need anything else from my end.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
codope commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-912465012


   @Ayan07 That's a bit odd. With Presto >0.240, Hudi is a compile time dependency, it should have worked. If you're still getting the same `S3AFileSystem not found` issue, I still think that it might be due to incompatible versions of hadoop-aws. Nevertheless, can you share your environment details as below? I'll try to reproduce.
   
   **Environment Description**
   
   * Hudi version :
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here, like which type of table are you querying COW/MOR?
   
   **Stacktrace**
   
   ```Add the full stacktrace of the error.```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #3530:
URL: https://github.com/apache/hudi/issues/3530


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Ayan07 commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
Ayan07 commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-909065470


   @codope Yes, I am able to write into Hudi table in S3.
   I just checked the version of aws-java-sdk and hadoop jar which I have used while writing to S3.
   aws-java-sdk - 1.7.4
   hadoop jars - 2.7.3
   
   I was able to resolve the issue. Followed the steps below:
   
   - Downgraded the version of presto to : 0.226 (<0.240) so that the hudi-presto-bundle jar won't be present by default in the classpath.
   
   - Manually added the hudi-presto-bundle jar to the plugin/hive-haddop2/ folder
   
   Finally was able to query the HUDI tables.
   
   However, I still want to understand why isn't it working for presto versions greater than 0.240 where adding the hudi-presto-bundle jar to the classpath is not required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
codope commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-926519415


   @Ayan07 I could not reproduce this issue. I am using Presto 0.245 along with Spark 2.4.7, Hadoop 2.10.1, Hive 2.3.7 and my target base path is on S3. Note that I had to ensure that I am using the right version of hadoop-aws and aws-java-sdk. For my setup, the following versions worked:
   ```
   hadoop-aws-2.10.1.jar
   aws-java-sdk-1.11.970.jar
   ```
   Can you match this setup and see if you still face the issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-997074560


   @Ayan07 : do you have any updates on this. Did Sagar's suggestion work out. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Ayan07 commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
Ayan07 commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-917841194


   @codope any updates on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Ayan07 commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
Ayan07 commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-905183186


   @codope I have used prestodb version which is >=0.240 in my setup.
    I went through this doc https://hudi.apache.org/docs/querying_data#prestodb, and as far as I have understood prestodb versions>=0.240 doesn't require the hudi-presto-bundle jar in the classpath. 
    Correct me if I am wrong here.
    
    I am using Apache Spark version 3.2 and the spark cluster is running on Kubernetes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-926330990


   @Ayan07 I Can explain why its working for lower version@though. The issue seems to happen only when instantiating from the path filter. This code along with the Compile dependency Were introduced together. 
   
   CC @bhasudha in case she recalls anything.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Ayan07 commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
Ayan07 commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-912374923


   @codope Any inputs on the above query?
   I am unable to query the HUDI tables if I am trying to use any of the latest versions of presto (>0.240).
   Adding the hudi-presto-bundle jar also din't work for the recent versions >0.240.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Ayan07 commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
Ayan07 commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-912562994


   @codope 
   
   Please find below the environment details:
   
   - Hudi version : 0.8
   - Spark version : 2.4.4
   - Hive version : 3.1.2
   - Hadoop version : 2.7.3
   - Storage : S3
   
   The entire thing is set up and running in Kubernetes.
   
   **Additional context** 
   
   I'm trying to query COW Hudi tables.
   
   i'm able to query these tables if I downgrade Presto to 0.226 ( which is less than 0.240) and then add the hudi-presto-bundle jar to the classpath as I have mentioned above.
   
   **Stacktrace**:
   
   > com.facebook.presto.spi.PrestoException: Error checking path :s3a://dp-ingestion-stage/hudi/stagenew/kubernetes-mysql.integrationtesting.dummy/.hoodie, under folder: null
   	at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:128)
   	at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)
   	at com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)
   	at com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)
   	at com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: Error checking path :s3a://dp-ingestion-stage/hudi/stagenew/kubernetes-mysql.integrationtesting.dummy/.hoodie, under folder: null
   	at org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:221)
   	at com.facebook.presto.hive.util.HiveFileIterator.lambda$getLocatedFileStatusRemoteIterator$0(HiveFileIterator.java:103)
   	at com.google.common.collect.Iterators$5.computeNext(Iterators.java:639)
   	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
   	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
   	at com.facebook.presto.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:69)
   	at com.facebook.presto.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:40)
   	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
   	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
   	at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811)
   	at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:295)
   	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:207)
   	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:162)
   	at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:301)
   	at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
   	at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:195)
   	at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:40)
   	at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:121)
   	... 7 more
   Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
   	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
   	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
   	at org.apache.hadoop.fs.PrestoFileSystemCache.createFileSystem(PrestoFileSystemCache.java:144)
   	at org.apache.hadoop.fs.PrestoFileSystemCache.getInternal(PrestoFileSystemCache.java:103)
   	at org.apache.hadoop.fs.PrestoFileSystemCache.get(PrestoFileSystemCache.java:62)
   	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
   	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
   	at org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:128)
   	... 24 more
   Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
   	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
   	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
   	... 31 more
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Ayan07 commented on issue #3530: Unable to query Hudi table using Presto Cli

Posted by GitBox <gi...@apache.org>.
Ayan07 commented on issue #3530:
URL: https://github.com/apache/hudi/issues/3530#issuecomment-909065470


   @codope Yes, I am able to write into Hudi table in S3.
   I just checked the version of aws-java-sdk and hadoop jar which I have used while writing to S3.
   aws-java-sdk - 1.7.4
   hadoop jars - 2.7.3
   
   I was able to resolve the issue. Followed the steps below:
   
   - Downgraded the version of presto to : 0.226 (<0.240) so that the hudi-presto-bundle jar won't be present by default in the classpath.
   
   - Manually added the hudi-presto-bundle jar to the plugin/hive-haddop2/ folder
   
   Finally was able to query the HUDI tables.
   
   However, I still want to understand why isn't it working for presto versions greater than 0.240 where adding the hudi-presto-bundle jar to the classpath is not required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org