You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/04/05 10:44:16 UTC

[GitHub] [incubator-hudi] malanb5 opened a new issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs

malanb5 opened a new issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
URL: https://github.com/apache/incubator-hudi/issues/1487
 
 
   Receiving the following Exception when querying data brought in from a SparkSession from a Hive table, which was setup via the steps outlined in the demo: 
   
   https://hudi.apache.org/docs/docker_demo.html  
   
    **Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs**
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.  Inserting data in the steps outlined in the demo.
   2.  Starting a SparkSession using the following:
   ```
       private static final String HUDI_SPARK_BUNDLE_JAR_FP = "/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-spark-bundle.jar";
       private static final String HADOOP_CONF_DIR = "/etc/hadoop";
       private static final String STOCK_TICKS_COW="stock_ticks_cow";
   
           SparkSession spark = SparkSession
                   .builder()
                   .appName("FeatureExtractor")
                   .config("spark.master", "local")
                   .config("spark.jars", FeatureExtractor.HUDI_SPARK_BUNDLE_JAR_FP)
                   .config("spark.driver.extraClassPath", FeatureExtractor.HADOOP_CONF_DIR)
                   .config("spark.sql.hive.convertMetastoreParquet", false)
                   .config("spark.submit.deployMode", "client")
                   .config("spark.jars.packages", "org.apache.spark:spark-avro_2.11:2.4.4")
                   .config("spark.sql.warehouse.dir", "/user/hive/warehouse")              // on HDFS
                   .config("hive.metastore.uris", "thrift://hivemetastore:9083")    // hive metastore uri
                   .enableHiveSupport()
                   .getOrCreate();
   ```
   3.  Make a query and try to display the results: 
   ```
   Dataset<Row> all_stock_cow = spark.sql(String.format("select * from %s", STOCK_TICKS_COW));
   all_stock_cow.show();
   ```
   **Expected behavior**
   The contents of the table to be rendered.  The filesystem to recognize the scheme hdfs.
   
   **Environment Description**
   
   * Hudi version : 0.5.2-incubating
   
   * Spark version : 2.4.5
   
   * Hive version : 2.4.1
   
   * Hadoop version : 2.7.3
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : yes
   
   **Additional context**
   Maven pom file:
   ```
   <plugins>
           <plugin>
               <artifactId>maven-assembly-plugin</artifactId>
               <configuration>
                   <archive>
                       <manifest>
                           <addClasspath>true</addClasspath>
                           <mainClass>com.mlpipelines.FeatureExtractor</mainClass>
                       </manifest>
                   </archive>
                   <descriptorRefs>
                       <descriptorRef>jar-with-dependencies</descriptorRef>
                   </descriptorRefs>
               </configuration>
   
           </plugin>
       <plugin>
           <groupId>org.apache.maven.plugins</groupId>
           <artifactId>maven-jar-plugin</artifactId>
       </plugin>
       </plugins>
       </build>
   
       <dependencies>
           <dependency>
               <groupId>org.apache.hadoop</groupId>
               <artifactId>hadoop-hdfs</artifactId>
               <version>2.7.3</version>
               <scope>provided</scope>
           </dependency>
   
           <dependency>
               <groupId>org.apache.spark</groupId>
               <artifactId>spark-sql_2.11</artifactId>
               <version>2.4.5</version>
           </dependency>
           <dependency>
               <groupId>org.apache.spark</groupId>
               <artifactId>spark-mllib_2.11</artifactId>
               <version>2.4.5</version>
           </dependency>
   
           <dependency>
               <groupId>org.apache.spark</groupId>
               <artifactId>spark-hive_2.11</artifactId>
               <version>2.4.1</version>
           </dependency>
   
           <dependency>
               <groupId>org.apache.hudi</groupId>
               <artifactId>hudi-hive</artifactId>
               <version>0.5.2-incubating</version>
           </dependency>
   
           <dependency>
               <groupId>org.apache.hudi</groupId>
               <artifactId>hudi-spark_2.12</artifactId>
               <version>0.5.2-incubating</version>
           </dependency>
   
           <dependency>
               <groupId>org.apache.hudi</groupId>
               <artifactId>hudi</artifactId>
               <version>0.5.2-incubating</version>
               <type>pom</type>
           </dependency>
       </dependencies>
   ```
   
   **Stacktrace**
   
   ```
   Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
           at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
           at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
           at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
           at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
           at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
           at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
           at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
           at org.apache.hudi.hadoop.InputPathHandler.parseInputPaths(InputPathHandler.java:98)
           at org.apache.hudi.hadoop.InputPathHandler.<init>(InputPathHandler.java:58)
           at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:73)
           at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
           at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
           at scala.Option.getOrElse(Option.scala:121)
           at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
           at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
           at scala.Option.getOrElse(Option.scala:121)
           at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
           at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
           at scala.Option.getOrElse(Option.scala:121)
           at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
           at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:84)
           at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:84)
           at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
           at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
           at scala.collection.immutable.List.foreach(List.scala:381)
           at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
           at scala.collection.immutable.List.map(List.scala:285)
           at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:84)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
           at scala.Option.getOrElse(Option.scala:121)
           at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
           at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
           at scala.Option.getOrElse(Option.scala:121)
           at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
           at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
           at scala.Option.getOrElse(Option.scala:121)
           at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
           at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
           at scala.Option.getOrElse(Option.scala:121)
           at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
           at org.apache.spark.sql.execution.SQLExecutionRDD.getPartitions(SQLExecutionRDD.scala:44)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
           at scala.Option.getOrElse(Option.scala:121)
           at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
           at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
           at scala.Option.getOrElse(Option.scala:121)
           at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
           at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
           at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
           at scala.Option.getOrElse(Option.scala:121)
           at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
           at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply(RDD.scala:1184)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
           at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
           at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1182)
           at org.apache.spark.mllib.feature.IDF.fit(IDF.scala:54)
           at org.apache.spark.ml.feature.IDF.fit(IDF.scala:92)
           at com.github.malanb5.mlpipelines.FeatureExtractor.main(Fe
   ```
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] lamber-ken commented on issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs

Posted by GitBox <gi...@apache.org>.

lamber-ken commented on issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
URL: https://github.com/apache/incubator-hudi/issues/1487#issuecomment-609472838
 
 
   Hi, it works fine in my local env. steps:
   
   1.Add `spark-hive` dependency
   ```
   <dependency>
     <groupId>org.apache.spark</groupId>
     <artifactId>spark-hive_${scala.binary.version}</artifactId>
     <version>2.4.4</version>
     <scope>compile</scope>
   </dependency>
   
   SparkSession spark = SparkSession
       .builder()
       .appName("FeatureExtractor")
       .config("spark.master", "local")
       .config("spark.sql.hive.convertMetastoreParquet", false)
       .config("spark.submit.deployMode", "client")
       .config("spark.jars.packages", "org.apache.spark:spark-avro_2.11:2.4.4")
       .config("spark.sql.warehouse.dir", "/user/hive/warehouse")
       .config("hive.metastore.uris", "thrift://hivemetastore:9083")
       .enableHiveSupport()
       .getOrCreate();
   
   spark.sql("select * from stock_ticks_cow").show(100);
   ```
   
   2.Build fatjar
   ```
   mvn package
   ```
   
   3.Copy to `adhoc-1` docker
   ```
   docker cp test.jar  adhoc-1:/opt/spark
   ```
   
   4.Run test.jar
   ```
   bin/spark-submit \
   --class com.TestExample \
   --executor-memory 1G \
   --total-executor-cores 2 \
   test.jar
   ```
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] malanb5 commented on issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs

Posted by GitBox <gi...@apache.org>.

malanb5 commented on issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
URL: https://github.com/apache/incubator-hudi/issues/1487#issuecomment-609480782
 
 
   @lamber-ken Thank you for the help.  I updated the version of Hive.  Now I'm getting the following:
   
   ```
   Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
           at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:869)
           at com.github.malanb5.mlpipelines.FeatureExtractor.main(FeatureExtractor.java:59
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] malanb5 edited a comment on issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs

Posted by GitBox <gi...@apache.org>.

malanb5 edited a comment on issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
URL: https://github.com/apache/incubator-hudi/issues/1487#issuecomment-609496962
 
 
   Posted this on Stack Overflow, hopefully this will help others:
   
   https://stackoverflow.com/a/61050495/8366477

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] malanb5 commented on issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs

Posted by GitBox <gi...@apache.org>.

malanb5 commented on issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
URL: https://github.com/apache/incubator-hudi/issues/1487#issuecomment-609496962
 
 
   https://stackoverflow.com/a/59823742/8366477

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] malanb5 closed issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs

Posted by GitBox <gi...@apache.org>.

malanb5 closed issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
URL: https://github.com/apache/incubator-hudi/issues/1487
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] lamber-ken commented on issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs

Posted by GitBox <gi...@apache.org>.

lamber-ken commented on issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
URL: https://github.com/apache/incubator-hudi/issues/1487#issuecomment-609550970
 
 
   You're always welcome : )

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] malanb5 commented on issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs

Posted by GitBox <gi...@apache.org>.

malanb5 commented on issue #1487: [SUPPORT] Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
URL: https://github.com/apache/incubator-hudi/issues/1487#issuecomment-609493874
 
 
   I was running this through the JVM not the script spark-submit which loaded in the spark classes.  Thanks for the help again @lamber-ken 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services