You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Gil Vernik <GI...@il.ibm.com> on 2015/03/16 13:09:44 UTC

problems with Parquet in Spark 1.3.0

Hi,

I am storing Parquet files in the OpenStack Swift and access those files 
from Spark.

This works perfectly in Spark prior 1.3.0, but in 1.3.0 I  am getting this 
error:
Is there some configuration i missed? I am not sure where this error get 
from, does Spark 1.3.0 requires Parquet files to be accessed via "file://" 
?
I will be glad to dig into this in case it's a bug, but would like to know 
if this is something intentionally in Spark 1.3.0
( I do can access swift:// names pace from SparkContext, only sqlContext 
has this issue )

Thanks,
Gil Vernik.

scala> val parquetFile = 
sqlContext.parquetFile("swift://ptest.localSwift12/SF311new3.parquet")

java.lang.IllegalArgumentException: Wrong FS: 
swift://ptest.localSwift12/SF311new3.parquet, expected: file:///
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
        at 
org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465)
        at 
org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:119)
        at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:252)
        at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:251)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at 
scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
        at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.AbstractTraversable.map(Traversable.scala:105)
        at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:251)
        at 
org.apache.spark.sql.parquet.ParquetRelation2.<init>(newParquet.scala:370)
        at 
org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:522)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26)
        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28)
        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
        at $iwC$$iwC$$iwC.<init>(<console>:32)
        at $iwC$$iwC.<init>(<console>:34)
        at $iwC.<init>(<console>:36)
        at <init>(<console>:38)
        at .<init>(<console>:42)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)

Re: problems with Parquet in Spark 1.3.0

Posted by Gil Vernik <GI...@il.ibm.com>.
I just noticed about this one

https://issues.apache.org/jira/browse/SPARK-6351
https://github.com/apache/spark/pull/5039


I verified it and this resolves my issues with Parquet and swift:// name 
space.





From:   Gil Vernik/Haifa/IBM@IBMIL
To:     dev <de...@spark.apache.org>
Date:   16/03/2015 02:11 PM
Subject:        problems with Parquet in Spark 1.3.0



Hi,

I am storing Parquet files in the OpenStack Swift and access those files 
from Spark.

This works perfectly in Spark prior 1.3.0, but in 1.3.0 I  am getting this 

error:
Is there some configuration i missed? I am not sure where this error get 
from, does Spark 1.3.0 requires Parquet files to be accessed via "file://" 

?
I will be glad to dig into this in case it's a bug, but would like to know 

if this is something intentionally in Spark 1.3.0
( I do can access swift:// names pace from SparkContext, only sqlContext 
has this issue )

Thanks,
Gil Vernik.

scala> val parquetFile = 
sqlContext.parquetFile("swift://ptest.localSwift12/SF311new3.parquet")

java.lang.IllegalArgumentException: Wrong FS: 
swift://ptest.localSwift12/SF311new3.parquet, expected: file:///
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
        at 
org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465)
        at 
org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:119)
        at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:252)
        at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:251)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at 
scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
        at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.AbstractTraversable.map(Traversable.scala:105)
        at 
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:251)
        at 
org.apache.spark.sql.parquet.ParquetRelation2.<init>(newParquet.scala:370)
        at 
org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:522)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26)
        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28)
        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
        at $iwC$$iwC$$iwC.<init>(<console>:32)
        at $iwC$$iwC.<init>(<console>:34)
        at $iwC.<init>(<console>:36)
        at <init>(<console>:38)
        at .<init>(<console>:42)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)