You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Gil Vernik <GI...@il.ibm.com> on 2015/03/16 13:09:44 UTC
problems with Parquet in Spark 1.3.0
Hi,
I am storing Parquet files in the OpenStack Swift and access those files
from Spark.
This works perfectly in Spark prior 1.3.0, but in 1.3.0 I am getting this
error:
Is there some configuration i missed? I am not sure where this error get
from, does Spark 1.3.0 requires Parquet files to be accessed via "file://"
?
I will be glad to dig into this in case it's a bug, but would like to know
if this is something intentionally in Spark 1.3.0
( I do can access swift:// names pace from SparkContext, only sqlContext
has this issue )
Thanks,
Gil Vernik.
scala> val parquetFile =
sqlContext.parquetFile("swift://ptest.localSwift12/SF311new3.parquet")
java.lang.IllegalArgumentException: Wrong FS:
swift://ptest.localSwift12/SF311new3.parquet, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at
org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465)
at
org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:119)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:252)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:251)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at
scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:251)
at
org.apache.spark.sql.parquet.ParquetRelation2.<init>(newParquet.scala:370)
at
org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:522)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
at $iwC$$iwC$$iwC.<init>(<console>:32)
at $iwC$$iwC.<init>(<console>:34)
at $iwC.<init>(<console>:36)
at <init>(<console>:38)
at .<init>(<console>:42)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
Re: problems with Parquet in Spark 1.3.0
Posted by Gil Vernik <GI...@il.ibm.com>.
I just noticed about this one
https://issues.apache.org/jira/browse/SPARK-6351
https://github.com/apache/spark/pull/5039
I verified it and this resolves my issues with Parquet and swift:// name
space.
From: Gil Vernik/Haifa/IBM@IBMIL
To: dev <de...@spark.apache.org>
Date: 16/03/2015 02:11 PM
Subject: problems with Parquet in Spark 1.3.0
Hi,
I am storing Parquet files in the OpenStack Swift and access those files
from Spark.
This works perfectly in Spark prior 1.3.0, but in 1.3.0 I am getting this
error:
Is there some configuration i missed? I am not sure where this error get
from, does Spark 1.3.0 requires Parquet files to be accessed via "file://"
?
I will be glad to dig into this in case it's a bug, but would like to know
if this is something intentionally in Spark 1.3.0
( I do can access swift:// names pace from SparkContext, only sqlContext
has this issue )
Thanks,
Gil Vernik.
scala> val parquetFile =
sqlContext.parquetFile("swift://ptest.localSwift12/SF311new3.parquet")
java.lang.IllegalArgumentException: Wrong FS:
swift://ptest.localSwift12/SF311new3.parquet, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at
org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465)
at
org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:119)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:252)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:251)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at
scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:251)
at
org.apache.spark.sql.parquet.ParquetRelation2.<init>(newParquet.scala:370)
at
org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:522)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
at $iwC$$iwC$$iwC.<init>(<console>:32)
at $iwC$$iwC.<init>(<console>:34)
at $iwC.<init>(<console>:36)
at <init>(<console>:38)
at .<init>(<console>:42)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)