You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yana Kadiyska <ya...@gmail.com> on 2014/12/03 18:25:13 UTC

[SQL] Wildcards in SQLContext.parquetFile?

Hi folks,

I'm wondering if someone has successfully used wildcards with a parquetFile
call?

I saw this thread and it makes me think no?
http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3CCACA1tWLjcF-NtXj=PqPQM3xK4aJ0JiTXJHMdQbOjj_OJyBohpQ@mail.gmail.com%3E

I have a set of parquet files that are partitioned by key. I'd like to
issue a query to read in a subset of the files, based on a directory
wildcard (the wildcard will be a little more specific than * but this is to
show the issue):

This call works fine:

sc.textFile("hdfs:///warehouse/hive/*/*/*.parquet").first
res4: String = PAR1????? L??????? ?\??????? ,????????????
,????????????????a??aL????????0?x????????U???e??

​

but this doesn't

scala> val parquetFile =
sqlContext.parquetFile(“hdfs:///warehouse/hive/*/*/*.parquet”).first
java.io.FileNotFoundException: File
hdfs://cdh4-14822-nn/warehouse/hive/*/*/*.parquet does not exist

​

Re: [SQL] Wildcards in SQLContext.parquetFile?

Posted by Michael Armbrust <mi...@databricks.com>.
It won't work until this is merged:
https://github.com/apache/spark/pull/3407

On Wed, Dec 3, 2014 at 9:25 AM, Yana Kadiyska <ya...@gmail.com>
wrote:

> Hi folks,
>
> I'm wondering if someone has successfully used wildcards with a
> parquetFile call?
>
> I saw this thread and it makes me think no?
> http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3CCACA1tWLjcF-NtXj=PqPQM3xK4aJ0JiTXJHMdQbOjj_OJyBohpQ@mail.gmail.com%3E
>
> I have a set of parquet files that are partitioned by key. I'd like to
> issue a query to read in a subset of the files, based on a directory
> wildcard (the wildcard will be a little more specific than * but this is to
> show the issue):
>
> This call works fine:
>
> sc.textFile("hdfs:///warehouse/hive/*/*/*.parquet").first
> res4: String = PAR1????? L??????? ?\??????? ,???????????? ,????????????????a??aL????????0?x????????U???e??
>
> ​
>
> but this doesn't
>
> scala> val parquetFile = sqlContext.parquetFile(“hdfs:///warehouse/hive/*/*/*.parquet”).first
> java.io.FileNotFoundException: File hdfs://cdh4-14822-nn/warehouse/hive/*/*/*.parquet does not exist
>
> ​
>
>
>
>