You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Marius Soutier <mp...@gmail.com> on 2014/09/24 16:46:38 UTC

parquetFile and wilcards

Hello,

sc.textFile and so on support wildcards in their path, but apparently sqlc.parquetFile() does not. I always receive “File /file/to/path/*/input.parquet does not exist". Is this normal or a bug? Is there are a workaround?

Thanks
- Marius



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: parquetFile and wilcards

Posted by Marius Soutier <mp...@gmail.com>.
Thank you, that works!

On 24.09.2014, at 19:01, Michael Armbrust <mi...@databricks.com> wrote:

> This behavior is inherited from the parquet input format that we use.  You could list the files manually and pass them as a comma separated list.
> 
> On Wed, Sep 24, 2014 at 7:46 AM, Marius Soutier <mp...@gmail.com> wrote:
> Hello,
> 
> sc.textFile and so on support wildcards in their path, but apparently sqlc.parquetFile() does not. I always receive “File /file/to/path/*/input.parquet does not exist". Is this normal or a bug? Is there are a workaround?
> 
> Thanks
> - Marius
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 
> 


Re: parquetFile and wilcards

Posted by Nicholas Chammas <ni...@gmail.com>.
SPARK-3928: Support wildcard matches on Parquet files
<https://issues.apache.org/jira/browse/SPARK-3928>

On Wed, Sep 24, 2014 at 2:14 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> We could certainly do this.  The comma separated support is something I
> added.
>
> On Wed, Sep 24, 2014 at 10:20 AM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> Does it make sense for us to open a JIRA to track enhancing the Parquet
>> input format to support wildcards? Or is this something outside of Spark's
>> control?
>>
>> Nick
>>
>> On Wed, Sep 24, 2014 at 1:01 PM, Michael Armbrust <michael@databricks.com
>> > wrote:
>>
>>> This behavior is inherited from the parquet input format that we use.
>>> You could list the files manually and pass them as a comma separated list.
>>>
>>> On Wed, Sep 24, 2014 at 7:46 AM, Marius Soutier <mp...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> sc.textFile and so on support wildcards in their path, but apparently
>>>> sqlc.parquetFile() does not. I always receive “File
>>>> /file/to/path/*/input.parquet does not exist". Is this normal or a bug? Is
>>>> there are a workaround?
>>>>
>>>> Thanks
>>>> - Marius
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>
>

Re: parquetFile and wilcards

Posted by Michael Armbrust <mi...@databricks.com>.
We could certainly do this.  The comma separated support is something I
added.

On Wed, Sep 24, 2014 at 10:20 AM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> Does it make sense for us to open a JIRA to track enhancing the Parquet
> input format to support wildcards? Or is this something outside of Spark's
> control?
>
> Nick
>
> On Wed, Sep 24, 2014 at 1:01 PM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
>> This behavior is inherited from the parquet input format that we use.
>> You could list the files manually and pass them as a comma separated list.
>>
>> On Wed, Sep 24, 2014 at 7:46 AM, Marius Soutier <mp...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> sc.textFile and so on support wildcards in their path, but apparently
>>> sqlc.parquetFile() does not. I always receive “File
>>> /file/to/path/*/input.parquet does not exist". Is this normal or a bug? Is
>>> there are a workaround?
>>>
>>> Thanks
>>> - Marius
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>

Re: parquetFile and wilcards

Posted by Nicholas Chammas <ni...@gmail.com>.
Does it make sense for us to open a JIRA to track enhancing the Parquet
input format to support wildcards? Or is this something outside of Spark's
control?

Nick

On Wed, Sep 24, 2014 at 1:01 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> This behavior is inherited from the parquet input format that we use.  You
> could list the files manually and pass them as a comma separated list.
>
> On Wed, Sep 24, 2014 at 7:46 AM, Marius Soutier <mp...@gmail.com> wrote:
>
>> Hello,
>>
>> sc.textFile and so on support wildcards in their path, but apparently
>> sqlc.parquetFile() does not. I always receive “File
>> /file/to/path/*/input.parquet does not exist". Is this normal or a bug? Is
>> there are a workaround?
>>
>> Thanks
>> - Marius
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: parquetFile and wilcards

Posted by Michael Armbrust <mi...@databricks.com>.
This behavior is inherited from the parquet input format that we use.  You
could list the files manually and pass them as a comma separated list.

On Wed, Sep 24, 2014 at 7:46 AM, Marius Soutier <mp...@gmail.com> wrote:

> Hello,
>
> sc.textFile and so on support wildcards in their path, but apparently
> sqlc.parquetFile() does not. I always receive “File
> /file/to/path/*/input.parquet does not exist". Is this normal or a bug? Is
> there are a workaround?
>
> Thanks
> - Marius
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>