You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Marius Soutier <mp...@gmail.com> on 2014/09/24 16:46:38 UTC
parquetFile and wilcards
Hello,
sc.textFile and so on support wildcards in their path, but apparently sqlc.parquetFile() does not. I always receive “File /file/to/path/*/input.parquet does not exist". Is this normal or a bug? Is there are a workaround?
Thanks
- Marius
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: parquetFile and wilcards
Posted by Marius Soutier <mp...@gmail.com>.
Thank you, that works!
On 24.09.2014, at 19:01, Michael Armbrust <mi...@databricks.com> wrote:
> This behavior is inherited from the parquet input format that we use. You could list the files manually and pass them as a comma separated list.
>
> On Wed, Sep 24, 2014 at 7:46 AM, Marius Soutier <mp...@gmail.com> wrote:
> Hello,
>
> sc.textFile and so on support wildcards in their path, but apparently sqlc.parquetFile() does not. I always receive “File /file/to/path/*/input.parquet does not exist". Is this normal or a bug? Is there are a workaround?
>
> Thanks
> - Marius
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
Re: parquetFile and wilcards
Posted by Nicholas Chammas <ni...@gmail.com>.
SPARK-3928: Support wildcard matches on Parquet files
<https://issues.apache.org/jira/browse/SPARK-3928>
On Wed, Sep 24, 2014 at 2:14 PM, Michael Armbrust <mi...@databricks.com>
wrote:
> We could certainly do this. The comma separated support is something I
> added.
>
> On Wed, Sep 24, 2014 at 10:20 AM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> Does it make sense for us to open a JIRA to track enhancing the Parquet
>> input format to support wildcards? Or is this something outside of Spark's
>> control?
>>
>> Nick
>>
>> On Wed, Sep 24, 2014 at 1:01 PM, Michael Armbrust <michael@databricks.com
>> > wrote:
>>
>>> This behavior is inherited from the parquet input format that we use.
>>> You could list the files manually and pass them as a comma separated list.
>>>
>>> On Wed, Sep 24, 2014 at 7:46 AM, Marius Soutier <mp...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> sc.textFile and so on support wildcards in their path, but apparently
>>>> sqlc.parquetFile() does not. I always receive “File
>>>> /file/to/path/*/input.parquet does not exist". Is this normal or a bug? Is
>>>> there are a workaround?
>>>>
>>>> Thanks
>>>> - Marius
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>
>
Re: parquetFile and wilcards
Posted by Michael Armbrust <mi...@databricks.com>.
We could certainly do this. The comma separated support is something I
added.
On Wed, Sep 24, 2014 at 10:20 AM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:
> Does it make sense for us to open a JIRA to track enhancing the Parquet
> input format to support wildcards? Or is this something outside of Spark's
> control?
>
> Nick
>
> On Wed, Sep 24, 2014 at 1:01 PM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
>> This behavior is inherited from the parquet input format that we use.
>> You could list the files manually and pass them as a comma separated list.
>>
>> On Wed, Sep 24, 2014 at 7:46 AM, Marius Soutier <mp...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> sc.textFile and so on support wildcards in their path, but apparently
>>> sqlc.parquetFile() does not. I always receive “File
>>> /file/to/path/*/input.parquet does not exist". Is this normal or a bug? Is
>>> there are a workaround?
>>>
>>> Thanks
>>> - Marius
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>
Re: parquetFile and wilcards
Posted by Nicholas Chammas <ni...@gmail.com>.
Does it make sense for us to open a JIRA to track enhancing the Parquet
input format to support wildcards? Or is this something outside of Spark's
control?
Nick
On Wed, Sep 24, 2014 at 1:01 PM, Michael Armbrust <mi...@databricks.com>
wrote:
> This behavior is inherited from the parquet input format that we use. You
> could list the files manually and pass them as a comma separated list.
>
> On Wed, Sep 24, 2014 at 7:46 AM, Marius Soutier <mp...@gmail.com> wrote:
>
>> Hello,
>>
>> sc.textFile and so on support wildcards in their path, but apparently
>> sqlc.parquetFile() does not. I always receive “File
>> /file/to/path/*/input.parquet does not exist". Is this normal or a bug? Is
>> there are a workaround?
>>
>> Thanks
>> - Marius
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>
Re: parquetFile and wilcards
Posted by Michael Armbrust <mi...@databricks.com>.
This behavior is inherited from the parquet input format that we use. You
could list the files manually and pass them as a comma separated list.
On Wed, Sep 24, 2014 at 7:46 AM, Marius Soutier <mp...@gmail.com> wrote:
> Hello,
>
> sc.textFile and so on support wildcards in their path, but apparently
> sqlc.parquetFile() does not. I always receive “File
> /file/to/path/*/input.parquet does not exist". Is this normal or a bug? Is
> there are a workaround?
>
> Thanks
> - Marius
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>