You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Daniel Siegmann <da...@velos.io> on 2014/12/24 00:33:05 UTC

Escape commas in file names

I am trying to load a Parquet file which has a comma in its name. Yes, this
is a valid file name in HDFS. However, sqlContext.parquetFile interprets
this as a comma-separated list of parquet files.

Is there any way to escape the comma so it is treated as part of a single
file name?

-- 
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

54 W 40th St, New York, NY 10018
E: daniel.siegmann@velos.io W: www.velos.io

Re: Escape commas in file names

Posted by Daniel Siegmann <da...@velos.io>.
Thanks for the replies. Hopefully this will not be too difficult to fix.

Why not support multiple paths by overloading the parquetFile method to
take a collection of strings? That way we don't need an appropriate
delimiter.

On Thu, Dec 25, 2014 at 3:46 AM, Cheng, Hao <ha...@intel.com> wrote:

>  I’ve created a jira issue for this
> https://issues.apache.org/jira/browse/SPARK-4967
>
>
>
> Originally we want to support multiple parquet file paths scanning as I
> guess, and those file paths are in a single string separated by comma
> internally, however I didn’t find any public example says we support
> multiple parquet files for API sqlContext.parquetFile, we need to think how
> to support multiple paths in some other way.
>
>
>
> Cheng Hao
>

RE: Escape commas in file names

Posted by "Cheng, Hao" <ha...@intel.com>.
I’ve created a jira issue for this https://issues.apache.org/jira/browse/SPARK-4967

Originally we want to support multiple parquet file paths scanning as I guess, and those file paths are in a single string separated by comma internally, however I didn’t find any public example says we support multiple parquet files for API sqlContext.parquetFile, we need to think how to support multiple paths in some other way.

Cheng Hao


From: Michael Armbrust [mailto:michael@databricks.com]
Sent: Thursday, December 25, 2014 1:01 PM
To: Daniel Siegmann
Cc: user@spark.apache.org
Subject: Re: Escape commas in file names

No, there is not.  Can you open a JIRA?

On Tue, Dec 23, 2014 at 6:33 PM, Daniel Siegmann <da...@velos.io>> wrote:
I am trying to load a Parquet file which has a comma in its name. Yes, this is a valid file name in HDFS. However, sqlContext.parquetFile interprets this as a comma-separated list of parquet files.

Is there any way to escape the comma so it is treated as part of a single file name?

--
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

54 W 40th St, New York, NY 10018
E: daniel.siegmann@velos.io<ma...@velos.io> W: www.velos.io<http://www.velos.io>


Re: Escape commas in file names

Posted by Michael Armbrust <mi...@databricks.com>.
No, there is not.  Can you open a JIRA?

On Tue, Dec 23, 2014 at 6:33 PM, Daniel Siegmann <da...@velos.io>
wrote:

> I am trying to load a Parquet file which has a comma in its name. Yes,
> this is a valid file name in HDFS. However, sqlContext.parquetFile
> interprets this as a comma-separated list of parquet files.
>
> Is there any way to escape the comma so it is treated as part of a single
> file name?
>
> --
> Daniel Siegmann, Software Developer
> Velos
> Accelerating Machine Learning
>
> 54 W 40th St, New York, NY 10018
> E: daniel.siegmann@velos.io W: www.velos.io
>