You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Blaž Šnuderl <sn...@gmail.com> on 2015/10/06 09:02:41 UTC

Pyspark dataframe read

Hello everyone.

It seems pyspark dataframe read is broken for reading multiple files.

sql.read.json( "file1,file2") fails with java.io.IOException: No input
paths specified in job.

This used to work in spark 1.4 and also still work with sc.textFile

Blaž

Re: Pyspark dataframe read

Posted by Koert Kuipers <ko...@tresata.com>.
i personally find the comma separated paths feature much more important
than commas in paths (which one could argue you should avoid).

but assuming people want to keep commas as legitimate characters in paths:
https://issues.apache.org/jira/browse/SPARK-10185
https://github.com/apache/spark/pull/8416



On Tue, Oct 6, 2015 at 4:31 AM, Reynold Xin <rx...@databricks.com> wrote:

> I think the problem is that comma is actually a legitimate character for
> file name, and as a result ...
>
>
> On Tuesday, October 6, 2015, Josh Rosen <ro...@gmail.com> wrote:
>
>> Could someone please file a JIRA to track this?
>> https://issues.apache.org/jira/browse/SPARK
>>
>> On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> i ran into the same thing in scala api. we depend heavily on comma
>>> separated paths, and it no longer works.
>>>
>>>
>>> On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl <sn...@gmail.com> wrote:
>>>
>>>> Hello everyone.
>>>>
>>>> It seems pyspark dataframe read is broken for reading multiple files.
>>>>
>>>> sql.read.json( "file1,file2") fails with java.io.IOException: No input
>>>> paths specified in job.
>>>>
>>>> This used to work in spark 1.4 and also still work with sc.textFile
>>>>
>>>> Blaž
>>>>
>>>
>>>
>>

Re: Pyspark dataframe read

Posted by Reynold Xin <rx...@databricks.com>.
I think the problem is that comma is actually a legitimate character for
file name, and as a result ...

On Tuesday, October 6, 2015, Josh Rosen <ro...@gmail.com> wrote:

> Could someone please file a JIRA to track this?
> https://issues.apache.org/jira/browse/SPARK
>
> On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers <koert@tresata.com
> <javascript:_e(%7B%7D,'cvml','koert@tresata.com');>> wrote:
>
>> i ran into the same thing in scala api. we depend heavily on comma
>> separated paths, and it no longer works.
>>
>>
>> On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl <snuderl@gmail.com
>> <javascript:_e(%7B%7D,'cvml','snuderl@gmail.com');>> wrote:
>>
>>> Hello everyone.
>>>
>>> It seems pyspark dataframe read is broken for reading multiple files.
>>>
>>> sql.read.json( "file1,file2") fails with java.io.IOException: No input
>>> paths specified in job.
>>>
>>> This used to work in spark 1.4 and also still work with sc.textFile
>>>
>>> Blaž
>>>
>>
>>
>

Re: Pyspark dataframe read

Posted by Josh Rosen <ro...@gmail.com>.
Could someone please file a JIRA to track this?
https://issues.apache.org/jira/browse/SPARK

On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers <ko...@tresata.com> wrote:

> i ran into the same thing in scala api. we depend heavily on comma
> separated paths, and it no longer works.
>
>
> On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl <sn...@gmail.com> wrote:
>
>> Hello everyone.
>>
>> It seems pyspark dataframe read is broken for reading multiple files.
>>
>> sql.read.json( "file1,file2") fails with java.io.IOException: No input
>> paths specified in job.
>>
>> This used to work in spark 1.4 and also still work with sc.textFile
>>
>> Blaž
>>
>
>

Re: Pyspark dataframe read

Posted by Koert Kuipers <ko...@tresata.com>.
i ran into the same thing in scala api. we depend heavily on comma
separated paths, and it no longer works.


On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl <sn...@gmail.com> wrote:

> Hello everyone.
>
> It seems pyspark dataframe read is broken for reading multiple files.
>
> sql.read.json( "file1,file2") fails with java.io.IOException: No input
> paths specified in job.
>
> This used to work in spark 1.4 and also still work with sc.textFile
>
> Blaž
>