You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Diana Carroll <dc...@cloudera.com> on 2014/01/09 17:15:04 UTC

hadoop files in Python

Hello!  I'm exploring using custom input formats, which it seems I can do
in Scala using sc.hadoopNewAPIFile or sc.hadoopNewAPIRDD.

My question is: is it possible to do this in Python?  The Python API
doesn't have (AFAICT) the sc.hadoop* functions.

Thanks,
Diana

Re: hadoop files in Python

Posted by Josh Rosen <ro...@gmail.com>.
There's an open pull request to add support for additional Hadoop file
formats to PySpark: https://github.com/apache/incubator-spark/pull/263


On Thu, Jan 9, 2014 at 8:15 AM, Diana Carroll <dc...@cloudera.com> wrote:

> Hello!  I'm exploring using custom input formats, which it seems I can do
> in Scala using sc.hadoopNewAPIFile or sc.hadoopNewAPIRDD.
>
> My question is: is it possible to do this in Python?  The Python API
> doesn't have (AFAICT) the sc.hadoop* functions.
>
> Thanks,
> Diana
>