You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Gary Malouf <ma...@gmail.com> on 2014/07/23 16:42:12 UTC

Workarounds for accessing sequence file data via PySpark?

I am aware that today PySpark can not load sequence files directly.  Are
there work-arounds people are using (short of duplicating all the data to
text files) for accessing this data?

Re: Workarounds for accessing sequence file data via PySpark?

Posted by Nick Pentreath <ni...@gmail.com>.

Load from sequenceFile for PySpark is in master and save is in this PR
underway (https://github.com/apache/spark/pull/1338)

I hope that Kan will have it ready to merge in time for 1.1 release window
(it should be, the PR just needs a final review or two).

In the meantime you can check out master and test out the sequenceFile load
support in PySpark (there are examples in the /examples project and in
python test, and some documentation in /docs)


On Wed, Jul 23, 2014 at 4:42 PM, Gary Malouf <ma...@gmail.com> wrote:

> I am aware that today PySpark can not load sequence files directly.  Are
> there work-arounds people are using (short of duplicating all the data to
> text files) for accessing this data?
>