You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Jeremy Freeman <fr...@gmail.com> on 2014/06/05 00:28:10 UTC

Re: error loading large files in PySpark 0.9.0

Hey Matei,

Wanted to let you know this issue appears to be fixed in 1.0.0. Great work!

-- Jeremy



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/error-loading-large-files-in-PySpark-0-9-0-tp3049p6985.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: error loading large files in PySpark 0.9.0

Posted by Nick Pentreath <ni...@gmail.com>.

Ah looking at that inputformat it should just work out the box using sc.newAPIHadoopFile ...


Would be interested to hear if it works as expected for you (in python you'll end up with bytearray values).




N
—
Sent from Mailbox

On Fri, Jun 6, 2014 at 9:38 PM, Jeremy Freeman <fr...@gmail.com>
wrote:

> Oh cool, thanks for the heads up! Especially for the Hadoop InputFormat
> support. We recently wrote a custom hadoop input format so we can support
> flat binary files
> (https://github.com/freeman-lab/thunder/tree/master/scala/src/main/scala/thunder/util/io/hadoop),
> and have been testing it in Scala. So I was following Nick's progress and
> was eager to check this out when ready. Will let you guys know how it goes.
> -- J
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/error-loading-large-files-in-PySpark-0-9-0-tp3049p7144.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: error loading large files in PySpark 0.9.0

Posted by Jeremy Freeman <fr...@gmail.com>.

Oh cool, thanks for the heads up! Especially for the Hadoop InputFormat
support. We recently wrote a custom hadoop input format so we can support
flat binary files
(https://github.com/freeman-lab/thunder/tree/master/scala/src/main/scala/thunder/util/io/hadoop),
and have been testing it in Scala. So I was following Nick's progress and
was eager to check this out when ready. Will let you guys know how it goes.

-- J



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/error-loading-large-files-in-PySpark-0-9-0-tp3049p7144.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: error loading large files in PySpark 0.9.0

Posted by Matei Zaharia <ma...@gmail.com>.

Ah, good to know!

By the way in master we now have saveAsPickleFile (https://github.com/apache/spark/pull/755), and Nick Pentreath has been working on Hadoop InputFormats: https://github.com/apache/spark/pull/455. Would be good to have your input on both of those if you have a chance to try them.

Matei

On Jun 4, 2014, at 3:28 PM, Jeremy Freeman <fr...@gmail.com> wrote:

> Hey Matei,
> 
> Wanted to let you know this issue appears to be fixed in 1.0.0. Great work!
> 
> -- Jeremy
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/error-loading-large-files-in-PySpark-0-9-0-tp3049p6985.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.