You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Oren Shpigel <or...@yowza3d.com> on 2015/07/23 17:14:00 UTC

Writing binary files in Spark

Hi, 
I use Spark to read binary files using SparkContext.binaryFiles(), and then
do some calculations, processing, and manipulations to get new objects (also
binary).
The next thing I want to do is write the results back to binary files on
disk.

Is there any equivalence like saveAsTextFile just for binary files?
Is there any other way to save the results to be used outside Spark?

Thanks, 
Oren



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-binary-files-in-Spark-tp23970.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Writing binary files in Spark

Posted by Oren Shpigel <or...@yowza3d.com>.
As I wrote before, the result of my pipeline is binary objects, which I
want to write directly as raw bytes, and not serializing them again.

Is it possible?

On Sat, Jul 25, 2015 at 11:28 AM Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Its been added from spark 1.1.0 i guess
> https://issues.apache.org/jira/browse/SPARK-1161
>
> Thanks
> Best Regards
>
> On Sat, Jul 25, 2015 at 12:06 AM, Oren Shpigel <or...@yowza3d.com> wrote:
>
>> Sorry, I didn't mention I'm using the Python API, which doesn't have the
>> saveAsObjectFiles method.
>> Is there any alternative from Python?
>> And also, I want to write the raw bytes of my object into files on disk,
>> and not using some Serialization format to be read back into Spark.
>>
>> Is it possible?
>> Any alternatives for that?
>>
>> Thanks,
>> Oren
>>
>> On Thu, Jul 23, 2015 at 8:04 PM Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>>
>>> You can look into .saveAsObjectFiles
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Thu, Jul 23, 2015 at 8:44 PM, Oren Shpigel <or...@yowza3d.com> wrote:
>>>
>>>> Hi,
>>>> I use Spark to read binary files using SparkContext.binaryFiles(), and
>>>> then
>>>> do some calculations, processing, and manipulations to get new objects
>>>> (also
>>>> binary).
>>>> The next thing I want to do is write the results back to binary files on
>>>> disk.
>>>>
>>>> Is there any equivalence like saveAsTextFile just for binary files?
>>>> Is there any other way to save the results to be used outside Spark?
>>>>
>>>> Thanks,
>>>> Oren
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Writing-binary-files-in-Spark-tp23970.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>

Re: Writing binary files in Spark

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Its been added from spark 1.1.0 i guess
https://issues.apache.org/jira/browse/SPARK-1161

Thanks
Best Regards

On Sat, Jul 25, 2015 at 12:06 AM, Oren Shpigel <or...@yowza3d.com> wrote:

> Sorry, I didn't mention I'm using the Python API, which doesn't have the
> saveAsObjectFiles method.
> Is there any alternative from Python?
> And also, I want to write the raw bytes of my object into files on disk,
> and not using some Serialization format to be read back into Spark.
>
> Is it possible?
> Any alternatives for that?
>
> Thanks,
> Oren
>
> On Thu, Jul 23, 2015 at 8:04 PM Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
>> You can look into .saveAsObjectFiles
>>
>> Thanks
>> Best Regards
>>
>> On Thu, Jul 23, 2015 at 8:44 PM, Oren Shpigel <or...@yowza3d.com> wrote:
>>
>>> Hi,
>>> I use Spark to read binary files using SparkContext.binaryFiles(), and
>>> then
>>> do some calculations, processing, and manipulations to get new objects
>>> (also
>>> binary).
>>> The next thing I want to do is write the results back to binary files on
>>> disk.
>>>
>>> Is there any equivalence like saveAsTextFile just for binary files?
>>> Is there any other way to save the results to be used outside Spark?
>>>
>>> Thanks,
>>> Oren
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Writing-binary-files-in-Spark-tp23970.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>

Re: Writing binary files in Spark

Posted by Oren Shpigel <or...@yowza3d.com>.
Sorry, I didn't mention I'm using the Python API, which doesn't have the
saveAsObjectFiles method.
Is there any alternative from Python?
And also, I want to write the raw bytes of my object into files on disk,
and not using some Serialization format to be read back into Spark.

Is it possible?
Any alternatives for that?

Thanks,
Oren

On Thu, Jul 23, 2015 at 8:04 PM Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> You can look into .saveAsObjectFiles
>
> Thanks
> Best Regards
>
> On Thu, Jul 23, 2015 at 8:44 PM, Oren Shpigel <or...@yowza3d.com> wrote:
>
>> Hi,
>> I use Spark to read binary files using SparkContext.binaryFiles(), and
>> then
>> do some calculations, processing, and manipulations to get new objects
>> (also
>> binary).
>> The next thing I want to do is write the results back to binary files on
>> disk.
>>
>> Is there any equivalence like saveAsTextFile just for binary files?
>> Is there any other way to save the results to be used outside Spark?
>>
>> Thanks,
>> Oren
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Writing-binary-files-in-Spark-tp23970.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>

Re: Writing binary files in Spark

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
You can look into .saveAsObjectFiles

Thanks
Best Regards

On Thu, Jul 23, 2015 at 8:44 PM, Oren Shpigel <or...@yowza3d.com> wrote:

> Hi,
> I use Spark to read binary files using SparkContext.binaryFiles(), and then
> do some calculations, processing, and manipulations to get new objects
> (also
> binary).
> The next thing I want to do is write the results back to binary files on
> disk.
>
> Is there any equivalence like saveAsTextFile just for binary files?
> Is there any other way to save the results to be used outside Spark?
>
> Thanks,
> Oren
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Writing-binary-files-in-Spark-tp23970.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: Writing binary files in Spark

Posted by Gylfi <gy...@berkeley.edu>.
Hi. 

I am not sure, but you might be able to pull it of by writing your own
custom serialization class and use the saveAsObjectFile() on the RDD. 

Here is an article on how to create a custom serializer to support Kryo
serialization to and from disk. 
http://blog.madhukaraphatak.com/kryo-disk-serialization-in-spark/
Perhaps you can use it as a base to write a "back-to-binary" override?
Sorry for not more detailed answer.

Regards, 
    Gylfi. 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-binary-files-in-Spark-tp23970p24015.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org