You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Lochlainn Wilson <lo...@gmail.com> on 2014/01/10 04:24:36 UTC

Large binary payloads with storm

Hi all,

I am new to Storm and have been tasked with determining whether it is
feasible for us to use Apache storm in my company. I have of course
configured the sample projects and have been poking around. A red flag is
raised with the "stream processing" style JSON parsing.

I am considering using storm with real time image processing bolts in C++.
Packaging binary data into a JSON (by escaping it) looks like it will be
slow and expensive. Is there a better way? Does anyone have experience
processing large streams of binary data through storm?

How did it go?

Regards,

Lochlainn

Re: Large binary payloads with storm

Posted by 李家宏 <jh...@gmail.com>.
I come up with this problem as well. I am considering using storm with real
time ip packet processing.

Regards,

Gvain


2014/1/10 Lochlainn Wilson <lo...@gmail.com>

> Hi all,
>
> I am new to Storm and have been tasked with determining whether it is
> feasible for us to use Apache storm in my company. I have of course
> configured the sample projects and have been poking around. A red flag is
> raised with the "stream processing" style JSON parsing.
>
> I am considering using storm with real time image processing bolts in C++.
> Packaging binary data into a JSON (by escaping it) looks like it will be
> slow and expensive. Is there a better way? Does anyone have experience
> processing large streams of binary data through storm?
>
> How did it go?
>
> Regards,
>
> Lochlainn
>



-- 

======================================================

Gvain

Email: jh.li.em@gmail.com

Re: Large binary payloads with storm

Posted by Ted Dunning <te...@gmail.com>.
Consider also whether you even *want* to pass large objects through your
tuples.  If this will cause many copies of the object with no modification
or reference, you might be much better off leaving your object in a static
cache and simply passing around an ID.  There are many heuristics for
managing the life of such objects, but the simplest LRU is probably
entirely sufficient since almost all objects will have a life of only a few
seconds.

Depending on your problem, this could make orders of magnitude difference
in performance (for the better) in your system.


On Sun, Jan 12, 2014 at 8:17 AM, Ruhollah Farchtchi <
ruhollah.farchtchi@gmail.com> wrote:

> Yep. That's what I figured. Thanks.
>
>
> On Sunday, January 12, 2014, Nathan Leung wrote:
>
>> Muliti lang interface uses json which is a text format. Given an earlier
>> email (
>> http://mail-archives.apache.org/mod_mbox/storm-user/201401.mbox/%3CCAEN10JreBSFO-=xhNjbn9r+5+F+G=AZ8rW58qDo8x32Gd-xUkg@mail.gmail.com%3E)
>> the object appears to be serialized to json using toString which for byte
>> array yields [B@<reference> where the [B is type information specifying
>> byte array. Therefore you will have to encode to something like base64 that
>> can represent your binary data on a text file.
>> On Jan 12, 2014 10:49 AM, "Ruhollah Farchtchi" <
>> ruhollah.farchtchi@gmail.com> wrote:
>>
>> I am using 0.9. What I think is the issue is that storm.py is having
>> problems when deserializing a byte array. When I encode as base64 binary
>> string I have no problems and it deserializes fine. Of course I would like
>> to avoid this extra overhead if possible. All my binary objects are
>> relatively small 200-300k max.
>>
>> On Sunday, January 12, 2014, 李家宏 wrote:
>>
>> hi , Farchtchi,
>>
>> which storm version are you using ?
>> IF the tuple is not serialized, then there is no need to use a JSON
>> parser to parse the received tuple. I guess so.
>>
>> Regards
>>
>>
>> 2014/1/11 Ruhollah Farchtchi <ru...@gmail.com>
>>
>> Yes I read that in the docs. However when receiving the byte array in
>> storm.py it throws a json error when trying to parse the tuples. I didn't
>> have time to look into it further as I am new to storm and python.
>>
>>
>> On Saturday, January 11, 2014, 李家宏 wrote:
>>
>> There is no need to serialize binary data, just send it as it.
>> As by defalut storm-0.9.0 use kryo serializer to serialize tuple values,
>> I guess we can skip this serialization step.
>>
>> Regards
>>
>>
>>
>> 2014/1/10 Jon Logan <jm...@buffalo.edu>
>>
>> You're going to run into issues if you have large tuples, because they
>> are buffered in memory. I would suggest moving it to an exterior channel,
>> like Redis, etc, and only passing meta-data through Storm.
>>
>> Your other solution is to use quirky things like reflection to prevent
>> your application from running out of memory when tuples are buffered.
>>
>>
>> On Fri, Jan 10, 2014 at 8:49 AM, Ruhollah Farchtchi <
>> ruhollah.farchtchi@gmail.com> wrote:
>>
>> I am using storm to process small (< 100k) image files. I don't have a
>> real-time requirement as yet, but my bottle neck is more in the image
>> processing than message passing between bolts. I am using the Clojure DSL
>> and the python bolt. Everything I've put together right now is very much a
>> prototype so my next steps are some further processing and integration.
>> Passing byte arrays didn't seem to work so well so I have had to
>> encode/decode into base64 binary as it seems the JSON parsers on the python
>> side didn't like byte arrays. I plan to go back and perhaps re-do the
>> integration with a native C++ bolt, however I believe that there are other
>> ways to do this integration as well. I'm As with Wilson, I'm interested if
>> anyone else is using Storm to process binary payloads and what they have
>> found works.
>>
>> Thanks,
>>
>> Ruhollah
>>
>> Ruhollah Farchtchi
>> ruhollah.farchtchi@gmail.com
>>
>>
>> On Thu, Jan 9, 2014 at 10:24 PM, Lochlainn Wilson <
>> lochlainn.wilson@gmail.com> wrote:
>>
>> Hi all,
>>
>> I am new to Storm and have been tasked with determining whether it is
>> feasible for us to use Apache storm in my company. I have of course
>> configured the sample projects and have been poking around. A red flag is
>> raised with the "stream processing" style JSON parsing.
>>
>> I am considering using storm with real time image processing bolts in
>> C++. Packaging binary data into a JSON (by escaping it) looks like it will
>> be slow and expensive. Is there a better way? Does anyone have experience
>> processing large streams of binary data through storm?
>>
>> How did it go?
>>
>> Regards,
>>
>> Lochlainn
>>
>>
>>
>>
>>
>>
>> --
>>
>> ======================================================
>>
>> Gvain
>>
>>
>
> --
> Ruhollah Farchtchi
> ruhollah.farchtchi@gmail.com
>

Re: Large binary payloads with storm

Posted by Ruhollah Farchtchi <ru...@gmail.com>.
Yep. That's what I figured. Thanks.

On Sunday, January 12, 2014, Nathan Leung wrote:

> Muliti lang interface uses json which is a text format. Given an earlier
> email (
> http://mail-archives.apache.org/mod_mbox/storm-user/201401.mbox/%3CCAEN10JreBSFO-=xhNjbn9r+5+F+G=AZ8rW58qDo8x32Gd-xUkg@mail.gmail.com%3E)
> the object appears to be serialized to json using toString which for byte
> array yields [B@<reference> where the [B is type information specifying
> byte array. Therefore you will have to encode to something like base64 that
> can represent your binary data on a text file.
> On Jan 12, 2014 10:49 AM, "Ruhollah Farchtchi" <
> ruhollah.farchtchi@gmail.com> wrote:
>
> I am using 0.9. What I think is the issue is that storm.py is having
> problems when deserializing a byte array. When I encode as base64 binary
> string I have no problems and it deserializes fine. Of course I would like
> to avoid this extra overhead if possible. All my binary objects are
> relatively small 200-300k max.
>
> On Sunday, January 12, 2014, 李家宏 wrote:
>
> hi , Farchtchi,
>
> which storm version are you using ?
> IF the tuple is not serialized, then there is no need to use a JSON parser
> to parse the received tuple. I guess so.
>
> Regards
>
>
> 2014/1/11 Ruhollah Farchtchi <ru...@gmail.com>
>
> Yes I read that in the docs. However when receiving the byte array in
> storm.py it throws a json error when trying to parse the tuples. I didn't
> have time to look into it further as I am new to storm and python.
>
>
> On Saturday, January 11, 2014, 李家宏 wrote:
>
> There is no need to serialize binary data, just send it as it.
> As by defalut storm-0.9.0 use kryo serializer to serialize tuple values, I
> guess we can skip this serialization step.
>
> Regards
>
>
>
> 2014/1/10 Jon Logan <jm...@buffalo.edu>
>
> You're going to run into issues if you have large tuples, because they are
> buffered in memory. I would suggest moving it to an exterior channel, like
> Redis, etc, and only passing meta-data through Storm.
>
> Your other solution is to use quirky things like reflection to prevent
> your application from running out of memory when tuples are buffered.
>
>
> On Fri, Jan 10, 2014 at 8:49 AM, Ruhollah Farchtchi <
> ruhollah.farchtchi@gmail.com> wrote:
>
> I am using storm to process small (< 100k) image files. I don't have a
> real-time requirement as yet, but my bottle neck is more in the image
> processing than message passing between bolts. I am using the Clojure DSL
> and the python bolt. Everything I've put together right now is very much a
> prototype so my next steps are some further processing and integration.
> Passing byte arrays didn't seem to work so well so I have had to
> encode/decode into base64 binary as it seems the JSON parsers on the python
> side didn't like byte arrays. I plan to go back and perhaps re-do the
> integration with a native C++ bolt, however I believe that there are other
> ways to do this integration as well. I'm As with Wilson, I'm interested if
> anyone else is using Storm to process binary payloads and what they have
> found works.
>
> Thanks,
>
> Ruhollah
>
> Ruhollah Farchtchi
> ruhollah.farchtchi@gmail.com
>
>
> On Thu, Jan 9, 2014 at 10:24 PM, Lochlainn Wilson <
> lochlainn.wilson@gmail.com> wrote:
>
> Hi all,
>
> I am new to Storm and have been tasked with determining whether it is
> feasible for us to use Apache storm in my company. I have of course
> configured the sample projects and have been poking around. A red flag is
> raised with the "stream processing" style JSON parsing.
>
> I am considering using storm with real time image processing bolts in C++.
> Packaging binary data into a JSON (by escaping it) looks like it will be
> slow and expensive. Is there a better way? Does anyone have experience
> processing large streams of binary data through storm?
>
> How did it go?
>
> Regards,
>
> Lochlainn
>
>
>
>
>
>
> --
>
> ======================================================
>
> Gvain
>
>

-- 
Ruhollah Farchtchi
ruhollah.farchtchi@gmail.com

Re: Large binary payloads with storm

Posted by Nathan Leung <nc...@gmail.com>.
Muliti lang interface uses json which is a text format. Given an earlier
email (
http://mail-archives.apache.org/mod_mbox/storm-user/201401.mbox/%3CCAEN10JreBSFO-=xhNjbn9r+5+F+G=AZ8rW58qDo8x32Gd-xUkg@mail.gmail.com%3E)
the object appears to be serialized to json using toString which for byte
array yields [B@<reference> where the [B is type information specifying
byte array. Therefore you will have to encode to something like base64 that
can represent your binary data on a text file.
On Jan 12, 2014 10:49 AM, "Ruhollah Farchtchi" <ru...@gmail.com>
wrote:

> I am using 0.9. What I think is the issue is that storm.py is having
> problems when deserializing a byte array. When I encode as base64 binary
> string I have no problems and it deserializes fine. Of course I would like
> to avoid this extra overhead if possible. All my binary objects are
> relatively small 200-300k max.
>
> On Sunday, January 12, 2014, 李家宏 wrote:
>
>> hi , Farchtchi,
>>
>> which storm version are you using ?
>> IF the tuple is not serialized, then there is no need to use a JSON
>> parser to parse the received tuple. I guess so.
>>
>> Regards
>>
>>
>> 2014/1/11 Ruhollah Farchtchi <ru...@gmail.com>
>>
>> Yes I read that in the docs. However when receiving the byte array in
>> storm.py it throws a json error when trying to parse the tuples. I didn't
>> have time to look into it further as I am new to storm and python.
>>
>>
>> On Saturday, January 11, 2014, 李家宏 wrote:
>>
>> There is no need to serialize binary data, just send it as it.
>> As by defalut storm-0.9.0 use kryo serializer to serialize tuple values,
>> I guess we can skip this serialization step.
>>
>> Regards
>>
>>
>>
>> 2014/1/10 Jon Logan <jm...@buffalo.edu>
>>
>> You're going to run into issues if you have large tuples, because they
>> are buffered in memory. I would suggest moving it to an exterior channel,
>> like Redis, etc, and only passing meta-data through Storm.
>>
>> Your other solution is to use quirky things like reflection to prevent
>> your application from running out of memory when tuples are buffered.
>>
>>
>> On Fri, Jan 10, 2014 at 8:49 AM, Ruhollah Farchtchi <
>> ruhollah.farchtchi@gmail.com> wrote:
>>
>> I am using storm to process small (< 100k) image files. I don't have a
>> real-time requirement as yet, but my bottle neck is more in the image
>> processing than message passing between bolts. I am using the Clojure DSL
>> and the python bolt. Everything I've put together right now is very much a
>> prototype so my next steps are some further processing and integration.
>> Passing byte arrays didn't seem to work so well so I have had to
>> encode/decode into base64 binary as it seems the JSON parsers on the python
>> side didn't like byte arrays. I plan to go back and perhaps re-do the
>> integration with a native C++ bolt, however I believe that there are other
>> ways to do this integration as well. I'm As with Wilson, I'm interested if
>> anyone else is using Storm to process binary payloads and what they have
>> found works.
>>
>> Thanks,
>>
>> Ruhollah
>>
>> Ruhollah Farchtchi
>> ruhollah.farchtchi@gmail.com
>>
>>
>> On Thu, Jan 9, 2014 at 10:24 PM, Lochlainn Wilson <
>> lochlainn.wilson@gmail.com> wrote:
>>
>> Hi all,
>>
>> I am new to Storm and have been tasked with determining whether it is
>> feasible for us to use Apache storm in my company. I have of course
>> configured the sample projects and have been poking around. A red flag is
>> raised with the "stream processing" style JSON parsing.
>>
>> I am considering using storm with real time image processing bolts in
>> C++. Packaging binary data into a JSON (by escaping it) looks like it will
>> be slow and expensive. Is there a better way? Does anyone have experience
>> processing large streams of binary data through storm?
>>
>> How did it go?
>>
>> Regards,
>>
>> Lochlainn
>>
>>
>>
>>
>>
>>
>> --
>>
>> ======================================================
>>
>> Gvain
>>
>> Email: jh.li.em@gmail.com
>>
>>
>>
>> --
>> Ruhollah Farchtchi
>> ruhollah.farchtchi@gmail.com
>>
>>
>>
>>
>> --
>>
>> ======================================================
>>
>> Gvain
>>
>> Email: jh.li.em@gmail.com
>>
>
>
> --
> Ruhollah Farchtchi
> ruhollah.farchtchi@gmail.com
>

Re: Large binary payloads with storm

Posted by Ruhollah Farchtchi <ru...@gmail.com>.
I am using 0.9. What I think is the issue is that storm.py is having
problems when deserializing a byte array. When I encode as base64 binary
string I have no problems and it deserializes fine. Of course I would like
to avoid this extra overhead if possible. All my binary objects are
relatively small 200-300k max.

On Sunday, January 12, 2014, 李家宏 wrote:

> hi , Farchtchi,
>
> which storm version are you using ?
> IF the tuple is not serialized, then there is no need to use a JSON parser
> to parse the received tuple. I guess so.
>
> Regards
>
>
> 2014/1/11 Ruhollah Farchtchi <ru...@gmail.com>
>
> Yes I read that in the docs. However when receiving the byte array in
> storm.py it throws a json error when trying to parse the tuples. I didn't
> have time to look into it further as I am new to storm and python.
>
>
> On Saturday, January 11, 2014, 李家宏 wrote:
>
> There is no need to serialize binary data, just send it as it.
> As by defalut storm-0.9.0 use kryo serializer to serialize tuple values, I
> guess we can skip this serialization step.
>
> Regards
>
>
>
> 2014/1/10 Jon Logan <jm...@buffalo.edu>
>
> You're going to run into issues if you have large tuples, because they are
> buffered in memory. I would suggest moving it to an exterior channel, like
> Redis, etc, and only passing meta-data through Storm.
>
> Your other solution is to use quirky things like reflection to prevent
> your application from running out of memory when tuples are buffered.
>
>
> On Fri, Jan 10, 2014 at 8:49 AM, Ruhollah Farchtchi <
> ruhollah.farchtchi@gmail.com> wrote:
>
> I am using storm to process small (< 100k) image files. I don't have a
> real-time requirement as yet, but my bottle neck is more in the image
> processing than message passing between bolts. I am using the Clojure DSL
> and the python bolt. Everything I've put together right now is very much a
> prototype so my next steps are some further processing and integration.
> Passing byte arrays didn't seem to work so well so I have had to
> encode/decode into base64 binary as it seems the JSON parsers on the python
> side didn't like byte arrays. I plan to go back and perhaps re-do the
> integration with a native C++ bolt, however I believe that there are other
> ways to do this integration as well. I'm As with Wilson, I'm interested if
> anyone else is using Storm to process binary payloads and what they have
> found works.
>
> Thanks,
>
> Ruhollah
>
> Ruhollah Farchtchi
> ruhollah.farchtchi@gmail.com
>
>
> On Thu, Jan 9, 2014 at 10:24 PM, Lochlainn Wilson <
> lochlainn.wilson@gmail.com> wrote:
>
> Hi all,
>
> I am new to Storm and have been tasked with determining whether it is
> feasible for us to use Apache storm in my company. I have of course
> configured the sample projects and have been poking around. A red flag is
> raised with the "stream processing" style JSON parsing.
>
> I am considering using storm with real time image processing bolts in C++.
> Packaging binary data into a JSON (by escaping it) looks like it will be
> slow and expensive. Is there a better way? Does anyone have experience
> processing large streams of binary data through storm?
>
> How did it go?
>
> Regards,
>
> Lochlainn
>
>
>
>
>
>
> --
>
> ======================================================
>
> Gvain
>
> Email: jh.li.em@gmail.com
>
>
>
> --
> Ruhollah Farchtchi
> ruhollah.farchtchi@gmail.com
>
>
>
>
> --
>
> ======================================================
>
> Gvain
>
> Email: jh.li.em@gmail.com
>


-- 
Ruhollah Farchtchi
ruhollah.farchtchi@gmail.com

Re: Large binary payloads with storm

Posted by 李家宏 <jh...@gmail.com>.
hi , Farchtchi,

which storm version are you using ?
IF the tuple is not serialized, then there is no need to use a JSON parser
to parse the received tuple. I guess so.

Regards


2014/1/11 Ruhollah Farchtchi <ru...@gmail.com>

> Yes I read that in the docs. However when receiving the byte array in
> storm.py it throws a json error when trying to parse the tuples. I didn't
> have time to look into it further as I am new to storm and python.
>
>
> On Saturday, January 11, 2014, 李家宏 wrote:
>
>> There is no need to serialize binary data, just send it as it.
>> As by defalut storm-0.9.0 use kryo serializer to serialize tuple values,
>> I guess we can skip this serialization step.
>>
>> Regards
>>
>>
>>
>> 2014/1/10 Jon Logan <jm...@buffalo.edu>
>>
>>> You're going to run into issues if you have large tuples, because they
>>> are buffered in memory. I would suggest moving it to an exterior channel,
>>> like Redis, etc, and only passing meta-data through Storm.
>>>
>>> Your other solution is to use quirky things like reflection to prevent
>>> your application from running out of memory when tuples are buffered.
>>>
>>>
>>> On Fri, Jan 10, 2014 at 8:49 AM, Ruhollah Farchtchi <
>>> ruhollah.farchtchi@gmail.com> wrote:
>>>
>>>> I am using storm to process small (< 100k) image files. I don't have a
>>>> real-time requirement as yet, but my bottle neck is more in the image
>>>> processing than message passing between bolts. I am using the Clojure DSL
>>>> and the python bolt. Everything I've put together right now is very much a
>>>> prototype so my next steps are some further processing and integration.
>>>> Passing byte arrays didn't seem to work so well so I have had to
>>>> encode/decode into base64 binary as it seems the JSON parsers on the python
>>>> side didn't like byte arrays. I plan to go back and perhaps re-do the
>>>> integration with a native C++ bolt, however I believe that there are other
>>>> ways to do this integration as well. I'm As with Wilson, I'm interested if
>>>> anyone else is using Storm to process binary payloads and what they have
>>>> found works.
>>>>
>>>> Thanks,
>>>>
>>>> Ruhollah
>>>>
>>>> Ruhollah Farchtchi
>>>> ruhollah.farchtchi@gmail.com
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 10:24 PM, Lochlainn Wilson <
>>>> lochlainn.wilson@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am new to Storm and have been tasked with determining whether it is
>>>>> feasible for us to use Apache storm in my company. I have of course
>>>>> configured the sample projects and have been poking around. A red flag is
>>>>> raised with the "stream processing" style JSON parsing.
>>>>>
>>>>> I am considering using storm with real time image processing bolts in
>>>>> C++. Packaging binary data into a JSON (by escaping it) looks like it will
>>>>> be slow and expensive. Is there a better way? Does anyone have experience
>>>>> processing large streams of binary data through storm?
>>>>>
>>>>> How did it go?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Lochlainn
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> ======================================================
>>
>> Gvain
>>
>> Email: jh.li.em@gmail.com
>>
>
>
> --
> Ruhollah Farchtchi
> ruhollah.farchtchi@gmail.com
>



-- 

======================================================

Gvain

Email: jh.li.em@gmail.com

Re: Large binary payloads with storm

Posted by Ruhollah Farchtchi <ru...@gmail.com>.
Yes I read that in the docs. However when receiving the byte array in
storm.py it throws a json error when trying to parse the tuples. I didn't
have time to look into it further as I am new to storm and python.

On Saturday, January 11, 2014, 李家宏 wrote:

> There is no need to serialize binary data, just send it as it.
> As by defalut storm-0.9.0 use kryo serializer to serialize tuple values, I
> guess we can skip this serialization step.
>
> Regards
>
>
>
> 2014/1/10 Jon Logan <jmlogan@buffalo.edu <javascript:_e({}, 'cvml',
> 'jmlogan@buffalo.edu');>>
>
>> You're going to run into issues if you have large tuples, because they
>> are buffered in memory. I would suggest moving it to an exterior channel,
>> like Redis, etc, and only passing meta-data through Storm.
>>
>> Your other solution is to use quirky things like reflection to prevent
>> your application from running out of memory when tuples are buffered.
>>
>>
>> On Fri, Jan 10, 2014 at 8:49 AM, Ruhollah Farchtchi <
>> ruhollah.farchtchi@gmail.com <javascript:_e({}, 'cvml',
>> 'ruhollah.farchtchi@gmail.com');>> wrote:
>>
>>> I am using storm to process small (< 100k) image files. I don't have a
>>> real-time requirement as yet, but my bottle neck is more in the image
>>> processing than message passing between bolts. I am using the Clojure DSL
>>> and the python bolt. Everything I've put together right now is very much a
>>> prototype so my next steps are some further processing and integration.
>>> Passing byte arrays didn't seem to work so well so I have had to
>>> encode/decode into base64 binary as it seems the JSON parsers on the python
>>> side didn't like byte arrays. I plan to go back and perhaps re-do the
>>> integration with a native C++ bolt, however I believe that there are other
>>> ways to do this integration as well. I'm As with Wilson, I'm interested if
>>> anyone else is using Storm to process binary payloads and what they have
>>> found works.
>>>
>>> Thanks,
>>>
>>> Ruhollah
>>>
>>> Ruhollah Farchtchi
>>> ruhollah.farchtchi@gmail.com <javascript:_e({}, 'cvml',
>>> 'ruhollah.farchtchi@gmail.com');>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 10:24 PM, Lochlainn Wilson <
>>> lochlainn.wilson@gmail.com <javascript:_e({}, 'cvml',
>>> 'lochlainn.wilson@gmail.com');>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am new to Storm and have been tasked with determining whether it is
>>>> feasible for us to use Apache storm in my company. I have of course
>>>> configured the sample projects and have been poking around. A red flag is
>>>> raised with the "stream processing" style JSON parsing.
>>>>
>>>> I am considering using storm with real time image processing bolts in
>>>> C++. Packaging binary data into a JSON (by escaping it) looks like it will
>>>> be slow and expensive. Is there a better way? Does anyone have experience
>>>> processing large streams of binary data through storm?
>>>>
>>>> How did it go?
>>>>
>>>> Regards,
>>>>
>>>> Lochlainn
>>>>
>>>
>>>
>>
>
>
> --
>
> ======================================================
>
> Gvain
>
> Email: jh.li.em@gmail.com <javascript:_e({}, 'cvml',
> 'jh.li.em@gmail.com');>
>


-- 
Ruhollah Farchtchi
ruhollah.farchtchi@gmail.com

Re: Large binary payloads with storm

Posted by 李家宏 <jh...@gmail.com>.
There is no need to serialize binary data, just send it as it.
As by defalut storm-0.9.0 use kryo serializer to serialize tuple values, I
guess we can skip this serialization step.

Regards



2014/1/10 Jon Logan <jm...@buffalo.edu>

> You're going to run into issues if you have large tuples, because they are
> buffered in memory. I would suggest moving it to an exterior channel, like
> Redis, etc, and only passing meta-data through Storm.
>
> Your other solution is to use quirky things like reflection to prevent
> your application from running out of memory when tuples are buffered.
>
>
> On Fri, Jan 10, 2014 at 8:49 AM, Ruhollah Farchtchi <
> ruhollah.farchtchi@gmail.com> wrote:
>
>> I am using storm to process small (< 100k) image files. I don't have a
>> real-time requirement as yet, but my bottle neck is more in the image
>> processing than message passing between bolts. I am using the Clojure DSL
>> and the python bolt. Everything I've put together right now is very much a
>> prototype so my next steps are some further processing and integration.
>> Passing byte arrays didn't seem to work so well so I have had to
>> encode/decode into base64 binary as it seems the JSON parsers on the python
>> side didn't like byte arrays. I plan to go back and perhaps re-do the
>> integration with a native C++ bolt, however I believe that there are other
>> ways to do this integration as well. I'm As with Wilson, I'm interested if
>> anyone else is using Storm to process binary payloads and what they have
>> found works.
>>
>> Thanks,
>>
>> Ruhollah
>>
>> Ruhollah Farchtchi
>> ruhollah.farchtchi@gmail.com
>>
>>
>> On Thu, Jan 9, 2014 at 10:24 PM, Lochlainn Wilson <
>> lochlainn.wilson@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am new to Storm and have been tasked with determining whether it is
>>> feasible for us to use Apache storm in my company. I have of course
>>> configured the sample projects and have been poking around. A red flag is
>>> raised with the "stream processing" style JSON parsing.
>>>
>>> I am considering using storm with real time image processing bolts in
>>> C++. Packaging binary data into a JSON (by escaping it) looks like it will
>>> be slow and expensive. Is there a better way? Does anyone have experience
>>> processing large streams of binary data through storm?
>>>
>>> How did it go?
>>>
>>> Regards,
>>>
>>> Lochlainn
>>>
>>
>>
>


-- 

======================================================

Gvain

Email: jh.li.em@gmail.com

Re: Large binary payloads with storm

Posted by Jon Logan <jm...@buffalo.edu>.
You're going to run into issues if you have large tuples, because they are
buffered in memory. I would suggest moving it to an exterior channel, like
Redis, etc, and only passing meta-data through Storm.

Your other solution is to use quirky things like reflection to prevent your
application from running out of memory when tuples are buffered.


On Fri, Jan 10, 2014 at 8:49 AM, Ruhollah Farchtchi <
ruhollah.farchtchi@gmail.com> wrote:

> I am using storm to process small (< 100k) image files. I don't have a
> real-time requirement as yet, but my bottle neck is more in the image
> processing than message passing between bolts. I am using the Clojure DSL
> and the python bolt. Everything I've put together right now is very much a
> prototype so my next steps are some further processing and integration.
> Passing byte arrays didn't seem to work so well so I have had to
> encode/decode into base64 binary as it seems the JSON parsers on the python
> side didn't like byte arrays. I plan to go back and perhaps re-do the
> integration with a native C++ bolt, however I believe that there are other
> ways to do this integration as well. I'm As with Wilson, I'm interested if
> anyone else is using Storm to process binary payloads and what they have
> found works.
>
> Thanks,
>
> Ruhollah
>
> Ruhollah Farchtchi
> ruhollah.farchtchi@gmail.com
>
>
> On Thu, Jan 9, 2014 at 10:24 PM, Lochlainn Wilson <
> lochlainn.wilson@gmail.com> wrote:
>
>> Hi all,
>>
>> I am new to Storm and have been tasked with determining whether it is
>> feasible for us to use Apache storm in my company. I have of course
>> configured the sample projects and have been poking around. A red flag is
>> raised with the "stream processing" style JSON parsing.
>>
>> I am considering using storm with real time image processing bolts in
>> C++. Packaging binary data into a JSON (by escaping it) looks like it will
>> be slow and expensive. Is there a better way? Does anyone have experience
>> processing large streams of binary data through storm?
>>
>> How did it go?
>>
>> Regards,
>>
>> Lochlainn
>>
>
>

Re: Large binary payloads with storm

Posted by Ruhollah Farchtchi <ru...@gmail.com>.
I am using storm to process small (< 100k) image files. I don't have a
real-time requirement as yet, but my bottle neck is more in the image
processing than message passing between bolts. I am using the Clojure DSL
and the python bolt. Everything I've put together right now is very much a
prototype so my next steps are some further processing and integration.
Passing byte arrays didn't seem to work so well so I have had to
encode/decode into base64 binary as it seems the JSON parsers on the python
side didn't like byte arrays. I plan to go back and perhaps re-do the
integration with a native C++ bolt, however I believe that there are other
ways to do this integration as well. I'm As with Wilson, I'm interested if
anyone else is using Storm to process binary payloads and what they have
found works.

Thanks,

Ruhollah

Ruhollah Farchtchi
ruhollah.farchtchi@gmail.com


On Thu, Jan 9, 2014 at 10:24 PM, Lochlainn Wilson <
lochlainn.wilson@gmail.com> wrote:

> Hi all,
>
> I am new to Storm and have been tasked with determining whether it is
> feasible for us to use Apache storm in my company. I have of course
> configured the sample projects and have been poking around. A red flag is
> raised with the "stream processing" style JSON parsing.
>
> I am considering using storm with real time image processing bolts in C++.
> Packaging binary data into a JSON (by escaping it) looks like it will be
> slow and expensive. Is there a better way? Does anyone have experience
> processing large streams of binary data through storm?
>
> How did it go?
>
> Regards,
>
> Lochlainn
>