You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Thomas Kramer <do...@gmx.de> on 2021/11/17 09:34:36 UTC

Best way transferring large binary data to nodes

I'll need to transfer large amounts of binary data (in ~30MB chunks)
from the compute job sender to the nodes that run the compute jobs. The
data will be needed in the compute jobs but each chunk is only needed on
one node, while another chunk is needed on another node that computes
the same job. I'm wondering what is the most performant way doing this?

a) Technically I could use byte[] objects on the sender and use this in
IgniteCallable functions, is this efficiently transferring the data to
the nodes?

b) I could also first put the data into a cache on the job sender and
access the data on each node within the job. Ideally using co-location
features.

c) Is there a difference to B if I load the data into the cache using a
DataStreamer? Would it be more efficient?

d) Of course I could also use something outside of Ignite, i.e. JeroMQ.
Is this the most efficient way transferring data?

Appreciate any help on this.


Re: Re: Best way transferring large binary data to nodes

Posted by Pavel Tupitsyn <pt...@apache.org>.
I think it should be fine.

With thousands of items, we usually may want to do additional
buffering/batching (offered by APIs like DataStreamer),
but since one item is already 30MB, this may not be necessary.

If you decide to try DataStreamer, note that the default batch size is
1024, and 1024*30MB = 3GB, which exceeds the maximum message size of 2GB.
See perNodeBufferSize, perThreadBufferSize [1]

[1]
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteDataStreamer.html

On Wed, Nov 17, 2021 at 8:04 PM Kramer <do...@gmx.de> wrote:

> It will be thousands of chunks of each 30MB in very short time. Still fine
> to use IgniteCompute simplified like this to pass the byte data over to
> nodes?
>
>
>         try (Ignite ignite = Ignition.start())
>         {
>             final byte[] buffer = new byte[30 * 1024 * 1024];
>             int length = ignite.compute().call(new
> IgniteCallable<Integer>()
>             {
>                 @Override
>                 public Integer call() throws Exception
>                 {
>                     // do something with buffer
>                     // ...
>                     // ...
>                     return buffer.length;
>                 }
>             });
>
>             System.out.println("Length: " + length);
>         }
>
>
>
> *Gesendet:* Mittwoch, 17. November 2021 um 15:27 Uhr
> *Von:* "Pavel Tupitsyn" <pt...@apache.org>
> *An:* "user" <us...@ignite.apache.org>
> *Betreff:* Re: Best way transferring large binary data to nodes
> If you only need to process the data, but not store it, I would suggest
> using IgniteCompute.
> Yes, sending byte[] is efficient. 30MB is not that much and should be fine.
>
> On Wed, Nov 17, 2021 at 12:34 PM Thomas Kramer <do...@gmx.de> wrote:
>
>> I'll need to transfer large amounts of binary data (in ~30MB chunks)
>> from the compute job sender to the nodes that run the compute jobs. The
>> data will be needed in the compute jobs but each chunk is only needed on
>> one node, while another chunk is needed on another node that computes
>> the same job. I'm wondering what is the most performant way doing this?
>>
>> a) Technically I could use byte[] objects on the sender and use this in
>> IgniteCallable functions, is this efficiently transferring the data to
>> the nodes?
>>
>> b) I could also first put the data into a cache on the job sender and
>> access the data on each node within the job. Ideally using co-location
>> features.
>>
>> c) Is there a difference to B if I load the data into the cache using a
>> DataStreamer? Would it be more efficient?
>>
>> d) Of course I could also use something outside of Ignite, i.e. JeroMQ.
>> Is this the most efficient way transferring data?
>>
>> Appreciate any help on this.
>>
>
>

Aw: Re: Best way transferring large binary data to nodes

Posted by Kramer <do...@gmx.de>.
It will be thousands of chunks of each 30MB in very short time. Still fine to
use IgniteCompute simplified like this to pass the byte data over to nodes?





        try (Ignite ignite = Ignition.start())  
        {  
            final byte[] buffer = new byte[30 * 1024 * 1024];

            int length = ignite.compute().call(new IgniteCallable<Integer>()  
            {  
                @Override  
                public Integer call() throws Exception  
                {  
                    // do something with buffer  
                    // ...  
                    // ...

                    return buffer.length;  
                }  
            });  
              
            System.out.println("Length: " \+ length);  
        }







**Gesendet:**  Mittwoch, 17. November 2021 um 15:27 Uhr  
**Von:**  "Pavel Tupitsyn" <pt...@apache.org>  
**An:**  "user" <us...@ignite.apache.org>  
**Betreff:**  Re: Best way transferring large binary data to nodes

If you only need to process the data, but not store it, I would suggest using
IgniteCompute.

Yes, sending byte[] is efficient. 30MB is not that much and should be fine.



On Wed, Nov 17, 2021 at 12:34 PM Thomas Kramer
<[don.tequila@gmx.de](mailto:don.tequila@gmx.de)> wrote:

> I'll need to transfer large amounts of binary data (in ~30MB chunks)  
>  from the compute job sender to the nodes that run the compute jobs. The  
>  data will be needed in the compute jobs but each chunk is only needed on  
>  one node, while another chunk is needed on another node that computes  
>  the same job. I'm wondering what is the most performant way doing this?  
>  
>  a) Technically I could use byte[] objects on the sender and use this in  
>  IgniteCallable functions, is this efficiently transferring the data to  
>  the nodes?  
>  
>  b) I could also first put the data into a cache on the job sender and  
>  access the data on each node within the job. Ideally using co-location  
>  features.  
>  
>  c) Is there a difference to B if I load the data into the cache using a  
>  DataStreamer? Would it be more efficient?  
>  
>  d) Of course I could also use something outside of Ignite, i.e. JeroMQ.  
>  Is this the most efficient way transferring data?  
>  
>  Appreciate any help on this.  
>  


Re: Best way transferring large binary data to nodes

Posted by Pavel Tupitsyn <pt...@apache.org>.
If you only need to process the data, but not store it, I would suggest
using IgniteCompute.
Yes, sending byte[] is efficient. 30MB is not that much and should be fine.

On Wed, Nov 17, 2021 at 12:34 PM Thomas Kramer <do...@gmx.de> wrote:

> I'll need to transfer large amounts of binary data (in ~30MB chunks)
> from the compute job sender to the nodes that run the compute jobs. The
> data will be needed in the compute jobs but each chunk is only needed on
> one node, while another chunk is needed on another node that computes
> the same job. I'm wondering what is the most performant way doing this?
>
> a) Technically I could use byte[] objects on the sender and use this in
> IgniteCallable functions, is this efficiently transferring the data to
> the nodes?
>
> b) I could also first put the data into a cache on the job sender and
> access the data on each node within the job. Ideally using co-location
> features.
>
> c) Is there a difference to B if I load the data into the cache using a
> DataStreamer? Would it be more efficient?
>
> d) Of course I could also use something outside of Ignite, i.e. JeroMQ.
> Is this the most efficient way transferring data?
>
> Appreciate any help on this.
>
>