You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by woo charles <ig...@gmail.com> on 2017/04/19 01:30:14 UTC

Input data is no significant change in multi-threading

When I try to input data(80 table, each 10000 records) to a cluster with 3
server node(each 2 gb), it only has a small change in time if multi thread
is performed
(ie. max decrease from 8s to 6.5s if using IgniteCache)

Is it normal?

Also, I found that multi thread do not affect the data input speed in
IgniteDataStreamer.

Is it true?

Re: Input data is no significant change in multi-threading

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi Woo,

AddData() add entry to one of local buffer according to key affinity, then
buffer is sent to server node that is primary for all buffer keys.

On Thu, Apr 20, 2017 at 8:16 AM, woo charles <ig...@gmail.com>
wrote:

> When I call addData() in streamer. this data will send & buffer in server
> node. is that correct?
> If I correct, this data will buffer in random server node or only the one
> it directly connected?
>
> 2017-04-19 18:33 GMT+08:00 Andrey Mashenkov <an...@gmail.com>:
>
>> It may have effect if you prepare data for streamer (call addData) slowly
>> and it is possible to utilize more resources for it. Of course remote nodes
>> should be able to bear pressure of data.
>> Performance can increased, but usually slightly as network will be a
>> bottleneck.
>>
>>
>> On Wed, Apr 19, 2017 at 12:29 PM, woo charles <ig...@gmail.com>
>> wrote:
>>
>>> Is that mean the performance of input data will not affect if I use 2 IgniteDataStreamer(2
>>> client program) to input data as they use the same queue in remote
>>> nodes?
>>>
>>> 2017-04-19 10:02 GMT+08:00 Andrey Mashenkov <an...@gmail.com>
>>> :
>>>
>>>> Hi Woo,
>>>>
>>>> IgniteDataStreamer uses per node buffer to make bulk cache updates
>>>> that shows much better throughput than single updates.
>>>> Also, IgniteDataStreamer send jobs to remote nodes, to utilize
>>>> multiple threads on remote nodes.
>>>>
>>>> In multi-node grid IgniteDataStreamer usually shows better results
>>>> than single updates in from multiple threads.
>>>>
>>>>
>>>>
>>>> On Wed, Apr 19, 2017 at 4:30 AM, woo charles <ignite.charlesw@gmail.com
>>>> > wrote:
>>>>
>>>>> When I try to input data(80 table, each 10000 records) to a cluster
>>>>> with 3 server node(each 2 gb), it only has a small change in time if multi
>>>>> thread is performed
>>>>> (ie. max decrease from 8s to 6.5s if using IgniteCache)
>>>>>
>>>>> Is it normal?
>>>>>
>>>>> Also, I found that multi thread do not affect the data input speed in
>>>>> IgniteDataStreamer.
>>>>>
>>>>> Is it true?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Andrey V. Mashenkov
>>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>
>


-- 
Best regards,
Andrey V. Mashenkov

Re: Input data is no significant change in multi-threading

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi Woo,

It may be reasonable, if you see, nodes resources utilization is too low
and rising per-node-buffer size have no effect (that means you prepare data
for nodes too slow).
Of course, you should check first if network isn't a bottleneck.

On Tue, Apr 25, 2017 at 10:08 AM, woo charles <ig...@gmail.com>
wrote:

> Same data  set mean that I separate original data into 2 parts & input
> them from 2 separate programs.
> E.g. a data set with id  1 - 100. Program A input id 1-50. Program B input
> 51 - 100.
>
> 2017-04-21 17:24 GMT+08:00 Andrey Mashenkov <an...@gmail.com>:
>
>> Hi Woo,
>>
>> DataStreamer is designed to fill cache with maximum throughput. By
>> default, streamer will not rewrite cache data, until allowOverwite option
>> is set.
>>
>> Why you need to input same set of data? Why do you expected data input
>> time will change significantly with 2 programs compared to 1 if data set is
>> put twice?
>> Or I missed smth?
>>
>> Anyway, if you do not get a speed up but you sure you should, then a
>> bottleneck have to be found at first.
>>
>> On Fri, Apr 21, 2017 at 5:02 AM, woo charles <ig...@gmail.com>
>> wrote:
>>
>>> If the data is buffered in client side, the bottleneck should be also in
>>> client side.
>>> If I use 2 programs to input same set data, it should be a significant
>>> change in data input time.
>>> Is it right?
>>>
>>> 2017-04-21 6:46 GMT+08:00 Dmitriy Setrakyan <ds...@apache.org>:
>>>
>>>>
>>>> On Wed, Apr 19, 2017 at 10:16 PM, woo charles <
>>>> ignite.charlesw@gmail.com> wrote:
>>>>
>>>>> When I call addData() in streamer. this data will send & buffer in
>>>>> server node. is that correct?
>>>>> If I correct, this data will buffer in random server node or only the
>>>>> one it directly connected?
>>>>>
>>>>
>>>> addData() will buffer the data on the client side. As a matter of fact,
>>>> there are multiple buffers on the client side, which each buffer associated
>>>> with some server node.
>>>>
>>>> Ignite will never send the data to a random node. The data is always
>>>> sent exactly to the node where it will be cached.
>>>>
>>>> D.
>>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>
>


-- 
Best regards,
Andrey V. Mashenkov

Re: Input data is no significant change in multi-threading

Posted by woo charles <ig...@gmail.com>.
Same data  set mean that I separate original data into 2 parts & input them
from 2 separate programs.
E.g. a data set with id  1 - 100. Program A input id 1-50. Program B input
51 - 100.

2017-04-21 17:24 GMT+08:00 Andrey Mashenkov <an...@gmail.com>:

> Hi Woo,
>
> DataStreamer is designed to fill cache with maximum throughput. By
> default, streamer will not rewrite cache data, until allowOverwite option
> is set.
>
> Why you need to input same set of data? Why do you expected data input
> time will change significantly with 2 programs compared to 1 if data set is
> put twice?
> Or I missed smth?
>
> Anyway, if you do not get a speed up but you sure you should, then a
> bottleneck have to be found at first.
>
> On Fri, Apr 21, 2017 at 5:02 AM, woo charles <ig...@gmail.com>
> wrote:
>
>> If the data is buffered in client side, the bottleneck should be also in
>> client side.
>> If I use 2 programs to input same set data, it should be a significant
>> change in data input time.
>> Is it right?
>>
>> 2017-04-21 6:46 GMT+08:00 Dmitriy Setrakyan <ds...@apache.org>:
>>
>>>
>>> On Wed, Apr 19, 2017 at 10:16 PM, woo charles <ignite.charlesw@gmail.com
>>> > wrote:
>>>
>>>> When I call addData() in streamer. this data will send & buffer in
>>>> server node. is that correct?
>>>> If I correct, this data will buffer in random server node or only the
>>>> one it directly connected?
>>>>
>>>
>>> addData() will buffer the data on the client side. As a matter of fact,
>>> there are multiple buffers on the client side, which each buffer associated
>>> with some server node.
>>>
>>> Ignite will never send the data to a random node. The data is always
>>> sent exactly to the node where it will be cached.
>>>
>>> D.
>>>
>>
>>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: Input data is no significant change in multi-threading

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi Woo,

DataStreamer is designed to fill cache with maximum throughput. By default,
streamer will not rewrite cache data, until allowOverwite option is set.

Why you need to input same set of data? Why do you expected data input time
will change significantly with 2 programs compared to 1 if data set is put
twice?
Or I missed smth?

Anyway, if you do not get a speed up but you sure you should, then a
bottleneck have to be found at first.

On Fri, Apr 21, 2017 at 5:02 AM, woo charles <ig...@gmail.com>
wrote:

> If the data is buffered in client side, the bottleneck should be also in
> client side.
> If I use 2 programs to input same set data, it should be a significant
> change in data input time.
> Is it right?
>
> 2017-04-21 6:46 GMT+08:00 Dmitriy Setrakyan <ds...@apache.org>:
>
>>
>> On Wed, Apr 19, 2017 at 10:16 PM, woo charles <ig...@gmail.com>
>> wrote:
>>
>>> When I call addData() in streamer. this data will send & buffer in
>>> server node. is that correct?
>>> If I correct, this data will buffer in random server node or only the
>>> one it directly connected?
>>>
>>
>> addData() will buffer the data on the client side. As a matter of fact,
>> there are multiple buffers on the client side, which each buffer associated
>> with some server node.
>>
>> Ignite will never send the data to a random node. The data is always sent
>> exactly to the node where it will be cached.
>>
>> D.
>>
>
>


-- 
Best regards,
Andrey V. Mashenkov

Re: Input data is no significant change in multi-threading

Posted by woo charles <ig...@gmail.com>.
If the data is buffered in client side, the bottleneck should be also in
client side.
If I use 2 programs to input same set data, it should be a significant
change in data input time.
Is it right?

2017-04-21 6:46 GMT+08:00 Dmitriy Setrakyan <ds...@apache.org>:

>
> On Wed, Apr 19, 2017 at 10:16 PM, woo charles <ig...@gmail.com>
> wrote:
>
>> When I call addData() in streamer. this data will send & buffer in server
>> node. is that correct?
>> If I correct, this data will buffer in random server node or only the one
>> it directly connected?
>>
>
> addData() will buffer the data on the client side. As a matter of fact,
> there are multiple buffers on the client side, which each buffer associated
> with some server node.
>
> Ignite will never send the data to a random node. The data is always sent
> exactly to the node where it will be cached.
>
> D.
>

Re: Input data is no significant change in multi-threading

Posted by Dmitriy Setrakyan <ds...@apache.org>.
On Wed, Apr 19, 2017 at 10:16 PM, woo charles <ig...@gmail.com>
wrote:

> When I call addData() in streamer. this data will send & buffer in server
> node. is that correct?
> If I correct, this data will buffer in random server node or only the one
> it directly connected?
>

addData() will buffer the data on the client side. As a matter of fact,
there are multiple buffers on the client side, which each buffer associated
with some server node.

Ignite will never send the data to a random node. The data is always sent
exactly to the node where it will be cached.

D.

Re: Input data is no significant change in multi-threading

Posted by woo charles <ig...@gmail.com>.
When I call addData() in streamer. this data will send & buffer in server
node. is that correct?
If I correct, this data will buffer in random server node or only the one
it directly connected?

2017-04-19 18:33 GMT+08:00 Andrey Mashenkov <an...@gmail.com>:

> It may have effect if you prepare data for streamer (call addData) slowly
> and it is possible to utilize more resources for it. Of course remote nodes
> should be able to bear pressure of data.
> Performance can increased, but usually slightly as network will be a
> bottleneck.
>
>
> On Wed, Apr 19, 2017 at 12:29 PM, woo charles <ig...@gmail.com>
> wrote:
>
>> Is that mean the performance of input data will not affect if I use 2 IgniteDataStreamer(2
>> client program) to input data as they use the same queue in remote nodes?
>>
>> 2017-04-19 10:02 GMT+08:00 Andrey Mashenkov <an...@gmail.com>:
>>
>>> Hi Woo,
>>>
>>> IgniteDataStreamer uses per node buffer to make bulk cache updates that
>>> shows much better throughput than single updates.
>>> Also, IgniteDataStreamer send jobs to remote nodes, to utilize multiple
>>> threads on remote nodes.
>>>
>>> In multi-node grid IgniteDataStreamer usually shows better results than
>>> single updates in from multiple threads.
>>>
>>>
>>>
>>> On Wed, Apr 19, 2017 at 4:30 AM, woo charles <ig...@gmail.com>
>>> wrote:
>>>
>>>> When I try to input data(80 table, each 10000 records) to a cluster
>>>> with 3 server node(each 2 gb), it only has a small change in time if multi
>>>> thread is performed
>>>> (ie. max decrease from 8s to 6.5s if using IgniteCache)
>>>>
>>>> Is it normal?
>>>>
>>>> Also, I found that multi thread do not affect the data input speed in
>>>> IgniteDataStreamer.
>>>>
>>>> Is it true?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Andrey V. Mashenkov
>>>
>>
>>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: Input data is no significant change in multi-threading

Posted by Andrey Mashenkov <an...@gmail.com>.
It may have effect if you prepare data for streamer (call addData) slowly
and it is possible to utilize more resources for it. Of course remote nodes
should be able to bear pressure of data.
Performance can increased, but usually slightly as network will be a
bottleneck.


On Wed, Apr 19, 2017 at 12:29 PM, woo charles <ig...@gmail.com>
wrote:

> Is that mean the performance of input data will not affect if I use 2 IgniteDataStreamer(2
> client program) to input data as they use the same queue in remote nodes?
>
> 2017-04-19 10:02 GMT+08:00 Andrey Mashenkov <an...@gmail.com>:
>
>> Hi Woo,
>>
>> IgniteDataStreamer uses per node buffer to make bulk cache updates that
>> shows much better throughput than single updates.
>> Also, IgniteDataStreamer send jobs to remote nodes, to utilize multiple
>> threads on remote nodes.
>>
>> In multi-node grid IgniteDataStreamer usually shows better results than
>> single updates in from multiple threads.
>>
>>
>>
>> On Wed, Apr 19, 2017 at 4:30 AM, woo charles <ig...@gmail.com>
>> wrote:
>>
>>> When I try to input data(80 table, each 10000 records) to a cluster with
>>> 3 server node(each 2 gb), it only has a small change in time if multi
>>> thread is performed
>>> (ie. max decrease from 8s to 6.5s if using IgniteCache)
>>>
>>> Is it normal?
>>>
>>> Also, I found that multi thread do not affect the data input speed in
>>> IgniteDataStreamer.
>>>
>>> Is it true?
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>
>


-- 
Best regards,
Andrey V. Mashenkov

Re: Input data is no significant change in multi-threading

Posted by woo charles <ig...@gmail.com>.
Is that mean the performance of input data will not affect if I use 2
IgniteDataStreamer(2
client program) to input data as they use the same queue in remote nodes?

2017-04-19 10:02 GMT+08:00 Andrey Mashenkov <an...@gmail.com>:

> Hi Woo,
>
> IgniteDataStreamer uses per node buffer to make bulk cache updates that
> shows much better throughput than single updates.
> Also, IgniteDataStreamer send jobs to remote nodes, to utilize multiple
> threads on remote nodes.
>
> In multi-node grid IgniteDataStreamer usually shows better results than
> single updates in from multiple threads.
>
>
>
> On Wed, Apr 19, 2017 at 4:30 AM, woo charles <ig...@gmail.com>
> wrote:
>
>> When I try to input data(80 table, each 10000 records) to a cluster with
>> 3 server node(each 2 gb), it only has a small change in time if multi
>> thread is performed
>> (ie. max decrease from 8s to 6.5s if using IgniteCache)
>>
>> Is it normal?
>>
>> Also, I found that multi thread do not affect the data input speed in
>> IgniteDataStreamer.
>>
>> Is it true?
>>
>>
>>
>>
>>
>>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: Input data is no significant change in multi-threading

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi Woo,

IgniteDataStreamer uses per node buffer to make bulk cache updates that
shows much better throughput than single updates.
Also, IgniteDataStreamer send jobs to remote nodes, to utilize multiple
threads on remote nodes.

In multi-node grid IgniteDataStreamer usually shows better results than
single updates in from multiple threads.



On Wed, Apr 19, 2017 at 4:30 AM, woo charles <ig...@gmail.com>
wrote:

> When I try to input data(80 table, each 10000 records) to a cluster with 3
> server node(each 2 gb), it only has a small change in time if multi thread
> is performed
> (ie. max decrease from 8s to 6.5s if using IgniteCache)
>
> Is it normal?
>
> Also, I found that multi thread do not affect the data input speed in
> IgniteDataStreamer.
>
> Is it true?
>
>
>
>
>
>


-- 
Best regards,
Andrey V. Mashenkov