You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by narges saleh <sn...@gmail.com> on 2020/01/05 23:37:34 UTC

Streamer and data loss

Hi All,

Another question regarding ignite's streamer.
What happens to the data if the streamer node crashes before the buffer's
content is flushed to the cache? Is the client responsible for making sure
the data is persisted or ignite redirects the data to another node's
streamer?

thanks.

Re: Streamer and data loss

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

If you use it in a smart way you can get very close performance (to
allowOverwrite=true data streamer), I guess.

Just call it with a decent number of entries belonging to the same cache
partition from multiple threads, with non-intersecting keys of course.

Regards,
-- 
Ilya Kasnacheev


чт, 16 янв. 2020 г. в 21:29, narges saleh <sn...@gmail.com>:

> Hello Ilya,
>
> If I use putAll() operation then I won't get the streamer's bulk
> performance, would I? I have a huge amount of data to persist.
>
> thanks.
>
> On Thu, Jan 16, 2020 at 8:43 AM Ilya Kasnacheev <il...@gmail.com>
> wrote:
>
>> Hello!
>>
>> I think you should consider using putAll() operation if resiliency is
>> important for you, since this operation will be salvaged if initiator node
>> fails.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> чт, 16 янв. 2020 г. в 15:48, narges saleh <sn...@gmail.com>:
>>
>>> Thanks Saikat.
>>>
>>> I am not sure if sequential keys/timestamps and Kafka like offsets would
>>> help if there are many data source clients and many streamer nodes in play;
>>> depending on the checkpoint, we might still end up duplicates (unless
>>> you're saying each client sequences its payload before sending it to the
>>> streamer; even then duplicates are possible on the cache). The only sure
>>> way, it seems to me, is for the client that catches the exception to check
>>> the cache and only resend the diff, which make things very complex. The
>>> other approach, if I am right is, to enable overwrite, so the streamer
>>> would dedup the data in cache. The latter is costly too. I think the ideal
>>> approach would have been if there were some type of streamer resiliency in
>>> place where another streamer node could pick up the buffer from a crashed
>>> streamer and continue the work.
>>>
>>>
>>> On Wed, Jan 15, 2020 at 9:00 PM Saikat Maitra <sa...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> To minimise data loss during streamer node failure I think we can use
>>>> the following steps:
>>>>
>>>> 1. Use autoFlushFrequency param to set the desired flush frequency,
>>>> depending on desired consistency level and performance you can choose how
>>>> frequently you would like the data to be flush to Ignite nodes.
>>>>
>>>> 2. Develop a automated checkpointing process to capture and store the
>>>> source data offset, it can be something like kafka message offset or cache
>>>> keys if keys are sequential or timestamp for last flush and depending on
>>>> that the Ignite client can restart the data streaming process from last
>>>> checkpoint if there are node failure.
>>>>
>>>> HTH
>>>>
>>>> Regards,
>>>> Saikat
>>>>
>>>> On Fri, Jan 10, 2020 at 4:34 AM narges saleh <sn...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Saikat for the feedback.
>>>>>
>>>>> But if I use the overwrite option set to true to avoid duplicates in
>>>>> case I have to resend the entire payload in case of a streamer node
>>>>> failure, then I won't
>>>>>  get optimal performance, right?
>>>>> What's the best practice for dealing with data streamer node failures?
>>>>> Are there examples?
>>>>>
>>>>> On Thu, Jan 9, 2020 at 9:12 PM Saikat Maitra <sa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> AFAIK, the DataStreamer check for presence of key and if it is
>>>>>> present in the cache then it does not allow overwrite of value if
>>>>>> allowOverwrite is set to false.
>>>>>>
>>>>>> Regards,
>>>>>> Saikat
>>>>>>
>>>>>> On Thu, Jan 9, 2020 at 6:04 AM narges saleh <sn...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Andrei.
>>>>>>>
>>>>>>> If the external data source client sending batches of 2-3 MB say via
>>>>>>> TCP socket connection to a bunch of socket streamers (deployed as ignite
>>>>>>> services deployed to each ignite node) and say of the streamer nodes die,
>>>>>>> the data source client catching the exception, has to check the cache to
>>>>>>> see how much of the 2-4MB batch has been flushed to cache and resend the
>>>>>>> rest? Would setting streamer with overwrite set to true work, if the data
>>>>>>> source client resend the entire batch?
>>>>>>> A question regarding streamer with overwrite option set to true. How
>>>>>>> does the streamer compare the content the data in hand with the data in
>>>>>>> cache, if each record is being assigned UUID when being  inserted to cache?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 7, 2020 at 4:40 AM Andrei Aleksandrov <
>>>>>>> aealexsandrov@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Not flushed data in a data streamer will be lost. Data streamer
>>>>>>>> works
>>>>>>>> thought some Ignite node and in case if this the node failed it
>>>>>>>> can't
>>>>>>>> somehow start working with another one. So your application should
>>>>>>>> think
>>>>>>>> about how to track that all data was loaded (wait for completion of
>>>>>>>> loading, catch the exceptions, check the cache sizes, etc) and use
>>>>>>>> another client for data loading in case if previous one was failed.
>>>>>>>>
>>>>>>>> BR,
>>>>>>>> Andrei
>>>>>>>>
>>>>>>>> 1/6/2020 2:37 AM, narges saleh пишет:
>>>>>>>> > Hi All,
>>>>>>>> >
>>>>>>>> > Another question regarding ignite's streamer.
>>>>>>>> > What happens to the data if the streamer node crashes before the
>>>>>>>> > buffer's content is flushed to the cache? Is the client
>>>>>>>> responsible
>>>>>>>> > for making sure the data is persisted or ignite redirects the
>>>>>>>> data to
>>>>>>>> > another node's streamer?
>>>>>>>> >
>>>>>>>> > thanks.
>>>>>>>>
>>>>>>>

Re: Streamer and data loss

Posted by narges saleh <sn...@gmail.com>.
Hello Ilya,

If I use putAll() operation then I won't get the streamer's bulk
performance, would I? I have a huge amount of data to persist.

thanks.

On Thu, Jan 16, 2020 at 8:43 AM Ilya Kasnacheev <il...@gmail.com>
wrote:

> Hello!
>
> I think you should consider using putAll() operation if resiliency is
> important for you, since this operation will be salvaged if initiator node
> fails.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 16 янв. 2020 г. в 15:48, narges saleh <sn...@gmail.com>:
>
>> Thanks Saikat.
>>
>> I am not sure if sequential keys/timestamps and Kafka like offsets would
>> help if there are many data source clients and many streamer nodes in play;
>> depending on the checkpoint, we might still end up duplicates (unless
>> you're saying each client sequences its payload before sending it to the
>> streamer; even then duplicates are possible on the cache). The only sure
>> way, it seems to me, is for the client that catches the exception to check
>> the cache and only resend the diff, which make things very complex. The
>> other approach, if I am right is, to enable overwrite, so the streamer
>> would dedup the data in cache. The latter is costly too. I think the ideal
>> approach would have been if there were some type of streamer resiliency in
>> place where another streamer node could pick up the buffer from a crashed
>> streamer and continue the work.
>>
>>
>> On Wed, Jan 15, 2020 at 9:00 PM Saikat Maitra <sa...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> To minimise data loss during streamer node failure I think we can use
>>> the following steps:
>>>
>>> 1. Use autoFlushFrequency param to set the desired flush frequency,
>>> depending on desired consistency level and performance you can choose how
>>> frequently you would like the data to be flush to Ignite nodes.
>>>
>>> 2. Develop a automated checkpointing process to capture and store the
>>> source data offset, it can be something like kafka message offset or cache
>>> keys if keys are sequential or timestamp for last flush and depending on
>>> that the Ignite client can restart the data streaming process from last
>>> checkpoint if there are node failure.
>>>
>>> HTH
>>>
>>> Regards,
>>> Saikat
>>>
>>> On Fri, Jan 10, 2020 at 4:34 AM narges saleh <sn...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Saikat for the feedback.
>>>>
>>>> But if I use the overwrite option set to true to avoid duplicates in
>>>> case I have to resend the entire payload in case of a streamer node
>>>> failure, then I won't
>>>>  get optimal performance, right?
>>>> What's the best practice for dealing with data streamer node failures?
>>>> Are there examples?
>>>>
>>>> On Thu, Jan 9, 2020 at 9:12 PM Saikat Maitra <sa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> AFAIK, the DataStreamer check for presence of key and if it is present
>>>>> in the cache then it does not allow overwrite of value if allowOverwrite is
>>>>> set to false.
>>>>>
>>>>> Regards,
>>>>> Saikat
>>>>>
>>>>> On Thu, Jan 9, 2020 at 6:04 AM narges saleh <sn...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Andrei.
>>>>>>
>>>>>> If the external data source client sending batches of 2-3 MB say via
>>>>>> TCP socket connection to a bunch of socket streamers (deployed as ignite
>>>>>> services deployed to each ignite node) and say of the streamer nodes die,
>>>>>> the data source client catching the exception, has to check the cache to
>>>>>> see how much of the 2-4MB batch has been flushed to cache and resend the
>>>>>> rest? Would setting streamer with overwrite set to true work, if the data
>>>>>> source client resend the entire batch?
>>>>>> A question regarding streamer with overwrite option set to true. How
>>>>>> does the streamer compare the content the data in hand with the data in
>>>>>> cache, if each record is being assigned UUID when being  inserted to cache?
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 7, 2020 at 4:40 AM Andrei Aleksandrov <
>>>>>> aealexsandrov@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Not flushed data in a data streamer will be lost. Data streamer
>>>>>>> works
>>>>>>> thought some Ignite node and in case if this the node failed it
>>>>>>> can't
>>>>>>> somehow start working with another one. So your application should
>>>>>>> think
>>>>>>> about how to track that all data was loaded (wait for completion of
>>>>>>> loading, catch the exceptions, check the cache sizes, etc) and use
>>>>>>> another client for data loading in case if previous one was failed.
>>>>>>>
>>>>>>> BR,
>>>>>>> Andrei
>>>>>>>
>>>>>>> 1/6/2020 2:37 AM, narges saleh пишет:
>>>>>>> > Hi All,
>>>>>>> >
>>>>>>> > Another question regarding ignite's streamer.
>>>>>>> > What happens to the data if the streamer node crashes before the
>>>>>>> > buffer's content is flushed to the cache? Is the client
>>>>>>> responsible
>>>>>>> > for making sure the data is persisted or ignite redirects the data
>>>>>>> to
>>>>>>> > another node's streamer?
>>>>>>> >
>>>>>>> > thanks.
>>>>>>>
>>>>>>

Re: Streamer and data loss

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

I think you should consider using putAll() operation if resiliency is
important for you, since this operation will be salvaged if initiator node
fails.

Regards,
-- 
Ilya Kasnacheev


чт, 16 янв. 2020 г. в 15:48, narges saleh <sn...@gmail.com>:

> Thanks Saikat.
>
> I am not sure if sequential keys/timestamps and Kafka like offsets would
> help if there are many data source clients and many streamer nodes in play;
> depending on the checkpoint, we might still end up duplicates (unless
> you're saying each client sequences its payload before sending it to the
> streamer; even then duplicates are possible on the cache). The only sure
> way, it seems to me, is for the client that catches the exception to check
> the cache and only resend the diff, which make things very complex. The
> other approach, if I am right is, to enable overwrite, so the streamer
> would dedup the data in cache. The latter is costly too. I think the ideal
> approach would have been if there were some type of streamer resiliency in
> place where another streamer node could pick up the buffer from a crashed
> streamer and continue the work.
>
>
> On Wed, Jan 15, 2020 at 9:00 PM Saikat Maitra <sa...@gmail.com>
> wrote:
>
>> Hi,
>>
>> To minimise data loss during streamer node failure I think we can use the
>> following steps:
>>
>> 1. Use autoFlushFrequency param to set the desired flush frequency,
>> depending on desired consistency level and performance you can choose how
>> frequently you would like the data to be flush to Ignite nodes.
>>
>> 2. Develop a automated checkpointing process to capture and store the
>> source data offset, it can be something like kafka message offset or cache
>> keys if keys are sequential or timestamp for last flush and depending on
>> that the Ignite client can restart the data streaming process from last
>> checkpoint if there are node failure.
>>
>> HTH
>>
>> Regards,
>> Saikat
>>
>> On Fri, Jan 10, 2020 at 4:34 AM narges saleh <sn...@gmail.com>
>> wrote:
>>
>>> Thanks Saikat for the feedback.
>>>
>>> But if I use the overwrite option set to true to avoid duplicates in
>>> case I have to resend the entire payload in case of a streamer node
>>> failure, then I won't
>>>  get optimal performance, right?
>>> What's the best practice for dealing with data streamer node failures?
>>> Are there examples?
>>>
>>> On Thu, Jan 9, 2020 at 9:12 PM Saikat Maitra <sa...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> AFAIK, the DataStreamer check for presence of key and if it is present
>>>> in the cache then it does not allow overwrite of value if allowOverwrite is
>>>> set to false.
>>>>
>>>> Regards,
>>>> Saikat
>>>>
>>>> On Thu, Jan 9, 2020 at 6:04 AM narges saleh <sn...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Andrei.
>>>>>
>>>>> If the external data source client sending batches of 2-3 MB say via
>>>>> TCP socket connection to a bunch of socket streamers (deployed as ignite
>>>>> services deployed to each ignite node) and say of the streamer nodes die,
>>>>> the data source client catching the exception, has to check the cache to
>>>>> see how much of the 2-4MB batch has been flushed to cache and resend the
>>>>> rest? Would setting streamer with overwrite set to true work, if the data
>>>>> source client resend the entire batch?
>>>>> A question regarding streamer with overwrite option set to true. How
>>>>> does the streamer compare the content the data in hand with the data in
>>>>> cache, if each record is being assigned UUID when being  inserted to cache?
>>>>>
>>>>>
>>>>> On Tue, Jan 7, 2020 at 4:40 AM Andrei Aleksandrov <
>>>>> aealexsandrov@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Not flushed data in a data streamer will be lost. Data streamer works
>>>>>> thought some Ignite node and in case if this the node failed it can't
>>>>>> somehow start working with another one. So your application should
>>>>>> think
>>>>>> about how to track that all data was loaded (wait for completion of
>>>>>> loading, catch the exceptions, check the cache sizes, etc) and use
>>>>>> another client for data loading in case if previous one was failed.
>>>>>>
>>>>>> BR,
>>>>>> Andrei
>>>>>>
>>>>>> 1/6/2020 2:37 AM, narges saleh пишет:
>>>>>> > Hi All,
>>>>>> >
>>>>>> > Another question regarding ignite's streamer.
>>>>>> > What happens to the data if the streamer node crashes before the
>>>>>> > buffer's content is flushed to the cache? Is the client responsible
>>>>>> > for making sure the data is persisted or ignite redirects the data
>>>>>> to
>>>>>> > another node's streamer?
>>>>>> >
>>>>>> > thanks.
>>>>>>
>>>>>

Re: Streamer and data loss

Posted by narges saleh <sn...@gmail.com>.
Thanks Saikat.

I am not sure if sequential keys/timestamps and Kafka like offsets would
help if there are many data source clients and many streamer nodes in play;
depending on the checkpoint, we might still end up duplicates (unless
you're saying each client sequences its payload before sending it to the
streamer; even then duplicates are possible on the cache). The only sure
way, it seems to me, is for the client that catches the exception to check
the cache and only resend the diff, which make things very complex. The
other approach, if I am right is, to enable overwrite, so the streamer
would dedup the data in cache. The latter is costly too. I think the ideal
approach would have been if there were some type of streamer resiliency in
place where another streamer node could pick up the buffer from a crashed
streamer and continue the work.


On Wed, Jan 15, 2020 at 9:00 PM Saikat Maitra <sa...@gmail.com>
wrote:

> Hi,
>
> To minimise data loss during streamer node failure I think we can use the
> following steps:
>
> 1. Use autoFlushFrequency param to set the desired flush frequency,
> depending on desired consistency level and performance you can choose how
> frequently you would like the data to be flush to Ignite nodes.
>
> 2. Develop a automated checkpointing process to capture and store the
> source data offset, it can be something like kafka message offset or cache
> keys if keys are sequential or timestamp for last flush and depending on
> that the Ignite client can restart the data streaming process from last
> checkpoint if there are node failure.
>
> HTH
>
> Regards,
> Saikat
>
> On Fri, Jan 10, 2020 at 4:34 AM narges saleh <sn...@gmail.com> wrote:
>
>> Thanks Saikat for the feedback.
>>
>> But if I use the overwrite option set to true to avoid duplicates in case
>> I have to resend the entire payload in case of a streamer node failure,
>> then I won't
>>  get optimal performance, right?
>> What's the best practice for dealing with data streamer node failures?
>> Are there examples?
>>
>> On Thu, Jan 9, 2020 at 9:12 PM Saikat Maitra <sa...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> AFAIK, the DataStreamer check for presence of key and if it is present
>>> in the cache then it does not allow overwrite of value if allowOverwrite is
>>> set to false.
>>>
>>> Regards,
>>> Saikat
>>>
>>> On Thu, Jan 9, 2020 at 6:04 AM narges saleh <sn...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Andrei.
>>>>
>>>> If the external data source client sending batches of 2-3 MB say via
>>>> TCP socket connection to a bunch of socket streamers (deployed as ignite
>>>> services deployed to each ignite node) and say of the streamer nodes die,
>>>> the data source client catching the exception, has to check the cache to
>>>> see how much of the 2-4MB batch has been flushed to cache and resend the
>>>> rest? Would setting streamer with overwrite set to true work, if the data
>>>> source client resend the entire batch?
>>>> A question regarding streamer with overwrite option set to true. How
>>>> does the streamer compare the content the data in hand with the data in
>>>> cache, if each record is being assigned UUID when being  inserted to cache?
>>>>
>>>>
>>>> On Tue, Jan 7, 2020 at 4:40 AM Andrei Aleksandrov <
>>>> aealexsandrov@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Not flushed data in a data streamer will be lost. Data streamer works
>>>>> thought some Ignite node and in case if this the node failed it can't
>>>>> somehow start working with another one. So your application should
>>>>> think
>>>>> about how to track that all data was loaded (wait for completion of
>>>>> loading, catch the exceptions, check the cache sizes, etc) and use
>>>>> another client for data loading in case if previous one was failed.
>>>>>
>>>>> BR,
>>>>> Andrei
>>>>>
>>>>> 1/6/2020 2:37 AM, narges saleh пишет:
>>>>> > Hi All,
>>>>> >
>>>>> > Another question regarding ignite's streamer.
>>>>> > What happens to the data if the streamer node crashes before the
>>>>> > buffer's content is flushed to the cache? Is the client responsible
>>>>> > for making sure the data is persisted or ignite redirects the data
>>>>> to
>>>>> > another node's streamer?
>>>>> >
>>>>> > thanks.
>>>>>
>>>>

Re: Streamer and data loss

Posted by Saikat Maitra <sa...@gmail.com>.
Hi,

To minimise data loss during streamer node failure I think we can use the
following steps:

1. Use autoFlushFrequency param to set the desired flush frequency,
depending on desired consistency level and performance you can choose how
frequently you would like the data to be flush to Ignite nodes.

2. Develop a automated checkpointing process to capture and store the
source data offset, it can be something like kafka message offset or cache
keys if keys are sequential or timestamp for last flush and depending on
that the Ignite client can restart the data streaming process from last
checkpoint if there are node failure.

HTH

Regards,
Saikat

On Fri, Jan 10, 2020 at 4:34 AM narges saleh <sn...@gmail.com> wrote:

> Thanks Saikat for the feedback.
>
> But if I use the overwrite option set to true to avoid duplicates in case
> I have to resend the entire payload in case of a streamer node failure,
> then I won't
>  get optimal performance, right?
> What's the best practice for dealing with data streamer node failures? Are
> there examples?
>
> On Thu, Jan 9, 2020 at 9:12 PM Saikat Maitra <sa...@gmail.com>
> wrote:
>
>> Hi,
>>
>> AFAIK, the DataStreamer check for presence of key and if it is present in
>> the cache then it does not allow overwrite of value if allowOverwrite is
>> set to false.
>>
>> Regards,
>> Saikat
>>
>> On Thu, Jan 9, 2020 at 6:04 AM narges saleh <sn...@gmail.com> wrote:
>>
>>> Thanks Andrei.
>>>
>>> If the external data source client sending batches of 2-3 MB say via TCP
>>> socket connection to a bunch of socket streamers (deployed as ignite
>>> services deployed to each ignite node) and say of the streamer nodes die,
>>> the data source client catching the exception, has to check the cache to
>>> see how much of the 2-4MB batch has been flushed to cache and resend the
>>> rest? Would setting streamer with overwrite set to true work, if the data
>>> source client resend the entire batch?
>>> A question regarding streamer with overwrite option set to true. How
>>> does the streamer compare the content the data in hand with the data in
>>> cache, if each record is being assigned UUID when being  inserted to cache?
>>>
>>>
>>> On Tue, Jan 7, 2020 at 4:40 AM Andrei Aleksandrov <
>>> aealexsandrov@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Not flushed data in a data streamer will be lost. Data streamer works
>>>> thought some Ignite node and in case if this the node failed it can't
>>>> somehow start working with another one. So your application should
>>>> think
>>>> about how to track that all data was loaded (wait for completion of
>>>> loading, catch the exceptions, check the cache sizes, etc) and use
>>>> another client for data loading in case if previous one was failed.
>>>>
>>>> BR,
>>>> Andrei
>>>>
>>>> 1/6/2020 2:37 AM, narges saleh пишет:
>>>> > Hi All,
>>>> >
>>>> > Another question regarding ignite's streamer.
>>>> > What happens to the data if the streamer node crashes before the
>>>> > buffer's content is flushed to the cache? Is the client responsible
>>>> > for making sure the data is persisted or ignite redirects the data to
>>>> > another node's streamer?
>>>> >
>>>> > thanks.
>>>>
>>>

Re: Streamer and data loss

Posted by narges saleh <sn...@gmail.com>.
Thanks Saikat for the feedback.

But if I use the overwrite option set to true to avoid duplicates in case I
have to resend the entire payload in case of a streamer node failure, then
I won't
 get optimal performance, right?
What's the best practice for dealing with data streamer node failures? Are
there examples?

On Thu, Jan 9, 2020 at 9:12 PM Saikat Maitra <sa...@gmail.com>
wrote:

> Hi,
>
> AFAIK, the DataStreamer check for presence of key and if it is present in
> the cache then it does not allow overwrite of value if allowOverwrite is
> set to false.
>
> Regards,
> Saikat
>
> On Thu, Jan 9, 2020 at 6:04 AM narges saleh <sn...@gmail.com> wrote:
>
>> Thanks Andrei.
>>
>> If the external data source client sending batches of 2-3 MB say via TCP
>> socket connection to a bunch of socket streamers (deployed as ignite
>> services deployed to each ignite node) and say of the streamer nodes die,
>> the data source client catching the exception, has to check the cache to
>> see how much of the 2-4MB batch has been flushed to cache and resend the
>> rest? Would setting streamer with overwrite set to true work, if the data
>> source client resend the entire batch?
>> A question regarding streamer with overwrite option set to true. How does
>> the streamer compare the content the data in hand with the data in cache,
>> if each record is being assigned UUID when being  inserted to cache?
>>
>>
>> On Tue, Jan 7, 2020 at 4:40 AM Andrei Aleksandrov <
>> aealexsandrov@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Not flushed data in a data streamer will be lost. Data streamer works
>>> thought some Ignite node and in case if this the node failed it can't
>>> somehow start working with another one. So your application should think
>>> about how to track that all data was loaded (wait for completion of
>>> loading, catch the exceptions, check the cache sizes, etc) and use
>>> another client for data loading in case if previous one was failed.
>>>
>>> BR,
>>> Andrei
>>>
>>> 1/6/2020 2:37 AM, narges saleh пишет:
>>> > Hi All,
>>> >
>>> > Another question regarding ignite's streamer.
>>> > What happens to the data if the streamer node crashes before the
>>> > buffer's content is flushed to the cache? Is the client responsible
>>> > for making sure the data is persisted or ignite redirects the data to
>>> > another node's streamer?
>>> >
>>> > thanks.
>>>
>>

Re: Streamer and data loss

Posted by Saikat Maitra <sa...@gmail.com>.
Hi,

AFAIK, the DataStreamer check for presence of key and if it is present in
the cache then it does not allow overwrite of value if allowOverwrite is
set to false.

Regards,
Saikat

On Thu, Jan 9, 2020 at 6:04 AM narges saleh <sn...@gmail.com> wrote:

> Thanks Andrei.
>
> If the external data source client sending batches of 2-3 MB say via TCP
> socket connection to a bunch of socket streamers (deployed as ignite
> services deployed to each ignite node) and say of the streamer nodes die,
> the data source client catching the exception, has to check the cache to
> see how much of the 2-4MB batch has been flushed to cache and resend the
> rest? Would setting streamer with overwrite set to true work, if the data
> source client resend the entire batch?
> A question regarding streamer with overwrite option set to true. How does
> the streamer compare the content the data in hand with the data in cache,
> if each record is being assigned UUID when being  inserted to cache?
>
>
> On Tue, Jan 7, 2020 at 4:40 AM Andrei Aleksandrov <ae...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Not flushed data in a data streamer will be lost. Data streamer works
>> thought some Ignite node and in case if this the node failed it can't
>> somehow start working with another one. So your application should think
>> about how to track that all data was loaded (wait for completion of
>> loading, catch the exceptions, check the cache sizes, etc) and use
>> another client for data loading in case if previous one was failed.
>>
>> BR,
>> Andrei
>>
>> 1/6/2020 2:37 AM, narges saleh пишет:
>> > Hi All,
>> >
>> > Another question regarding ignite's streamer.
>> > What happens to the data if the streamer node crashes before the
>> > buffer's content is flushed to the cache? Is the client responsible
>> > for making sure the data is persisted or ignite redirects the data to
>> > another node's streamer?
>> >
>> > thanks.
>>
>

Re: Streamer and data loss

Posted by narges saleh <sn...@gmail.com>.
Thanks Andrei.

If the external data source client sending batches of 2-3 MB say via TCP
socket connection to a bunch of socket streamers (deployed as ignite
services deployed to each ignite node) and say of the streamer nodes die,
the data source client catching the exception, has to check the cache to
see how much of the 2-4MB batch has been flushed to cache and resend the
rest? Would setting streamer with overwrite set to true work, if the data
source client resend the entire batch?
A question regarding streamer with overwrite option set to true. How does
the streamer compare the content the data in hand with the data in cache,
if each record is being assigned UUID when being  inserted to cache?


On Tue, Jan 7, 2020 at 4:40 AM Andrei Aleksandrov <ae...@gmail.com>
wrote:

> Hi,
>
> Not flushed data in a data streamer will be lost. Data streamer works
> thought some Ignite node and in case if this the node failed it can't
> somehow start working with another one. So your application should think
> about how to track that all data was loaded (wait for completion of
> loading, catch the exceptions, check the cache sizes, etc) and use
> another client for data loading in case if previous one was failed.
>
> BR,
> Andrei
>
> 1/6/2020 2:37 AM, narges saleh пишет:
> > Hi All,
> >
> > Another question regarding ignite's streamer.
> > What happens to the data if the streamer node crashes before the
> > buffer's content is flushed to the cache? Is the client responsible
> > for making sure the data is persisted or ignite redirects the data to
> > another node's streamer?
> >
> > thanks.
>

Re: Streamer and data loss

Posted by Andrei Aleksandrov <ae...@gmail.com>.
Hi,

Not flushed data in a data streamer will be lost. Data streamer works 
thought some Ignite node and in case if this the node failed it can't 
somehow start working with another one. So your application should think 
about how to track that all data was loaded (wait for completion of 
loading, catch the exceptions, check the cache sizes, etc) and use 
another client for data loading in case if previous one was failed.

BR,
Andrei

1/6/2020 2:37 AM, narges saleh пишет:
> Hi All,
>
> Another question regarding ignite's streamer.
> What happens to the data if the streamer node crashes before the 
> buffer's content is flushed to the cache? Is the client responsible 
> for making sure the data is persisted or ignite redirects the data to 
> another node's streamer?
>
> thanks.