You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Alexey Goncharuk <al...@gmail.com> on 2019/12/23 14:02:58 UTC

Discovery-based services deployment guarantees question

Igniters,

I have a question based on one of my recent tests debugging.

The test is related to Ignite services. I noticed that sometimes a proxy
invocation of a newly deployed service fails because the service cannot be
found. I managed to reduce the test to a simple "start two nodes, deploy a
service, create a proxy, invoke the proxy" scenario. The proxy invocation
fails in about ~80% of runs.

As far as I remember, the new discovery-based service deployment was
supposed to be synchronous, so not only non-proxy service instances should
work, but the proxies as well. Was my understanding correct? Should I file
a bug for the observed behavior?

--AG

Re: Discovery-based services deployment guarantees question

Posted by Vyacheslav Daradur <da...@gmail.com>.

Mikhail, I merged your changes.
Thanks for your contribution!

On Tue, May 12, 2020 at 8:01 PM Vyacheslav Daradur <da...@gmail.com>
wrote:

> Hi Mikhail, proposed changes make sense to me.
> I left some comments to the pr.
> Thank you!
>
> On Wed, May 6, 2020 at 2:28 PM Mikhail Petrov <pm...@gmail.com>
> wrote:
>
>> Hello, Igniters.
>>
>> I am working on IGNITE-12894 - [1]. It seems that it has the root cause
>> which is similar to the problem described in this thread.
>>
>> To solve these problems, I propose to change the behavior of the
>> IgniteServiceProcessor#serviceTopology if the timeout argument is 0.
>> At the moment, IgniteServiceProcessor#serviceTopology returns the
>> topology immediately, regardless of whether it was initialized or not in
>> this case. I propose to wait for the service topology to be initialized
>> if the requested service is already registered on local node, but the
>> full message was not received from the coordinator yet.
>>
>> So the final behavior of IgniteServices#serviceProxy() will be:
>> 1. If the timeout is specified - it waits for the topology over a
>> specified timeout even if the requested service was not registered yet.
>> As in current implementation.
>>
>> 2. If the timeout is not specified - if service was not registered it
>> fails immediately, else it is waiting for the topology initialization
>> (full message from the coordinator) if needed.
>>
>> Here is PR with the implementation of the described proposal - [2].
>>
>> WDYT?
>>
>> [1] - https://issues.apache.org/jira/browse/IGNITE-12894
>> [2] - https://github.com/apache/ignite/pull/7771
>>
>> On 30.12.2019 13:03, Alexey Goncharuk wrote:
>> > Agree, sounds like a plan, thanks for taking over!
>> >
>> > пн, 30 дек. 2019 г. в 13:00, Vyacheslav Daradur <da...@gmail.com>:
>> >
>> >> Alexey,
>> >>
>> >> I would not make it default in the current implementation.
>> >>
>> >> Waiting of proxies on non-deployment-initiator nodes should be
>> >> improved - additional checks are required:
>> >> 1) We should not wait if requested service has not been submitted to
>> >> deploy (when there is no info about such service)
>> >> 2) If service deployment failed - getting proxy should be failed or
>> >> interrupted as well (do not wait for all available timeout)
>> >>
>> >> Let's schedule this improvement to next release, I'll try to find a
>> >> time to implement it.
>> >>
>> >> What do you think?
>> >>
>> >> On Mon, Dec 30, 2019 at 12:05 PM Alexey Goncharuk
>> >> <al...@gmail.com> wrote:
>> >>> Vyacheslav, thanks for the explanation, makes sense to me.
>> >>>
>> >>> I was thinking though, should we make the behavior with the timeout
>> >> default
>> >>> for all proxies?
>> >>>
>> >>> Just my opinion - I think for a user it would be hard to control which
>> >> node
>> >>> deploys the service, especially if multiple nodes deploy it
>> concurrently.
>> >>> Most likely users will end up always calling the second option of the
>> >> proxy
>> >>> (with the timeout), so, perhaps, make it default?
>> >>>
>> >>> вс, 29 дек. 2019 г. в 21:05, Vyacheslav Daradur <daradurvs@gmail.com
>> >:
>> >>>
>> >>>> Alexey,
>> >>>>
>> >>>> I've prepared pr [1] to show our proxy invocation guarantees and to
>> >>>> avoid misunderstanding.
>> >>>>
>> >>>> Please, let me know if you think that we should improve our
>> guaranties
>> >>>> in some cases.
>> >>>>
>> >>>> [1] https://github.com/apache/ignite/pull/7213
>> >>>>
>> >>>> On Tue, Dec 24, 2019 at 7:27 PM Vyacheslav Daradur <
>> >> daradurvs@gmail.com>
>> >>>> wrote:
>> >>>>>> even the local deployment looks broken: if a compute job
>> >>>>>> is sent to a remote node after the service deployment
>> >>>>> This is a different case and covered by retries:
>> >>>>> * If you deploy a service from node A to node B, then take a proxy
>> >>>>> from node A (deployment initiator) it should NOT fail even if node B
>> >>>>> has not received yet a message that deployment finished
>> successfully,
>> >>>>> because of proxy invocation retries.
>> >>>>>
>> >>>>> Look like It's better to describe all these cases on the wiki.
>> >>>>>
>> >>>>>> Should we schedule this ticket for the further work on Services
>> >> IEP?
>> >>>>> If it is a frequent use-case we definitely should implement it.
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Dec 24, 2019 at 6:55 PM Alexey Goncharuk
>> >>>>> <al...@gmail.com> wrote:
>> >>>>>> Ok, got it.
>> >>>>>>
>> >>>>>> I agree that this is consistent with the old behavior, but this is
>> >> the
>> >>>> kind
>> >>>>>> of errors we wanted to get rid of when we started the IEP. From the
>> >>>>>> user perspective, even the local deployment looks broken: if a
>> >> compute
>> >>>> job
>> >>>>>> is sent to a remote node after the service deployment, the job
>> >>>> execution
>> >>>>>> may fail due to this error.
>> >>>>>>
>> >>>>>> Should we schedule this ticket for the further work on Services
>> >> IEP?
>> >>>>>> вт, 24 дек. 2019 г. в 18:49, Vyacheslav Daradur <
>> >> daradurvs@gmail.com>:
>> >>>>>>> Not sure that "user fallback" is the right definition, it is not
>> >> new
>> >>>>>>> behaviour in comparison with legacy implementation.
>> >>>>>>>
>> >>>>>>> Our synchronous deployment provides guaranties for a deployment
>> >>>>>>> initiator to be able to start work with service immediately after
>> >>>>>>> deployment finished successfully.
>> >>>>>>> For not the deployment initiator we can't provide such guarantees
>> >>>> now,
>> >>>>>>> because of unknown deployment result and possibly fail.
>> >>>>>>>
>> >>>>>>> In this case, a reasonable timeout might be an acceptable
>> >> solution.
>> >>>>>>> We can improve guaranties in future releases, but there is an
>> >> open
>> >>>>>>> question:
>> >>>>>>> - how long taking of proxy should wait? - deployment of "heavy"
>> >>>>>>> service may take a while
>> >>>>>>>
>> >>>>>>> On Tue, Dec 24, 2019 at 6:19 PM Alexey Goncharuk
>> >>>>>>> <al...@gmail.com> wrote:
>> >>>>>>>> What should be the user fallback in this case? Retry
>> >> infinitely? Is
>> >>>>>>> there a
>> >>>>>>>> way to wait for the proper deployment?
>> >>>>>>>>
>> >>>>>>>> вт, 24 дек. 2019 г. в 12:41, Vyacheslav Daradur <
>> >>>> daradurvs@gmail.com>:
>> >>>>>>>>> I’ll take a look at the end of the week.
>> >>>>>>>>>
>> >>>>>>>>> There is one more use-case:
>> >>>>>>>>> * if you initiate deployment from node A, but getting proxy
>> >> on
>> >>>> node B
>> >>>>>>>>> (which isn’t deployment initiator) to call service on node A
>> >> -
>> >>>> it may
>> >>>>>>> fail
>> >>>>>>>>> with "service not found", this is expected behaviour because
>> >> we
>> >>>> didn't
>> >>>>>>>>> provide such guarantees.
>> >>>>>>>>>
>> >>>>>>>>> API of getting proxy with timeout should be used in this
>> >> case:
>> >>>>>>>>> T serviceProxy(String name, Class<? super T> svcItf, boolean
>> >>>> sticky,
>> >>>>>>> long
>> >>>>>>>>> timeout)
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> вт, 24 дек. 2019 г. в 12:11, Alexey Goncharuk <
>> >>>>>>> alexey.goncharuk@gmail.com
>> >>>>>>>>>> :
>> >>>>>>>>>> Well, this is exactly the case. The service is deployed
>> >> from
>> >>>> node A,
>> >>>>>>> the
>> >>>>>>>>>> proxy is created on node B, and "service not found"
>> >> exception
>> >>>> gets
>> >>>>>>> thrown
>> >>>>>>>>>> to a user anyway. Perhaps, the retry happens too fast?
>> >>>>>>>>>>
>> >>>>>>>>>> Created a ticket [1].
>> >>>>>>>>>>
>> >>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12490
>> >>>>>>>>>>
>> >>>>>>>>>> пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur <
>> >>>> daradurvs@gmail.com
>> >>>>>>>> :
>> >>>>>>>>>>> Hi, Alexey
>> >>>>>>>>>>>
>> >>>>>>>>>>> Please attach a reproducer to the ticket.
>> >>>>>>>>>>>
>> >>>>>>>>>>> As far as I remember we have the following behaviour for
>> >> the
>> >>>>>>> proxies:
>> >>>>>>>>>>> Let's assume you have deployed service from node A, then:
>> >>>>>>>>>>> * if you invoke service locally from node A - it is
>> >>>> guaranteed to
>> >>>>>>>>>>> service to be deployed and ready to work
>> >>>>>>>>>>> * if you take a proxy from node A to remote node B right
>> >>>> after
>> >>>>>>> deploy
>> >>>>>>>>>>> - there is might be a race between disco-spi (a message
>> >> which
>> >>>>>>> releases
>> >>>>>>>>>>> deployed service)  and comm-spi (remote call works via
>> >>>> Compute over
>> >>>>>>>>>>> comm-spi), but it shouldn't affect end-users because the
>> >>>> failed
>> >>>>>>>>>>> request will be retried in this case
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk
>> >>>>>>>>>>> <al...@gmail.com> wrote:
>> >>>>>>>>>>>> Nikolay,
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Yes, I've rechecked, the new service processor is being
>> >>>> used.
>> >>>>>>> I'll
>> >>>>>>>>>> file a
>> >>>>>>>>>>>> bug shortly.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> пн, 23 дек. 2019 г. в 17:33, Николай Ижиков <
>> >>>> nizhikov@apache.org
>> >>>>>>>> :
>> >>>>>>>>>>>>> Alexey, are you sure, you are testing new service
>> >>>> framework?
>> >>>>>>>>>>>>> Is yes - you definitely should file a bug.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 23 дек. 2019 г., в 17:02, Alexey Goncharuk <
>> >>>>>>>>>>> alexey.goncharuk@gmail.com>
>> >>>>>>>>>>>>> написал(а):
>> >>>>>>>>>>>>>> Igniters,
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I have a question based on one of my recent tests
>> >>>> debugging.
>> >>>>>>>>>>>>>> The test is related to Ignite services. I noticed
>> >> that
>> >>>>>>> sometimes
>> >>>>>>>>> a
>> >>>>>>>>>>> proxy
>> >>>>>>>>>>>>>> invocation of a newly deployed service fails
>> >> because
>> >>>> the
>> >>>>>>> service
>> >>>>>>>>>>> cannot
>> >>>>>>>>>>>>> be
>> >>>>>>>>>>>>>> found. I managed to reduce the test to a simple
>> >> "start
>> >>>> two
>> >>>>>>> nodes,
>> >>>>>>>>>>> deploy
>> >>>>>>>>>>>>> a
>> >>>>>>>>>>>>>> service, create a proxy, invoke the proxy"
>> >> scenario.
>> >>>> The
>> >>>>>>> proxy
>> >>>>>>>>>>> invocation
>> >>>>>>>>>>>>>> fails in about ~80% of runs.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> As far as I remember, the new discovery-based
>> >> service
>> >>>>>>> deployment
>> >>>>>>>>>> was
>> >>>>>>>>>>>>>> supposed to be synchronous, so not only non-proxy
>> >>>> service
>> >>>>>>>>> instances
>> >>>>>>>>>>>>> should
>> >>>>>>>>>>>>>> work, but the proxies as well. Was my understanding
>> >>>> correct?
>> >>>>>>>>>> Should I
>> >>>>>>>>>>>>> file
>> >>>>>>>>>>>>>> a bug for the observed behavior?
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> --AG
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> --
>> >>>>>>>>>>> Best Regards, Vyacheslav D.
>> >>>>>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Best Regards, Vyacheslav D.
>> >>>>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Best Regards, Vyacheslav D.
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Best Regards, Vyacheslav D.
>> >>>>
>> >>
>> >>
>> >> --
>> >> Best Regards, Vyacheslav D.
>> >>
>>
>
>
> --
> Best Regards,
> Vyacheslav D.
>


-- 
Best Regards,
Vyacheslav D.

Re: Discovery-based services deployment guarantees question

Posted by Vyacheslav Daradur <da...@gmail.com>.

Hi Mikhail, proposed changes make sense to me.
I left some comments to the pr.
Thank you!

On Wed, May 6, 2020 at 2:28 PM Mikhail Petrov <pm...@gmail.com> wrote:

> Hello, Igniters.
>
> I am working on IGNITE-12894 - [1]. It seems that it has the root cause
> which is similar to the problem described in this thread.
>
> To solve these problems, I propose to change the behavior of the
> IgniteServiceProcessor#serviceTopology if the timeout argument is 0.
> At the moment, IgniteServiceProcessor#serviceTopology returns the
> topology immediately, regardless of whether it was initialized or not in
> this case. I propose to wait for the service topology to be initialized
> if the requested service is already registered on local node, but the
> full message was not received from the coordinator yet.
>
> So the final behavior of IgniteServices#serviceProxy() will be:
> 1. If the timeout is specified - it waits for the topology over a
> specified timeout even if the requested service was not registered yet.
> As in current implementation.
>
> 2. If the timeout is not specified - if service was not registered it
> fails immediately, else it is waiting for the topology initialization
> (full message from the coordinator) if needed.
>
> Here is PR with the implementation of the described proposal - [2].
>
> WDYT?
>
> [1] - https://issues.apache.org/jira/browse/IGNITE-12894
> [2] - https://github.com/apache/ignite/pull/7771
>
> On 30.12.2019 13:03, Alexey Goncharuk wrote:
> > Agree, sounds like a plan, thanks for taking over!
> >
> > пн, 30 дек. 2019 г. в 13:00, Vyacheslav Daradur <da...@gmail.com>:
> >
> >> Alexey,
> >>
> >> I would not make it default in the current implementation.
> >>
> >> Waiting of proxies on non-deployment-initiator nodes should be
> >> improved - additional checks are required:
> >> 1) We should not wait if requested service has not been submitted to
> >> deploy (when there is no info about such service)
> >> 2) If service deployment failed - getting proxy should be failed or
> >> interrupted as well (do not wait for all available timeout)
> >>
> >> Let's schedule this improvement to next release, I'll try to find a
> >> time to implement it.
> >>
> >> What do you think?
> >>
> >> On Mon, Dec 30, 2019 at 12:05 PM Alexey Goncharuk
> >> <al...@gmail.com> wrote:
> >>> Vyacheslav, thanks for the explanation, makes sense to me.
> >>>
> >>> I was thinking though, should we make the behavior with the timeout
> >> default
> >>> for all proxies?
> >>>
> >>> Just my opinion - I think for a user it would be hard to control which
> >> node
> >>> deploys the service, especially if multiple nodes deploy it
> concurrently.
> >>> Most likely users will end up always calling the second option of the
> >> proxy
> >>> (with the timeout), so, perhaps, make it default?
> >>>
> >>> вс, 29 дек. 2019 г. в 21:05, Vyacheslav Daradur <da...@gmail.com>:
> >>>
> >>>> Alexey,
> >>>>
> >>>> I've prepared pr [1] to show our proxy invocation guarantees and to
> >>>> avoid misunderstanding.
> >>>>
> >>>> Please, let me know if you think that we should improve our guaranties
> >>>> in some cases.
> >>>>
> >>>> [1] https://github.com/apache/ignite/pull/7213
> >>>>
> >>>> On Tue, Dec 24, 2019 at 7:27 PM Vyacheslav Daradur <
> >> daradurvs@gmail.com>
> >>>> wrote:
> >>>>>> even the local deployment looks broken: if a compute job
> >>>>>> is sent to a remote node after the service deployment
> >>>>> This is a different case and covered by retries:
> >>>>> * If you deploy a service from node A to node B, then take a proxy
> >>>>> from node A (deployment initiator) it should NOT fail even if node B
> >>>>> has not received yet a message that deployment finished successfully,
> >>>>> because of proxy invocation retries.
> >>>>>
> >>>>> Look like It's better to describe all these cases on the wiki.
> >>>>>
> >>>>>> Should we schedule this ticket for the further work on Services
> >> IEP?
> >>>>> If it is a frequent use-case we definitely should implement it.
> >>>>>
> >>>>>
> >>>>> On Tue, Dec 24, 2019 at 6:55 PM Alexey Goncharuk
> >>>>> <al...@gmail.com> wrote:
> >>>>>> Ok, got it.
> >>>>>>
> >>>>>> I agree that this is consistent with the old behavior, but this is
> >> the
> >>>> kind
> >>>>>> of errors we wanted to get rid of when we started the IEP. From the
> >>>>>> user perspective, even the local deployment looks broken: if a
> >> compute
> >>>> job
> >>>>>> is sent to a remote node after the service deployment, the job
> >>>> execution
> >>>>>> may fail due to this error.
> >>>>>>
> >>>>>> Should we schedule this ticket for the further work on Services
> >> IEP?
> >>>>>> вт, 24 дек. 2019 г. в 18:49, Vyacheslav Daradur <
> >> daradurvs@gmail.com>:
> >>>>>>> Not sure that "user fallback" is the right definition, it is not
> >> new
> >>>>>>> behaviour in comparison with legacy implementation.
> >>>>>>>
> >>>>>>> Our synchronous deployment provides guaranties for a deployment
> >>>>>>> initiator to be able to start work with service immediately after
> >>>>>>> deployment finished successfully.
> >>>>>>> For not the deployment initiator we can't provide such guarantees
> >>>> now,
> >>>>>>> because of unknown deployment result and possibly fail.
> >>>>>>>
> >>>>>>> In this case, a reasonable timeout might be an acceptable
> >> solution.
> >>>>>>> We can improve guaranties in future releases, but there is an
> >> open
> >>>>>>> question:
> >>>>>>> - how long taking of proxy should wait? - deployment of "heavy"
> >>>>>>> service may take a while
> >>>>>>>
> >>>>>>> On Tue, Dec 24, 2019 at 6:19 PM Alexey Goncharuk
> >>>>>>> <al...@gmail.com> wrote:
> >>>>>>>> What should be the user fallback in this case? Retry
> >> infinitely? Is
> >>>>>>> there a
> >>>>>>>> way to wait for the proper deployment?
> >>>>>>>>
> >>>>>>>> вт, 24 дек. 2019 г. в 12:41, Vyacheslav Daradur <
> >>>> daradurvs@gmail.com>:
> >>>>>>>>> I’ll take a look at the end of the week.
> >>>>>>>>>
> >>>>>>>>> There is one more use-case:
> >>>>>>>>> * if you initiate deployment from node A, but getting proxy
> >> on
> >>>> node B
> >>>>>>>>> (which isn’t deployment initiator) to call service on node A
> >> -
> >>>> it may
> >>>>>>> fail
> >>>>>>>>> with "service not found", this is expected behaviour because
> >> we
> >>>> didn't
> >>>>>>>>> provide such guarantees.
> >>>>>>>>>
> >>>>>>>>> API of getting proxy with timeout should be used in this
> >> case:
> >>>>>>>>> T serviceProxy(String name, Class<? super T> svcItf, boolean
> >>>> sticky,
> >>>>>>> long
> >>>>>>>>> timeout)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> вт, 24 дек. 2019 г. в 12:11, Alexey Goncharuk <
> >>>>>>> alexey.goncharuk@gmail.com
> >>>>>>>>>> :
> >>>>>>>>>> Well, this is exactly the case. The service is deployed
> >> from
> >>>> node A,
> >>>>>>> the
> >>>>>>>>>> proxy is created on node B, and "service not found"
> >> exception
> >>>> gets
> >>>>>>> thrown
> >>>>>>>>>> to a user anyway. Perhaps, the retry happens too fast?
> >>>>>>>>>>
> >>>>>>>>>> Created a ticket [1].
> >>>>>>>>>>
> >>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12490
> >>>>>>>>>>
> >>>>>>>>>> пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur <
> >>>> daradurvs@gmail.com
> >>>>>>>> :
> >>>>>>>>>>> Hi, Alexey
> >>>>>>>>>>>
> >>>>>>>>>>> Please attach a reproducer to the ticket.
> >>>>>>>>>>>
> >>>>>>>>>>> As far as I remember we have the following behaviour for
> >> the
> >>>>>>> proxies:
> >>>>>>>>>>> Let's assume you have deployed service from node A, then:
> >>>>>>>>>>> * if you invoke service locally from node A - it is
> >>>> guaranteed to
> >>>>>>>>>>> service to be deployed and ready to work
> >>>>>>>>>>> * if you take a proxy from node A to remote node B right
> >>>> after
> >>>>>>> deploy
> >>>>>>>>>>> - there is might be a race between disco-spi (a message
> >> which
> >>>>>>> releases
> >>>>>>>>>>> deployed service)  and comm-spi (remote call works via
> >>>> Compute over
> >>>>>>>>>>> comm-spi), but it shouldn't affect end-users because the
> >>>> failed
> >>>>>>>>>>> request will be retried in this case
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk
> >>>>>>>>>>> <al...@gmail.com> wrote:
> >>>>>>>>>>>> Nikolay,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Yes, I've rechecked, the new service processor is being
> >>>> used.
> >>>>>>> I'll
> >>>>>>>>>> file a
> >>>>>>>>>>>> bug shortly.
> >>>>>>>>>>>>
> >>>>>>>>>>>> пн, 23 дек. 2019 г. в 17:33, Николай Ижиков <
> >>>> nizhikov@apache.org
> >>>>>>>> :
> >>>>>>>>>>>>> Alexey, are you sure, you are testing new service
> >>>> framework?
> >>>>>>>>>>>>> Is yes - you definitely should file a bug.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> 23 дек. 2019 г., в 17:02, Alexey Goncharuk <
> >>>>>>>>>>> alexey.goncharuk@gmail.com>
> >>>>>>>>>>>>> написал(а):
> >>>>>>>>>>>>>> Igniters,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I have a question based on one of my recent tests
> >>>> debugging.
> >>>>>>>>>>>>>> The test is related to Ignite services. I noticed
> >> that
> >>>>>>> sometimes
> >>>>>>>>> a
> >>>>>>>>>>> proxy
> >>>>>>>>>>>>>> invocation of a newly deployed service fails
> >> because
> >>>> the
> >>>>>>> service
> >>>>>>>>>>> cannot
> >>>>>>>>>>>>> be
> >>>>>>>>>>>>>> found. I managed to reduce the test to a simple
> >> "start
> >>>> two
> >>>>>>> nodes,
> >>>>>>>>>>> deploy
> >>>>>>>>>>>>> a
> >>>>>>>>>>>>>> service, create a proxy, invoke the proxy"
> >> scenario.
> >>>> The
> >>>>>>> proxy
> >>>>>>>>>>> invocation
> >>>>>>>>>>>>>> fails in about ~80% of runs.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> As far as I remember, the new discovery-based
> >> service
> >>>>>>> deployment
> >>>>>>>>>> was
> >>>>>>>>>>>>>> supposed to be synchronous, so not only non-proxy
> >>>> service
> >>>>>>>>> instances
> >>>>>>>>>>>>> should
> >>>>>>>>>>>>>> work, but the proxies as well. Was my understanding
> >>>> correct?
> >>>>>>>>>> Should I
> >>>>>>>>>>>>> file
> >>>>>>>>>>>>>> a bug for the observed behavior?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --AG
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Best Regards, Vyacheslav D.
> >>>>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Best Regards, Vyacheslav D.
> >>>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best Regards, Vyacheslav D.
> >>>>
> >>>>
> >>>> --
> >>>> Best Regards, Vyacheslav D.
> >>>>
> >>
> >>
> >> --
> >> Best Regards, Vyacheslav D.
> >>
>


-- 
Best Regards,
Vyacheslav D.

Re: Discovery-based services deployment guarantees question

Posted by Mikhail Petrov <pm...@gmail.com>.

Hello, Igniters.

I am working on IGNITE-12894 - [1]. It seems that it has the root cause 
which is similar to the problem described in this thread.

To solve these problems, I propose to change the behavior of the 
IgniteServiceProcessor#serviceTopology if the timeout argument is 0.
At the moment, IgniteServiceProcessor#serviceTopology returns the 
topology immediately, regardless of whether it was initialized or not in 
this case. I propose to wait for the service topology to be initialized 
if the requested service is already registered on local node, but the 
full message was not received from the coordinator yet.

So the final behavior of IgniteServices#serviceProxy() will be:
1. If the timeout is specified - it waits for the topology over a 
specified timeout even if the requested service was not registered yet. 
As in current implementation.

2. If the timeout is not specified - if service was not registered it 
fails immediately, else it is waiting for the topology initialization 
(full message from the coordinator) if needed.

Here is PR with the implementation of the described proposal - [2].

WDYT?

[1] - https://issues.apache.org/jira/browse/IGNITE-12894
[2] - https://github.com/apache/ignite/pull/7771

On 30.12.2019 13:03, Alexey Goncharuk wrote:
> Agree, sounds like a plan, thanks for taking over!
>
> пн, 30 дек. 2019 г. в 13:00, Vyacheslav Daradur <da...@gmail.com>:
>
>> Alexey,
>>
>> I would not make it default in the current implementation.
>>
>> Waiting of proxies on non-deployment-initiator nodes should be
>> improved - additional checks are required:
>> 1) We should not wait if requested service has not been submitted to
>> deploy (when there is no info about such service)
>> 2) If service deployment failed - getting proxy should be failed or
>> interrupted as well (do not wait for all available timeout)
>>
>> Let's schedule this improvement to next release, I'll try to find a
>> time to implement it.
>>
>> What do you think?
>>
>> On Mon, Dec 30, 2019 at 12:05 PM Alexey Goncharuk
>> <al...@gmail.com> wrote:
>>> Vyacheslav, thanks for the explanation, makes sense to me.
>>>
>>> I was thinking though, should we make the behavior with the timeout
>> default
>>> for all proxies?
>>>
>>> Just my opinion - I think for a user it would be hard to control which
>> node
>>> deploys the service, especially if multiple nodes deploy it concurrently.
>>> Most likely users will end up always calling the second option of the
>> proxy
>>> (with the timeout), so, perhaps, make it default?
>>>
>>> вс, 29 дек. 2019 г. в 21:05, Vyacheslav Daradur <da...@gmail.com>:
>>>
>>>> Alexey,
>>>>
>>>> I've prepared pr [1] to show our proxy invocation guarantees and to
>>>> avoid misunderstanding.
>>>>
>>>> Please, let me know if you think that we should improve our guaranties
>>>> in some cases.
>>>>
>>>> [1] https://github.com/apache/ignite/pull/7213
>>>>
>>>> On Tue, Dec 24, 2019 at 7:27 PM Vyacheslav Daradur <
>> daradurvs@gmail.com>
>>>> wrote:
>>>>>> even the local deployment looks broken: if a compute job
>>>>>> is sent to a remote node after the service deployment
>>>>> This is a different case and covered by retries:
>>>>> * If you deploy a service from node A to node B, then take a proxy
>>>>> from node A (deployment initiator) it should NOT fail even if node B
>>>>> has not received yet a message that deployment finished successfully,
>>>>> because of proxy invocation retries.
>>>>>
>>>>> Look like It's better to describe all these cases on the wiki.
>>>>>
>>>>>> Should we schedule this ticket for the further work on Services
>> IEP?
>>>>> If it is a frequent use-case we definitely should implement it.
>>>>>
>>>>>
>>>>> On Tue, Dec 24, 2019 at 6:55 PM Alexey Goncharuk
>>>>> <al...@gmail.com> wrote:
>>>>>> Ok, got it.
>>>>>>
>>>>>> I agree that this is consistent with the old behavior, but this is
>> the
>>>> kind
>>>>>> of errors we wanted to get rid of when we started the IEP. From the
>>>>>> user perspective, even the local deployment looks broken: if a
>> compute
>>>> job
>>>>>> is sent to a remote node after the service deployment, the job
>>>> execution
>>>>>> may fail due to this error.
>>>>>>
>>>>>> Should we schedule this ticket for the further work on Services
>> IEP?
>>>>>> вт, 24 дек. 2019 г. в 18:49, Vyacheslav Daradur <
>> daradurvs@gmail.com>:
>>>>>>> Not sure that "user fallback" is the right definition, it is not
>> new
>>>>>>> behaviour in comparison with legacy implementation.
>>>>>>>
>>>>>>> Our synchronous deployment provides guaranties for a deployment
>>>>>>> initiator to be able to start work with service immediately after
>>>>>>> deployment finished successfully.
>>>>>>> For not the deployment initiator we can't provide such guarantees
>>>> now,
>>>>>>> because of unknown deployment result and possibly fail.
>>>>>>>
>>>>>>> In this case, a reasonable timeout might be an acceptable
>> solution.
>>>>>>> We can improve guaranties in future releases, but there is an
>> open
>>>>>>> question:
>>>>>>> - how long taking of proxy should wait? - deployment of "heavy"
>>>>>>> service may take a while
>>>>>>>
>>>>>>> On Tue, Dec 24, 2019 at 6:19 PM Alexey Goncharuk
>>>>>>> <al...@gmail.com> wrote:
>>>>>>>> What should be the user fallback in this case? Retry
>> infinitely? Is
>>>>>>> there a
>>>>>>>> way to wait for the proper deployment?
>>>>>>>>
>>>>>>>> вт, 24 дек. 2019 г. в 12:41, Vyacheslav Daradur <
>>>> daradurvs@gmail.com>:
>>>>>>>>> I’ll take a look at the end of the week.
>>>>>>>>>
>>>>>>>>> There is one more use-case:
>>>>>>>>> * if you initiate deployment from node A, but getting proxy
>> on
>>>> node B
>>>>>>>>> (which isn’t deployment initiator) to call service on node A
>> -
>>>> it may
>>>>>>> fail
>>>>>>>>> with "service not found", this is expected behaviour because
>> we
>>>> didn't
>>>>>>>>> provide such guarantees.
>>>>>>>>>
>>>>>>>>> API of getting proxy with timeout should be used in this
>> case:
>>>>>>>>> T serviceProxy(String name, Class<? super T> svcItf, boolean
>>>> sticky,
>>>>>>> long
>>>>>>>>> timeout)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> вт, 24 дек. 2019 г. в 12:11, Alexey Goncharuk <
>>>>>>> alexey.goncharuk@gmail.com
>>>>>>>>>> :
>>>>>>>>>> Well, this is exactly the case. The service is deployed
>> from
>>>> node A,
>>>>>>> the
>>>>>>>>>> proxy is created on node B, and "service not found"
>> exception
>>>> gets
>>>>>>> thrown
>>>>>>>>>> to a user anyway. Perhaps, the retry happens too fast?
>>>>>>>>>>
>>>>>>>>>> Created a ticket [1].
>>>>>>>>>>
>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12490
>>>>>>>>>>
>>>>>>>>>> пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur <
>>>> daradurvs@gmail.com
>>>>>>>> :
>>>>>>>>>>> Hi, Alexey
>>>>>>>>>>>
>>>>>>>>>>> Please attach a reproducer to the ticket.
>>>>>>>>>>>
>>>>>>>>>>> As far as I remember we have the following behaviour for
>> the
>>>>>>> proxies:
>>>>>>>>>>> Let's assume you have deployed service from node A, then:
>>>>>>>>>>> * if you invoke service locally from node A - it is
>>>> guaranteed to
>>>>>>>>>>> service to be deployed and ready to work
>>>>>>>>>>> * if you take a proxy from node A to remote node B right
>>>> after
>>>>>>> deploy
>>>>>>>>>>> - there is might be a race between disco-spi (a message
>> which
>>>>>>> releases
>>>>>>>>>>> deployed service)  and comm-spi (remote call works via
>>>> Compute over
>>>>>>>>>>> comm-spi), but it shouldn't affect end-users because the
>>>> failed
>>>>>>>>>>> request will be retried in this case
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk
>>>>>>>>>>> <al...@gmail.com> wrote:
>>>>>>>>>>>> Nikolay,
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, I've rechecked, the new service processor is being
>>>> used.
>>>>>>> I'll
>>>>>>>>>> file a
>>>>>>>>>>>> bug shortly.
>>>>>>>>>>>>
>>>>>>>>>>>> пн, 23 дек. 2019 г. в 17:33, Николай Ижиков <
>>>> nizhikov@apache.org
>>>>>>>> :
>>>>>>>>>>>>> Alexey, are you sure, you are testing new service
>>>> framework?
>>>>>>>>>>>>> Is yes - you definitely should file a bug.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 23 дек. 2019 г., в 17:02, Alexey Goncharuk <
>>>>>>>>>>> alexey.goncharuk@gmail.com>
>>>>>>>>>>>>> написал(а):
>>>>>>>>>>>>>> Igniters,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have a question based on one of my recent tests
>>>> debugging.
>>>>>>>>>>>>>> The test is related to Ignite services. I noticed
>> that
>>>>>>> sometimes
>>>>>>>>> a
>>>>>>>>>>> proxy
>>>>>>>>>>>>>> invocation of a newly deployed service fails
>> because
>>>> the
>>>>>>> service
>>>>>>>>>>> cannot
>>>>>>>>>>>>> be
>>>>>>>>>>>>>> found. I managed to reduce the test to a simple
>> "start
>>>> two
>>>>>>> nodes,
>>>>>>>>>>> deploy
>>>>>>>>>>>>> a
>>>>>>>>>>>>>> service, create a proxy, invoke the proxy"
>> scenario.
>>>> The
>>>>>>> proxy
>>>>>>>>>>> invocation
>>>>>>>>>>>>>> fails in about ~80% of runs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As far as I remember, the new discovery-based
>> service
>>>>>>> deployment
>>>>>>>>>> was
>>>>>>>>>>>>>> supposed to be synchronous, so not only non-proxy
>>>> service
>>>>>>>>> instances
>>>>>>>>>>>>> should
>>>>>>>>>>>>>> work, but the proxies as well. Was my understanding
>>>> correct?
>>>>>>>>>> Should I
>>>>>>>>>>>>> file
>>>>>>>>>>>>>> a bug for the observed behavior?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --AG
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards, Vyacheslav D.
>>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards, Vyacheslav D.
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards, Vyacheslav D.
>>>>
>>>>
>>>> --
>>>> Best Regards, Vyacheslav D.
>>>>
>>
>>
>> --
>> Best Regards, Vyacheslav D.
>>

Re: Discovery-based services deployment guarantees question

Posted by Alexey Goncharuk <al...@gmail.com>.

Agree, sounds like a plan, thanks for taking over!

пн, 30 дек. 2019 г. в 13:00, Vyacheslav Daradur <da...@gmail.com>:

> Alexey,
>
> I would not make it default in the current implementation.
>
> Waiting of proxies on non-deployment-initiator nodes should be
> improved - additional checks are required:
> 1) We should not wait if requested service has not been submitted to
> deploy (when there is no info about such service)
> 2) If service deployment failed - getting proxy should be failed or
> interrupted as well (do not wait for all available timeout)
>
> Let's schedule this improvement to next release, I'll try to find a
> time to implement it.
>
> What do you think?
>
> On Mon, Dec 30, 2019 at 12:05 PM Alexey Goncharuk
> <al...@gmail.com> wrote:
> >
> > Vyacheslav, thanks for the explanation, makes sense to me.
> >
> > I was thinking though, should we make the behavior with the timeout
> default
> > for all proxies?
> >
> > Just my opinion - I think for a user it would be hard to control which
> node
> > deploys the service, especially if multiple nodes deploy it concurrently.
> > Most likely users will end up always calling the second option of the
> proxy
> > (with the timeout), so, perhaps, make it default?
> >
> > вс, 29 дек. 2019 г. в 21:05, Vyacheslav Daradur <da...@gmail.com>:
> >
> > > Alexey,
> > >
> > > I've prepared pr [1] to show our proxy invocation guarantees and to
> > > avoid misunderstanding.
> > >
> > > Please, let me know if you think that we should improve our guaranties
> > > in some cases.
> > >
> > > [1] https://github.com/apache/ignite/pull/7213
> > >
> > > On Tue, Dec 24, 2019 at 7:27 PM Vyacheslav Daradur <
> daradurvs@gmail.com>
> > > wrote:
> > > >
> > > > > even the local deployment looks broken: if a compute job
> > > > > is sent to a remote node after the service deployment
> > > >
> > > > This is a different case and covered by retries:
> > > > * If you deploy a service from node A to node B, then take a proxy
> > > > from node A (deployment initiator) it should NOT fail even if node B
> > > > has not received yet a message that deployment finished successfully,
> > > > because of proxy invocation retries.
> > > >
> > > > Look like It's better to describe all these cases on the wiki.
> > > >
> > > > > Should we schedule this ticket for the further work on Services
> IEP?
> > > >
> > > > If it is a frequent use-case we definitely should implement it.
> > > >
> > > >
> > > > On Tue, Dec 24, 2019 at 6:55 PM Alexey Goncharuk
> > > > <al...@gmail.com> wrote:
> > > > >
> > > > > Ok, got it.
> > > > >
> > > > > I agree that this is consistent with the old behavior, but this is
> the
> > > kind
> > > > > of errors we wanted to get rid of when we started the IEP. From the
> > > > > user perspective, even the local deployment looks broken: if a
> compute
> > > job
> > > > > is sent to a remote node after the service deployment, the job
> > > execution
> > > > > may fail due to this error.
> > > > >
> > > > > Should we schedule this ticket for the further work on Services
> IEP?
> > > > >
> > > > > вт, 24 дек. 2019 г. в 18:49, Vyacheslav Daradur <
> daradurvs@gmail.com>:
> > > > >
> > > > > > Not sure that "user fallback" is the right definition, it is not
> new
> > > > > > behaviour in comparison with legacy implementation.
> > > > > >
> > > > > > Our synchronous deployment provides guaranties for a deployment
> > > > > > initiator to be able to start work with service immediately after
> > > > > > deployment finished successfully.
> > > > > > For not the deployment initiator we can't provide such guarantees
> > > now,
> > > > > > because of unknown deployment result and possibly fail.
> > > > > >
> > > > > > In this case, a reasonable timeout might be an acceptable
> solution.
> > > > > >
> > > > > > We can improve guaranties in future releases, but there is an
> open
> > > > > > question:
> > > > > > - how long taking of proxy should wait? - deployment of "heavy"
> > > > > > service may take a while
> > > > > >
> > > > > > On Tue, Dec 24, 2019 at 6:19 PM Alexey Goncharuk
> > > > > > <al...@gmail.com> wrote:
> > > > > > >
> > > > > > > What should be the user fallback in this case? Retry
> infinitely? Is
> > > > > > there a
> > > > > > > way to wait for the proper deployment?
> > > > > > >
> > > > > > > вт, 24 дек. 2019 г. в 12:41, Vyacheslav Daradur <
> > > daradurvs@gmail.com>:
> > > > > > >
> > > > > > > > I’ll take a look at the end of the week.
> > > > > > > >
> > > > > > > > There is one more use-case:
> > > > > > > > * if you initiate deployment from node A, but getting proxy
> on
> > > node B
> > > > > > > > (which isn’t deployment initiator) to call service on node A
> -
> > > it may
> > > > > > fail
> > > > > > > > with "service not found", this is expected behaviour because
> we
> > > didn't
> > > > > > > > provide such guarantees.
> > > > > > > >
> > > > > > > > API of getting proxy with timeout should be used in this
> case:
> > > > > > > > T serviceProxy(String name, Class<? super T> svcItf, boolean
> > > sticky,
> > > > > > long
> > > > > > > > timeout)
> > > > > > > >
> > > > > > > >
> > > > > > > > вт, 24 дек. 2019 г. в 12:11, Alexey Goncharuk <
> > > > > > alexey.goncharuk@gmail.com
> > > > > > > > >:
> > > > > > > >
> > > > > > > > > Well, this is exactly the case. The service is deployed
> from
> > > node A,
> > > > > > the
> > > > > > > > > proxy is created on node B, and "service not found"
> exception
> > > gets
> > > > > > thrown
> > > > > > > > > to a user anyway. Perhaps, the retry happens too fast?
> > > > > > > > >
> > > > > > > > > Created a ticket [1].
> > > > > > > > >
> > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12490
> > > > > > > > >
> > > > > > > > > пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur <
> > > daradurvs@gmail.com
> > > > > > >:
> > > > > > > > >
> > > > > > > > > > Hi, Alexey
> > > > > > > > > >
> > > > > > > > > > Please attach a reproducer to the ticket.
> > > > > > > > > >
> > > > > > > > > > As far as I remember we have the following behaviour for
> the
> > > > > > proxies:
> > > > > > > > > >
> > > > > > > > > > Let's assume you have deployed service from node A, then:
> > > > > > > > > > * if you invoke service locally from node A - it is
> > > guaranteed to
> > > > > > > > > > service to be deployed and ready to work
> > > > > > > > > > * if you take a proxy from node A to remote node B right
> > > after
> > > > > > deploy
> > > > > > > > > > - there is might be a race between disco-spi (a message
> which
> > > > > > releases
> > > > > > > > > > deployed service)  and comm-spi (remote call works via
> > > Compute over
> > > > > > > > > > comm-spi), but it shouldn't affect end-users because the
> > > failed
> > > > > > > > > > request will be retried in this case
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk
> > > > > > > > > > <al...@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Nikolay,
> > > > > > > > > > >
> > > > > > > > > > > Yes, I've rechecked, the new service processor is being
> > > used.
> > > > > > I'll
> > > > > > > > > file a
> > > > > > > > > > > bug shortly.
> > > > > > > > > > >
> > > > > > > > > > > пн, 23 дек. 2019 г. в 17:33, Николай Ижиков <
> > > nizhikov@apache.org
> > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > > > Alexey, are you sure, you are testing new service
> > > framework?
> > > > > > > > > > > >
> > > > > > > > > > > > Is yes - you definitely should file a bug.
> > > > > > > > > > > >
> > > > > > > > > > > > > 23 дек. 2019 г., в 17:02, Alexey Goncharuk <
> > > > > > > > > > alexey.goncharuk@gmail.com>
> > > > > > > > > > > > написал(а):
> > > > > > > > > > > > >
> > > > > > > > > > > > > Igniters,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have a question based on one of my recent tests
> > > debugging.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The test is related to Ignite services. I noticed
> that
> > > > > > sometimes
> > > > > > > > a
> > > > > > > > > > proxy
> > > > > > > > > > > > > invocation of a newly deployed service fails
> because
> > > the
> > > > > > service
> > > > > > > > > > cannot
> > > > > > > > > > > > be
> > > > > > > > > > > > > found. I managed to reduce the test to a simple
> "start
> > > two
> > > > > > nodes,
> > > > > > > > > > deploy
> > > > > > > > > > > > a
> > > > > > > > > > > > > service, create a proxy, invoke the proxy"
> scenario.
> > > The
> > > > > > proxy
> > > > > > > > > > invocation
> > > > > > > > > > > > > fails in about ~80% of runs.
> > > > > > > > > > > > >
> > > > > > > > > > > > > As far as I remember, the new discovery-based
> service
> > > > > > deployment
> > > > > > > > > was
> > > > > > > > > > > > > supposed to be synchronous, so not only non-proxy
> > > service
> > > > > > > > instances
> > > > > > > > > > > > should
> > > > > > > > > > > > > work, but the proxies as well. Was my understanding
> > > correct?
> > > > > > > > > Should I
> > > > > > > > > > > > file
> > > > > > > > > > > > > a bug for the observed behavior?
> > > > > > > > > > > > >
> > > > > > > > > > > > > --AG
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best Regards, Vyacheslav D.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav D.
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav D.
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> > >
>
>
>
> --
> Best Regards, Vyacheslav D.
>

Re: Discovery-based services deployment guarantees question

Posted by Vyacheslav Daradur <da...@gmail.com>.

Alexey,

I would not make it default in the current implementation.

Waiting of proxies on non-deployment-initiator nodes should be
improved - additional checks are required:
1) We should not wait if requested service has not been submitted to
deploy (when there is no info about such service)
2) If service deployment failed - getting proxy should be failed or
interrupted as well (do not wait for all available timeout)

Let's schedule this improvement to next release, I'll try to find a
time to implement it.

What do you think?

On Mon, Dec 30, 2019 at 12:05 PM Alexey Goncharuk
<al...@gmail.com> wrote:
>
> Vyacheslav, thanks for the explanation, makes sense to me.
>
> I was thinking though, should we make the behavior with the timeout default
> for all proxies?
>
> Just my opinion - I think for a user it would be hard to control which node
> deploys the service, especially if multiple nodes deploy it concurrently.
> Most likely users will end up always calling the second option of the proxy
> (with the timeout), so, perhaps, make it default?
>
> вс, 29 дек. 2019 г. в 21:05, Vyacheslav Daradur <da...@gmail.com>:
>
> > Alexey,
> >
> > I've prepared pr [1] to show our proxy invocation guarantees and to
> > avoid misunderstanding.
> >
> > Please, let me know if you think that we should improve our guaranties
> > in some cases.
> >
> > [1] https://github.com/apache/ignite/pull/7213
> >
> > On Tue, Dec 24, 2019 at 7:27 PM Vyacheslav Daradur <da...@gmail.com>
> > wrote:
> > >
> > > > even the local deployment looks broken: if a compute job
> > > > is sent to a remote node after the service deployment
> > >
> > > This is a different case and covered by retries:
> > > * If you deploy a service from node A to node B, then take a proxy
> > > from node A (deployment initiator) it should NOT fail even if node B
> > > has not received yet a message that deployment finished successfully,
> > > because of proxy invocation retries.
> > >
> > > Look like It's better to describe all these cases on the wiki.
> > >
> > > > Should we schedule this ticket for the further work on Services IEP?
> > >
> > > If it is a frequent use-case we definitely should implement it.
> > >
> > >
> > > On Tue, Dec 24, 2019 at 6:55 PM Alexey Goncharuk
> > > <al...@gmail.com> wrote:
> > > >
> > > > Ok, got it.
> > > >
> > > > I agree that this is consistent with the old behavior, but this is the
> > kind
> > > > of errors we wanted to get rid of when we started the IEP. From the
> > > > user perspective, even the local deployment looks broken: if a compute
> > job
> > > > is sent to a remote node after the service deployment, the job
> > execution
> > > > may fail due to this error.
> > > >
> > > > Should we schedule this ticket for the further work on Services IEP?
> > > >
> > > > вт, 24 дек. 2019 г. в 18:49, Vyacheslav Daradur <da...@gmail.com>:
> > > >
> > > > > Not sure that "user fallback" is the right definition, it is not new
> > > > > behaviour in comparison with legacy implementation.
> > > > >
> > > > > Our synchronous deployment provides guaranties for a deployment
> > > > > initiator to be able to start work with service immediately after
> > > > > deployment finished successfully.
> > > > > For not the deployment initiator we can't provide such guarantees
> > now,
> > > > > because of unknown deployment result and possibly fail.
> > > > >
> > > > > In this case, a reasonable timeout might be an acceptable solution.
> > > > >
> > > > > We can improve guaranties in future releases, but there is an open
> > > > > question:
> > > > > - how long taking of proxy should wait? - deployment of "heavy"
> > > > > service may take a while
> > > > >
> > > > > On Tue, Dec 24, 2019 at 6:19 PM Alexey Goncharuk
> > > > > <al...@gmail.com> wrote:
> > > > > >
> > > > > > What should be the user fallback in this case? Retry infinitely? Is
> > > > > there a
> > > > > > way to wait for the proper deployment?
> > > > > >
> > > > > > вт, 24 дек. 2019 г. в 12:41, Vyacheslav Daradur <
> > daradurvs@gmail.com>:
> > > > > >
> > > > > > > I’ll take a look at the end of the week.
> > > > > > >
> > > > > > > There is one more use-case:
> > > > > > > * if you initiate deployment from node A, but getting proxy on
> > node B
> > > > > > > (which isn’t deployment initiator) to call service on node A -
> > it may
> > > > > fail
> > > > > > > with "service not found", this is expected behaviour because we
> > didn't
> > > > > > > provide such guarantees.
> > > > > > >
> > > > > > > API of getting proxy with timeout should be used in this case:
> > > > > > > T serviceProxy(String name, Class<? super T> svcItf, boolean
> > sticky,
> > > > > long
> > > > > > > timeout)
> > > > > > >
> > > > > > >
> > > > > > > вт, 24 дек. 2019 г. в 12:11, Alexey Goncharuk <
> > > > > alexey.goncharuk@gmail.com
> > > > > > > >:
> > > > > > >
> > > > > > > > Well, this is exactly the case. The service is deployed from
> > node A,
> > > > > the
> > > > > > > > proxy is created on node B, and "service not found" exception
> > gets
> > > > > thrown
> > > > > > > > to a user anyway. Perhaps, the retry happens too fast?
> > > > > > > >
> > > > > > > > Created a ticket [1].
> > > > > > > >
> > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12490
> > > > > > > >
> > > > > > > > пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur <
> > daradurvs@gmail.com
> > > > > >:
> > > > > > > >
> > > > > > > > > Hi, Alexey
> > > > > > > > >
> > > > > > > > > Please attach a reproducer to the ticket.
> > > > > > > > >
> > > > > > > > > As far as I remember we have the following behaviour for the
> > > > > proxies:
> > > > > > > > >
> > > > > > > > > Let's assume you have deployed service from node A, then:
> > > > > > > > > * if you invoke service locally from node A - it is
> > guaranteed to
> > > > > > > > > service to be deployed and ready to work
> > > > > > > > > * if you take a proxy from node A to remote node B right
> > after
> > > > > deploy
> > > > > > > > > - there is might be a race between disco-spi (a message which
> > > > > releases
> > > > > > > > > deployed service)  and comm-spi (remote call works via
> > Compute over
> > > > > > > > > comm-spi), but it shouldn't affect end-users because the
> > failed
> > > > > > > > > request will be retried in this case
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk
> > > > > > > > > <al...@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Nikolay,
> > > > > > > > > >
> > > > > > > > > > Yes, I've rechecked, the new service processor is being
> > used.
> > > > > I'll
> > > > > > > > file a
> > > > > > > > > > bug shortly.
> > > > > > > > > >
> > > > > > > > > > пн, 23 дек. 2019 г. в 17:33, Николай Ижиков <
> > nizhikov@apache.org
> > > > > >:
> > > > > > > > > >
> > > > > > > > > > > Alexey, are you sure, you are testing new service
> > framework?
> > > > > > > > > > >
> > > > > > > > > > > Is yes - you definitely should file a bug.
> > > > > > > > > > >
> > > > > > > > > > > > 23 дек. 2019 г., в 17:02, Alexey Goncharuk <
> > > > > > > > > alexey.goncharuk@gmail.com>
> > > > > > > > > > > написал(а):
> > > > > > > > > > > >
> > > > > > > > > > > > Igniters,
> > > > > > > > > > > >
> > > > > > > > > > > > I have a question based on one of my recent tests
> > debugging.
> > > > > > > > > > > >
> > > > > > > > > > > > The test is related to Ignite services. I noticed that
> > > > > sometimes
> > > > > > > a
> > > > > > > > > proxy
> > > > > > > > > > > > invocation of a newly deployed service fails because
> > the
> > > > > service
> > > > > > > > > cannot
> > > > > > > > > > > be
> > > > > > > > > > > > found. I managed to reduce the test to a simple "start
> > two
> > > > > nodes,
> > > > > > > > > deploy
> > > > > > > > > > > a
> > > > > > > > > > > > service, create a proxy, invoke the proxy" scenario.
> > The
> > > > > proxy
> > > > > > > > > invocation
> > > > > > > > > > > > fails in about ~80% of runs.
> > > > > > > > > > > >
> > > > > > > > > > > > As far as I remember, the new discovery-based service
> > > > > deployment
> > > > > > > > was
> > > > > > > > > > > > supposed to be synchronous, so not only non-proxy
> > service
> > > > > > > instances
> > > > > > > > > > > should
> > > > > > > > > > > > work, but the proxies as well. Was my understanding
> > correct?
> > > > > > > > Should I
> > > > > > > > > > > file
> > > > > > > > > > > > a bug for the observed behavior?
> > > > > > > > > > > >
> > > > > > > > > > > > --AG
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards, Vyacheslav D.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards, Vyacheslav D.
> > > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> >
> >
> >
> > --
> > Best Regards, Vyacheslav D.
> >



-- 
Best Regards, Vyacheslav D.

Re: Discovery-based services deployment guarantees question

Posted by Alexey Goncharuk <al...@gmail.com>.

Vyacheslav, thanks for the explanation, makes sense to me.

I was thinking though, should we make the behavior with the timeout default
for all proxies?

Just my opinion - I think for a user it would be hard to control which node
deploys the service, especially if multiple nodes deploy it concurrently.
Most likely users will end up always calling the second option of the proxy
(with the timeout), so, perhaps, make it default?

вс, 29 дек. 2019 г. в 21:05, Vyacheslav Daradur <da...@gmail.com>:

> Alexey,
>
> I've prepared pr [1] to show our proxy invocation guarantees and to
> avoid misunderstanding.
>
> Please, let me know if you think that we should improve our guaranties
> in some cases.
>
> [1] https://github.com/apache/ignite/pull/7213
>
> On Tue, Dec 24, 2019 at 7:27 PM Vyacheslav Daradur <da...@gmail.com>
> wrote:
> >
> > > even the local deployment looks broken: if a compute job
> > > is sent to a remote node after the service deployment
> >
> > This is a different case and covered by retries:
> > * If you deploy a service from node A to node B, then take a proxy
> > from node A (deployment initiator) it should NOT fail even if node B
> > has not received yet a message that deployment finished successfully,
> > because of proxy invocation retries.
> >
> > Look like It's better to describe all these cases on the wiki.
> >
> > > Should we schedule this ticket for the further work on Services IEP?
> >
> > If it is a frequent use-case we definitely should implement it.
> >
> >
> > On Tue, Dec 24, 2019 at 6:55 PM Alexey Goncharuk
> > <al...@gmail.com> wrote:
> > >
> > > Ok, got it.
> > >
> > > I agree that this is consistent with the old behavior, but this is the
> kind
> > > of errors we wanted to get rid of when we started the IEP. From the
> > > user perspective, even the local deployment looks broken: if a compute
> job
> > > is sent to a remote node after the service deployment, the job
> execution
> > > may fail due to this error.
> > >
> > > Should we schedule this ticket for the further work on Services IEP?
> > >
> > > вт, 24 дек. 2019 г. в 18:49, Vyacheslav Daradur <da...@gmail.com>:
> > >
> > > > Not sure that "user fallback" is the right definition, it is not new
> > > > behaviour in comparison with legacy implementation.
> > > >
> > > > Our synchronous deployment provides guaranties for a deployment
> > > > initiator to be able to start work with service immediately after
> > > > deployment finished successfully.
> > > > For not the deployment initiator we can't provide such guarantees
> now,
> > > > because of unknown deployment result and possibly fail.
> > > >
> > > > In this case, a reasonable timeout might be an acceptable solution.
> > > >
> > > > We can improve guaranties in future releases, but there is an open
> > > > question:
> > > > - how long taking of proxy should wait? - deployment of "heavy"
> > > > service may take a while
> > > >
> > > > On Tue, Dec 24, 2019 at 6:19 PM Alexey Goncharuk
> > > > <al...@gmail.com> wrote:
> > > > >
> > > > > What should be the user fallback in this case? Retry infinitely? Is
> > > > there a
> > > > > way to wait for the proper deployment?
> > > > >
> > > > > вт, 24 дек. 2019 г. в 12:41, Vyacheslav Daradur <
> daradurvs@gmail.com>:
> > > > >
> > > > > > I’ll take a look at the end of the week.
> > > > > >
> > > > > > There is one more use-case:
> > > > > > * if you initiate deployment from node A, but getting proxy on
> node B
> > > > > > (which isn’t deployment initiator) to call service on node A -
> it may
> > > > fail
> > > > > > with "service not found", this is expected behaviour because we
> didn't
> > > > > > provide such guarantees.
> > > > > >
> > > > > > API of getting proxy with timeout should be used in this case:
> > > > > > T serviceProxy(String name, Class<? super T> svcItf, boolean
> sticky,
> > > > long
> > > > > > timeout)
> > > > > >
> > > > > >
> > > > > > вт, 24 дек. 2019 г. в 12:11, Alexey Goncharuk <
> > > > alexey.goncharuk@gmail.com
> > > > > > >:
> > > > > >
> > > > > > > Well, this is exactly the case. The service is deployed from
> node A,
> > > > the
> > > > > > > proxy is created on node B, and "service not found" exception
> gets
> > > > thrown
> > > > > > > to a user anyway. Perhaps, the retry happens too fast?
> > > > > > >
> > > > > > > Created a ticket [1].
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12490
> > > > > > >
> > > > > > > пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur <
> daradurvs@gmail.com
> > > > >:
> > > > > > >
> > > > > > > > Hi, Alexey
> > > > > > > >
> > > > > > > > Please attach a reproducer to the ticket.
> > > > > > > >
> > > > > > > > As far as I remember we have the following behaviour for the
> > > > proxies:
> > > > > > > >
> > > > > > > > Let's assume you have deployed service from node A, then:
> > > > > > > > * if you invoke service locally from node A - it is
> guaranteed to
> > > > > > > > service to be deployed and ready to work
> > > > > > > > * if you take a proxy from node A to remote node B right
> after
> > > > deploy
> > > > > > > > - there is might be a race between disco-spi (a message which
> > > > releases
> > > > > > > > deployed service)  and comm-spi (remote call works via
> Compute over
> > > > > > > > comm-spi), but it shouldn't affect end-users because the
> failed
> > > > > > > > request will be retried in this case
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk
> > > > > > > > <al...@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Nikolay,
> > > > > > > > >
> > > > > > > > > Yes, I've rechecked, the new service processor is being
> used.
> > > > I'll
> > > > > > > file a
> > > > > > > > > bug shortly.
> > > > > > > > >
> > > > > > > > > пн, 23 дек. 2019 г. в 17:33, Николай Ижиков <
> nizhikov@apache.org
> > > > >:
> > > > > > > > >
> > > > > > > > > > Alexey, are you sure, you are testing new service
> framework?
> > > > > > > > > >
> > > > > > > > > > Is yes - you definitely should file a bug.
> > > > > > > > > >
> > > > > > > > > > > 23 дек. 2019 г., в 17:02, Alexey Goncharuk <
> > > > > > > > alexey.goncharuk@gmail.com>
> > > > > > > > > > написал(а):
> > > > > > > > > > >
> > > > > > > > > > > Igniters,
> > > > > > > > > > >
> > > > > > > > > > > I have a question based on one of my recent tests
> debugging.
> > > > > > > > > > >
> > > > > > > > > > > The test is related to Ignite services. I noticed that
> > > > sometimes
> > > > > > a
> > > > > > > > proxy
> > > > > > > > > > > invocation of a newly deployed service fails because
> the
> > > > service
> > > > > > > > cannot
> > > > > > > > > > be
> > > > > > > > > > > found. I managed to reduce the test to a simple "start
> two
> > > > nodes,
> > > > > > > > deploy
> > > > > > > > > > a
> > > > > > > > > > > service, create a proxy, invoke the proxy" scenario.
> The
> > > > proxy
> > > > > > > > invocation
> > > > > > > > > > > fails in about ~80% of runs.
> > > > > > > > > > >
> > > > > > > > > > > As far as I remember, the new discovery-based service
> > > > deployment
> > > > > > > was
> > > > > > > > > > > supposed to be synchronous, so not only non-proxy
> service
> > > > > > instances
> > > > > > > > > > should
> > > > > > > > > > > work, but the proxies as well. Was my understanding
> correct?
> > > > > > > Should I
> > > > > > > > > > file
> > > > > > > > > > > a bug for the observed behavior?
> > > > > > > > > > >
> > > > > > > > > > > --AG
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards, Vyacheslav D.
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav D.
> > > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav D.
>
>
>
> --
> Best Regards, Vyacheslav D.
>

Re: Discovery-based services deployment guarantees question

Posted by Vyacheslav Daradur <da...@gmail.com>.

Alexey,

I've prepared pr [1] to show our proxy invocation guarantees and to
avoid misunderstanding.

Please, let me know if you think that we should improve our guaranties
in some cases.

[1] https://github.com/apache/ignite/pull/7213

On Tue, Dec 24, 2019 at 7:27 PM Vyacheslav Daradur <da...@gmail.com> wrote:
>
> > even the local deployment looks broken: if a compute job
> > is sent to a remote node after the service deployment
>
> This is a different case and covered by retries:
> * If you deploy a service from node A to node B, then take a proxy
> from node A (deployment initiator) it should NOT fail even if node B
> has not received yet a message that deployment finished successfully,
> because of proxy invocation retries.
>
> Look like It's better to describe all these cases on the wiki.
>
> > Should we schedule this ticket for the further work on Services IEP?
>
> If it is a frequent use-case we definitely should implement it.
>
>
> On Tue, Dec 24, 2019 at 6:55 PM Alexey Goncharuk
> <al...@gmail.com> wrote:
> >
> > Ok, got it.
> >
> > I agree that this is consistent with the old behavior, but this is the kind
> > of errors we wanted to get rid of when we started the IEP. From the
> > user perspective, even the local deployment looks broken: if a compute job
> > is sent to a remote node after the service deployment, the job execution
> > may fail due to this error.
> >
> > Should we schedule this ticket for the further work on Services IEP?
> >
> > вт, 24 дек. 2019 г. в 18:49, Vyacheslav Daradur <da...@gmail.com>:
> >
> > > Not sure that "user fallback" is the right definition, it is not new
> > > behaviour in comparison with legacy implementation.
> > >
> > > Our synchronous deployment provides guaranties for a deployment
> > > initiator to be able to start work with service immediately after
> > > deployment finished successfully.
> > > For not the deployment initiator we can't provide such guarantees now,
> > > because of unknown deployment result and possibly fail.
> > >
> > > In this case, a reasonable timeout might be an acceptable solution.
> > >
> > > We can improve guaranties in future releases, but there is an open
> > > question:
> > > - how long taking of proxy should wait? - deployment of "heavy"
> > > service may take a while
> > >
> > > On Tue, Dec 24, 2019 at 6:19 PM Alexey Goncharuk
> > > <al...@gmail.com> wrote:
> > > >
> > > > What should be the user fallback in this case? Retry infinitely? Is
> > > there a
> > > > way to wait for the proper deployment?
> > > >
> > > > вт, 24 дек. 2019 г. в 12:41, Vyacheslav Daradur <da...@gmail.com>:
> > > >
> > > > > I’ll take a look at the end of the week.
> > > > >
> > > > > There is one more use-case:
> > > > > * if you initiate deployment from node A, but getting proxy on node B
> > > > > (which isn’t deployment initiator) to call service on node A - it may
> > > fail
> > > > > with "service not found", this is expected behaviour because we didn't
> > > > > provide such guarantees.
> > > > >
> > > > > API of getting proxy with timeout should be used in this case:
> > > > > T serviceProxy(String name, Class<? super T> svcItf, boolean sticky,
> > > long
> > > > > timeout)
> > > > >
> > > > >
> > > > > вт, 24 дек. 2019 г. в 12:11, Alexey Goncharuk <
> > > alexey.goncharuk@gmail.com
> > > > > >:
> > > > >
> > > > > > Well, this is exactly the case. The service is deployed from node A,
> > > the
> > > > > > proxy is created on node B, and "service not found" exception gets
> > > thrown
> > > > > > to a user anyway. Perhaps, the retry happens too fast?
> > > > > >
> > > > > > Created a ticket [1].
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12490
> > > > > >
> > > > > > пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur <daradurvs@gmail.com
> > > >:
> > > > > >
> > > > > > > Hi, Alexey
> > > > > > >
> > > > > > > Please attach a reproducer to the ticket.
> > > > > > >
> > > > > > > As far as I remember we have the following behaviour for the
> > > proxies:
> > > > > > >
> > > > > > > Let's assume you have deployed service from node A, then:
> > > > > > > * if you invoke service locally from node A - it is guaranteed to
> > > > > > > service to be deployed and ready to work
> > > > > > > * if you take a proxy from node A to remote node B right after
> > > deploy
> > > > > > > - there is might be a race between disco-spi (a message which
> > > releases
> > > > > > > deployed service)  and comm-spi (remote call works via Compute over
> > > > > > > comm-spi), but it shouldn't affect end-users because the failed
> > > > > > > request will be retried in this case
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk
> > > > > > > <al...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Nikolay,
> > > > > > > >
> > > > > > > > Yes, I've rechecked, the new service processor is being used.
> > > I'll
> > > > > > file a
> > > > > > > > bug shortly.
> > > > > > > >
> > > > > > > > пн, 23 дек. 2019 г. в 17:33, Николай Ижиков <nizhikov@apache.org
> > > >:
> > > > > > > >
> > > > > > > > > Alexey, are you sure, you are testing new service framework?
> > > > > > > > >
> > > > > > > > > Is yes - you definitely should file a bug.
> > > > > > > > >
> > > > > > > > > > 23 дек. 2019 г., в 17:02, Alexey Goncharuk <
> > > > > > > alexey.goncharuk@gmail.com>
> > > > > > > > > написал(а):
> > > > > > > > > >
> > > > > > > > > > Igniters,
> > > > > > > > > >
> > > > > > > > > > I have a question based on one of my recent tests debugging.
> > > > > > > > > >
> > > > > > > > > > The test is related to Ignite services. I noticed that
> > > sometimes
> > > > > a
> > > > > > > proxy
> > > > > > > > > > invocation of a newly deployed service fails because the
> > > service
> > > > > > > cannot
> > > > > > > > > be
> > > > > > > > > > found. I managed to reduce the test to a simple "start two
> > > nodes,
> > > > > > > deploy
> > > > > > > > > a
> > > > > > > > > > service, create a proxy, invoke the proxy" scenario. The
> > > proxy
> > > > > > > invocation
> > > > > > > > > > fails in about ~80% of runs.
> > > > > > > > > >
> > > > > > > > > > As far as I remember, the new discovery-based service
> > > deployment
> > > > > > was
> > > > > > > > > > supposed to be synchronous, so not only non-proxy service
> > > > > instances
> > > > > > > > > should
> > > > > > > > > > work, but the proxies as well. Was my understanding correct?
> > > > > > Should I
> > > > > > > > > file
> > > > > > > > > > a bug for the observed behavior?
> > > > > > > > > >
> > > > > > > > > > --AG
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best Regards, Vyacheslav D.
> > > > > > >
> > > > > >
> > > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> > >
>
>
>
> --
> Best Regards, Vyacheslav D.



-- 
Best Regards, Vyacheslav D.

Re: Discovery-based services deployment guarantees question

Posted by Vyacheslav Daradur <da...@gmail.com>.

> even the local deployment looks broken: if a compute job
> is sent to a remote node after the service deployment

This is a different case and covered by retries:
* If you deploy a service from node A to node B, then take a proxy
from node A (deployment initiator) it should NOT fail even if node B
has not received yet a message that deployment finished successfully,
because of proxy invocation retries.

Look like It's better to describe all these cases on the wiki.

> Should we schedule this ticket for the further work on Services IEP?

If it is a frequent use-case we definitely should implement it.


On Tue, Dec 24, 2019 at 6:55 PM Alexey Goncharuk
<al...@gmail.com> wrote:
>
> Ok, got it.
>
> I agree that this is consistent with the old behavior, but this is the kind
> of errors we wanted to get rid of when we started the IEP. From the
> user perspective, even the local deployment looks broken: if a compute job
> is sent to a remote node after the service deployment, the job execution
> may fail due to this error.
>
> Should we schedule this ticket for the further work on Services IEP?
>
> вт, 24 дек. 2019 г. в 18:49, Vyacheslav Daradur <da...@gmail.com>:
>
> > Not sure that "user fallback" is the right definition, it is not new
> > behaviour in comparison with legacy implementation.
> >
> > Our synchronous deployment provides guaranties for a deployment
> > initiator to be able to start work with service immediately after
> > deployment finished successfully.
> > For not the deployment initiator we can't provide such guarantees now,
> > because of unknown deployment result and possibly fail.
> >
> > In this case, a reasonable timeout might be an acceptable solution.
> >
> > We can improve guaranties in future releases, but there is an open
> > question:
> > - how long taking of proxy should wait? - deployment of "heavy"
> > service may take a while
> >
> > On Tue, Dec 24, 2019 at 6:19 PM Alexey Goncharuk
> > <al...@gmail.com> wrote:
> > >
> > > What should be the user fallback in this case? Retry infinitely? Is
> > there a
> > > way to wait for the proper deployment?
> > >
> > > вт, 24 дек. 2019 г. в 12:41, Vyacheslav Daradur <da...@gmail.com>:
> > >
> > > > I’ll take a look at the end of the week.
> > > >
> > > > There is one more use-case:
> > > > * if you initiate deployment from node A, but getting proxy on node B
> > > > (which isn’t deployment initiator) to call service on node A - it may
> > fail
> > > > with "service not found", this is expected behaviour because we didn't
> > > > provide such guarantees.
> > > >
> > > > API of getting proxy with timeout should be used in this case:
> > > > T serviceProxy(String name, Class<? super T> svcItf, boolean sticky,
> > long
> > > > timeout)
> > > >
> > > >
> > > > вт, 24 дек. 2019 г. в 12:11, Alexey Goncharuk <
> > alexey.goncharuk@gmail.com
> > > > >:
> > > >
> > > > > Well, this is exactly the case. The service is deployed from node A,
> > the
> > > > > proxy is created on node B, and "service not found" exception gets
> > thrown
> > > > > to a user anyway. Perhaps, the retry happens too fast?
> > > > >
> > > > > Created a ticket [1].
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12490
> > > > >
> > > > > пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur <daradurvs@gmail.com
> > >:
> > > > >
> > > > > > Hi, Alexey
> > > > > >
> > > > > > Please attach a reproducer to the ticket.
> > > > > >
> > > > > > As far as I remember we have the following behaviour for the
> > proxies:
> > > > > >
> > > > > > Let's assume you have deployed service from node A, then:
> > > > > > * if you invoke service locally from node A - it is guaranteed to
> > > > > > service to be deployed and ready to work
> > > > > > * if you take a proxy from node A to remote node B right after
> > deploy
> > > > > > - there is might be a race between disco-spi (a message which
> > releases
> > > > > > deployed service)  and comm-spi (remote call works via Compute over
> > > > > > comm-spi), but it shouldn't affect end-users because the failed
> > > > > > request will be retried in this case
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk
> > > > > > <al...@gmail.com> wrote:
> > > > > > >
> > > > > > > Nikolay,
> > > > > > >
> > > > > > > Yes, I've rechecked, the new service processor is being used.
> > I'll
> > > > > file a
> > > > > > > bug shortly.
> > > > > > >
> > > > > > > пн, 23 дек. 2019 г. в 17:33, Николай Ижиков <nizhikov@apache.org
> > >:
> > > > > > >
> > > > > > > > Alexey, are you sure, you are testing new service framework?
> > > > > > > >
> > > > > > > > Is yes - you definitely should file a bug.
> > > > > > > >
> > > > > > > > > 23 дек. 2019 г., в 17:02, Alexey Goncharuk <
> > > > > > alexey.goncharuk@gmail.com>
> > > > > > > > написал(а):
> > > > > > > > >
> > > > > > > > > Igniters,
> > > > > > > > >
> > > > > > > > > I have a question based on one of my recent tests debugging.
> > > > > > > > >
> > > > > > > > > The test is related to Ignite services. I noticed that
> > sometimes
> > > > a
> > > > > > proxy
> > > > > > > > > invocation of a newly deployed service fails because the
> > service
> > > > > > cannot
> > > > > > > > be
> > > > > > > > > found. I managed to reduce the test to a simple "start two
> > nodes,
> > > > > > deploy
> > > > > > > > a
> > > > > > > > > service, create a proxy, invoke the proxy" scenario. The
> > proxy
> > > > > > invocation
> > > > > > > > > fails in about ~80% of runs.
> > > > > > > > >
> > > > > > > > > As far as I remember, the new discovery-based service
> > deployment
> > > > > was
> > > > > > > > > supposed to be synchronous, so not only non-proxy service
> > > > instances
> > > > > > > > should
> > > > > > > > > work, but the proxies as well. Was my understanding correct?
> > > > > Should I
> > > > > > > > file
> > > > > > > > > a bug for the observed behavior?
> > > > > > > > >
> > > > > > > > > --AG
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav D.
> > > > > >
> > > > >
> > > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav D.
> >



-- 
Best Regards, Vyacheslav D.

Re: Discovery-based services deployment guarantees question

Posted by Alexey Goncharuk <al...@gmail.com>.

Ok, got it.

I agree that this is consistent with the old behavior, but this is the kind
of errors we wanted to get rid of when we started the IEP. From the
user perspective, even the local deployment looks broken: if a compute job
is sent to a remote node after the service deployment, the job execution
may fail due to this error.

Should we schedule this ticket for the further work on Services IEP?

вт, 24 дек. 2019 г. в 18:49, Vyacheslav Daradur <da...@gmail.com>:

> Not sure that "user fallback" is the right definition, it is not new
> behaviour in comparison with legacy implementation.
>
> Our synchronous deployment provides guaranties for a deployment
> initiator to be able to start work with service immediately after
> deployment finished successfully.
> For not the deployment initiator we can't provide such guarantees now,
> because of unknown deployment result and possibly fail.
>
> In this case, a reasonable timeout might be an acceptable solution.
>
> We can improve guaranties in future releases, but there is an open
> question:
> - how long taking of proxy should wait? - deployment of "heavy"
> service may take a while
>
> On Tue, Dec 24, 2019 at 6:19 PM Alexey Goncharuk
> <al...@gmail.com> wrote:
> >
> > What should be the user fallback in this case? Retry infinitely? Is
> there a
> > way to wait for the proper deployment?
> >
> > вт, 24 дек. 2019 г. в 12:41, Vyacheslav Daradur <da...@gmail.com>:
> >
> > > I’ll take a look at the end of the week.
> > >
> > > There is one more use-case:
> > > * if you initiate deployment from node A, but getting proxy on node B
> > > (which isn’t deployment initiator) to call service on node A - it may
> fail
> > > with "service not found", this is expected behaviour because we didn't
> > > provide such guarantees.
> > >
> > > API of getting proxy with timeout should be used in this case:
> > > T serviceProxy(String name, Class<? super T> svcItf, boolean sticky,
> long
> > > timeout)
> > >
> > >
> > > вт, 24 дек. 2019 г. в 12:11, Alexey Goncharuk <
> alexey.goncharuk@gmail.com
> > > >:
> > >
> > > > Well, this is exactly the case. The service is deployed from node A,
> the
> > > > proxy is created on node B, and "service not found" exception gets
> thrown
> > > > to a user anyway. Perhaps, the retry happens too fast?
> > > >
> > > > Created a ticket [1].
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-12490
> > > >
> > > > пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur <daradurvs@gmail.com
> >:
> > > >
> > > > > Hi, Alexey
> > > > >
> > > > > Please attach a reproducer to the ticket.
> > > > >
> > > > > As far as I remember we have the following behaviour for the
> proxies:
> > > > >
> > > > > Let's assume you have deployed service from node A, then:
> > > > > * if you invoke service locally from node A - it is guaranteed to
> > > > > service to be deployed and ready to work
> > > > > * if you take a proxy from node A to remote node B right after
> deploy
> > > > > - there is might be a race between disco-spi (a message which
> releases
> > > > > deployed service)  and comm-spi (remote call works via Compute over
> > > > > comm-spi), but it shouldn't affect end-users because the failed
> > > > > request will be retried in this case
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk
> > > > > <al...@gmail.com> wrote:
> > > > > >
> > > > > > Nikolay,
> > > > > >
> > > > > > Yes, I've rechecked, the new service processor is being used.
> I'll
> > > > file a
> > > > > > bug shortly.
> > > > > >
> > > > > > пн, 23 дек. 2019 г. в 17:33, Николай Ижиков <nizhikov@apache.org
> >:
> > > > > >
> > > > > > > Alexey, are you sure, you are testing new service framework?
> > > > > > >
> > > > > > > Is yes - you definitely should file a bug.
> > > > > > >
> > > > > > > > 23 дек. 2019 г., в 17:02, Alexey Goncharuk <
> > > > > alexey.goncharuk@gmail.com>
> > > > > > > написал(а):
> > > > > > > >
> > > > > > > > Igniters,
> > > > > > > >
> > > > > > > > I have a question based on one of my recent tests debugging.
> > > > > > > >
> > > > > > > > The test is related to Ignite services. I noticed that
> sometimes
> > > a
> > > > > proxy
> > > > > > > > invocation of a newly deployed service fails because the
> service
> > > > > cannot
> > > > > > > be
> > > > > > > > found. I managed to reduce the test to a simple "start two
> nodes,
> > > > > deploy
> > > > > > > a
> > > > > > > > service, create a proxy, invoke the proxy" scenario. The
> proxy
> > > > > invocation
> > > > > > > > fails in about ~80% of runs.
> > > > > > > >
> > > > > > > > As far as I remember, the new discovery-based service
> deployment
> > > > was
> > > > > > > > supposed to be synchronous, so not only non-proxy service
> > > instances
> > > > > > > should
> > > > > > > > work, but the proxies as well. Was my understanding correct?
> > > > Should I
> > > > > > > file
> > > > > > > > a bug for the observed behavior?
> > > > > > > >
> > > > > > > > --AG
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards, Vyacheslav D.
> > > > >
> > > >
> > >
>
>
>
> --
> Best Regards, Vyacheslav D.
>

Re: Discovery-based services deployment guarantees question

Posted by Vyacheslav Daradur <da...@gmail.com>.

Not sure that "user fallback" is the right definition, it is not new
behaviour in comparison with legacy implementation.

Our synchronous deployment provides guaranties for a deployment
initiator to be able to start work with service immediately after
deployment finished successfully.
For not the deployment initiator we can't provide such guarantees now,
because of unknown deployment result and possibly fail.

In this case, a reasonable timeout might be an acceptable solution.

We can improve guaranties in future releases, but there is an open question:
- how long taking of proxy should wait? - deployment of "heavy"
service may take a while

On Tue, Dec 24, 2019 at 6:19 PM Alexey Goncharuk
<al...@gmail.com> wrote:
>
> What should be the user fallback in this case? Retry infinitely? Is there a
> way to wait for the proper deployment?
>
> вт, 24 дек. 2019 г. в 12:41, Vyacheslav Daradur <da...@gmail.com>:
>
> > I’ll take a look at the end of the week.
> >
> > There is one more use-case:
> > * if you initiate deployment from node A, but getting proxy on node B
> > (which isn’t deployment initiator) to call service on node A - it may fail
> > with "service not found", this is expected behaviour because we didn't
> > provide such guarantees.
> >
> > API of getting proxy with timeout should be used in this case:
> > T serviceProxy(String name, Class<? super T> svcItf, boolean sticky, long
> > timeout)
> >
> >
> > вт, 24 дек. 2019 г. в 12:11, Alexey Goncharuk <alexey.goncharuk@gmail.com
> > >:
> >
> > > Well, this is exactly the case. The service is deployed from node A, the
> > > proxy is created on node B, and "service not found" exception gets thrown
> > > to a user anyway. Perhaps, the retry happens too fast?
> > >
> > > Created a ticket [1].
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-12490
> > >
> > > пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur <da...@gmail.com>:
> > >
> > > > Hi, Alexey
> > > >
> > > > Please attach a reproducer to the ticket.
> > > >
> > > > As far as I remember we have the following behaviour for the proxies:
> > > >
> > > > Let's assume you have deployed service from node A, then:
> > > > * if you invoke service locally from node A - it is guaranteed to
> > > > service to be deployed and ready to work
> > > > * if you take a proxy from node A to remote node B right after deploy
> > > > - there is might be a race between disco-spi (a message which releases
> > > > deployed service)  and comm-spi (remote call works via Compute over
> > > > comm-spi), but it shouldn't affect end-users because the failed
> > > > request will be retried in this case
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk
> > > > <al...@gmail.com> wrote:
> > > > >
> > > > > Nikolay,
> > > > >
> > > > > Yes, I've rechecked, the new service processor is being used. I'll
> > > file a
> > > > > bug shortly.
> > > > >
> > > > > пн, 23 дек. 2019 г. в 17:33, Николай Ижиков <ni...@apache.org>:
> > > > >
> > > > > > Alexey, are you sure, you are testing new service framework?
> > > > > >
> > > > > > Is yes - you definitely should file a bug.
> > > > > >
> > > > > > > 23 дек. 2019 г., в 17:02, Alexey Goncharuk <
> > > > alexey.goncharuk@gmail.com>
> > > > > > написал(а):
> > > > > > >
> > > > > > > Igniters,
> > > > > > >
> > > > > > > I have a question based on one of my recent tests debugging.
> > > > > > >
> > > > > > > The test is related to Ignite services. I noticed that sometimes
> > a
> > > > proxy
> > > > > > > invocation of a newly deployed service fails because the service
> > > > cannot
> > > > > > be
> > > > > > > found. I managed to reduce the test to a simple "start two nodes,
> > > > deploy
> > > > > > a
> > > > > > > service, create a proxy, invoke the proxy" scenario. The proxy
> > > > invocation
> > > > > > > fails in about ~80% of runs.
> > > > > > >
> > > > > > > As far as I remember, the new discovery-based service deployment
> > > was
> > > > > > > supposed to be synchronous, so not only non-proxy service
> > instances
> > > > > > should
> > > > > > > work, but the proxies as well. Was my understanding correct?
> > > Should I
> > > > > > file
> > > > > > > a bug for the observed behavior?
> > > > > > >
> > > > > > > --AG
> > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav D.
> > > >
> > >
> >



-- 
Best Regards, Vyacheslav D.

Re: Discovery-based services deployment guarantees question

Posted by Alexey Goncharuk <al...@gmail.com>.

What should be the user fallback in this case? Retry infinitely? Is there a
way to wait for the proper deployment?

вт, 24 дек. 2019 г. в 12:41, Vyacheslav Daradur <da...@gmail.com>:

> I’ll take a look at the end of the week.
>
> There is one more use-case:
> * if you initiate deployment from node A, but getting proxy on node B
> (which isn’t deployment initiator) to call service on node A - it may fail
> with "service not found", this is expected behaviour because we didn't
> provide such guarantees.
>
> API of getting proxy with timeout should be used in this case:
> T serviceProxy(String name, Class<? super T> svcItf, boolean sticky, long
> timeout)
>
>
> вт, 24 дек. 2019 г. в 12:11, Alexey Goncharuk <alexey.goncharuk@gmail.com
> >:
>
> > Well, this is exactly the case. The service is deployed from node A, the
> > proxy is created on node B, and "service not found" exception gets thrown
> > to a user anyway. Perhaps, the retry happens too fast?
> >
> > Created a ticket [1].
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-12490
> >
> > пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur <da...@gmail.com>:
> >
> > > Hi, Alexey
> > >
> > > Please attach a reproducer to the ticket.
> > >
> > > As far as I remember we have the following behaviour for the proxies:
> > >
> > > Let's assume you have deployed service from node A, then:
> > > * if you invoke service locally from node A - it is guaranteed to
> > > service to be deployed and ready to work
> > > * if you take a proxy from node A to remote node B right after deploy
> > > - there is might be a race between disco-spi (a message which releases
> > > deployed service)  and comm-spi (remote call works via Compute over
> > > comm-spi), but it shouldn't affect end-users because the failed
> > > request will be retried in this case
> > >
> > >
> > >
> > >
> > > On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk
> > > <al...@gmail.com> wrote:
> > > >
> > > > Nikolay,
> > > >
> > > > Yes, I've rechecked, the new service processor is being used. I'll
> > file a
> > > > bug shortly.
> > > >
> > > > пн, 23 дек. 2019 г. в 17:33, Николай Ижиков <ni...@apache.org>:
> > > >
> > > > > Alexey, are you sure, you are testing new service framework?
> > > > >
> > > > > Is yes - you definitely should file a bug.
> > > > >
> > > > > > 23 дек. 2019 г., в 17:02, Alexey Goncharuk <
> > > alexey.goncharuk@gmail.com>
> > > > > написал(а):
> > > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > I have a question based on one of my recent tests debugging.
> > > > > >
> > > > > > The test is related to Ignite services. I noticed that sometimes
> a
> > > proxy
> > > > > > invocation of a newly deployed service fails because the service
> > > cannot
> > > > > be
> > > > > > found. I managed to reduce the test to a simple "start two nodes,
> > > deploy
> > > > > a
> > > > > > service, create a proxy, invoke the proxy" scenario. The proxy
> > > invocation
> > > > > > fails in about ~80% of runs.
> > > > > >
> > > > > > As far as I remember, the new discovery-based service deployment
> > was
> > > > > > supposed to be synchronous, so not only non-proxy service
> instances
> > > > > should
> > > > > > work, but the proxies as well. Was my understanding correct?
> > Should I
> > > > > file
> > > > > > a bug for the observed behavior?
> > > > > >
> > > > > > --AG
> > > > >
> > > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> > >
> >
>

Re: Discovery-based services deployment guarantees question

Posted by Vyacheslav Daradur <da...@gmail.com>.

I’ll take a look at the end of the week.

There is one more use-case:
* if you initiate deployment from node A, but getting proxy on node B
(which isn’t deployment initiator) to call service on node A - it may fail
with "service not found", this is expected behaviour because we didn't
provide such guarantees.

API of getting proxy with timeout should be used in this case:
T serviceProxy(String name, Class<? super T> svcItf, boolean sticky, long
timeout)


вт, 24 дек. 2019 г. в 12:11, Alexey Goncharuk <al...@gmail.com>:

> Well, this is exactly the case. The service is deployed from node A, the
> proxy is created on node B, and "service not found" exception gets thrown
> to a user anyway. Perhaps, the retry happens too fast?
>
> Created a ticket [1].
>
> [1] https://issues.apache.org/jira/browse/IGNITE-12490
>
> пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur <da...@gmail.com>:
>
> > Hi, Alexey
> >
> > Please attach a reproducer to the ticket.
> >
> > As far as I remember we have the following behaviour for the proxies:
> >
> > Let's assume you have deployed service from node A, then:
> > * if you invoke service locally from node A - it is guaranteed to
> > service to be deployed and ready to work
> > * if you take a proxy from node A to remote node B right after deploy
> > - there is might be a race between disco-spi (a message which releases
> > deployed service)  and comm-spi (remote call works via Compute over
> > comm-spi), but it shouldn't affect end-users because the failed
> > request will be retried in this case
> >
> >
> >
> >
> > On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk
> > <al...@gmail.com> wrote:
> > >
> > > Nikolay,
> > >
> > > Yes, I've rechecked, the new service processor is being used. I'll
> file a
> > > bug shortly.
> > >
> > > пн, 23 дек. 2019 г. в 17:33, Николай Ижиков <ni...@apache.org>:
> > >
> > > > Alexey, are you sure, you are testing new service framework?
> > > >
> > > > Is yes - you definitely should file a bug.
> > > >
> > > > > 23 дек. 2019 г., в 17:02, Alexey Goncharuk <
> > alexey.goncharuk@gmail.com>
> > > > написал(а):
> > > > >
> > > > > Igniters,
> > > > >
> > > > > I have a question based on one of my recent tests debugging.
> > > > >
> > > > > The test is related to Ignite services. I noticed that sometimes a
> > proxy
> > > > > invocation of a newly deployed service fails because the service
> > cannot
> > > > be
> > > > > found. I managed to reduce the test to a simple "start two nodes,
> > deploy
> > > > a
> > > > > service, create a proxy, invoke the proxy" scenario. The proxy
> > invocation
> > > > > fails in about ~80% of runs.
> > > > >
> > > > > As far as I remember, the new discovery-based service deployment
> was
> > > > > supposed to be synchronous, so not only non-proxy service instances
> > > > should
> > > > > work, but the proxies as well. Was my understanding correct?
> Should I
> > > > file
> > > > > a bug for the observed behavior?
> > > > >
> > > > > --AG
> > > >
> > > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav D.
> >
>

Re: Discovery-based services deployment guarantees question

Posted by Alexey Goncharuk <al...@gmail.com>.

Well, this is exactly the case. The service is deployed from node A, the
proxy is created on node B, and "service not found" exception gets thrown
to a user anyway. Perhaps, the retry happens too fast?

Created a ticket [1].

[1] https://issues.apache.org/jira/browse/IGNITE-12490

пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur <da...@gmail.com>:

> Hi, Alexey
>
> Please attach a reproducer to the ticket.
>
> As far as I remember we have the following behaviour for the proxies:
>
> Let's assume you have deployed service from node A, then:
> * if you invoke service locally from node A - it is guaranteed to
> service to be deployed and ready to work
> * if you take a proxy from node A to remote node B right after deploy
> - there is might be a race between disco-spi (a message which releases
> deployed service)  and comm-spi (remote call works via Compute over
> comm-spi), but it shouldn't affect end-users because the failed
> request will be retried in this case
>
>
>
>
> On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk
> <al...@gmail.com> wrote:
> >
> > Nikolay,
> >
> > Yes, I've rechecked, the new service processor is being used. I'll file a
> > bug shortly.
> >
> > пн, 23 дек. 2019 г. в 17:33, Николай Ижиков <ni...@apache.org>:
> >
> > > Alexey, are you sure, you are testing new service framework?
> > >
> > > Is yes - you definitely should file a bug.
> > >
> > > > 23 дек. 2019 г., в 17:02, Alexey Goncharuk <
> alexey.goncharuk@gmail.com>
> > > написал(а):
> > > >
> > > > Igniters,
> > > >
> > > > I have a question based on one of my recent tests debugging.
> > > >
> > > > The test is related to Ignite services. I noticed that sometimes a
> proxy
> > > > invocation of a newly deployed service fails because the service
> cannot
> > > be
> > > > found. I managed to reduce the test to a simple "start two nodes,
> deploy
> > > a
> > > > service, create a proxy, invoke the proxy" scenario. The proxy
> invocation
> > > > fails in about ~80% of runs.
> > > >
> > > > As far as I remember, the new discovery-based service deployment was
> > > > supposed to be synchronous, so not only non-proxy service instances
> > > should
> > > > work, but the proxies as well. Was my understanding correct? Should I
> > > file
> > > > a bug for the observed behavior?
> > > >
> > > > --AG
> > >
> > >
>
>
>
> --
> Best Regards, Vyacheslav D.
>

Re: Discovery-based services deployment guarantees question

Posted by Vyacheslav Daradur <da...@gmail.com>.

Hi, Alexey

Please attach a reproducer to the ticket.

As far as I remember we have the following behaviour for the proxies:

Let's assume you have deployed service from node A, then:
* if you invoke service locally from node A - it is guaranteed to
service to be deployed and ready to work
* if you take a proxy from node A to remote node B right after deploy
- there is might be a race between disco-spi (a message which releases
deployed service)  and comm-spi (remote call works via Compute over
comm-spi), but it shouldn't affect end-users because the failed
request will be retried in this case




On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk
<al...@gmail.com> wrote:
>
> Nikolay,
>
> Yes, I've rechecked, the new service processor is being used. I'll file a
> bug shortly.
>
> пн, 23 дек. 2019 г. в 17:33, Николай Ижиков <ni...@apache.org>:
>
> > Alexey, are you sure, you are testing new service framework?
> >
> > Is yes - you definitely should file a bug.
> >
> > > 23 дек. 2019 г., в 17:02, Alexey Goncharuk <al...@gmail.com>
> > написал(а):
> > >
> > > Igniters,
> > >
> > > I have a question based on one of my recent tests debugging.
> > >
> > > The test is related to Ignite services. I noticed that sometimes a proxy
> > > invocation of a newly deployed service fails because the service cannot
> > be
> > > found. I managed to reduce the test to a simple "start two nodes, deploy
> > a
> > > service, create a proxy, invoke the proxy" scenario. The proxy invocation
> > > fails in about ~80% of runs.
> > >
> > > As far as I remember, the new discovery-based service deployment was
> > > supposed to be synchronous, so not only non-proxy service instances
> > should
> > > work, but the proxies as well. Was my understanding correct? Should I
> > file
> > > a bug for the observed behavior?
> > >
> > > --AG
> >
> >



-- 
Best Regards, Vyacheslav D.

Re: Discovery-based services deployment guarantees question

Posted by Alexey Goncharuk <al...@gmail.com>.

Nikolay,

Yes, I've rechecked, the new service processor is being used. I'll file a
bug shortly.

пн, 23 дек. 2019 г. в 17:33, Николай Ижиков <ni...@apache.org>:

> Alexey, are you sure, you are testing new service framework?
>
> Is yes - you definitely should file a bug.
>
> > 23 дек. 2019 г., в 17:02, Alexey Goncharuk <al...@gmail.com>
> написал(а):
> >
> > Igniters,
> >
> > I have a question based on one of my recent tests debugging.
> >
> > The test is related to Ignite services. I noticed that sometimes a proxy
> > invocation of a newly deployed service fails because the service cannot
> be
> > found. I managed to reduce the test to a simple "start two nodes, deploy
> a
> > service, create a proxy, invoke the proxy" scenario. The proxy invocation
> > fails in about ~80% of runs.
> >
> > As far as I remember, the new discovery-based service deployment was
> > supposed to be synchronous, so not only non-proxy service instances
> should
> > work, but the proxies as well. Was my understanding correct? Should I
> file
> > a bug for the observed behavior?
> >
> > --AG
>
>

Re: Discovery-based services deployment guarantees question

Posted by Николай Ижиков <ni...@apache.org>.

Alexey, are you sure, you are testing new service framework?

Is yes - you definitely should file a bug. 

> 23 дек. 2019 г., в 17:02, Alexey Goncharuk <al...@gmail.com> написал(а):
> 
> Igniters,
> 
> I have a question based on one of my recent tests debugging.
> 
> The test is related to Ignite services. I noticed that sometimes a proxy
> invocation of a newly deployed service fails because the service cannot be
> found. I managed to reduce the test to a simple "start two nodes, deploy a
> service, create a proxy, invoke the proxy" scenario. The proxy invocation
> fails in about ~80% of runs.
> 
> As far as I remember, the new discovery-based service deployment was
> supposed to be synchronous, so not only non-proxy service instances should
> work, but the proxies as well. Was my understanding correct? Should I file
> a bug for the observed behavior?
> 
> --AG