You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by Josh Elser <jo...@gmail.com> on 2020/02/18 18:20:53 UTC

[DISCUSS] On HBase client retries (NIFI-6197)

Hiya!

LarsF brought this up in the apache-hbase slack account and it caught my 
eye. Sending a note here since the PR is closed where this was being 
discussed before[1].

I understand Bryan's concerns that misconfiguration of an HBase 
processor with a high number of retries and back-off can create a 
situation in which the processing of a single FlowFile will take a very 
long time to hit the onFailure state.

However, as an HBase developer, I can confidently state that 
hbase.client.retries=1 will create scenarios in which you'll be pushing 
a FlowFile through a retry loop inside of NiFi for things which should 
be implicitly retried inside of the HBase client.

For example, if a Region is being moved between two RegionServers and an 
HBase processor is trying to read/write to that Region, the client will 
see an exception. This is a "retriable" exception in HBase-parlance 
which means that HBase client code would automatically re-process that 
request (looking for the new location of that Region first). In most 
cases, the subsequent RPC would succeed and the caller is non-the-wiser 
and the whole retry logic took 1's of milliseconds.

My first idea was also what Lars' had suggested -- can we come up with a 
sanity check to validate "correct" configuration for the processor 
before we throw the waterfall of data at it? I can respect if processors 
don't have a "good" hook to do such a check.

What _would_ be the ideal semantics from NiFi's? perspective? We have 
the ability to implicitly retry operations and also control the retry 
backoff values. Is there something more we could do from the HBase side, 
given what y'all have seen from the battlefield?

Thanks!

- Josh

[1] https://github.com/apache/nifi/pull/3425

Re: [DISCUSS] On HBase client retries (NIFI-6197)

Posted by Lars Francke <la...@gmail.com>.

I'm not sure either.
I agree that 35 is too high, the 2.x++ default of 15 seems reasonable to me.

We might be able to include a connection check API in the future but that
won't help us now so we need to find some way forward.

I believe we have different "services" in NiFi for HBase 1 & 2.
How about we "revert" the NiFi HBase 2 service to use the HBase default and
override the default for 1 to default to 15 as well?

On Tue, Feb 25, 2020 at 9:44 PM Josh Elser <el...@apache.org> wrote:

> So, do we have a real issue here? Or are we just overlapping two
> discussions about the same code-change with different context?
>
> My impression from Bryan is, now that he has the bigger context of why
> implicit retries are important for HBase, that he is not against setting
> this value higher (although, he also originally said this [1]), but just
> not the 1.x default of 35 retries (which *I* will say is too high). Am I
> reading too much into what you've said, Bryan?
>
> I understand that folks are busy and this is probably not the highest
> priority. I'm just trying to figure out if we're actually still in
> disagreement. My gut says that we're not.
>
> Re-stating goals: I think everyone is trying to make sure that the user
> of NiFi with HBase in the mix has a great user experience OOTB. That
> requires finding a balance between "fail fast when something is
> outwardly messed up" and "don't fail up when HBase can implicitly ride
> over a transient/automatically-recoverable state".
>
> [1] https://github.com/apache/nifi/pull/3425#issuecomment-487962996
>
> On 2/19/20 11:14 AM, Lars Francke wrote:
> > That's what I suggested[1] and Bryan rejected[2]
> >
> > [1] <https://github.com/apache/nifi/pull/3425#issuecomment-482711295>
> > [2] <https://github.com/apache/nifi/pull/3425#issuecomment-491984116>
> >
> >
> > On Wed, Feb 19, 2020 at 4:29 PM Josh Elser <el...@apache.org> wrote:
> >
> >> Thanks for sharing the code, Bryan! I lazily did not go digging.
> >>
> >> Looking into #onEnabled, we could change this to create its own
> >> Connection with a very low 1 or 2 retries with a very short retry
> >> duration. Throw that one away after we did our sanity check, set a high
> >> retry amount, and then create a new Connection for all of the
> >> puts/gets/scans/whatever the service will do.
> >>
> >> I think NiFi, in general, can err on the side of "lower" client retries
> >> (both in number and duration of backoff) than a normal client since it
> >> can implicitly buffer+retry flowfiles that fail.
> >>
> >> WDYT, Lars?
> >>
> >> On 2/19/20 9:24 AM, Bryan Bende wrote:
> >>> We do already expose the ability to configure the retries in the
> >>> controller service [1], it was just a debate about what the default
> >>> value should be. Currently it is set to 1 because we believed it was
> >>> better to fail fast during the initial configuration of the service. A
> >>> user can easily set it back to 15, or whatever HBase client normally
> >>> does. I even suggested a compromise of making the default value 7,
> >>> which was half way between the two extremes, but that was considered
> >>> unacceptable, even though based on what you said in your original
> >>> message, most cases work on the first retry, so 7 would have covered
> >>> those.
> >>>
> >>> We can also expose a property for the RPC timeout, but I think the
> >>> retries is more the issue here.
> >>>
> >>> We can also definitely improve the docs, but I would still lean
> >>> towards the default retires being something lower, and then the docs
> >>> can explain that after ensuring the service can be successfully
> >>> started that this value can be increased to 15.
> >>>
> >>> [1]
> >>
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-hbase_2-client-service-bundle/nifi-hbase_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_2_ClientService.java#L134-L140
> >>>
> >>> On Tue, Feb 18, 2020 at 8:31 PM Josh Elser <el...@apache.org> wrote:
> >>>>
> >>>> We could certainly implement some kind of "sanity check" via HBase
> code,
> >>>> but I think the thing that is missing is the logical way to validate
> >>>> this in NiFi itself.
> >>>>
> >>>> Something like...
> >>>>
> >>>> ```
> >>>> Configuration conf = HBaseConfiguration.create();
> >>>> conf.setInt("hbase.rpc.timeout", 5000);
> >>>> try (Connection conn = ConnectionFactory.create(conf)) {
> >>>>      // do sanity check
> >>>> }
> >>>> Configuration conf2 = new Configuration(conf);
> >>>> conf2.setInt("hbase.rpc.timeout", 25000);
> >>>> try (Connection conn = ConnectionFactory.create(conf2)) {
> >>>>      // do real hbase-y stuff
> >>>> }
> >>>> ```
> >>>>
> >>>> Maybe instead of requiring an implicit way to do this (requiring NiFi
> >>>> code changes), we could solve the problem at the "human level": create
> >>>> docs that walk people through how to push a dummy record through the
> >>>> service/processor with the low configuration of timeouts and retries?
> >>>> Make the "sanity check" a human operation and just expose the ability
> to
> >>>> set timeout/retries via the controller service?
> >>>>
> >>>> On 2/18/20 4:36 PM, Lars Francke wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Josh, thanks for bringing it up here again.
> >>>>> I'm happy to revive the PR with whatever the outcome of this thread
> is.
> >>>>> It came up today because another client complained about how
> "unstable"
> >>>>> HBase is on NiFi.
> >>>>>
> >>>>> @Josh: As the whole issue is only the initial connect can we have a
> >>>>> different timeout setting there? I have to admit I don't know.
> >>>>>
> >>>>> Cheers,
> >>>>> Lars
> >>>>>
> >>>>> On Tue, Feb 18, 2020 at 8:11 PM Pierre Villard <
> >> pierre.villard.fr@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Good point, I don't think we can do that on a controller service.
> >>>>>>
> >>>>>> Le mar. 18 févr. 2020 à 11:06, Bryan Bende <bb...@gmail.com> a
> >> écrit :
> >>>>>>
> >>>>>>> That could make it a little better, but I can't remember, can we
> >>>>>>> terminate on a controller service?
> >>>>>>>
> >>>>>>> The issue here would be on first time enabling the the HBase client
> >>>>>>> service, so before even getting to a processor.
> >>>>>>>
> >>>>>>> On Tue, Feb 18, 2020 at 2:00 PM Pierre Villard
> >>>>>>> <pi...@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> Bryan,
> >>>>>>>>
> >>>>>>>> I didn't follow the whole discussion so I apologize if I'm saying
> >>>>>>> something
> >>>>>>>> stupid here. Now that we have the possibility to terminate threads
> >> in a
> >>>>>>>> processor, would that solve the issue?
> >>>>>>>>
> >>>>>>>> Pierre
> >>>>>>>>
> >>>>>>>> Le mar. 18 févr. 2020 à 10:52, Bryan Bende <bb...@gmail.com> a
> >> écrit
> >>>>>> :
> >>>>>>>>
> >>>>>>>>> Hi Josh,
> >>>>>>>>>
> >>>>>>>>> The problem isn't so much about the retries within the flow, its
> >> more
> >>>>>>>>> about setting up the service for the first time.
> >>>>>>>>>
> >>>>>>>>> A common scenario for users was the following:
> >>>>>>>>>
> >>>>>>>>> - Create a new HBase client service
> >>>>>>>>> - Enter some config that wasn't quite correct, possibly hostnames
> >>>>>> that
> >>>>>>>>> weren't reachable from nifi as one example
> >>>>>>>>> - Enable service and enter retry loop
> >>>>>>>>> - Attempt to disable service to fix config, but have to wait 5+
> >> mins
> >>>>>>>>> for the retries to finish
> >>>>>>>>>
> >>>>>>>>> Maybe a lazy initialization of the connection on our side would
> >> help
> >>>>>>>>> here, although it would just be moving the problem until later
> >> (i.e.
> >>>>>>>>> service immediately enables because nothing is happening, then
> they
> >>>>>>>>> find out about config problems later when a flow file hits an
> hbase
> >>>>>>>>> processor).
> >>>>>>>>>
> >>>>>>>>> I guess the ideal scenario would be to have different logic for
> >>>>>>>>> initializing the connection vs. using it, so that there wouldn't
> be
> >>>>>>>>> retries during initialization.
> >>>>>>>>>
> >>>>>>>>> -Bryan
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <josh.elser@gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hiya!
> >>>>>>>>>>
> >>>>>>>>>> LarsF brought this up in the apache-hbase slack account and it
> >>>>>>> caught my
> >>>>>>>>>> eye. Sending a note here since the PR is closed where this was
> >>>>>> being
> >>>>>>>>>> discussed before[1].
> >>>>>>>>>>
> >>>>>>>>>> I understand Bryan's concerns that misconfiguration of an HBase
> >>>>>>>>>> processor with a high number of retries and back-off can create
> a
> >>>>>>>>>> situation in which the processing of a single FlowFile will
> take a
> >>>>>>> very
> >>>>>>>>>> long time to hit the onFailure state.
> >>>>>>>>>>
> >>>>>>>>>> However, as an HBase developer, I can confidently state that
> >>>>>>>>>> hbase.client.retries=1 will create scenarios in which you'll be
> >>>>>>> pushing
> >>>>>>>>>> a FlowFile through a retry loop inside of NiFi for things which
> >>>>>>> should
> >>>>>>>>>> be implicitly retried inside of the HBase client.
> >>>>>>>>>>
> >>>>>>>>>> For example, if a Region is being moved between two
> RegionServers
> >>>>>>> and an
> >>>>>>>>>> HBase processor is trying to read/write to that Region, the
> client
> >>>>>>> will
> >>>>>>>>>> see an exception. This is a "retriable" exception in
> >> HBase-parlance
> >>>>>>>>>> which means that HBase client code would automatically
> re-process
> >>>>>>> that
> >>>>>>>>>> request (looking for the new location of that Region first). In
> >>>>>> most
> >>>>>>>>>> cases, the subsequent RPC would succeed and the caller is
> >>>>>>> non-the-wiser
> >>>>>>>>>> and the whole retry logic took 1's of milliseconds.
> >>>>>>>>>>
> >>>>>>>>>> My first idea was also what Lars' had suggested -- can we come
> up
> >>>>>>> with a
> >>>>>>>>>> sanity check to validate "correct" configuration for the
> processor
> >>>>>>>>>> before we throw the waterfall of data at it? I can respect if
> >>>>>>> processors
> >>>>>>>>>> don't have a "good" hook to do such a check.
> >>>>>>>>>>
> >>>>>>>>>> What _would_ be the ideal semantics from NiFi's? perspective? We
> >>>>>> have
> >>>>>>>>>> the ability to implicitly retry operations and also control the
> >>>>>> retry
> >>>>>>>>>> backoff values. Is there something more we could do from the
> HBase
> >>>>>>> side,
> >>>>>>>>>> given what y'all have seen from the battlefield?
> >>>>>>>>>>
> >>>>>>>>>> Thanks!
> >>>>>>>>>>
> >>>>>>>>>> - Josh
> >>>>>>>>>>
> >>>>>>>>>> [1] https://github.com/apache/nifi/pull/3425
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>
> >
>

Re: [DISCUSS] On HBase client retries (NIFI-6197)

Posted by Josh Elser <el...@apache.org>.

So, do we have a real issue here? Or are we just overlapping two 
discussions about the same code-change with different context?

My impression from Bryan is, now that he has the bigger context of why 
implicit retries are important for HBase, that he is not against setting 
this value higher (although, he also originally said this [1]), but just 
not the 1.x default of 35 retries (which *I* will say is too high). Am I 
reading too much into what you've said, Bryan?

I understand that folks are busy and this is probably not the highest 
priority. I'm just trying to figure out if we're actually still in 
disagreement. My gut says that we're not.

Re-stating goals: I think everyone is trying to make sure that the user 
of NiFi with HBase in the mix has a great user experience OOTB. That 
requires finding a balance between "fail fast when something is 
outwardly messed up" and "don't fail up when HBase can implicitly ride 
over a transient/automatically-recoverable state".

[1] https://github.com/apache/nifi/pull/3425#issuecomment-487962996

On 2/19/20 11:14 AM, Lars Francke wrote:
> That's what I suggested[1] and Bryan rejected[2]
> 
> [1] <https://github.com/apache/nifi/pull/3425#issuecomment-482711295>
> [2] <https://github.com/apache/nifi/pull/3425#issuecomment-491984116>
> 
> 
> On Wed, Feb 19, 2020 at 4:29 PM Josh Elser <el...@apache.org> wrote:
> 
>> Thanks for sharing the code, Bryan! I lazily did not go digging.
>>
>> Looking into #onEnabled, we could change this to create its own
>> Connection with a very low 1 or 2 retries with a very short retry
>> duration. Throw that one away after we did our sanity check, set a high
>> retry amount, and then create a new Connection for all of the
>> puts/gets/scans/whatever the service will do.
>>
>> I think NiFi, in general, can err on the side of "lower" client retries
>> (both in number and duration of backoff) than a normal client since it
>> can implicitly buffer+retry flowfiles that fail.
>>
>> WDYT, Lars?
>>
>> On 2/19/20 9:24 AM, Bryan Bende wrote:
>>> We do already expose the ability to configure the retries in the
>>> controller service [1], it was just a debate about what the default
>>> value should be. Currently it is set to 1 because we believed it was
>>> better to fail fast during the initial configuration of the service. A
>>> user can easily set it back to 15, or whatever HBase client normally
>>> does. I even suggested a compromise of making the default value 7,
>>> which was half way between the two extremes, but that was considered
>>> unacceptable, even though based on what you said in your original
>>> message, most cases work on the first retry, so 7 would have covered
>>> those.
>>>
>>> We can also expose a property for the RPC timeout, but I think the
>>> retries is more the issue here.
>>>
>>> We can also definitely improve the docs, but I would still lean
>>> towards the default retires being something lower, and then the docs
>>> can explain that after ensuring the service can be successfully
>>> started that this value can be increased to 15.
>>>
>>> [1]
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-hbase_2-client-service-bundle/nifi-hbase_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_2_ClientService.java#L134-L140
>>>
>>> On Tue, Feb 18, 2020 at 8:31 PM Josh Elser <el...@apache.org> wrote:
>>>>
>>>> We could certainly implement some kind of "sanity check" via HBase code,
>>>> but I think the thing that is missing is the logical way to validate
>>>> this in NiFi itself.
>>>>
>>>> Something like...
>>>>
>>>> ```
>>>> Configuration conf = HBaseConfiguration.create();
>>>> conf.setInt("hbase.rpc.timeout", 5000);
>>>> try (Connection conn = ConnectionFactory.create(conf)) {
>>>>      // do sanity check
>>>> }
>>>> Configuration conf2 = new Configuration(conf);
>>>> conf2.setInt("hbase.rpc.timeout", 25000);
>>>> try (Connection conn = ConnectionFactory.create(conf2)) {
>>>>      // do real hbase-y stuff
>>>> }
>>>> ```
>>>>
>>>> Maybe instead of requiring an implicit way to do this (requiring NiFi
>>>> code changes), we could solve the problem at the "human level": create
>>>> docs that walk people through how to push a dummy record through the
>>>> service/processor with the low configuration of timeouts and retries?
>>>> Make the "sanity check" a human operation and just expose the ability to
>>>> set timeout/retries via the controller service?
>>>>
>>>> On 2/18/20 4:36 PM, Lars Francke wrote:
>>>>> Hi,
>>>>>
>>>>> Josh, thanks for bringing it up here again.
>>>>> I'm happy to revive the PR with whatever the outcome of this thread is.
>>>>> It came up today because another client complained about how "unstable"
>>>>> HBase is on NiFi.
>>>>>
>>>>> @Josh: As the whole issue is only the initial connect can we have a
>>>>> different timeout setting there? I have to admit I don't know.
>>>>>
>>>>> Cheers,
>>>>> Lars
>>>>>
>>>>> On Tue, Feb 18, 2020 at 8:11 PM Pierre Villard <
>> pierre.villard.fr@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Good point, I don't think we can do that on a controller service.
>>>>>>
>>>>>> Le mar. 18 févr. 2020 à 11:06, Bryan Bende <bb...@gmail.com> a
>> écrit :
>>>>>>
>>>>>>> That could make it a little better, but I can't remember, can we
>>>>>>> terminate on a controller service?
>>>>>>>
>>>>>>> The issue here would be on first time enabling the the HBase client
>>>>>>> service, so before even getting to a processor.
>>>>>>>
>>>>>>> On Tue, Feb 18, 2020 at 2:00 PM Pierre Villard
>>>>>>> <pi...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Bryan,
>>>>>>>>
>>>>>>>> I didn't follow the whole discussion so I apologize if I'm saying
>>>>>>> something
>>>>>>>> stupid here. Now that we have the possibility to terminate threads
>> in a
>>>>>>>> processor, would that solve the issue?
>>>>>>>>
>>>>>>>> Pierre
>>>>>>>>
>>>>>>>> Le mar. 18 févr. 2020 à 10:52, Bryan Bende <bb...@gmail.com> a
>> écrit
>>>>>> :
>>>>>>>>
>>>>>>>>> Hi Josh,
>>>>>>>>>
>>>>>>>>> The problem isn't so much about the retries within the flow, its
>> more
>>>>>>>>> about setting up the service for the first time.
>>>>>>>>>
>>>>>>>>> A common scenario for users was the following:
>>>>>>>>>
>>>>>>>>> - Create a new HBase client service
>>>>>>>>> - Enter some config that wasn't quite correct, possibly hostnames
>>>>>> that
>>>>>>>>> weren't reachable from nifi as one example
>>>>>>>>> - Enable service and enter retry loop
>>>>>>>>> - Attempt to disable service to fix config, but have to wait 5+
>> mins
>>>>>>>>> for the retries to finish
>>>>>>>>>
>>>>>>>>> Maybe a lazy initialization of the connection on our side would
>> help
>>>>>>>>> here, although it would just be moving the problem until later
>> (i.e.
>>>>>>>>> service immediately enables because nothing is happening, then they
>>>>>>>>> find out about config problems later when a flow file hits an hbase
>>>>>>>>> processor).
>>>>>>>>>
>>>>>>>>> I guess the ideal scenario would be to have different logic for
>>>>>>>>> initializing the connection vs. using it, so that there wouldn't be
>>>>>>>>> retries during initialization.
>>>>>>>>>
>>>>>>>>> -Bryan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <jo...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hiya!
>>>>>>>>>>
>>>>>>>>>> LarsF brought this up in the apache-hbase slack account and it
>>>>>>> caught my
>>>>>>>>>> eye. Sending a note here since the PR is closed where this was
>>>>>> being
>>>>>>>>>> discussed before[1].
>>>>>>>>>>
>>>>>>>>>> I understand Bryan's concerns that misconfiguration of an HBase
>>>>>>>>>> processor with a high number of retries and back-off can create a
>>>>>>>>>> situation in which the processing of a single FlowFile will take a
>>>>>>> very
>>>>>>>>>> long time to hit the onFailure state.
>>>>>>>>>>
>>>>>>>>>> However, as an HBase developer, I can confidently state that
>>>>>>>>>> hbase.client.retries=1 will create scenarios in which you'll be
>>>>>>> pushing
>>>>>>>>>> a FlowFile through a retry loop inside of NiFi for things which
>>>>>>> should
>>>>>>>>>> be implicitly retried inside of the HBase client.
>>>>>>>>>>
>>>>>>>>>> For example, if a Region is being moved between two RegionServers
>>>>>>> and an
>>>>>>>>>> HBase processor is trying to read/write to that Region, the client
>>>>>>> will
>>>>>>>>>> see an exception. This is a "retriable" exception in
>> HBase-parlance
>>>>>>>>>> which means that HBase client code would automatically re-process
>>>>>>> that
>>>>>>>>>> request (looking for the new location of that Region first). In
>>>>>> most
>>>>>>>>>> cases, the subsequent RPC would succeed and the caller is
>>>>>>> non-the-wiser
>>>>>>>>>> and the whole retry logic took 1's of milliseconds.
>>>>>>>>>>
>>>>>>>>>> My first idea was also what Lars' had suggested -- can we come up
>>>>>>> with a
>>>>>>>>>> sanity check to validate "correct" configuration for the processor
>>>>>>>>>> before we throw the waterfall of data at it? I can respect if
>>>>>>> processors
>>>>>>>>>> don't have a "good" hook to do such a check.
>>>>>>>>>>
>>>>>>>>>> What _would_ be the ideal semantics from NiFi's? perspective? We
>>>>>> have
>>>>>>>>>> the ability to implicitly retry operations and also control the
>>>>>> retry
>>>>>>>>>> backoff values. Is there something more we could do from the HBase
>>>>>>> side,
>>>>>>>>>> given what y'all have seen from the battlefield?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> - Josh
>>>>>>>>>>
>>>>>>>>>> [1] https://github.com/apache/nifi/pull/3425
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>
>

Re: [DISCUSS] On HBase client retries (NIFI-6197)

Posted by Lars Francke <la...@gmail.com>.

That's what I suggested[1] and Bryan rejected[2]

[1] <https://github.com/apache/nifi/pull/3425#issuecomment-482711295>
[2] <https://github.com/apache/nifi/pull/3425#issuecomment-491984116>


On Wed, Feb 19, 2020 at 4:29 PM Josh Elser <el...@apache.org> wrote:

> Thanks for sharing the code, Bryan! I lazily did not go digging.
>
> Looking into #onEnabled, we could change this to create its own
> Connection with a very low 1 or 2 retries with a very short retry
> duration. Throw that one away after we did our sanity check, set a high
> retry amount, and then create a new Connection for all of the
> puts/gets/scans/whatever the service will do.
>
> I think NiFi, in general, can err on the side of "lower" client retries
> (both in number and duration of backoff) than a normal client since it
> can implicitly buffer+retry flowfiles that fail.
>
> WDYT, Lars?
>
> On 2/19/20 9:24 AM, Bryan Bende wrote:
> > We do already expose the ability to configure the retries in the
> > controller service [1], it was just a debate about what the default
> > value should be. Currently it is set to 1 because we believed it was
> > better to fail fast during the initial configuration of the service. A
> > user can easily set it back to 15, or whatever HBase client normally
> > does. I even suggested a compromise of making the default value 7,
> > which was half way between the two extremes, but that was considered
> > unacceptable, even though based on what you said in your original
> > message, most cases work on the first retry, so 7 would have covered
> > those.
> >
> > We can also expose a property for the RPC timeout, but I think the
> > retries is more the issue here.
> >
> > We can also definitely improve the docs, but I would still lean
> > towards the default retires being something lower, and then the docs
> > can explain that after ensuring the service can be successfully
> > started that this value can be increased to 15.
> >
> > [1]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-hbase_2-client-service-bundle/nifi-hbase_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_2_ClientService.java#L134-L140
> >
> > On Tue, Feb 18, 2020 at 8:31 PM Josh Elser <el...@apache.org> wrote:
> >>
> >> We could certainly implement some kind of "sanity check" via HBase code,
> >> but I think the thing that is missing is the logical way to validate
> >> this in NiFi itself.
> >>
> >> Something like...
> >>
> >> ```
> >> Configuration conf = HBaseConfiguration.create();
> >> conf.setInt("hbase.rpc.timeout", 5000);
> >> try (Connection conn = ConnectionFactory.create(conf)) {
> >>     // do sanity check
> >> }
> >> Configuration conf2 = new Configuration(conf);
> >> conf2.setInt("hbase.rpc.timeout", 25000);
> >> try (Connection conn = ConnectionFactory.create(conf2)) {
> >>     // do real hbase-y stuff
> >> }
> >> ```
> >>
> >> Maybe instead of requiring an implicit way to do this (requiring NiFi
> >> code changes), we could solve the problem at the "human level": create
> >> docs that walk people through how to push a dummy record through the
> >> service/processor with the low configuration of timeouts and retries?
> >> Make the "sanity check" a human operation and just expose the ability to
> >> set timeout/retries via the controller service?
> >>
> >> On 2/18/20 4:36 PM, Lars Francke wrote:
> >>> Hi,
> >>>
> >>> Josh, thanks for bringing it up here again.
> >>> I'm happy to revive the PR with whatever the outcome of this thread is.
> >>> It came up today because another client complained about how "unstable"
> >>> HBase is on NiFi.
> >>>
> >>> @Josh: As the whole issue is only the initial connect can we have a
> >>> different timeout setting there? I have to admit I don't know.
> >>>
> >>> Cheers,
> >>> Lars
> >>>
> >>> On Tue, Feb 18, 2020 at 8:11 PM Pierre Villard <
> pierre.villard.fr@gmail.com>
> >>> wrote:
> >>>
> >>>> Good point, I don't think we can do that on a controller service.
> >>>>
> >>>> Le mar. 18 févr. 2020 à 11:06, Bryan Bende <bb...@gmail.com> a
> écrit :
> >>>>
> >>>>> That could make it a little better, but I can't remember, can we
> >>>>> terminate on a controller service?
> >>>>>
> >>>>> The issue here would be on first time enabling the the HBase client
> >>>>> service, so before even getting to a processor.
> >>>>>
> >>>>> On Tue, Feb 18, 2020 at 2:00 PM Pierre Villard
> >>>>> <pi...@gmail.com> wrote:
> >>>>>>
> >>>>>> Bryan,
> >>>>>>
> >>>>>> I didn't follow the whole discussion so I apologize if I'm saying
> >>>>> something
> >>>>>> stupid here. Now that we have the possibility to terminate threads
> in a
> >>>>>> processor, would that solve the issue?
> >>>>>>
> >>>>>> Pierre
> >>>>>>
> >>>>>> Le mar. 18 févr. 2020 à 10:52, Bryan Bende <bb...@gmail.com> a
> écrit
> >>>> :
> >>>>>>
> >>>>>>> Hi Josh,
> >>>>>>>
> >>>>>>> The problem isn't so much about the retries within the flow, its
> more
> >>>>>>> about setting up the service for the first time.
> >>>>>>>
> >>>>>>> A common scenario for users was the following:
> >>>>>>>
> >>>>>>> - Create a new HBase client service
> >>>>>>> - Enter some config that wasn't quite correct, possibly hostnames
> >>>> that
> >>>>>>> weren't reachable from nifi as one example
> >>>>>>> - Enable service and enter retry loop
> >>>>>>> - Attempt to disable service to fix config, but have to wait 5+
> mins
> >>>>>>> for the retries to finish
> >>>>>>>
> >>>>>>> Maybe a lazy initialization of the connection on our side would
> help
> >>>>>>> here, although it would just be moving the problem until later
> (i.e.
> >>>>>>> service immediately enables because nothing is happening, then they
> >>>>>>> find out about config problems later when a flow file hits an hbase
> >>>>>>> processor).
> >>>>>>>
> >>>>>>> I guess the ideal scenario would be to have different logic for
> >>>>>>> initializing the connection vs. using it, so that there wouldn't be
> >>>>>>> retries during initialization.
> >>>>>>>
> >>>>>>> -Bryan
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <jo...@gmail.com>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hiya!
> >>>>>>>>
> >>>>>>>> LarsF brought this up in the apache-hbase slack account and it
> >>>>> caught my
> >>>>>>>> eye. Sending a note here since the PR is closed where this was
> >>>> being
> >>>>>>>> discussed before[1].
> >>>>>>>>
> >>>>>>>> I understand Bryan's concerns that misconfiguration of an HBase
> >>>>>>>> processor with a high number of retries and back-off can create a
> >>>>>>>> situation in which the processing of a single FlowFile will take a
> >>>>> very
> >>>>>>>> long time to hit the onFailure state.
> >>>>>>>>
> >>>>>>>> However, as an HBase developer, I can confidently state that
> >>>>>>>> hbase.client.retries=1 will create scenarios in which you'll be
> >>>>> pushing
> >>>>>>>> a FlowFile through a retry loop inside of NiFi for things which
> >>>>> should
> >>>>>>>> be implicitly retried inside of the HBase client.
> >>>>>>>>
> >>>>>>>> For example, if a Region is being moved between two RegionServers
> >>>>> and an
> >>>>>>>> HBase processor is trying to read/write to that Region, the client
> >>>>> will
> >>>>>>>> see an exception. This is a "retriable" exception in
> HBase-parlance
> >>>>>>>> which means that HBase client code would automatically re-process
> >>>>> that
> >>>>>>>> request (looking for the new location of that Region first). In
> >>>> most
> >>>>>>>> cases, the subsequent RPC would succeed and the caller is
> >>>>> non-the-wiser
> >>>>>>>> and the whole retry logic took 1's of milliseconds.
> >>>>>>>>
> >>>>>>>> My first idea was also what Lars' had suggested -- can we come up
> >>>>> with a
> >>>>>>>> sanity check to validate "correct" configuration for the processor
> >>>>>>>> before we throw the waterfall of data at it? I can respect if
> >>>>> processors
> >>>>>>>> don't have a "good" hook to do such a check.
> >>>>>>>>
> >>>>>>>> What _would_ be the ideal semantics from NiFi's? perspective? We
> >>>> have
> >>>>>>>> the ability to implicitly retry operations and also control the
> >>>> retry
> >>>>>>>> backoff values. Is there something more we could do from the HBase
> >>>>> side,
> >>>>>>>> given what y'all have seen from the battlefield?
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>>
> >>>>>>>> - Josh
> >>>>>>>>
> >>>>>>>> [1] https://github.com/apache/nifi/pull/3425
> >>>>>>>
> >>>>>
> >>>>
> >>>
>

Re: [DISCUSS] On HBase client retries (NIFI-6197)

Posted by Josh Elser <el...@apache.org>.

Thanks for sharing the code, Bryan! I lazily did not go digging.

Looking into #onEnabled, we could change this to create its own 
Connection with a very low 1 or 2 retries with a very short retry 
duration. Throw that one away after we did our sanity check, set a high 
retry amount, and then create a new Connection for all of the 
puts/gets/scans/whatever the service will do.

I think NiFi, in general, can err on the side of "lower" client retries 
(both in number and duration of backoff) than a normal client since it 
can implicitly buffer+retry flowfiles that fail.

WDYT, Lars?

On 2/19/20 9:24 AM, Bryan Bende wrote:
> We do already expose the ability to configure the retries in the
> controller service [1], it was just a debate about what the default
> value should be. Currently it is set to 1 because we believed it was
> better to fail fast during the initial configuration of the service. A
> user can easily set it back to 15, or whatever HBase client normally
> does. I even suggested a compromise of making the default value 7,
> which was half way between the two extremes, but that was considered
> unacceptable, even though based on what you said in your original
> message, most cases work on the first retry, so 7 would have covered
> those.
> 
> We can also expose a property for the RPC timeout, but I think the
> retries is more the issue here.
> 
> We can also definitely improve the docs, but I would still lean
> towards the default retires being something lower, and then the docs
> can explain that after ensuring the service can be successfully
> started that this value can be increased to 15.
> 
> [1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-hbase_2-client-service-bundle/nifi-hbase_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_2_ClientService.java#L134-L140
> 
> On Tue, Feb 18, 2020 at 8:31 PM Josh Elser <el...@apache.org> wrote:
>>
>> We could certainly implement some kind of "sanity check" via HBase code,
>> but I think the thing that is missing is the logical way to validate
>> this in NiFi itself.
>>
>> Something like...
>>
>> ```
>> Configuration conf = HBaseConfiguration.create();
>> conf.setInt("hbase.rpc.timeout", 5000);
>> try (Connection conn = ConnectionFactory.create(conf)) {
>>     // do sanity check
>> }
>> Configuration conf2 = new Configuration(conf);
>> conf2.setInt("hbase.rpc.timeout", 25000);
>> try (Connection conn = ConnectionFactory.create(conf2)) {
>>     // do real hbase-y stuff
>> }
>> ```
>>
>> Maybe instead of requiring an implicit way to do this (requiring NiFi
>> code changes), we could solve the problem at the "human level": create
>> docs that walk people through how to push a dummy record through the
>> service/processor with the low configuration of timeouts and retries?
>> Make the "sanity check" a human operation and just expose the ability to
>> set timeout/retries via the controller service?
>>
>> On 2/18/20 4:36 PM, Lars Francke wrote:
>>> Hi,
>>>
>>> Josh, thanks for bringing it up here again.
>>> I'm happy to revive the PR with whatever the outcome of this thread is.
>>> It came up today because another client complained about how "unstable"
>>> HBase is on NiFi.
>>>
>>> @Josh: As the whole issue is only the initial connect can we have a
>>> different timeout setting there? I have to admit I don't know.
>>>
>>> Cheers,
>>> Lars
>>>
>>> On Tue, Feb 18, 2020 at 8:11 PM Pierre Villard <pi...@gmail.com>
>>> wrote:
>>>
>>>> Good point, I don't think we can do that on a controller service.
>>>>
>>>> Le mar. 18 févr. 2020 à 11:06, Bryan Bende <bb...@gmail.com> a écrit :
>>>>
>>>>> That could make it a little better, but I can't remember, can we
>>>>> terminate on a controller service?
>>>>>
>>>>> The issue here would be on first time enabling the the HBase client
>>>>> service, so before even getting to a processor.
>>>>>
>>>>> On Tue, Feb 18, 2020 at 2:00 PM Pierre Villard
>>>>> <pi...@gmail.com> wrote:
>>>>>>
>>>>>> Bryan,
>>>>>>
>>>>>> I didn't follow the whole discussion so I apologize if I'm saying
>>>>> something
>>>>>> stupid here. Now that we have the possibility to terminate threads in a
>>>>>> processor, would that solve the issue?
>>>>>>
>>>>>> Pierre
>>>>>>
>>>>>> Le mar. 18 févr. 2020 à 10:52, Bryan Bende <bb...@gmail.com> a écrit
>>>> :
>>>>>>
>>>>>>> Hi Josh,
>>>>>>>
>>>>>>> The problem isn't so much about the retries within the flow, its more
>>>>>>> about setting up the service for the first time.
>>>>>>>
>>>>>>> A common scenario for users was the following:
>>>>>>>
>>>>>>> - Create a new HBase client service
>>>>>>> - Enter some config that wasn't quite correct, possibly hostnames
>>>> that
>>>>>>> weren't reachable from nifi as one example
>>>>>>> - Enable service and enter retry loop
>>>>>>> - Attempt to disable service to fix config, but have to wait 5+ mins
>>>>>>> for the retries to finish
>>>>>>>
>>>>>>> Maybe a lazy initialization of the connection on our side would help
>>>>>>> here, although it would just be moving the problem until later (i.e.
>>>>>>> service immediately enables because nothing is happening, then they
>>>>>>> find out about config problems later when a flow file hits an hbase
>>>>>>> processor).
>>>>>>>
>>>>>>> I guess the ideal scenario would be to have different logic for
>>>>>>> initializing the connection vs. using it, so that there wouldn't be
>>>>>>> retries during initialization.
>>>>>>>
>>>>>>> -Bryan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <jo...@gmail.com>
>>>>> wrote:
>>>>>>>>
>>>>>>>> Hiya!
>>>>>>>>
>>>>>>>> LarsF brought this up in the apache-hbase slack account and it
>>>>> caught my
>>>>>>>> eye. Sending a note here since the PR is closed where this was
>>>> being
>>>>>>>> discussed before[1].
>>>>>>>>
>>>>>>>> I understand Bryan's concerns that misconfiguration of an HBase
>>>>>>>> processor with a high number of retries and back-off can create a
>>>>>>>> situation in which the processing of a single FlowFile will take a
>>>>> very
>>>>>>>> long time to hit the onFailure state.
>>>>>>>>
>>>>>>>> However, as an HBase developer, I can confidently state that
>>>>>>>> hbase.client.retries=1 will create scenarios in which you'll be
>>>>> pushing
>>>>>>>> a FlowFile through a retry loop inside of NiFi for things which
>>>>> should
>>>>>>>> be implicitly retried inside of the HBase client.
>>>>>>>>
>>>>>>>> For example, if a Region is being moved between two RegionServers
>>>>> and an
>>>>>>>> HBase processor is trying to read/write to that Region, the client
>>>>> will
>>>>>>>> see an exception. This is a "retriable" exception in HBase-parlance
>>>>>>>> which means that HBase client code would automatically re-process
>>>>> that
>>>>>>>> request (looking for the new location of that Region first). In
>>>> most
>>>>>>>> cases, the subsequent RPC would succeed and the caller is
>>>>> non-the-wiser
>>>>>>>> and the whole retry logic took 1's of milliseconds.
>>>>>>>>
>>>>>>>> My first idea was also what Lars' had suggested -- can we come up
>>>>> with a
>>>>>>>> sanity check to validate "correct" configuration for the processor
>>>>>>>> before we throw the waterfall of data at it? I can respect if
>>>>> processors
>>>>>>>> don't have a "good" hook to do such a check.
>>>>>>>>
>>>>>>>> What _would_ be the ideal semantics from NiFi's? perspective? We
>>>> have
>>>>>>>> the ability to implicitly retry operations and also control the
>>>> retry
>>>>>>>> backoff values. Is there something more we could do from the HBase
>>>>> side,
>>>>>>>> given what y'all have seen from the battlefield?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> - Josh
>>>>>>>>
>>>>>>>> [1] https://github.com/apache/nifi/pull/3425
>>>>>>>
>>>>>
>>>>
>>>

Re: [DISCUSS] On HBase client retries (NIFI-6197)

Posted by Bryan Bende <bb...@gmail.com>.

We do already expose the ability to configure the retries in the
controller service [1], it was just a debate about what the default
value should be. Currently it is set to 1 because we believed it was
better to fail fast during the initial configuration of the service. A
user can easily set it back to 15, or whatever HBase client normally
does. I even suggested a compromise of making the default value 7,
which was half way between the two extremes, but that was considered
unacceptable, even though based on what you said in your original
message, most cases work on the first retry, so 7 would have covered
those.

We can also expose a property for the RPC timeout, but I think the
retries is more the issue here.

We can also definitely improve the docs, but I would still lean
towards the default retires being something lower, and then the docs
can explain that after ensuring the service can be successfully
started that this value can be increased to 15.

[1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-hbase_2-client-service-bundle/nifi-hbase_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_2_ClientService.java#L134-L140

On Tue, Feb 18, 2020 at 8:31 PM Josh Elser <el...@apache.org> wrote:
>
> We could certainly implement some kind of "sanity check" via HBase code,
> but I think the thing that is missing is the logical way to validate
> this in NiFi itself.
>
> Something like...
>
> ```
> Configuration conf = HBaseConfiguration.create();
> conf.setInt("hbase.rpc.timeout", 5000);
> try (Connection conn = ConnectionFactory.create(conf)) {
>    // do sanity check
> }
> Configuration conf2 = new Configuration(conf);
> conf2.setInt("hbase.rpc.timeout", 25000);
> try (Connection conn = ConnectionFactory.create(conf2)) {
>    // do real hbase-y stuff
> }
> ```
>
> Maybe instead of requiring an implicit way to do this (requiring NiFi
> code changes), we could solve the problem at the "human level": create
> docs that walk people through how to push a dummy record through the
> service/processor with the low configuration of timeouts and retries?
> Make the "sanity check" a human operation and just expose the ability to
> set timeout/retries via the controller service?
>
> On 2/18/20 4:36 PM, Lars Francke wrote:
> > Hi,
> >
> > Josh, thanks for bringing it up here again.
> > I'm happy to revive the PR with whatever the outcome of this thread is.
> > It came up today because another client complained about how "unstable"
> > HBase is on NiFi.
> >
> > @Josh: As the whole issue is only the initial connect can we have a
> > different timeout setting there? I have to admit I don't know.
> >
> > Cheers,
> > Lars
> >
> > On Tue, Feb 18, 2020 at 8:11 PM Pierre Villard <pi...@gmail.com>
> > wrote:
> >
> >> Good point, I don't think we can do that on a controller service.
> >>
> >> Le mar. 18 févr. 2020 à 11:06, Bryan Bende <bb...@gmail.com> a écrit :
> >>
> >>> That could make it a little better, but I can't remember, can we
> >>> terminate on a controller service?
> >>>
> >>> The issue here would be on first time enabling the the HBase client
> >>> service, so before even getting to a processor.
> >>>
> >>> On Tue, Feb 18, 2020 at 2:00 PM Pierre Villard
> >>> <pi...@gmail.com> wrote:
> >>>>
> >>>> Bryan,
> >>>>
> >>>> I didn't follow the whole discussion so I apologize if I'm saying
> >>> something
> >>>> stupid here. Now that we have the possibility to terminate threads in a
> >>>> processor, would that solve the issue?
> >>>>
> >>>> Pierre
> >>>>
> >>>> Le mar. 18 févr. 2020 à 10:52, Bryan Bende <bb...@gmail.com> a écrit
> >> :
> >>>>
> >>>>> Hi Josh,
> >>>>>
> >>>>> The problem isn't so much about the retries within the flow, its more
> >>>>> about setting up the service for the first time.
> >>>>>
> >>>>> A common scenario for users was the following:
> >>>>>
> >>>>> - Create a new HBase client service
> >>>>> - Enter some config that wasn't quite correct, possibly hostnames
> >> that
> >>>>> weren't reachable from nifi as one example
> >>>>> - Enable service and enter retry loop
> >>>>> - Attempt to disable service to fix config, but have to wait 5+ mins
> >>>>> for the retries to finish
> >>>>>
> >>>>> Maybe a lazy initialization of the connection on our side would help
> >>>>> here, although it would just be moving the problem until later (i.e.
> >>>>> service immediately enables because nothing is happening, then they
> >>>>> find out about config problems later when a flow file hits an hbase
> >>>>> processor).
> >>>>>
> >>>>> I guess the ideal scenario would be to have different logic for
> >>>>> initializing the connection vs. using it, so that there wouldn't be
> >>>>> retries during initialization.
> >>>>>
> >>>>> -Bryan
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <jo...@gmail.com>
> >>> wrote:
> >>>>>>
> >>>>>> Hiya!
> >>>>>>
> >>>>>> LarsF brought this up in the apache-hbase slack account and it
> >>> caught my
> >>>>>> eye. Sending a note here since the PR is closed where this was
> >> being
> >>>>>> discussed before[1].
> >>>>>>
> >>>>>> I understand Bryan's concerns that misconfiguration of an HBase
> >>>>>> processor with a high number of retries and back-off can create a
> >>>>>> situation in which the processing of a single FlowFile will take a
> >>> very
> >>>>>> long time to hit the onFailure state.
> >>>>>>
> >>>>>> However, as an HBase developer, I can confidently state that
> >>>>>> hbase.client.retries=1 will create scenarios in which you'll be
> >>> pushing
> >>>>>> a FlowFile through a retry loop inside of NiFi for things which
> >>> should
> >>>>>> be implicitly retried inside of the HBase client.
> >>>>>>
> >>>>>> For example, if a Region is being moved between two RegionServers
> >>> and an
> >>>>>> HBase processor is trying to read/write to that Region, the client
> >>> will
> >>>>>> see an exception. This is a "retriable" exception in HBase-parlance
> >>>>>> which means that HBase client code would automatically re-process
> >>> that
> >>>>>> request (looking for the new location of that Region first). In
> >> most
> >>>>>> cases, the subsequent RPC would succeed and the caller is
> >>> non-the-wiser
> >>>>>> and the whole retry logic took 1's of milliseconds.
> >>>>>>
> >>>>>> My first idea was also what Lars' had suggested -- can we come up
> >>> with a
> >>>>>> sanity check to validate "correct" configuration for the processor
> >>>>>> before we throw the waterfall of data at it? I can respect if
> >>> processors
> >>>>>> don't have a "good" hook to do such a check.
> >>>>>>
> >>>>>> What _would_ be the ideal semantics from NiFi's? perspective? We
> >> have
> >>>>>> the ability to implicitly retry operations and also control the
> >> retry
> >>>>>> backoff values. Is there something more we could do from the HBase
> >>> side,
> >>>>>> given what y'all have seen from the battlefield?
> >>>>>>
> >>>>>> Thanks!
> >>>>>>
> >>>>>> - Josh
> >>>>>>
> >>>>>> [1] https://github.com/apache/nifi/pull/3425
> >>>>>
> >>>
> >>
> >

Re: [DISCUSS] On HBase client retries (NIFI-6197)

Posted by Josh Elser <el...@apache.org>.

We could certainly implement some kind of "sanity check" via HBase code, 
but I think the thing that is missing is the logical way to validate 
this in NiFi itself.

Something like...

```
Configuration conf = HBaseConfiguration.create();
conf.setInt("hbase.rpc.timeout", 5000);
try (Connection conn = ConnectionFactory.create(conf)) {
   // do sanity check
}
Configuration conf2 = new Configuration(conf);
conf2.setInt("hbase.rpc.timeout", 25000);
try (Connection conn = ConnectionFactory.create(conf2)) {
   // do real hbase-y stuff
}
```

Maybe instead of requiring an implicit way to do this (requiring NiFi 
code changes), we could solve the problem at the "human level": create 
docs that walk people through how to push a dummy record through the 
service/processor with the low configuration of timeouts and retries? 
Make the "sanity check" a human operation and just expose the ability to 
set timeout/retries via the controller service?

On 2/18/20 4:36 PM, Lars Francke wrote:
> Hi,
> 
> Josh, thanks for bringing it up here again.
> I'm happy to revive the PR with whatever the outcome of this thread is.
> It came up today because another client complained about how "unstable"
> HBase is on NiFi.
> 
> @Josh: As the whole issue is only the initial connect can we have a
> different timeout setting there? I have to admit I don't know.
> 
> Cheers,
> Lars
> 
> On Tue, Feb 18, 2020 at 8:11 PM Pierre Villard <pi...@gmail.com>
> wrote:
> 
>> Good point, I don't think we can do that on a controller service.
>>
>> Le mar. 18 févr. 2020 à 11:06, Bryan Bende <bb...@gmail.com> a écrit :
>>
>>> That could make it a little better, but I can't remember, can we
>>> terminate on a controller service?
>>>
>>> The issue here would be on first time enabling the the HBase client
>>> service, so before even getting to a processor.
>>>
>>> On Tue, Feb 18, 2020 at 2:00 PM Pierre Villard
>>> <pi...@gmail.com> wrote:
>>>>
>>>> Bryan,
>>>>
>>>> I didn't follow the whole discussion so I apologize if I'm saying
>>> something
>>>> stupid here. Now that we have the possibility to terminate threads in a
>>>> processor, would that solve the issue?
>>>>
>>>> Pierre
>>>>
>>>> Le mar. 18 févr. 2020 à 10:52, Bryan Bende <bb...@gmail.com> a écrit
>> :
>>>>
>>>>> Hi Josh,
>>>>>
>>>>> The problem isn't so much about the retries within the flow, its more
>>>>> about setting up the service for the first time.
>>>>>
>>>>> A common scenario for users was the following:
>>>>>
>>>>> - Create a new HBase client service
>>>>> - Enter some config that wasn't quite correct, possibly hostnames
>> that
>>>>> weren't reachable from nifi as one example
>>>>> - Enable service and enter retry loop
>>>>> - Attempt to disable service to fix config, but have to wait 5+ mins
>>>>> for the retries to finish
>>>>>
>>>>> Maybe a lazy initialization of the connection on our side would help
>>>>> here, although it would just be moving the problem until later (i.e.
>>>>> service immediately enables because nothing is happening, then they
>>>>> find out about config problems later when a flow file hits an hbase
>>>>> processor).
>>>>>
>>>>> I guess the ideal scenario would be to have different logic for
>>>>> initializing the connection vs. using it, so that there wouldn't be
>>>>> retries during initialization.
>>>>>
>>>>> -Bryan
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <jo...@gmail.com>
>>> wrote:
>>>>>>
>>>>>> Hiya!
>>>>>>
>>>>>> LarsF brought this up in the apache-hbase slack account and it
>>> caught my
>>>>>> eye. Sending a note here since the PR is closed where this was
>> being
>>>>>> discussed before[1].
>>>>>>
>>>>>> I understand Bryan's concerns that misconfiguration of an HBase
>>>>>> processor with a high number of retries and back-off can create a
>>>>>> situation in which the processing of a single FlowFile will take a
>>> very
>>>>>> long time to hit the onFailure state.
>>>>>>
>>>>>> However, as an HBase developer, I can confidently state that
>>>>>> hbase.client.retries=1 will create scenarios in which you'll be
>>> pushing
>>>>>> a FlowFile through a retry loop inside of NiFi for things which
>>> should
>>>>>> be implicitly retried inside of the HBase client.
>>>>>>
>>>>>> For example, if a Region is being moved between two RegionServers
>>> and an
>>>>>> HBase processor is trying to read/write to that Region, the client
>>> will
>>>>>> see an exception. This is a "retriable" exception in HBase-parlance
>>>>>> which means that HBase client code would automatically re-process
>>> that
>>>>>> request (looking for the new location of that Region first). In
>> most
>>>>>> cases, the subsequent RPC would succeed and the caller is
>>> non-the-wiser
>>>>>> and the whole retry logic took 1's of milliseconds.
>>>>>>
>>>>>> My first idea was also what Lars' had suggested -- can we come up
>>> with a
>>>>>> sanity check to validate "correct" configuration for the processor
>>>>>> before we throw the waterfall of data at it? I can respect if
>>> processors
>>>>>> don't have a "good" hook to do such a check.
>>>>>>
>>>>>> What _would_ be the ideal semantics from NiFi's? perspective? We
>> have
>>>>>> the ability to implicitly retry operations and also control the
>> retry
>>>>>> backoff values. Is there something more we could do from the HBase
>>> side,
>>>>>> given what y'all have seen from the battlefield?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> - Josh
>>>>>>
>>>>>> [1] https://github.com/apache/nifi/pull/3425
>>>>>
>>>
>>
>

Re: [DISCUSS] On HBase client retries (NIFI-6197)

Posted by Lars Francke <la...@gmail.com>.

Hi,

Josh, thanks for bringing it up here again.
I'm happy to revive the PR with whatever the outcome of this thread is.
It came up today because another client complained about how "unstable"
HBase is on NiFi.

@Josh: As the whole issue is only the initial connect can we have a
different timeout setting there? I have to admit I don't know.

Cheers,
Lars

On Tue, Feb 18, 2020 at 8:11 PM Pierre Villard <pi...@gmail.com>
wrote:

> Good point, I don't think we can do that on a controller service.
>
> Le mar. 18 févr. 2020 à 11:06, Bryan Bende <bb...@gmail.com> a écrit :
>
> > That could make it a little better, but I can't remember, can we
> > terminate on a controller service?
> >
> > The issue here would be on first time enabling the the HBase client
> > service, so before even getting to a processor.
> >
> > On Tue, Feb 18, 2020 at 2:00 PM Pierre Villard
> > <pi...@gmail.com> wrote:
> > >
> > > Bryan,
> > >
> > > I didn't follow the whole discussion so I apologize if I'm saying
> > something
> > > stupid here. Now that we have the possibility to terminate threads in a
> > > processor, would that solve the issue?
> > >
> > > Pierre
> > >
> > > Le mar. 18 févr. 2020 à 10:52, Bryan Bende <bb...@gmail.com> a écrit
> :
> > >
> > > > Hi Josh,
> > > >
> > > > The problem isn't so much about the retries within the flow, its more
> > > > about setting up the service for the first time.
> > > >
> > > > A common scenario for users was the following:
> > > >
> > > > - Create a new HBase client service
> > > > - Enter some config that wasn't quite correct, possibly hostnames
> that
> > > > weren't reachable from nifi as one example
> > > > - Enable service and enter retry loop
> > > > - Attempt to disable service to fix config, but have to wait 5+ mins
> > > > for the retries to finish
> > > >
> > > > Maybe a lazy initialization of the connection on our side would help
> > > > here, although it would just be moving the problem until later (i.e.
> > > > service immediately enables because nothing is happening, then they
> > > > find out about config problems later when a flow file hits an hbase
> > > > processor).
> > > >
> > > > I guess the ideal scenario would be to have different logic for
> > > > initializing the connection vs. using it, so that there wouldn't be
> > > > retries during initialization.
> > > >
> > > > -Bryan
> > > >
> > > >
> > > >
> > > > On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <jo...@gmail.com>
> > wrote:
> > > > >
> > > > > Hiya!
> > > > >
> > > > > LarsF brought this up in the apache-hbase slack account and it
> > caught my
> > > > > eye. Sending a note here since the PR is closed where this was
> being
> > > > > discussed before[1].
> > > > >
> > > > > I understand Bryan's concerns that misconfiguration of an HBase
> > > > > processor with a high number of retries and back-off can create a
> > > > > situation in which the processing of a single FlowFile will take a
> > very
> > > > > long time to hit the onFailure state.
> > > > >
> > > > > However, as an HBase developer, I can confidently state that
> > > > > hbase.client.retries=1 will create scenarios in which you'll be
> > pushing
> > > > > a FlowFile through a retry loop inside of NiFi for things which
> > should
> > > > > be implicitly retried inside of the HBase client.
> > > > >
> > > > > For example, if a Region is being moved between two RegionServers
> > and an
> > > > > HBase processor is trying to read/write to that Region, the client
> > will
> > > > > see an exception. This is a "retriable" exception in HBase-parlance
> > > > > which means that HBase client code would automatically re-process
> > that
> > > > > request (looking for the new location of that Region first). In
> most
> > > > > cases, the subsequent RPC would succeed and the caller is
> > non-the-wiser
> > > > > and the whole retry logic took 1's of milliseconds.
> > > > >
> > > > > My first idea was also what Lars' had suggested -- can we come up
> > with a
> > > > > sanity check to validate "correct" configuration for the processor
> > > > > before we throw the waterfall of data at it? I can respect if
> > processors
> > > > > don't have a "good" hook to do such a check.
> > > > >
> > > > > What _would_ be the ideal semantics from NiFi's? perspective? We
> have
> > > > > the ability to implicitly retry operations and also control the
> retry
> > > > > backoff values. Is there something more we could do from the HBase
> > side,
> > > > > given what y'all have seen from the battlefield?
> > > > >
> > > > > Thanks!
> > > > >
> > > > > - Josh
> > > > >
> > > > > [1] https://github.com/apache/nifi/pull/3425
> > > >
> >
>

Re: [DISCUSS] On HBase client retries (NIFI-6197)

Posted by Pierre Villard <pi...@gmail.com>.

Good point, I don't think we can do that on a controller service.

Le mar. 18 févr. 2020 à 11:06, Bryan Bende <bb...@gmail.com> a écrit :

> That could make it a little better, but I can't remember, can we
> terminate on a controller service?
>
> The issue here would be on first time enabling the the HBase client
> service, so before even getting to a processor.
>
> On Tue, Feb 18, 2020 at 2:00 PM Pierre Villard
> <pi...@gmail.com> wrote:
> >
> > Bryan,
> >
> > I didn't follow the whole discussion so I apologize if I'm saying
> something
> > stupid here. Now that we have the possibility to terminate threads in a
> > processor, would that solve the issue?
> >
> > Pierre
> >
> > Le mar. 18 févr. 2020 à 10:52, Bryan Bende <bb...@gmail.com> a écrit :
> >
> > > Hi Josh,
> > >
> > > The problem isn't so much about the retries within the flow, its more
> > > about setting up the service for the first time.
> > >
> > > A common scenario for users was the following:
> > >
> > > - Create a new HBase client service
> > > - Enter some config that wasn't quite correct, possibly hostnames that
> > > weren't reachable from nifi as one example
> > > - Enable service and enter retry loop
> > > - Attempt to disable service to fix config, but have to wait 5+ mins
> > > for the retries to finish
> > >
> > > Maybe a lazy initialization of the connection on our side would help
> > > here, although it would just be moving the problem until later (i.e.
> > > service immediately enables because nothing is happening, then they
> > > find out about config problems later when a flow file hits an hbase
> > > processor).
> > >
> > > I guess the ideal scenario would be to have different logic for
> > > initializing the connection vs. using it, so that there wouldn't be
> > > retries during initialization.
> > >
> > > -Bryan
> > >
> > >
> > >
> > > On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <jo...@gmail.com>
> wrote:
> > > >
> > > > Hiya!
> > > >
> > > > LarsF brought this up in the apache-hbase slack account and it
> caught my
> > > > eye. Sending a note here since the PR is closed where this was being
> > > > discussed before[1].
> > > >
> > > > I understand Bryan's concerns that misconfiguration of an HBase
> > > > processor with a high number of retries and back-off can create a
> > > > situation in which the processing of a single FlowFile will take a
> very
> > > > long time to hit the onFailure state.
> > > >
> > > > However, as an HBase developer, I can confidently state that
> > > > hbase.client.retries=1 will create scenarios in which you'll be
> pushing
> > > > a FlowFile through a retry loop inside of NiFi for things which
> should
> > > > be implicitly retried inside of the HBase client.
> > > >
> > > > For example, if a Region is being moved between two RegionServers
> and an
> > > > HBase processor is trying to read/write to that Region, the client
> will
> > > > see an exception. This is a "retriable" exception in HBase-parlance
> > > > which means that HBase client code would automatically re-process
> that
> > > > request (looking for the new location of that Region first). In most
> > > > cases, the subsequent RPC would succeed and the caller is
> non-the-wiser
> > > > and the whole retry logic took 1's of milliseconds.
> > > >
> > > > My first idea was also what Lars' had suggested -- can we come up
> with a
> > > > sanity check to validate "correct" configuration for the processor
> > > > before we throw the waterfall of data at it? I can respect if
> processors
> > > > don't have a "good" hook to do such a check.
> > > >
> > > > What _would_ be the ideal semantics from NiFi's? perspective? We have
> > > > the ability to implicitly retry operations and also control the retry
> > > > backoff values. Is there something more we could do from the HBase
> side,
> > > > given what y'all have seen from the battlefield?
> > > >
> > > > Thanks!
> > > >
> > > > - Josh
> > > >
> > > > [1] https://github.com/apache/nifi/pull/3425
> > >
>

Re: [DISCUSS] On HBase client retries (NIFI-6197)

Posted by Bryan Bende <bb...@gmail.com>.

That could make it a little better, but I can't remember, can we
terminate on a controller service?

The issue here would be on first time enabling the the HBase client
service, so before even getting to a processor.

On Tue, Feb 18, 2020 at 2:00 PM Pierre Villard
<pi...@gmail.com> wrote:
>
> Bryan,
>
> I didn't follow the whole discussion so I apologize if I'm saying something
> stupid here. Now that we have the possibility to terminate threads in a
> processor, would that solve the issue?
>
> Pierre
>
> Le mar. 18 févr. 2020 à 10:52, Bryan Bende <bb...@gmail.com> a écrit :
>
> > Hi Josh,
> >
> > The problem isn't so much about the retries within the flow, its more
> > about setting up the service for the first time.
> >
> > A common scenario for users was the following:
> >
> > - Create a new HBase client service
> > - Enter some config that wasn't quite correct, possibly hostnames that
> > weren't reachable from nifi as one example
> > - Enable service and enter retry loop
> > - Attempt to disable service to fix config, but have to wait 5+ mins
> > for the retries to finish
> >
> > Maybe a lazy initialization of the connection on our side would help
> > here, although it would just be moving the problem until later (i.e.
> > service immediately enables because nothing is happening, then they
> > find out about config problems later when a flow file hits an hbase
> > processor).
> >
> > I guess the ideal scenario would be to have different logic for
> > initializing the connection vs. using it, so that there wouldn't be
> > retries during initialization.
> >
> > -Bryan
> >
> >
> >
> > On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <jo...@gmail.com> wrote:
> > >
> > > Hiya!
> > >
> > > LarsF brought this up in the apache-hbase slack account and it caught my
> > > eye. Sending a note here since the PR is closed where this was being
> > > discussed before[1].
> > >
> > > I understand Bryan's concerns that misconfiguration of an HBase
> > > processor with a high number of retries and back-off can create a
> > > situation in which the processing of a single FlowFile will take a very
> > > long time to hit the onFailure state.
> > >
> > > However, as an HBase developer, I can confidently state that
> > > hbase.client.retries=1 will create scenarios in which you'll be pushing
> > > a FlowFile through a retry loop inside of NiFi for things which should
> > > be implicitly retried inside of the HBase client.
> > >
> > > For example, if a Region is being moved between two RegionServers and an
> > > HBase processor is trying to read/write to that Region, the client will
> > > see an exception. This is a "retriable" exception in HBase-parlance
> > > which means that HBase client code would automatically re-process that
> > > request (looking for the new location of that Region first). In most
> > > cases, the subsequent RPC would succeed and the caller is non-the-wiser
> > > and the whole retry logic took 1's of milliseconds.
> > >
> > > My first idea was also what Lars' had suggested -- can we come up with a
> > > sanity check to validate "correct" configuration for the processor
> > > before we throw the waterfall of data at it? I can respect if processors
> > > don't have a "good" hook to do such a check.
> > >
> > > What _would_ be the ideal semantics from NiFi's? perspective? We have
> > > the ability to implicitly retry operations and also control the retry
> > > backoff values. Is there something more we could do from the HBase side,
> > > given what y'all have seen from the battlefield?
> > >
> > > Thanks!
> > >
> > > - Josh
> > >
> > > [1] https://github.com/apache/nifi/pull/3425
> >

Re: [DISCUSS] On HBase client retries (NIFI-6197)

Posted by Pierre Villard <pi...@gmail.com>.

Bryan,

I didn't follow the whole discussion so I apologize if I'm saying something
stupid here. Now that we have the possibility to terminate threads in a
processor, would that solve the issue?

Pierre

Le mar. 18 févr. 2020 à 10:52, Bryan Bende <bb...@gmail.com> a écrit :

> Hi Josh,
>
> The problem isn't so much about the retries within the flow, its more
> about setting up the service for the first time.
>
> A common scenario for users was the following:
>
> - Create a new HBase client service
> - Enter some config that wasn't quite correct, possibly hostnames that
> weren't reachable from nifi as one example
> - Enable service and enter retry loop
> - Attempt to disable service to fix config, but have to wait 5+ mins
> for the retries to finish
>
> Maybe a lazy initialization of the connection on our side would help
> here, although it would just be moving the problem until later (i.e.
> service immediately enables because nothing is happening, then they
> find out about config problems later when a flow file hits an hbase
> processor).
>
> I guess the ideal scenario would be to have different logic for
> initializing the connection vs. using it, so that there wouldn't be
> retries during initialization.
>
> -Bryan
>
>
>
> On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <jo...@gmail.com> wrote:
> >
> > Hiya!
> >
> > LarsF brought this up in the apache-hbase slack account and it caught my
> > eye. Sending a note here since the PR is closed where this was being
> > discussed before[1].
> >
> > I understand Bryan's concerns that misconfiguration of an HBase
> > processor with a high number of retries and back-off can create a
> > situation in which the processing of a single FlowFile will take a very
> > long time to hit the onFailure state.
> >
> > However, as an HBase developer, I can confidently state that
> > hbase.client.retries=1 will create scenarios in which you'll be pushing
> > a FlowFile through a retry loop inside of NiFi for things which should
> > be implicitly retried inside of the HBase client.
> >
> > For example, if a Region is being moved between two RegionServers and an
> > HBase processor is trying to read/write to that Region, the client will
> > see an exception. This is a "retriable" exception in HBase-parlance
> > which means that HBase client code would automatically re-process that
> > request (looking for the new location of that Region first). In most
> > cases, the subsequent RPC would succeed and the caller is non-the-wiser
> > and the whole retry logic took 1's of milliseconds.
> >
> > My first idea was also what Lars' had suggested -- can we come up with a
> > sanity check to validate "correct" configuration for the processor
> > before we throw the waterfall of data at it? I can respect if processors
> > don't have a "good" hook to do such a check.
> >
> > What _would_ be the ideal semantics from NiFi's? perspective? We have
> > the ability to implicitly retry operations and also control the retry
> > backoff values. Is there something more we could do from the HBase side,
> > given what y'all have seen from the battlefield?
> >
> > Thanks!
> >
> > - Josh
> >
> > [1] https://github.com/apache/nifi/pull/3425
>

Re: [DISCUSS] On HBase client retries (NIFI-6197)

Posted by Bryan Bende <bb...@gmail.com>.

Hi Josh,

The problem isn't so much about the retries within the flow, its more
about setting up the service for the first time.

A common scenario for users was the following:

- Create a new HBase client service
- Enter some config that wasn't quite correct, possibly hostnames that
weren't reachable from nifi as one example
- Enable service and enter retry loop
- Attempt to disable service to fix config, but have to wait 5+ mins
for the retries to finish

Maybe a lazy initialization of the connection on our side would help
here, although it would just be moving the problem until later (i.e.
service immediately enables because nothing is happening, then they
find out about config problems later when a flow file hits an hbase
processor).

I guess the ideal scenario would be to have different logic for
initializing the connection vs. using it, so that there wouldn't be
retries during initialization.

-Bryan



On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <jo...@gmail.com> wrote:
>
> Hiya!
>
> LarsF brought this up in the apache-hbase slack account and it caught my
> eye. Sending a note here since the PR is closed where this was being
> discussed before[1].
>
> I understand Bryan's concerns that misconfiguration of an HBase
> processor with a high number of retries and back-off can create a
> situation in which the processing of a single FlowFile will take a very
> long time to hit the onFailure state.
>
> However, as an HBase developer, I can confidently state that
> hbase.client.retries=1 will create scenarios in which you'll be pushing
> a FlowFile through a retry loop inside of NiFi for things which should
> be implicitly retried inside of the HBase client.
>
> For example, if a Region is being moved between two RegionServers and an
> HBase processor is trying to read/write to that Region, the client will
> see an exception. This is a "retriable" exception in HBase-parlance
> which means that HBase client code would automatically re-process that
> request (looking for the new location of that Region first). In most
> cases, the subsequent RPC would succeed and the caller is non-the-wiser
> and the whole retry logic took 1's of milliseconds.
>
> My first idea was also what Lars' had suggested -- can we come up with a
> sanity check to validate "correct" configuration for the processor
> before we throw the waterfall of data at it? I can respect if processors
> don't have a "good" hook to do such a check.
>
> What _would_ be the ideal semantics from NiFi's? perspective? We have
> the ability to implicitly retry operations and also control the retry
> backoff values. Is there something more we could do from the HBase side,
> given what y'all have seen from the battlefield?
>
> Thanks!
>
> - Josh
>
> [1] https://github.com/apache/nifi/pull/3425