You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Kourge (JIRA)" <ji...@apache.org> on 2019/01/11 10:38:00 UTC

[jira] [Created] (NIFI-5953) GetTwitter processor throws Enhance Your Calm exceptions then fails with Retries exhausted

Kourge created NIFI-5953:
----------------------------

             Summary: GetTwitter processor throws Enhance Your Calm exceptions then fails with Retries exhausted
                 Key: NIFI-5953
                 URL: https://issues.apache.org/jira/browse/NIFI-5953
             Project: Apache NiFi
          Issue Type: Bug
          Components: Extensions
    Affects Versions: 1.7.0
            Reporter: Kourge


Hi,

I am using the GetTwitter processor, with the Filter Endpoint.
 The issue is that I am often getting series of `*Received error HTTP_ERROR: HTTP/1.1 420 Enhance Your Calm. Will attempt to reconnect*` exceptions.
 These are followed by one `*Received error STOPPED_BY_ERROR: Retries exhausted due to null. Will not attempt to reconnect*` exception and then the processor don't get any more tweet from Twitter endpoint.

I am getting rate limited by Twitter API. I am running a NiFi cluster so I am running GetTwitter process on the Primary Node only to prevent using the same credentials several times in parallel.

I tried to apply the configuration recommendation from this mailing list:
 <[https://lists.apache.org/thread.html/ed397f42a26760280363e9cc1f64f6654c635110005e24ab9486bf19@%3Cdev.nifi.apache.org%3E]

But raising "run schedule" parameter to 60 seconds does not help in my case since I target reading between 100 and 200 tweets per minute. Setting "run schedule" to 60 seconds will let NiFi poll only 1 tweet per minute and won't be able to consume Twitter API tweets queue.

+*Proposed solution*+

I analyzed the `*GetTwitter.java*` implementation and noticed that the `*onTrigger()*`method reconnects (`*client.reconnect();*`) to the Twitter endpoint on `*HTTP_ERROR*`.
 The issue here is that `*HTTP/1.1 420 Enhance Your Calm*` messages are `*HTTP_ERROR*` but the Twitter HBC library client (com.twitter.hbc) already manage reconnection.
 Twitter HBC library client is making retries with an increasing wait delay by its own; with 5 retries by default.

More, it seam that the `*client.reconnect();*` don't work in my case and this brings to be kicked off the Twitter API earlier because that method is called too often.

My proposed solution is the following (tested on my local development)

*1. Letting Twitter HBC library client making the connection retries on `HTTP/1.1 420 Enhance Your Calm` messages.*

The `*onTrigger()*` method should be updated to not try to reconnect in case of `*HTTP_ERROR*` with message equal to `*HTTP/1.1 420 Enhance Your Calm*`:
{code:java}
 case HTTP_ERROR:
 if (!event.getMessage().equals("HTTP/1.1 420 Enhance Your Calm")) {
 getLogger().error("Received error {}: {}. Will attempt to reconnect", new Object[]

{event.getEventType(), event.getMessage()});
 client.reconnect();
 }
 else {
 getLogger().error("Received error {}: {}. Will not attempt to reconnect", new Object[]\{event.getEventType(), event.getMessage()}

);
 }
 break;
{code}
*2. Parameterize maximum number of connection retries*

I also noticed that the default number of retries on the Twitter HBC library is sometimes too low (5 times).
 So it would be useful to add a GetTwitter processor property named `*Max Connection Retries*`. In my usage I found that `*10*` is a good value.

Then update the `*onSchedule()*` method with this line (replacing `*10*` by the value of `*Max Connection Retries*`)
{code:java}
clientBuilder.retries(10); // default value is 5
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)