You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/04/09 00:16:00 UTC
[jira] [Commented] (NIFI-5953) GetTwitter processor throws Enhance
Your Calm exceptions then fails with Retries exhausted
[ https://issues.apache.org/jira/browse/NIFI-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812890#comment-16812890 ]
ASF subversion and git services commented on NIFI-5953:
-------------------------------------------------------
Commit cded30b3d2dc0497cf3d06051f55ca304c744250 in nifi's branch refs/heads/master from Kourge
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=cded30b ]
NIFI-5953 Manage GetTwitter connection retries on '420 Enhance Your Calm' exceptions
NIFI-5953 Manage GetTwitter connection retries on '420 Enhance Your Calm' exceptions
Update "Max Client Error Retries" parameter name.
reintriduce client.reconnect() on HTTP_ERROR 420
This closes #3276.
Signed-off-by: Koji Kawamura <ij...@apache.org>
> GetTwitter processor throws Enhance Your Calm exceptions then fails with Retries exhausted
> ------------------------------------------------------------------------------------------
>
> Key: NIFI-5953
> URL: https://issues.apache.org/jira/browse/NIFI-5953
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 1.7.0
> Reporter: Kourge
> Assignee: Kourge
> Priority: Major
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> Hi,
> I am using the GetTwitter processor, with the Filter Endpoint.
> The issue is that I am often getting series of `*Received error HTTP_ERROR: HTTP/1.1 420 Enhance Your Calm. Will attempt to reconnect*` exceptions.
> These are followed by one `*Received error STOPPED_BY_ERROR: Retries exhausted due to null. Will not attempt to reconnect*` exception and then the processor don't get any more tweet from Twitter endpoint.
> I am getting rate limited by Twitter API. I am running a NiFi cluster so I am running GetTwitter process on the Primary Node only to prevent using the same credentials several times in parallel.
> I tried to apply the configuration recommendation from this mailing list:
> <[https://lists.apache.org/thread.html/ed397f42a26760280363e9cc1f64f6654c635110005e24ab9486bf19@%3Cdev.nifi.apache.org%3E]
> But raising "run schedule" parameter to 60 seconds does not help in my case since I target reading between 100 and 200 tweets per minute. Setting "run schedule" to 60 seconds will let NiFi poll only 1 tweet per minute and won't be able to consume Twitter API tweets queue.
> +*Proposed solution*+
> I analyzed the `*GetTwitter.java*` implementation and noticed that the `*onTrigger()*`method reconnects (`*client.reconnect();*`) to the Twitter endpoint on `*HTTP_ERROR*`.
> The issue here is that `*HTTP/1.1 420 Enhance Your Calm*` messages are `*HTTP_ERROR*` but the Twitter HBC library client (com.twitter.hbc) already manage reconnection.
> Twitter HBC library client is making retries with an increasing wait delay by its own; with 5 retries by default.
> More, it seam that the `*client.reconnect();*` don't work in my case and this brings to be kicked off the Twitter API earlier because that method is called too often.
> My proposed solution is the following (tested on my local development)
> *1. Letting Twitter HBC library client making the connection retries on `HTTP/1.1 420 Enhance Your Calm` messages.*
> The `*onTrigger()*` method should be updated to not try to reconnect in case of `*HTTP_ERROR*` with message equal to `*HTTP/1.1 420 Enhance Your Calm*`:
> {code:java}
> case HTTP_ERROR:
> if (!event.getMessage().equals("HTTP/1.1 420 Enhance Your Calm")) {
> getLogger().error("Received error {}: {}. Will attempt to reconnect", new Object[{event.getEventType(), event.getMessage()});
> client.reconnect();
> }
> else {
> getLogger().error("Received error {}: {}. Will not attempt to reconnect", new Object[]{event.getEventType(), event.getMessage()});
> }
> break;
> {code}
> *2. Parameterize maximum number of connection retries*
> I also noticed that the default number of retries on the Twitter HBC library is sometimes too low (5 times).
> So it would be useful to add a GetTwitter processor property named `*Max Connection Retries*`. In my usage I found that `*10*` is a good value.
> Then update the `*onSchedule()*` method with this line (replacing `*10*` by the value of `*Max Connection Retries*`)
> {code:java}
> clientBuilder.retries(10); // default value is 5
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)