You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "mrkm4ntr (via GitHub)" <gi...@apache.org> on 2023/04/26 04:18:20 UTC

[GitHub] [beam] mrkm4ntr opened a new pull request, #26424: Handle response exception from Elasticsearch client

mrkm4ntr opened a new pull request, #26424:
URL: https://github.com/apache/beam/pull/26424

   If Elasticsearch client returns non success HTTP status, it throws ResponseException like below.
   https://github.com/elastic/elasticsearch/blob/f8ea89edcb9e79a62f79495b65eeb7c88c252f4f/client/rest/src/main/java/org/elasticsearch/client/RestClient.java#L340-L355
   
   So currently ElasticsearchIO doesn't retry 429 response.
   https://github.com/apache/beam/issues/22160
   
   **Please** add a meaningful description for your change here
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [x] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Handle response exception from Elasticsearch client [beam]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #26424: Handle response exception from Elasticsearch client
URL: https://github.com/apache/beam/pull/26424


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Handle response exception from Elasticsearch client [beam]

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on PR #26424:
URL: https://github.com/apache/beam/pull/26424#issuecomment-1852086472

   @egalpin @mrkm4ntr what are next steps for this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on pull request #26424: Handle response exception from Elasticsearch client

Posted by "egalpin (via GitHub)" <gi...@apache.org>.
egalpin commented on PR #26424:
URL: https://github.com/apache/beam/pull/26424#issuecomment-1525991570

   thanks @mrkm4ntr for the contribution!  I'll take a look more closely as soon as I can.  In the meantime, based on the spotless checker output[1] you need to run `./gradlew :sdks:java:io:elasticsearch:spotlessApply` on your local machine and commit any changes to auto-fix java style.
   
   Could you also double-check that there is an existing test for the failure mode that your code aims to address? I'll also verify in my review.  Thanks!
   
   [1] https://ci-beam.apache.org/job/beam_PreCommit_Spotless_Commit/26535/console


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mrkm4ntr commented on a diff in pull request #26424: Handle response exception from Elasticsearch client

Posted by "mrkm4ntr (via GitHub)" <gi...@apache.org>.
mrkm4ntr commented on code in PR #26424:
URL: https://github.com/apache/beam/pull/26424#discussion_r1224985484


##########
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java:
##########
@@ -373,6 +376,8 @@ abstract static class Builder {
 
       abstract Builder setTrustSelfSignedCerts(boolean trustSelfSignedCerts);
 
+      abstract Builder setInvalidBulkEndpoint(boolean invalidBulkEndpoint);

Review Comment:
   I think this method and [this](https://github.com/apache/beam/pull/26424/files#diff-e119923661d41dfc1911da8f6e29cc0e2e9a0a50129718c1c20c7d1aa32119bfR658) are not visible from users.  Same approach as [this](https://github.com/apache/beam/blob/6ad2b12c0997ad70f54fe8367918f5a47d482b1b/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L1191-L1196).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Handle response exception from Elasticsearch client [beam]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26424:
URL: https://github.com/apache/beam/pull/26424#issuecomment-1937697538

   This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on a diff in pull request #26424: Handle response exception from Elasticsearch client

Posted by "egalpin (via GitHub)" <gi...@apache.org>.
egalpin commented on code in PR #26424:
URL: https://github.com/apache/beam/pull/26424#discussion_r1224857843


##########
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java:
##########
@@ -373,6 +376,8 @@ abstract static class Builder {
 
       abstract Builder setTrustSelfSignedCerts(boolean trustSelfSignedCerts);
 
+      abstract Builder setInvalidBulkEndpoint(boolean invalidBulkEndpoint);

Review Comment:
   I don't particularly like the idea of having this builder method only for the sake of testing. I feel it will be very confusing to users to see this as an option when building configurations. Could you look for an alternative approach?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Handle response exception from Elasticsearch client [beam]

Posted by "mrkm4ntr (via GitHub)" <gi...@apache.org>.
mrkm4ntr commented on PR #26424:
URL: https://github.com/apache/beam/pull/26424#issuecomment-1760615922

   /hold


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #26424: Handle response exception from Elasticsearch client

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26424:
URL: https://github.com/apache/beam/pull/26424#issuecomment-1522788593

   Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @kennknowles for label java.
   R: @chamikaramj for label io.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Handle response exception from Elasticsearch client [beam]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26424:
URL: https://github.com/apache/beam/pull/26424#issuecomment-1755332980

   This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Handle response exception from Elasticsearch client [beam]

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on PR #26424:
URL: https://github.com/apache/beam/pull/26424#issuecomment-1771146769

   R: @egalpin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #26424: Handle response exception from Elasticsearch client

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26424:
URL: https://github.com/apache/beam/pull/26424#issuecomment-1671254755

   This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mrkm4ntr commented on a diff in pull request #26424: Handle response exception from Elasticsearch client

Posted by "mrkm4ntr (via GitHub)" <gi...@apache.org>.
mrkm4ntr commented on code in PR #26424:
URL: https://github.com/apache/beam/pull/26424#discussion_r1289632381


##########
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java:
##########
@@ -1224,26 +1241,9 @@ static class DefaultRetryPredicate implements RetryPredicate {
         this(429);
       }
 
-      /** Returns true if the response has the error code for any mutation. */
-      private static boolean errorCodePresent(HttpEntity responseEntity, int errorCode) {

Review Comment:
   Thanks. 
   So, you believe there's a scenario where the HTTP status code is 200, but the status in the body is 429. I'm not certain this is accurate, but we should pay attention to this if we're not confident.
   
   OK, I will change my PR like this.
   > Checking the http status code first 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #26424: Handle response exception from Elasticsearch client

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26424:
URL: https://github.com/apache/beam/pull/26424#issuecomment-1522792170

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mrkm4ntr commented on pull request #26424: Handle response exception from Elasticsearch client

Posted by "mrkm4ntr (via GitHub)" <gi...@apache.org>.
mrkm4ntr commented on PR #26424:
URL: https://github.com/apache/beam/pull/26424#issuecomment-1522790936

   R: @egalpin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mrkm4ntr commented on pull request #26424: Handle response exception from Elasticsearch client

Posted by "mrkm4ntr (via GitHub)" <gi...@apache.org>.
mrkm4ntr commented on PR #26424:
URL: https://github.com/apache/beam/pull/26424#issuecomment-1527037298

   @egalpin Thanks. I checked the current tests and found that Elasticsearch returns HTTP 200 status for invalid bulk data (400 is only in response entity). While too many requests error is HTTP 429. I changed test to emulate non-200 status with 405 status.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on a diff in pull request #26424: Handle response exception from Elasticsearch client

Posted by "egalpin (via GitHub)" <gi...@apache.org>.
egalpin commented on code in PR #26424:
URL: https://github.com/apache/beam/pull/26424#discussion_r1290428111


##########
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java:
##########
@@ -1224,26 +1241,9 @@ static class DefaultRetryPredicate implements RetryPredicate {
         this(429);
       }
 
-      /** Returns true if the response has the error code for any mutation. */
-      private static boolean errorCodePresent(HttpEntity responseEntity, int errorCode) {

Review Comment:
   > So, you believe there's a scenario where the HTTP status code is 200, but the status in the body is 429.
   
   It's more so that I feel that we cannot remove the `errorCodePresent` method and rely on http status code alone. I have seen first hand (examples above) scenarios where 1 out of `n` bulk entities succeeds and therefore the http status code is 200, even if `n-1` bulk entities failed and had their associated `status` field in `items` set to something like `400`.
   
   I believe the failure mode for not properly handling http 429 is that the response body does not contain the key `items` which `errorCodePresent` currently depends on to look further for `status`.  Instead, in the case of 429, `status` is a top-level key in the response body. 
   
   So I feel that we need to keep all existing `errorCodePresent` logic, _and_ add additional handling to check the http status code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on a diff in pull request #26424: Handle response exception from Elasticsearch client

Posted by "egalpin (via GitHub)" <gi...@apache.org>.
egalpin commented on code in PR #26424:
URL: https://github.com/apache/beam/pull/26424#discussion_r1288883681


##########
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java:
##########
@@ -1224,26 +1241,9 @@ static class DefaultRetryPredicate implements RetryPredicate {
         this(429);
       }
 
-      /** Returns true if the response has the error code for any mutation. */
-      private static boolean errorCodePresent(HttpEntity responseEntity, int errorCode) {

Review Comment:
   Thanks @mrkm4ntr for your patience, I have kept you waiting too many times.
   
   This change is definitely interesting.  I can see that in some cases, such as successfully _creating_ a document via _bulk, the http status code is `200` while the `status` field under `items` in the response body is `201` (updating the doc results in `status` field being `200`:
   
   ```json
   {
       "took": 383,
       "errors": false,
       "items": [
           {
               "update": {
                   "_index": "foo",
                   "_type": "_doc",
                   "_id": "2",
                   "_version": 1,
                   "result": "created",
                   "_shards": {
                       "total": 2,
                       "successful": 1,
                       "failed": 0
                   },
                   "_seq_no": 0,
                   "_primary_term": 1,
                   "status": 201
               }
           }
       ]
   }
   ```
   
   It's also the case, more importantly, that when one of the operations of the bulk payload fails, but another succeeds, there can be an http status code of 200 but one of the items can have `status` field with non-200 code:
   
   ```json
   {
       "took": 19,
       "errors": true,
       "items": [
           {
               "update": {
                   "_index": "foo",
                   "_type": "_doc",
                   "_id": "2",
                   "_version": 7,
                   "result": "updated",
                   "_shards": {
                       "total": 2,
                       "successful": 2,
                       "failed": 0
                   },
                   "_seq_no": 7,
                   "_primary_term": 1,
                   "status": 200
               }
           },
           {
               "update": {
                   "_index": "foo",
                   "_type": "_doc",
                   "_id": "2",
                   "status": 400,
                   "error": {
                       "type": "illegal_argument_exception",
                       "reason": "failed to execute script",
                       "caused_by": {
                           "type": "script_exception",
                           "reason": "compile error",
                           "script_stack": [
                               "if(ctx._source.group !=== null) { ctx._source.gro ...",
                               "                        ^---- HERE"
                           ],
                           "script": "if(ctx._source.group !=== null) { ctx._source.group = params.id % 2 } else { ctx._source.group = 0 }",
                           "lang": "painless",
                           "position": {
                               "offset": 24,
                               "start": 0,
                               "end": 49
                           },
                           "caused_by": {
                               "type": "illegal_argument_exception",
                               "reason": "invalid sequence of tokens near ['='].",
                               "caused_by": {
                                   "type": "no_viable_alt_exception",
                                   "reason": "no_viable_alt_exception: null"
                               }
                           }
                       }
                   }
               }
           }
       ]
   }
   ```
   
   So the status code alone cannot be "trusted" to inform of all failures, since there could be false negatives.  However, it seems that the http status code can be used as a first check: if the http status code is >=400, we _know_ a failure has occurred and we don't need to iterate through the `items` necessarily.  
   
   Checking the http status code first might be a viable path forward for properly handling the 429 case.
   
   Thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Handle response exception from Elasticsearch client [beam]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26424:
URL: https://github.com/apache/beam/pull/26424#issuecomment-1951308839

   This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org