You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ChrisSamo632 (via GitHub)" <gi...@apache.org> on 2023/04/19 11:14:34 UTC

[GitHub] [nifi] ChrisSamo632 commented on pull request #6903: NIFI-11111 add option to output Elasticsearch error responses as FlowFile to PutElasticsearchJson and PutElasticsearchRecord

ChrisSamo632 commented on PR #6903:
URL: https://github.com/apache/nifi/pull/6903#issuecomment-1514554132

   @davis-anthony this PR if/when Approved (@MikeThomsen / @mattyb149 ) was not going to introduce such behaviour, *but* it *does* make sense for `PutElasticsearchJson` so I've added a new `elasticsearch.bulk.error` attribute for files being sent to the `errors` relationship - this will contain the `_bulk` API's response for the document if Elasticsearch has marked it as `error`ed (and if it's `not_found` if you set the `Treat "Not Found" as Success` to `false`)
   
   It's not as simple for `PutElasticsearchRecord` because the `errors` output FlowFile may contain multiple records (each being a document sent to Elasticsearch). So we'd either be serialising all errors into a single attribute that could be huge (and probably break attribute value limits) or adding an attribute for every single record, which would cause memory issues in NiFi. An alternative would be to produce a single `errors` FlowFile for every errored Record from the input FlowFile, which again would cause performance problems in NiFi if you're trying to process large amounts of Records (which is the big benefit of Record-based processors, e.g. millions of records within a single file).
   
   A flow I've used before for handling things like errors from a record processor such as `PutElasticsearchRecord` is to send the `errors` to ` PutDistributedMapCache` keyed on the document `_id` (which is in the error response from Elasticsearch) and then using the `FetchDistributedMapCache` or `LookupAttribute` to enrich each of the records in the `PutElasticsearchRecord` output in the cases where there's an `_id` match - it's a bit fidly and could require splitting FlowFiles by Record (which again brings us back to the performance hit mentioned above).
   
   *Notes*:
   - `PutElasticsearchHttp` (and `PutElasticsearchHttpRecord`) are _deprecated_ is recent 1.x versions of NiFi and will be *removed* in NiFi 2.x
   - the `elasticsearch.put.error` attribute for both `PutElasticsearchJson` and `PutElasticsearchRecord` are used for general Elasticsearch connection error reporting, e.g. if the Elasticsearch instance/cluster cis not found or authentication/authorisation fails, etc., and FlowFile processing, e.g. if the content of the FlowFile sent to `PutElastichsearchJson` can't be parsed as a JSON object


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org