You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Parth Agarwal (Jira)" <ji...@apache.org> on 2019/09/22 19:14:00 UTC

[jira] [Commented] (NIFI-4095) ExtractText should not require a capture group in every regular expression

    [ https://issues.apache.org/jira/browse/NIFI-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935403#comment-16935403 ] 

Parth Agarwal commented on NIFI-4095:
-------------------------------------

Hi everyone, a newb here so please bear with me as I explain my problem.

I am using ExtractText to extract a key: value pair from JSON. JSON is an array, with each object having the key:value pair. 

Dynamic property is 'key' with the regex "key"\s*:\s*"([^\"]*)"+ as value.

But I am getting attributes in the form of:

key       value1

key.1   value1

key.2   value2

Is it possible to not get the first attribute 'key'?

Thanks .

 

> ExtractText should not require a capture group in every regular expression
> --------------------------------------------------------------------------
>
>                 Key: NIFI-4095
>                 URL: https://issues.apache.org/jira/browse/NIFI-4095
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>    Affects Versions: 1.3.0
>            Reporter: Andy LoPresto
>            Assignee: Andy LoPresto
>            Priority: Major
>              Labels: extracttext, regular_expression, validation
>             Fix For: 1.4.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The {{ExtractText}} processor currently validates every regular expression and requires that it contain "between 1 and 40 capture groups". This seems to be a design decision, as the values are hardcoded into the [validator|https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ExtractText.java#L262-L262], but there are valid regular expressions that do not need an explicit capture group (especially when the expression is small and the full expression is the desired match). This results in unnecessary duplicate matches ("some_attr" and "some_attr.1" being identical). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)