You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Joe Skora <js...@gmail.com> on 2015/09/03 14:57:55 UTC

Re: [jira] [Created] (NIFI-921) Create a processor to promote character delimited data to attributes

Is the (very) general idea sort of a combination of the functionality
ExtractText and UpdateAttributes?

On Thu, Sep 3, 2015 at 12:37 AM, Aldrin Piri (JIRA) <ji...@apache.org> wrote:

> Aldrin Piri created NIFI-921:
> --------------------------------
>
>              Summary: Create a processor to promote character delimited
> data to attributes
>                  Key: NIFI-921
>                  URL: https://issues.apache.org/jira/browse/NIFI-921
>              Project: Apache NiFi
>           Issue Type: Improvement
>           Components: Extensions
>             Reporter: Aldrin Piri
>             Priority: Minor
>
>
> A processor that can analyze content and promote character delimited data
> to attributes could prove quite helpful.
>
> There are a large number of "schemas"/formats that are simply character
> delimited formats.  Typically these records are quite small in format but
> "rich" in terms of the values that they possess.  This processor would
> provide an easy means to handle these simpler formats and make for an easy
> way to reason about data in this class of formats.
>
> We can approximate this by performing a regular expression within
> ExtractText and capturing groups, but this is not a good fit for regexes.
>
> The processor would handle likely be fed by a split text processor but,
> with some reasonable consideration, could handle this splitting of text
> along rows generating a unique flowfile for each.  Exact contract would
> need some consideration in terms of the content that passes through
> (entirety of original file, row by itself, row with header if it exists)
>
> Additionally, the processor could also consider if there is a header,
> delimited in the same fashion as each of its constituent records.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Re: [jira] [Created] (NIFI-921) Create a processor to promote character delimited data to attributes

Posted by Aldrin Piri <al...@gmail.com>.
Joe,

That's a pretty good summation.  The core idea is that there is a lot of
data transported around that is simply delimited.  I've seen this case pop
up several times and while regex with capture groups in ExtractText can get
you there, it's quite heavy handed and possibly quite verbose depending on
expression needed.


On Thu, Sep 3, 2015 at 7:57 AM, Joe Skora <js...@gmail.com> wrote:

> Is the (very) general idea sort of a combination of the functionality
> ExtractText and UpdateAttributes?
>
> On Thu, Sep 3, 2015 at 12:37 AM, Aldrin Piri (JIRA) <ji...@apache.org>
> wrote:
>
> > Aldrin Piri created NIFI-921:
> > --------------------------------
> >
> >              Summary: Create a processor to promote character delimited
> > data to attributes
> >                  Key: NIFI-921
> >                  URL: https://issues.apache.org/jira/browse/NIFI-921
> >              Project: Apache NiFi
> >           Issue Type: Improvement
> >           Components: Extensions
> >             Reporter: Aldrin Piri
> >             Priority: Minor
> >
> >
> > A processor that can analyze content and promote character delimited data
> > to attributes could prove quite helpful.
> >
> > There are a large number of "schemas"/formats that are simply character
> > delimited formats.  Typically these records are quite small in format but
> > "rich" in terms of the values that they possess.  This processor would
> > provide an easy means to handle these simpler formats and make for an
> easy
> > way to reason about data in this class of formats.
> >
> > We can approximate this by performing a regular expression within
> > ExtractText and capturing groups, but this is not a good fit for regexes.
> >
> > The processor would handle likely be fed by a split text processor but,
> > with some reasonable consideration, could handle this splitting of text
> > along rows generating a unique flowfile for each.  Exact contract would
> > need some consideration in terms of the content that passes through
> > (entirety of original file, row by itself, row with header if it exists)
> >
> > Additionally, the processor could also consider if there is a header,
> > delimited in the same fashion as each of its constituent records.
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.3.4#6332)
> >
>