You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Otto Fowler <ot...@gmail.com> on 2018/07/03 15:36:17 UTC

Re: upgraded to nifi 1.6.0 from 1.3.0: issues/concerns w/ SplitRecord and maybe CSVReader or JSONWriter

If you can provide a sample csv file ( cleaned of proprietary data ), and a
sample flow attached to a jira it could help, in the event nobody has seen
these issues.


On July 3, 2018 at 10:19:02, Jeremy Taylor (jeremy.taylor@acesinc.net)
wrote:

Greetings,

My team and I have been using the SplitRecord processor to to convert CSV
to JSON and to later ingest it for about a year.  I’m noticing some things
that I feel worked a lot better before.  I just don’t know exactly when
“before” was, but let’s say at initial testing and development time, my CSV
ingesting to JSON used to work a lot better. I have an AVRO schema setup
that gives default values to facilitate error handling.  I am observing
some troubling issues.  We recently upgraded to nifi 1.6.0 last month, so
I’m  naturally suspicious of the upgrade, but do not see known issues that
relate to this.  Has anyone noticed these issues or are some of these items
known issues? I’m dealing w/ probably over 40 CSV columns that I generate
for testing and then ingest.

Observations:

   1. I have decimals that I’m trying to take in as Strings.  Instead, we
   are seeing the empty string being retrieved on those fields from the CSV to
   JSON line-by-line conversion phase.  The empty string is the default if
   something is not found.
   2. I have integers w/ values being taken in as -1, the default, so the
   conversion phase is not working here properly.
   3. I have strings that exist that get dropped to empty string.
   4. And, I have a few integer fields that get correctly taken in as
   strings and match the right values in the CSV.



Regards,



-- 

Jeremy H. Taylor

Software Developer

ACES, Incorporated

http://acesinc.net

Re: upgraded to nifi 1.6.0 from 1.3.0: issues/concerns w/ SplitRecord and maybe CSVReader or JSONWriter

Posted by Matt Burgess <ma...@apache.org>.
Jeremy,

Sorry to hear of your struggles with this; also I can appreciate that
it is difficult to produce representative (but "clean") sample data.
Here's what I tried, perhaps you can comment on how your data/config
is different and I can zero in on what's happening:

Input CSV data (via GenerateFlowFile):

1,Hello,4.5
2,World,5.11

SplitRecord (1 record per flow file) with CSVReader (using default
Apache Commons CSV parser) and JsonRecordSetWriter. Writer inherits
schema, CSV schema is explicit (i.e. there is no header in the data
above) and set as Schema Text in the reader:

{
 "namespace": "nifi",
 "name": "test",
 "type": "record",
 "fields": [
  {"name": "id","type": "int", "default": -1},
  {"name": "s", "type": "string", "default": ""},
  {"name": "d", "type": {"type": "bytes","logicalType":
"decimal","precision": 4,"scale": 2}}
 ]
}

This trivial example works fine for all fields. Are there quotes
around various values, or are your decimals formatted differently
(scientific notation, e.g.), or are your reader/writer configured
differently from mine, etc.?

Regards,
Matt
On Tue, Jul 3, 2018 at 3:45 PM Jeremy Taylor <je...@acesinc.net> wrote:
>
> RE: Unfortunately, I cannot provide an example CSV at this time w/ even the column names being used and unhelpful data.  I tried to explain enough to help someone craft something.  I have a software deadline to meet and am unable to offer something at this time.  If and when, I come up for air more (I don’t know when that is yet), build a CSV and put something together in nifi for SplitRecord that helps simulate the problem, I’ll reach back out.  Giving you something w/o proprietary data or column names isn’t easy in this case. Nevertheless, I’ll be trying to satisfy a minimal requirement another way for now.
>
>
>
> --
>
> Jeremy H. Taylor
>
> Software Developer
>
> ACES, Incorporated
>
> http://acesinc.net
>
>
>
> From: Otto Fowler <ot...@gmail.com>
> Date: Tuesday, July 3, 2018 at 11:36 AM
> To: Jeremy Taylor <je...@acesinc.net>, "users@nifi.apache.org" <us...@nifi.apache.org>
> Subject: Re: upgraded to nifi 1.6.0 from 1.3.0: issues/concerns w/ SplitRecord and maybe CSVReader or JSONWriter
>
>
>
> If you can provide a sample csv file ( cleaned of proprietary data ), and a sample flow attached to a jira it could help, in the event nobody has seen these issues.
>
>
>
>
>
> On July 3, 2018 at 10:19:02, Jeremy Taylor (jeremy.taylor@acesinc.net) wrote:
>
> Greetings,
>
> My team and I have been using the SplitRecord processor to to convert CSV to JSON and to later ingest it for about a year.  I’m noticing some things that I feel worked a lot better before.  I just don’t know exactly when “before” was, but let’s say at initial testing and development time, my CSV ingesting to JSON used to work a lot better. I have an AVRO schema setup that gives default values to facilitate error handling.  I am observing some troubling issues.  We recently upgraded to nifi 1.6.0 last month, so I’m  naturally suspicious of the upgrade, but do not see known issues that relate to this.  Has anyone noticed these issues or are some of these items known issues? I’m dealing w/ probably over 40 CSV columns that I generate for testing and then ingest.
>
> Observations:
>
> I have decimals that I’m trying to take in as Strings.  Instead, we are seeing the empty string being retrieved on those fields from the CSV to JSON line-by-line conversion phase.  The empty string is the default if something is not found.
> I have integers w/ values being taken in as -1, the default, so the conversion phase is not working here properly.
> I have strings that exist that get dropped to empty string.
> And, I have a few integer fields that get correctly taken in as strings and match the right values in the CSV.
>
>
>
> Regards,
>
>
>
> --
>
> Jeremy H. Taylor
>
> Software Developer
>
> ACES, Incorporated
>
> http://acesinc.net

Re: upgraded to nifi 1.6.0 from 1.3.0: issues/concerns w/ SplitRecord and maybe CSVReader or JSONWriter

Posted by Jeremy Taylor <je...@acesinc.net>.
RE: Unfortunately, I cannot provide an example CSV at this time w/ even the column names being used and unhelpful data.  I tried to explain enough to help someone craft something.  I have a software deadline to meet and am unable to offer something at this time.  If and when, I come up for air more (I don’t know when that is yet), build a CSV and put something together in nifi for SplitRecord that helps simulate the problem, I’ll reach back out.  Giving you something w/o proprietary data or column names isn’t easy in this case. Nevertheless, I’ll be trying to satisfy a minimal requirement another way for now.

--
Jeremy H. Taylor
Software Developer
ACES, Incorporated
http://acesinc.net<http://acesinc.net/>

From: Otto Fowler <ot...@gmail.com>
Date: Tuesday, July 3, 2018 at 11:36 AM
To: Jeremy Taylor <je...@acesinc.net>, "users@nifi.apache.org" <us...@nifi.apache.org>
Subject: Re: upgraded to nifi 1.6.0 from 1.3.0: issues/concerns w/ SplitRecord and maybe CSVReader or JSONWriter

If you can provide a sample csv file ( cleaned of proprietary data ), and a sample flow attached to a jira it could help, in the event nobody has seen these issues.



On July 3, 2018 at 10:19:02, Jeremy Taylor (jeremy.taylor@acesinc.net<ma...@acesinc.net>) wrote:
Greetings,
My team and I have been using the SplitRecord processor to to convert CSV to JSON and to later ingest it for about a year.  I’m noticing some things that I feel worked a lot better before.  I just don’t know exactly when “before” was, but let’s say at initial testing and development time, my CSV ingesting to JSON used to work a lot better. I have an AVRO schema setup that gives default values to facilitate error handling.  I am observing some troubling issues.  We recently upgraded to nifi 1.6.0 last month, so I’m  naturally suspicious of the upgrade, but do not see known issues that relate to this.  Has anyone noticed these issues or are some of these items known issues? I’m dealing w/ probably over 40 CSV columns that I generate for testing and then ingest.
Observations:

  1.  I have decimals that I’m trying to take in as Strings.  Instead, we are seeing the empty string being retrieved on those fields from the CSV to JSON line-by-line conversion phase.  The empty string is the default if something is not found.
  2.  I have integers w/ values being taken in as -1, the default, so the conversion phase is not working here properly.
  3.  I have strings that exist that get dropped to empty string.
  4.  And, I have a few integer fields that get correctly taken in as strings and match the right values in the CSV.

Regards,

--
Jeremy H. Taylor
Software Developer
ACES, Incorporated
http://acesinc.net<http://acesinc.net/>