You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Jens M. Kofoed" <jm...@gmail.com> on 2021/02/25 07:02:27 UTC

Issue with QueryRecord failing when data is missing

Hi all

I have a issue with using the QueryRecord query csv files. currently i'm
running NiFi version 1.12.1 but I also tested this in version 1.13.0
If my incoming csv file only have a header line and no data it fails

My querying statement looks like this: SELECT colA FROM FLOWFILE WHERE colC
= 'true'

Changes made to the CSVReader:
Treat Firs Line as Header = true

Changes made to the CSVRecordSetWriter:
Include Header Line = false
Record Separator = ,

Here are 2 sample data. The first one works as expected, but sample 2 gives
errors
Sample 1:
colA,colB,colC
data1A,data1B,true
data2A,data2B,false
data3A,data3B,true

Outcome: data1A,data3A,

Sample 2:
colA,colB,colC

Error message:
QueryRecord[id=d7c38f75-0177-1000-ffff-fffff694dd96] Unable to query
StandardFlowFileRecord[uuid=74a71c6e-3d3f-406c-92af-c9e4e27d6d69,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1614232293848-3,
container=Node01Cont01, section=3], offset=463,
length=14],offset=0,name=74a71c6e-3d3f-406c-92af-c9e4e27d6d69,size=14] due
to java.sql.SQLException: Error while preparing statement [SELECT colA FROM
FLOWFILE WHERE colC = true]:
org.apache.nifi.processor.exception.ProcessException:
java.sql.SQLException: Error while preparing statement [SELECT colA FROM
FLOWFILE WHERE colC = true]

Is this a bug?

kind regards
Jens M. Kofoed

Re: Issue with QueryRecord failing when data is missing

Posted by "Jens M. Kofoed" <jm...@gmail.com>.
Hi Matt

Many thanks for you replay. Yes I use the default which is the
infer-schema. And it makes sense that the infer-schema can't guess what
type the fields are if there are no data.
So if I search for a boolean value and the infer-schema set it to text, it
makes sense it produce errors. And why did I not think of it when I sat and
tore my hair out of my head :-)

As a workaround I have used a ValidateRecord and a RouteOnAttribute using
the record.count.
So many thanks for the answer.

Kind regards
Jens M. Kofoed

Den tor. 25. feb. 2021 kl. 16.28 skrev Matt Burgess <ma...@apache.org>:

> Jens,
>
> What is the Schema Access Strategy set to in your CSVReader? If "Infer
> Schema" or "Use String Fields From Header", the setting of "Treat
> First Line As Header" should be ignored as those two options require a
> header be present anyway. If you know the schema ahead of time you
> could set it in the CSVReader rather than inferring it.
>
> For "Infer Schema", there's a bug where the inferred schema is empty
> because we don't have any records from which to infer the types of the
> fields (even though the field names are present). I wrote up NIFI-8259
> [1] to infer the types as strings when no records are present.
>
> As a workaround you could filter out any FlowFiles that have no
> records, either by using CountText or the 'record.count' attribute if
> it has been set, into a RouteOnAttribute. Alternatively you could
> emulate what NIFI-8259 is going to do by using "Use String Fields From
> Header" in your CSVReader, but in that case you might need a CAST(colC
> as BOOLEAN) in your SQL since populated FlowFiles could have the
> correctly inferred schema where empty FlowFiles (or if "Use String
> Fields From Header" is set) will think colC is a string rather than a
> boolean. The CAST should work in both cases but I didn't try it.
>
> Regards,
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-8259
>
> On Thu, Feb 25, 2021 at 1:56 AM Jens M. Kofoed <jm...@gmail.com>
> wrote:
> >
> > Hi all
> >
> > I have a issue with using the QueryRecord query csv files. currently i'm
> running NiFi version 1.12.1 but I also tested this in version 1.13.0
> > If my incoming csv file only have a header line and no data it fails
> >
> > My querying statement looks like this: SELECT colA FROM FLOWFILE WHERE
> colC = 'true'
> >
> > Changes made to the CSVReader:
> > Treat Firs Line as Header = true
> >
> > Changes made to the CSVRecordSetWriter:
> > Include Header Line = false
> > Record Separator = ,
> >
> > Here are 2 sample data. The first one works as expected, but sample 2
> gives errors
> > Sample 1:
> > colA,colB,colC
> > data1A,data1B,true
> > data2A,data2B,false
> > data3A,data3B,true
> >
> > Outcome: data1A,data3A,
> >
> > Sample 2:
> > colA,colB,colC
> >
> > Error message:
> > QueryRecord[id=d7c38f75-0177-1000-ffff-fffff694dd96] Unable to query
> StandardFlowFileRecord[uuid=74a71c6e-3d3f-406c-92af-c9e4e27d6d69,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1614232293848-3,
> container=Node01Cont01, section=3], offset=463,
> length=14],offset=0,name=74a71c6e-3d3f-406c-92af-c9e4e27d6d69,size=14] due
> to java.sql.SQLException: Error while preparing statement [SELECT colA FROM
> FLOWFILE WHERE colC = true]:
> org.apache.nifi.processor.exception.ProcessException:
> java.sql.SQLException: Error while preparing statement [SELECT colA FROM
> FLOWFILE WHERE colC = true]
> >
> > Is this a bug?
> >
> > kind regards
> > Jens M. Kofoed
>

Re: Issue with QueryRecord failing when data is missing

Posted by Matt Burgess <ma...@apache.org>.
Jens,

What is the Schema Access Strategy set to in your CSVReader? If "Infer
Schema" or "Use String Fields From Header", the setting of "Treat
First Line As Header" should be ignored as those two options require a
header be present anyway. If you know the schema ahead of time you
could set it in the CSVReader rather than inferring it.

For "Infer Schema", there's a bug where the inferred schema is empty
because we don't have any records from which to infer the types of the
fields (even though the field names are present). I wrote up NIFI-8259
[1] to infer the types as strings when no records are present.

As a workaround you could filter out any FlowFiles that have no
records, either by using CountText or the 'record.count' attribute if
it has been set, into a RouteOnAttribute. Alternatively you could
emulate what NIFI-8259 is going to do by using "Use String Fields From
Header" in your CSVReader, but in that case you might need a CAST(colC
as BOOLEAN) in your SQL since populated FlowFiles could have the
correctly inferred schema where empty FlowFiles (or if "Use String
Fields From Header" is set) will think colC is a string rather than a
boolean. The CAST should work in both cases but I didn't try it.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-8259

On Thu, Feb 25, 2021 at 1:56 AM Jens M. Kofoed <jm...@gmail.com> wrote:
>
> Hi all
>
> I have a issue with using the QueryRecord query csv files. currently i'm running NiFi version 1.12.1 but I also tested this in version 1.13.0
> If my incoming csv file only have a header line and no data it fails
>
> My querying statement looks like this: SELECT colA FROM FLOWFILE WHERE colC = 'true'
>
> Changes made to the CSVReader:
> Treat Firs Line as Header = true
>
> Changes made to the CSVRecordSetWriter:
> Include Header Line = false
> Record Separator = ,
>
> Here are 2 sample data. The first one works as expected, but sample 2 gives errors
> Sample 1:
> colA,colB,colC
> data1A,data1B,true
> data2A,data2B,false
> data3A,data3B,true
>
> Outcome: data1A,data3A,
>
> Sample 2:
> colA,colB,colC
>
> Error message:
> QueryRecord[id=d7c38f75-0177-1000-ffff-fffff694dd96] Unable to query StandardFlowFileRecord[uuid=74a71c6e-3d3f-406c-92af-c9e4e27d6d69,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1614232293848-3, container=Node01Cont01, section=3], offset=463, length=14],offset=0,name=74a71c6e-3d3f-406c-92af-c9e4e27d6d69,size=14] due to java.sql.SQLException: Error while preparing statement [SELECT colA FROM FLOWFILE WHERE colC = true]: org.apache.nifi.processor.exception.ProcessException: java.sql.SQLException: Error while preparing statement [SELECT colA FROM FLOWFILE WHERE colC = true]
>
> Is this a bug?
>
> kind regards
> Jens M. Kofoed