You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by st...@bt.com on 2022/03/15 12:35:55 UTC

QueryRecord with Union type

I am having a play with QueryRecord to do some filtering but I have run across this problem. I have a schema for my records which includes a union type, so the relevant part of the schema is

{
  "type":"record",
  "namespace":"blah",
  "name":"SimpleTraffic",
  "fields":[
    {"name":"src_address","type":"string"},
    {"name":"flag_s","type":["int","boolean"]}
  ]
}

This is because I am processing CSV records that look this, where 1 is true and 0 is false.

192.168.0.1,1

Into JSON that looks like this, using a ConvertRecord and an Update Record.

{"src_address":"192.168.0.1","flag_s":true}

Then I create a QueryRecord so I can filter out the cases where the flag is false. So I use this query.

select * from flowfile where flag_s = true

But I get this error

org.apache.calcite.sql.validate.SqlValidatorException: Cannot apply '=' to arguments of type '<JAVATYPE(CLASS JAVA.LANG.OBJECT)> = <BOOLEAN>'

Is this because the type is a Union type and the Calcite processor cannot work out which subtype it should be? Can I do anything to persuade the query to use an operator or a function on this field to make it usable? I have tried casting to Boolean or Char but no success. Or do I need to use two separate "before" and "after" schemas to eliminate the union?

Regards

Steve Hindmarch


Re: QueryRecord with Union type

Posted by Mike Thomsen <mi...@gmail.com>.
{"name":"flag_s","type":["int","boolean"]}

We have a lot of type massaging baked into the Record API. If the int
version is meant to be used as a boolean equivalent (0 = false,
anything else is truthy) then this is something that probably already
is or should be covered by that type massaging.

On Fri, Mar 18, 2022 at 5:50 AM <st...@bt.com> wrote:
>
> Mark,
>
>
>
> Thank you for your response. I thought that was probably the case, but I tried a cast and it did not work. I got this error.
>
>
>
> Query:
>
> select *
>
> from flowfile
>
> where cast(flag_s as boolean) = true
>
>
>
> Error:
>
> org.apache.calcite.sql.validate.SqlValidatorException: Cast function cannot convert value of type JavaType(class java.lang.Object) to type BOOLEAN
>
>
>
> By taking the union out of the input schema I could get the query to work, but I did find myself getting tangled up in managing various schemas so I am trying to use infer/inherit read/write services instead. I have inherited a very complex flow from a team that have long departed and am looking to simplify it to improve performance and maintainability. I need to convert from CSV/TSV to JSON, normalise fields, filter unwanted records, enrich with more JSON and finally publish to a customer defined schema, so I do need a few steps along the way. I am exploring each step in order to validate my redesign so I take your point about minimising the number of processes and will look again at combining steps in the query process, although I am also a fan of the JOLT transform as I have used that often in previous projects.
>
>
>
> Regards
>
> Steve Hindmarch
>
>
>
> From: Mark Payne <ma...@hotmail.com>
> Sent: 17 March 2022 14:17
> To: users <us...@nifi.apache.org>
> Subject: Re: QueryRecord with Union type
>
>
>
> Steve,
>
>
>
> Because your schema has a union, the SQL engine doesn’t really know how to interpret the data. So it interprets it as a “Java Object.” Essentially,
>
> it could be anything. But you can’t compare just anything to true - you need to compare a boolean to true. So you need to tell the SQL engine that the
>
> value you’re looking at is, in fact, a boolean.
>
>
>
> You can do that with a simple CAST() function in your SQL:
>
>
>
> SELECT *
>
> FROM FLOWFILE
>
> WHERE CAST(flag_s AS BOOLEAN) = true
>
>
>
> That should give you what you’re looking for.
>
>
>
> Also worth nothing - you mentioned that you’re using ConvertRecord and UpdateRecord before QueryRecord.
>
> 99% of the time, you should not be using ConvertRecord in conjunction with any other Record processor. Because the Record processors like UpdateRecord
>
> allow you to use any Record Reader, it doesn’t make sense to convert the data first using ConvertRecord - it’s just extra overhead.
>
> And, in fact, you may be able to eliminated the UpdateRecord, as well, as just use the SQL within QueryRecord to perform the transformation needed on the fly,
>
> rather than having another step to update the data, which requires reading the data, parsing it, updating it, serializing the data, writing the data. This may not
>
> be possible, depends on what you’re updating. But QueryRecord does support RecordPath expressions so it’s worth considering.
>
>
>
> Thanks
>
> -Mark
>
>
>
>
>
>
>
> On Mar 15, 2022, at 8:35 AM, stephen.hindmarch@bt.com wrote:
>
>
>
> I am having a play with QueryRecord to do some filtering but I have run across this problem. I have a schema for my records which includes a union type, so the relevant part of the schema is
>
>
>
> {
>
>   "type":"record",
>
>   "namespace":"blah",
>
>   "name":"SimpleTraffic",
>
>   "fields":[
>
>     {"name":"src_address","type":"string"},
>
>     {"name":"flag_s","type":["int","boolean"]}
>
>   ]
>
> }
>
>
>
> This is because I am processing CSV records that look this, where 1 is true and 0 is false.
>
>
>
> 192.168.0.1,1
>
>
>
> Into JSON that looks like this, using a ConvertRecord and an Update Record.
>
>
>
> {"src_address":"192.168.0.1","flag_s":true}
>
>
>
> Then I create a QueryRecord so I can filter out the cases where the flag is false. So I use this query.
>
>
>
> select * from flowfile where flag_s = true
>
>
>
> But I get this error
>
>
>
> org.apache.calcite.sql.validate.SqlValidatorException: Cannot apply '=' to arguments of type '<JAVATYPE(CLASS JAVA.LANG.OBJECT)> = <BOOLEAN>'
>
>
>
> Is this because the type is a Union type and the Calcite processor cannot work out which subtype it should be? Can I do anything to persuade the query to use an operator or a function on this field to make it usable? I have tried casting to Boolean or Char but no success. Or do I need to use two separate “before” and “after” schemas to eliminate the union?
>
>
>
> Regards
>
>
>
> Steve Hindmarch
>
>

RE: QueryRecord with Union type

Posted by st...@bt.com.
Mark,

I am using a Docker image of apache/nifi:latest which I pulled about 2 months ago. Image ID 0070c5c5d9fd. From the UI it says tag 1.15.3-RC1 and commit 753c311. I am guessing it must be this: GitHub - apache/nifi at nifi-1.15.3-RC1<https://github.com/apache/nifi/tree/nifi-1.15.3-RC1>.

Just to let you know, I did come to a satisfactory solution that does a convert, normalise, filter and coerce to a new schema using only two processors and two services (plus a schema registry). I first use a query record which if I work with the integer values means I have “select * from flowfile where flag_s = 1”. Doing the filtering early feels like a better idea, as I am not processing records I later discard. By using a CSV reader and a JSON writer I get the conversion from CSV to JSON for free. And by working with the integers I have no need for the union.

Then I use a JOLT transform to coerce the JSON into the form of the new schema. At the same time I can switch the int values to booleans by using a modify-overwrite. Here is the step, operating on the flag in its new position (“Network.flags.s”), which uses a trick of having the integer value pick the boolean out of a fixed array.

{
   "operation":"modify-overwrite-beta",
   "spec": {
     "*": {
       "Network": {
         "flags": {
           "*": "=elementAt(@(0),false,true)"
         }
       }
     }
   }
}

Regards,
Steve Hindmarch

From: Mark Payne <ma...@hotmail.com>
Sent: 18 March 2022 14:44
To: users@nifi.apache.org
Subject: Re: QueryRecord with Union type

Steve,

What version of nifi are you running? I’d tried that on the latest “main” branch and it worked as expected.

Thanks
-Mark


On Mar 18, 2022, at 5:49 AM, stephen.hindmarch@bt.com<ma...@bt.com> wrote:

Mark,

Thank you for your response. I thought that was probably the case, but I tried a cast and it did not work. I got this error.

Query:
select *
from flowfile
where cast(flag_s as boolean) = true

Error:
org.apache.calcite.sql.validate.SqlValidatorException: Cast function cannot convert value of type JavaType(class java.lang.Object) to type BOOLEAN

By taking the union out of the input schema I could get the query to work, but I did find myself getting tangled up in managing various schemas so I am trying to use infer/inherit read/write services instead. I have inherited a very complex flow from a team that have long departed and am looking to simplify it to improve performance and maintainability. I need to convert from CSV/TSV to JSON, normalise fields, filter unwanted records, enrich with more JSON and finally publish to a customer defined schema, so I do need a few steps along the way. I am exploring each step in order to validate my redesign so I take your point about minimising the number of processes and will look again at combining steps in the query process, although I am also a fan of the JOLT transform as I have used that often in previous projects.

Regards
Steve Hindmarch

From: Mark Payne <ma...@hotmail.com>>
Sent: 17 March 2022 14:17
To: users <us...@nifi.apache.org>>
Subject: Re: QueryRecord with Union type

Steve,

Because your schema has a union, the SQL engine doesn’t really know how to interpret the data. So it interprets it as a “Java Object.” Essentially,
it could be anything. But you can’t compare just anything to true - you need to compare a boolean to true. So you need to tell the SQL engine that the
value you’re looking at is, in fact, a boolean.

You can do that with a simple CAST() function in your SQL:

SELECT *
FROM FLOWFILE
WHERE CAST(flag_s AS BOOLEAN) = true

That should give you what you’re looking for.

Also worth nothing - you mentioned that you’re using ConvertRecord and UpdateRecord before QueryRecord.
99% of the time, you should not be using ConvertRecord in conjunction with any other Record processor. Because the Record processors like UpdateRecord
allow you to use any Record Reader, it doesn’t make sense to convert the data first using ConvertRecord - it’s just extra overhead.
And, in fact, you may be able to eliminated the UpdateRecord, as well, as just use the SQL within QueryRecord to perform the transformation needed on the fly,
rather than having another step to update the data, which requires reading the data, parsing it, updating it, serializing the data, writing the data. This may not
be possible, depends on what you’re updating. But QueryRecord does support RecordPath expressions so it’s worth considering.

Thanks
-Mark





On Mar 15, 2022, at 8:35 AM, stephen.hindmarch@bt.com<ma...@bt.com> wrote:

I am having a play with QueryRecord to do some filtering but I have run across this problem. I have a schema for my records which includes a union type, so the relevant part of the schema is

{
  "type":"record",
  "namespace":"blah",
  "name":"SimpleTraffic",
  "fields":[
    {"name":"src_address","type":"string"},
    {"name":"flag_s","type":["int","boolean"]}
  ]
}

This is because I am processing CSV records that look this, where 1 is true and 0 is false.

192.168.0.1,1

Into JSON that looks like this, using a ConvertRecord and an Update Record.

{"src_address":"192.168.0.1","flag_s":true}

Then I create a QueryRecord so I can filter out the cases where the flag is false. So I use this query.

select * from flowfile where flag_s = true

But I get this error

org.apache.calcite.sql.validate.SqlValidatorException: Cannot apply '=' to arguments of type '<JAVATYPE(CLASS JAVA.LANG.OBJECT)> = <BOOLEAN>'

Is this because the type is a Union type and the Calcite processor cannot work out which subtype it should be? Can I do anything to persuade the query to use an operator or a function on this field to make it usable? I have tried casting to Boolean or Char but no success. Or do I need to use two separate “before” and “after” schemas to eliminate the union?

Regards

Steve Hindmarch


Re: QueryRecord with Union type

Posted by Mark Payne <ma...@hotmail.com>.
Steve,

What version of nifi are you running? I’d tried that on the latest “main” branch and it worked as expected.

Thanks
-Mark

On Mar 18, 2022, at 5:49 AM, stephen.hindmarch@bt.com<ma...@bt.com> wrote:

Mark,

Thank you for your response. I thought that was probably the case, but I tried a cast and it did not work. I got this error.

Query:
select *
from flowfile
where cast(flag_s as boolean) = true

Error:
org.apache.calcite.sql.validate.SqlValidatorException: Cast function cannot convert value of type JavaType(class java.lang.Object) to type BOOLEAN

By taking the union out of the input schema I could get the query to work, but I did find myself getting tangled up in managing various schemas so I am trying to use infer/inherit read/write services instead. I have inherited a very complex flow from a team that have long departed and am looking to simplify it to improve performance and maintainability. I need to convert from CSV/TSV to JSON, normalise fields, filter unwanted records, enrich with more JSON and finally publish to a customer defined schema, so I do need a few steps along the way. I am exploring each step in order to validate my redesign so I take your point about minimising the number of processes and will look again at combining steps in the query process, although I am also a fan of the JOLT transform as I have used that often in previous projects.

Regards
Steve Hindmarch

From: Mark Payne <ma...@hotmail.com>>
Sent: 17 March 2022 14:17
To: users <us...@nifi.apache.org>>
Subject: Re: QueryRecord with Union type

Steve,

Because your schema has a union, the SQL engine doesn’t really know how to interpret the data. So it interprets it as a “Java Object.” Essentially,
it could be anything. But you can’t compare just anything to true - you need to compare a boolean to true. So you need to tell the SQL engine that the
value you’re looking at is, in fact, a boolean.

You can do that with a simple CAST() function in your SQL:

SELECT *
FROM FLOWFILE
WHERE CAST(flag_s AS BOOLEAN) = true

That should give you what you’re looking for.

Also worth nothing - you mentioned that you’re using ConvertRecord and UpdateRecord before QueryRecord.
99% of the time, you should not be using ConvertRecord in conjunction with any other Record processor. Because the Record processors like UpdateRecord
allow you to use any Record Reader, it doesn’t make sense to convert the data first using ConvertRecord - it’s just extra overhead.
And, in fact, you may be able to eliminated the UpdateRecord, as well, as just use the SQL within QueryRecord to perform the transformation needed on the fly,
rather than having another step to update the data, which requires reading the data, parsing it, updating it, serializing the data, writing the data. This may not
be possible, depends on what you’re updating. But QueryRecord does support RecordPath expressions so it’s worth considering.

Thanks
-Mark




On Mar 15, 2022, at 8:35 AM, stephen.hindmarch@bt.com<ma...@bt.com> wrote:

I am having a play with QueryRecord to do some filtering but I have run across this problem. I have a schema for my records which includes a union type, so the relevant part of the schema is

{
  "type":"record",
  "namespace":"blah",
  "name":"SimpleTraffic",
  "fields":[
    {"name":"src_address","type":"string"},
    {"name":"flag_s","type":["int","boolean"]}
  ]
}

This is because I am processing CSV records that look this, where 1 is true and 0 is false.

192.168.0.1,1

Into JSON that looks like this, using a ConvertRecord and an Update Record.

{"src_address":"192.168.0.1","flag_s":true}

Then I create a QueryRecord so I can filter out the cases where the flag is false. So I use this query.

select * from flowfile where flag_s = true

But I get this error

org.apache.calcite.sql.validate.SqlValidatorException: Cannot apply '=' to arguments of type '<JAVATYPE(CLASS JAVA.LANG.OBJECT)> = <BOOLEAN>'

Is this because the type is a Union type and the Calcite processor cannot work out which subtype it should be? Can I do anything to persuade the query to use an operator or a function on this field to make it usable? I have tried casting to Boolean or Char but no success. Or do I need to use two separate “before” and “after” schemas to eliminate the union?

Regards

Steve Hindmarch


RE: QueryRecord with Union type

Posted by st...@bt.com.
Mark,

Thank you for your response. I thought that was probably the case, but I tried a cast and it did not work. I got this error.

Query:
select *
from flowfile
where cast(flag_s as boolean) = true

Error:
org.apache.calcite.sql.validate.SqlValidatorException: Cast function cannot convert value of type JavaType(class java.lang.Object) to type BOOLEAN

By taking the union out of the input schema I could get the query to work, but I did find myself getting tangled up in managing various schemas so I am trying to use infer/inherit read/write services instead. I have inherited a very complex flow from a team that have long departed and am looking to simplify it to improve performance and maintainability. I need to convert from CSV/TSV to JSON, normalise fields, filter unwanted records, enrich with more JSON and finally publish to a customer defined schema, so I do need a few steps along the way. I am exploring each step in order to validate my redesign so I take your point about minimising the number of processes and will look again at combining steps in the query process, although I am also a fan of the JOLT transform as I have used that often in previous projects.

Regards
Steve Hindmarch

From: Mark Payne <ma...@hotmail.com>
Sent: 17 March 2022 14:17
To: users <us...@nifi.apache.org>
Subject: Re: QueryRecord with Union type

Steve,

Because your schema has a union, the SQL engine doesn’t really know how to interpret the data. So it interprets it as a “Java Object.” Essentially,
it could be anything. But you can’t compare just anything to true - you need to compare a boolean to true. So you need to tell the SQL engine that the
value you’re looking at is, in fact, a boolean.

You can do that with a simple CAST() function in your SQL:

SELECT *
FROM FLOWFILE
WHERE CAST(flag_s AS BOOLEAN) = true

That should give you what you’re looking for.

Also worth nothing - you mentioned that you’re using ConvertRecord and UpdateRecord before QueryRecord.
99% of the time, you should not be using ConvertRecord in conjunction with any other Record processor. Because the Record processors like UpdateRecord
allow you to use any Record Reader, it doesn’t make sense to convert the data first using ConvertRecord - it’s just extra overhead.
And, in fact, you may be able to eliminated the UpdateRecord, as well, as just use the SQL within QueryRecord to perform the transformation needed on the fly,
rather than having another step to update the data, which requires reading the data, parsing it, updating it, serializing the data, writing the data. This may not
be possible, depends on what you’re updating. But QueryRecord does support RecordPath expressions so it’s worth considering.

Thanks
-Mark




On Mar 15, 2022, at 8:35 AM, stephen.hindmarch@bt.com<ma...@bt.com> wrote:

I am having a play with QueryRecord to do some filtering but I have run across this problem. I have a schema for my records which includes a union type, so the relevant part of the schema is

{
  "type":"record",
  "namespace":"blah",
  "name":"SimpleTraffic",
  "fields":[
    {"name":"src_address","type":"string"},
    {"name":"flag_s","type":["int","boolean"]}
  ]
}

This is because I am processing CSV records that look this, where 1 is true and 0 is false.

192.168.0.1,1

Into JSON that looks like this, using a ConvertRecord and an Update Record.

{"src_address":"192.168.0.1","flag_s":true}

Then I create a QueryRecord so I can filter out the cases where the flag is false. So I use this query.

select * from flowfile where flag_s = true

But I get this error

org.apache.calcite.sql.validate.SqlValidatorException: Cannot apply '=' to arguments of type '<JAVATYPE(CLASS JAVA.LANG.OBJECT)> = <BOOLEAN>'

Is this because the type is a Union type and the Calcite processor cannot work out which subtype it should be? Can I do anything to persuade the query to use an operator or a function on this field to make it usable? I have tried casting to Boolean or Char but no success. Or do I need to use two separate “before” and “after” schemas to eliminate the union?

Regards

Steve Hindmarch


Re: QueryRecord with Union type

Posted by Mark Payne <ma...@hotmail.com>.
Steve,

Because your schema has a union, the SQL engine doesn’t really know how to interpret the data. So it interprets it as a “Java Object.” Essentially,
it could be anything. But you can’t compare just anything to true - you need to compare a boolean to true. So you need to tell the SQL engine that the
value you’re looking at is, in fact, a boolean.

You can do that with a simple CAST() function in your SQL:

SELECT *
FROM FLOWFILE
WHERE CAST(flag_s AS BOOLEAN) = true

That should give you what you’re looking for.

Also worth nothing - you mentioned that you’re using ConvertRecord and UpdateRecord before QueryRecord.
99% of the time, you should not be using ConvertRecord in conjunction with any other Record processor. Because the Record processors like UpdateRecord
allow you to use any Record Reader, it doesn’t make sense to convert the data first using ConvertRecord - it’s just extra overhead.
And, in fact, you may be able to eliminated the UpdateRecord, as well, as just use the SQL within QueryRecord to perform the transformation needed on the fly,
rather than having another step to update the data, which requires reading the data, parsing it, updating it, serializing the data, writing the data. This may not
be possible, depends on what you’re updating. But QueryRecord does support RecordPath expressions so it’s worth considering.

Thanks
-Mark



On Mar 15, 2022, at 8:35 AM, stephen.hindmarch@bt.com<ma...@bt.com> wrote:

I am having a play with QueryRecord to do some filtering but I have run across this problem. I have a schema for my records which includes a union type, so the relevant part of the schema is

{
  "type":"record",
  "namespace":"blah",
  "name":"SimpleTraffic",
  "fields":[
    {"name":"src_address","type":"string"},
    {"name":"flag_s","type":["int","boolean"]}
  ]
}

This is because I am processing CSV records that look this, where 1 is true and 0 is false.

192.168.0.1,1

Into JSON that looks like this, using a ConvertRecord and an Update Record.

{"src_address":"192.168.0.1","flag_s":true}

Then I create a QueryRecord so I can filter out the cases where the flag is false. So I use this query.

select * from flowfile where flag_s = true

But I get this error

org.apache.calcite.sql.validate.SqlValidatorException: Cannot apply '=' to arguments of type '<JAVATYPE(CLASS JAVA.LANG.OBJECT)> = <BOOLEAN>'

Is this because the type is a Union type and the Calcite processor cannot work out which subtype it should be? Can I do anything to persuade the query to use an operator or a function on this field to make it usable? I have tried casting to Boolean or Char but no success. Or do I need to use two separate “before” and “after” schemas to eliminate the union?

Regards

Steve Hindmarch