You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Mike Thomsen <mi...@gmail.com> on 2019/07/28 14:19:54 UTC
Record path API chooses CHOICE[STRING, RECORD] when the field is missing
I have a simple avro schema in a test case that looks like this:
{
"type": "record",
"name": "PersonRecord",
"fields": [
{ "name": "firstName", "type": "string" },
{ "name": "lastName", "type": "string" },
{ "name": "creationDateTime", "type": [ "null", "type": "long",
"logicalType": "timestamp-millis" }]
]
}
Then I try something like this...
RecordPath path = recordPathCache.getCompiled("/creationDateTime");
RecordPathResult rp = path.evaluate(targetRecord);
Optional<FieldValue> nodeField = rp.getSelectedFields().findFirst();
if (!nodeField.isPresent()) {
throw new ProcessException("...");
}
FieldValue fieldValue = nodeField.get();
//fieldValue.getField() is a Choice of String, Record
Is there a way to get the correct field type here? I assume that
Choice[String, Record] default here was done to facilitate schema inference.
Thanks,
Mike
Re: Record path API chooses CHOICE[STRING, RECORD] when the field is missing
Posted by Mike Thomsen <mi...@gmail.com>.
It was JsonTreeReader.
Thanks,
Mike
On Mon, Jul 29, 2019 at 3:36 PM Mark Payne <ma...@hotmail.com> wrote:
> Mike,
>
> What Record Reader is being used here? The problem appears to be due to
> the Record Reader itself assigning that as the field type.
>
> I created a dummy unit test to verify the RecordPath stuff is correct:
>
>
> @Test
> public void testFromEmail() {
> final List<RecordField> fields = new ArrayList<>();
> fields.add(new RecordField("firstName",
> RecordFieldType.STRING.getDataType()));
> fields.add(new RecordField("lastName",
> RecordFieldType.STRING.getDataType()));
> fields.add(new RecordField("creationDateTime",
> RecordFieldType.TIMESTAMP.getDataType(), true));
> final RecordSchema schema = new SimpleRecordSchema(fields);
>
> final Map<String, Object> values = new HashMap<>();
> values.put("firstName", "John");
> values.put("lastName", "Doe");
> values.put("creationDateTime", new
> Timestamp(System.currentTimeMillis()));
> final Record record = new MapRecord(schema, values);
>
> final Optional<FieldValue> optionalFieldValue =
> RecordPath.compile("/creationDateTime").evaluate(record).getSelectedFields().findFirst();
> final FieldValue fieldValue = optionalFieldValue.get();
> System.out.println(fieldValue.getField());
> }
>
> Which prints out the correct field type:
>
> RecordField[name=creationDateTime, dataType=TIMESTAMP:yyyy-MM-dd HH:mm:ss,
> nullable=true]
>
> So I presume the Record Reader may not be properly applying the schema to
> the Record that it returns.
>
> Thanks
> -Mark
>
> On Jul 28, 2019, at 10:19 AM, Mike Thomsen <mikerthomsen@gmail.com<mailto:
> mikerthomsen@gmail.com>> wrote:
>
> I have a simple avro schema in a test case that looks like this:
>
> {
> "type": "record",
> "name": "PersonRecord",
> "fields": [
> { "name": "firstName", "type": "string" },
> { "name": "lastName", "type": "string" },
> { "name": "creationDateTime", "type": [ "null", "type": "long",
> "logicalType": "timestamp-millis" }]
> ]
> }
>
> Then I try something like this...
>
> RecordPath path = recordPathCache.getCompiled("/creationDateTime");
> RecordPathResult rp = path.evaluate(targetRecord);
> Optional<FieldValue> nodeField = rp.getSelectedFields().findFirst();
>
> if (!nodeField.isPresent()) {
> throw new ProcessException("...");
> }
>
> FieldValue fieldValue = nodeField.get();
> //fieldValue.getField() is a Choice of String, Record
>
> Is there a way to get the correct field type here? I assume that
> Choice[String, Record] default here was done to facilitate schema
> inference.
>
> Thanks,
>
> Mike
>
>
Re: Record path API chooses CHOICE[STRING, RECORD] when the field is
missing
Posted by Mark Payne <ma...@hotmail.com>.
Mike,
What Record Reader is being used here? The problem appears to be due to the Record Reader itself assigning that as the field type.
I created a dummy unit test to verify the RecordPath stuff is correct:
@Test
public void testFromEmail() {
final List<RecordField> fields = new ArrayList<>();
fields.add(new RecordField("firstName", RecordFieldType.STRING.getDataType()));
fields.add(new RecordField("lastName", RecordFieldType.STRING.getDataType()));
fields.add(new RecordField("creationDateTime", RecordFieldType.TIMESTAMP.getDataType(), true));
final RecordSchema schema = new SimpleRecordSchema(fields);
final Map<String, Object> values = new HashMap<>();
values.put("firstName", "John");
values.put("lastName", "Doe");
values.put("creationDateTime", new Timestamp(System.currentTimeMillis()));
final Record record = new MapRecord(schema, values);
final Optional<FieldValue> optionalFieldValue = RecordPath.compile("/creationDateTime").evaluate(record).getSelectedFields().findFirst();
final FieldValue fieldValue = optionalFieldValue.get();
System.out.println(fieldValue.getField());
}
Which prints out the correct field type:
RecordField[name=creationDateTime, dataType=TIMESTAMP:yyyy-MM-dd HH:mm:ss, nullable=true]
So I presume the Record Reader may not be properly applying the schema to the Record that it returns.
Thanks
-Mark
On Jul 28, 2019, at 10:19 AM, Mike Thomsen <mi...@gmail.com>> wrote:
I have a simple avro schema in a test case that looks like this:
{
"type": "record",
"name": "PersonRecord",
"fields": [
{ "name": "firstName", "type": "string" },
{ "name": "lastName", "type": "string" },
{ "name": "creationDateTime", "type": [ "null", "type": "long",
"logicalType": "timestamp-millis" }]
]
}
Then I try something like this...
RecordPath path = recordPathCache.getCompiled("/creationDateTime");
RecordPathResult rp = path.evaluate(targetRecord);
Optional<FieldValue> nodeField = rp.getSelectedFields().findFirst();
if (!nodeField.isPresent()) {
throw new ProcessException("...");
}
FieldValue fieldValue = nodeField.get();
//fieldValue.getField() is a Choice of String, Record
Is there a way to get the correct field type here? I assume that
Choice[String, Record] default here was done to facilitate schema inference.
Thanks,
Mike
Re: Record path API chooses CHOICE[STRING, RECORD] when the field is missing
Posted by Mike Thomsen <mi...@gmail.com>.
Doesn't explain WHY it happened, but I was able to resolve it like this:
Optional<RecordField> _temp =
fieldValue.getParentRecord().get().getSchema().getField(fieldValue.getField().getFieldName());
RecordField _rf = _temp.get();
value = DataTypeUtils.convertType(value, _rf.getDataType(),
_rf.getFieldName());
On Sun, Jul 28, 2019 at 10:19 AM Mike Thomsen <mi...@gmail.com>
wrote:
> I have a simple avro schema in a test case that looks like this:
>
> {
> "type": "record",
> "name": "PersonRecord",
> "fields": [
> { "name": "firstName", "type": "string" },
> { "name": "lastName", "type": "string" },
> { "name": "creationDateTime", "type": [ "null", "type": "long",
> "logicalType": "timestamp-millis" }]
> ]
> }
>
> Then I try something like this...
>
> RecordPath path = recordPathCache.getCompiled("/creationDateTime");
> RecordPathResult rp = path.evaluate(targetRecord);
> Optional<FieldValue> nodeField = rp.getSelectedFields().findFirst();
>
> if (!nodeField.isPresent()) {
> throw new ProcessException("...");
> }
>
> FieldValue fieldValue = nodeField.get();
> //fieldValue.getField() is a Choice of String, Record
>
> Is there a way to get the correct field type here? I assume that
> Choice[String, Record] default here was done to facilitate schema inference.
>
> Thanks,
>
> Mike
>