You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Mike Thomsen <mi...@gmail.com> on 2019/07/28 14:19:54 UTC

Record path API chooses CHOICE[STRING, RECORD] when the field is missing

I have a simple avro schema in a test case that looks like this:

{
    "type": "record",
    "name": "PersonRecord",
    "fields": [
        { "name": "firstName", "type": "string" },
        { "name": "lastName", "type": "string" },
        { "name": "creationDateTime", "type": [ "null", "type": "long",
"logicalType": "timestamp-millis" }]
    ]
}

Then I try something like this...

RecordPath path = recordPathCache.getCompiled("/creationDateTime");
RecordPathResult rp = path.evaluate(targetRecord);
Optional<FieldValue> nodeField = rp.getSelectedFields().findFirst();

if (!nodeField.isPresent()) {
    throw new ProcessException("...");
}

FieldValue fieldValue = nodeField.get();
//fieldValue.getField() is a Choice of String, Record

Is there a way to get the correct field type here? I assume that
Choice[String, Record] default here was done to facilitate schema inference.

Thanks,

Mike

Re: Record path API chooses CHOICE[STRING, RECORD] when the field is missing

Posted by Mike Thomsen <mi...@gmail.com>.
It was JsonTreeReader.

Thanks,

Mike

On Mon, Jul 29, 2019 at 3:36 PM Mark Payne <ma...@hotmail.com> wrote:

> Mike,
>
> What Record Reader is being used here? The problem appears to be due to
> the Record Reader itself assigning that as the field type.
>
> I created a dummy unit test to verify the RecordPath stuff is correct:
>
>
> @Test
> public void testFromEmail() {
>     final List<RecordField> fields = new ArrayList<>();
>     fields.add(new RecordField("firstName",
> RecordFieldType.STRING.getDataType()));
>     fields.add(new RecordField("lastName",
> RecordFieldType.STRING.getDataType()));
>     fields.add(new RecordField("creationDateTime",
> RecordFieldType.TIMESTAMP.getDataType(), true));
>     final RecordSchema schema = new SimpleRecordSchema(fields);
>
>     final Map<String, Object> values = new HashMap<>();
>     values.put("firstName", "John");
>     values.put("lastName", "Doe");
>     values.put("creationDateTime", new
> Timestamp(System.currentTimeMillis()));
>     final Record record = new MapRecord(schema, values);
>
>     final Optional<FieldValue> optionalFieldValue =
> RecordPath.compile("/creationDateTime").evaluate(record).getSelectedFields().findFirst();
>     final FieldValue fieldValue = optionalFieldValue.get();
>     System.out.println(fieldValue.getField());
> }
>
> Which prints out the correct field type:
>
> RecordField[name=creationDateTime, dataType=TIMESTAMP:yyyy-MM-dd HH:mm:ss,
> nullable=true]
>
> So I presume the Record Reader may not be properly applying the schema to
> the Record that it returns.
>
> Thanks
> -Mark
>
> On Jul 28, 2019, at 10:19 AM, Mike Thomsen <mikerthomsen@gmail.com<mailto:
> mikerthomsen@gmail.com>> wrote:
>
> I have a simple avro schema in a test case that looks like this:
>
> {
>    "type": "record",
>    "name": "PersonRecord",
>    "fields": [
>        { "name": "firstName", "type": "string" },
>        { "name": "lastName", "type": "string" },
>        { "name": "creationDateTime", "type": [ "null", "type": "long",
> "logicalType": "timestamp-millis" }]
>    ]
> }
>
> Then I try something like this...
>
> RecordPath path = recordPathCache.getCompiled("/creationDateTime");
> RecordPathResult rp = path.evaluate(targetRecord);
> Optional<FieldValue> nodeField = rp.getSelectedFields().findFirst();
>
> if (!nodeField.isPresent()) {
>    throw new ProcessException("...");
> }
>
> FieldValue fieldValue = nodeField.get();
> //fieldValue.getField() is a Choice of String, Record
>
> Is there a way to get the correct field type here? I assume that
> Choice[String, Record] default here was done to facilitate schema
> inference.
>
> Thanks,
>
> Mike
>
>

Re: Record path API chooses CHOICE[STRING, RECORD] when the field is missing

Posted by Mark Payne <ma...@hotmail.com>.
Mike,

What Record Reader is being used here? The problem appears to be due to the Record Reader itself assigning that as the field type.

I created a dummy unit test to verify the RecordPath stuff is correct:


@Test
public void testFromEmail() {
    final List<RecordField> fields = new ArrayList<>();
    fields.add(new RecordField("firstName", RecordFieldType.STRING.getDataType()));
    fields.add(new RecordField("lastName", RecordFieldType.STRING.getDataType()));
    fields.add(new RecordField("creationDateTime", RecordFieldType.TIMESTAMP.getDataType(), true));
    final RecordSchema schema = new SimpleRecordSchema(fields);

    final Map<String, Object> values = new HashMap<>();
    values.put("firstName", "John");
    values.put("lastName", "Doe");
    values.put("creationDateTime", new Timestamp(System.currentTimeMillis()));
    final Record record = new MapRecord(schema, values);

    final Optional<FieldValue> optionalFieldValue = RecordPath.compile("/creationDateTime").evaluate(record).getSelectedFields().findFirst();
    final FieldValue fieldValue = optionalFieldValue.get();
    System.out.println(fieldValue.getField());
}

Which prints out the correct field type:

RecordField[name=creationDateTime, dataType=TIMESTAMP:yyyy-MM-dd HH:mm:ss, nullable=true]

So I presume the Record Reader may not be properly applying the schema to the Record that it returns.

Thanks
-Mark

On Jul 28, 2019, at 10:19 AM, Mike Thomsen <mi...@gmail.com>> wrote:

I have a simple avro schema in a test case that looks like this:

{
   "type": "record",
   "name": "PersonRecord",
   "fields": [
       { "name": "firstName", "type": "string" },
       { "name": "lastName", "type": "string" },
       { "name": "creationDateTime", "type": [ "null", "type": "long",
"logicalType": "timestamp-millis" }]
   ]
}

Then I try something like this...

RecordPath path = recordPathCache.getCompiled("/creationDateTime");
RecordPathResult rp = path.evaluate(targetRecord);
Optional<FieldValue> nodeField = rp.getSelectedFields().findFirst();

if (!nodeField.isPresent()) {
   throw new ProcessException("...");
}

FieldValue fieldValue = nodeField.get();
//fieldValue.getField() is a Choice of String, Record

Is there a way to get the correct field type here? I assume that
Choice[String, Record] default here was done to facilitate schema inference.

Thanks,

Mike


Re: Record path API chooses CHOICE[STRING, RECORD] when the field is missing

Posted by Mike Thomsen <mi...@gmail.com>.
Doesn't explain WHY it happened, but I was able to resolve it like this:

Optional<RecordField> _temp =
fieldValue.getParentRecord().get().getSchema().getField(fieldValue.getField().getFieldName());
RecordField _rf = _temp.get();
value = DataTypeUtils.convertType(value, _rf.getDataType(),
_rf.getFieldName());

On Sun, Jul 28, 2019 at 10:19 AM Mike Thomsen <mi...@gmail.com>
wrote:

> I have a simple avro schema in a test case that looks like this:
>
> {
>     "type": "record",
>     "name": "PersonRecord",
>     "fields": [
>         { "name": "firstName", "type": "string" },
>         { "name": "lastName", "type": "string" },
>         { "name": "creationDateTime", "type": [ "null", "type": "long",
> "logicalType": "timestamp-millis" }]
>     ]
> }
>
> Then I try something like this...
>
> RecordPath path = recordPathCache.getCompiled("/creationDateTime");
> RecordPathResult rp = path.evaluate(targetRecord);
> Optional<FieldValue> nodeField = rp.getSelectedFields().findFirst();
>
> if (!nodeField.isPresent()) {
>     throw new ProcessException("...");
> }
>
> FieldValue fieldValue = nodeField.get();
> //fieldValue.getField() is a Choice of String, Record
>
> Is there a way to get the correct field type here? I assume that
> Choice[String, Record] default here was done to facilitate schema inference.
>
> Thanks,
>
> Mike
>