You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Bjorn Olsen (JIRA)" <ji...@apache.org> on 2017/07/13 10:45:00 UTC

[jira] [Updated] (NIFI-4182) Inconsistent Type Coercion for Date and Time field types

     [ https://issues.apache.org/jira/browse/NIFI-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bjorn Olsen updated NIFI-4182:
------------------------------
    Description: 
Type Coercion rules currently allow for the following conversions regarding Date, Time and Timestamp fields:

* Any "date/time" type (Date, Time, Timestamp) can be coerced into any other "date/time" type.
* Any "date/time" type can be coerced into a Long type, representing the number of milliseconds since epoch (Midnight GMT, January 1, 1970).
* Any "date/time" type can be coerced into a String. 

In the case of Avro, the type of "int" and logicalType of "date" is stored as an integer representing the number of days since 01 Jan 1970. This works with the AvroRecordSetWriter, but not with the CSVRecordSetWriter or JSONRecordSetWriter.
Thus it is inconsistent with the rules outlined above.

Consider a date of 2017-01-11 (11th Jan 2017) and write Avro schema of:

{code:java}
{ "name": "MY_DATE" , "type": { "type":"int", "logicalType":"date"} }
{code}


This is stored as follows:
Avro: 17177
CSV: 1484092800000
JSON: 1484092800000

It appears in the latter 2 cases that the schema specification is ignored. 
The data is stored as a Long value even though an Int was specified in the "type" attribute of the schema.

The same reasoning applies for the time-millis and time-micros Avro annotated logicalTypes which are stored as Int in the Avro standard. 

Changing this default to align with the Avro standard for logicalTypes, may break existing implementations. 
Certainly it should be changed for the case when the output schema explicitly asks for an Int output type (or at the very least, fail to do the type coercion from Long to Int).

Test flow is attached. The problem is replicated by changing through the various RecordSetWriter controllers on the ConvertRecord processor an observing the output flowfile content in each case.

  was:
Type Coercion rules currently allow for the following conversions regarding Date, Time and Timestamp fields:

* Any "date/time" type (Date, Time, Timestamp) can be coerced into any other "date/time" type.
* Any "date/time" type can be coerced into a Long type, representing the number of milliseconds since epoch (Midnight GMT, January 1, 1970).
* Any "date/time" type can be coerced into a String. 

In the case of Avro, the type of "int" and logicalType of "date" is stored as an integer representing the number of days since 01 Jan 1970. This works with the AvroRecordSetWriter, but not with the CSVRecordSetWriter or JSONRecordSetWriter.
Thus it is inconsistent with the rules outlined above.

Consider a date of 2017-01-11 (11th Jan 2017) and write Avro schema of:
{ "name": "MY_DATE" , "type": { "type":"int", "logicalType":"date"} }

This is stored as follows:
Avro: 17177
CSV: 1484092800000
JSON: 1484092800000

It appears in the latter 2 cases that the schema specification is ignored. 
The data is stored as a Long value even though an Int was specified in the "type" attribute of the schema.

The same reasoning applies for the time-millis and time-micros Avro annotated logicalTypes which are stored as Int in the Avro standard. 

Changing this default to align with the Avro standard for logicalTypes, may break existing implementations. 
Certainly it should be changed for the case when the output schema explicitly asks for an Int output type (or at the very least, fail to do the type coercion from Long to Int).

Test flow is attached. The problem is replicated by changing through the various RecordSetWriter controllers on the ConvertRecord processor an observing the output flowfile content in each case.


> Inconsistent Type Coercion for Date and Time field types
> --------------------------------------------------------
>
>                 Key: NIFI-4182
>                 URL: https://issues.apache.org/jira/browse/NIFI-4182
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Bjorn Olsen
>            Priority: Minor
>         Attachments: Field_Conversion_Test_1.xml
>
>
> Type Coercion rules currently allow for the following conversions regarding Date, Time and Timestamp fields:
> * Any "date/time" type (Date, Time, Timestamp) can be coerced into any other "date/time" type.
> * Any "date/time" type can be coerced into a Long type, representing the number of milliseconds since epoch (Midnight GMT, January 1, 1970).
> * Any "date/time" type can be coerced into a String. 
> In the case of Avro, the type of "int" and logicalType of "date" is stored as an integer representing the number of days since 01 Jan 1970. This works with the AvroRecordSetWriter, but not with the CSVRecordSetWriter or JSONRecordSetWriter.
> Thus it is inconsistent with the rules outlined above.
> Consider a date of 2017-01-11 (11th Jan 2017) and write Avro schema of:
> {code:java}
> { "name": "MY_DATE" , "type": { "type":"int", "logicalType":"date"} }
> {code}
> This is stored as follows:
> Avro: 17177
> CSV: 1484092800000
> JSON: 1484092800000
> It appears in the latter 2 cases that the schema specification is ignored. 
> The data is stored as a Long value even though an Int was specified in the "type" attribute of the schema.
> The same reasoning applies for the time-millis and time-micros Avro annotated logicalTypes which are stored as Int in the Avro standard. 
> Changing this default to align with the Avro standard for logicalTypes, may break existing implementations. 
> Certainly it should be changed for the case when the output schema explicitly asks for an Int output type (or at the very least, fail to do the type coercion from Long to Int).
> Test flow is attached. The problem is replicated by changing through the various RecordSetWriter controllers on the ConvertRecord processor an observing the output flowfile content in each case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)