You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 13:38:38 UTC

[GitHub] [beam] damccorm opened a new issue, #19817: datetime and decimal should be logical types

damccorm opened a new issue, #19817:
URL: https://github.com/apache/beam/issues/19817

   Currently datetime and decimal are implemented as primitive types in the Java SDK. They should be logical types as documented in https://s.apache.org/beam-schemas
   
   Imported from Jira [BEAM-7554](https://issues.apache.org/jira/browse/BEAM-7554). Original Jira may contain additional context.
   Reported by: bhulette.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #19817: datetime and decimal should be logical types

Posted by GitBox <gi...@apache.org>.
Abacn commented on issue #19817:
URL: https://github.com/apache/beam/issues/19817#issuecomment-1197363433

   This issue was raised again in May 22, per this comment in Jira
   >>>
   ilango-mediaagility:
   Due to this issue, cross language transforms are affected. For example, when using ReadFromJdbc transform in python, we get the decimal and datetime as logical types, but they are encoded differently than the underlying representation types, resulting in incorrect values.
   >>>
   This is blocked by #21433


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] TheNeuralBit commented on issue #19817: datetime and decimal should be logical types

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on issue #19817:
URL: https://github.com/apache/beam/issues/19817#issuecomment-1204600762

   Yeah I agree that it's better to just prefer logical types for multi-language transforms, rather than trying to shunt the DATETIME primitive type this way. We may want to add types for additional precisions as needed (MillInstant, NanosInstant, etc)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #19817: datetime and decimal should be logical types

Posted by GitBox <gi...@apache.org>.
Abacn commented on issue #19817:
URL: https://github.com/apache/beam/issues/19817#issuecomment-1204222164

   There is a problem of replace all DATETIME fieldtype to logical type, per @TheNeuralBit:
   > One thing that I realized as I wrote up my PR: If we make DATETIME a logical type backed by INT64, then it will have to use VarLongCoder, rather than InstantCoder as it currently does.
   >
   > It seems like that would be a problem since InstantCoder is designed to make lexicographic order correspond to chronological order.
   Note that InstantCoder is not only used in processing schemas but also in coding windows: https://github.com/apache/beam/blob/bf39489b2a1fd45e6798483d083e4ad240f66891/sdks/java/core/src/main/java/org/apache/beam/sdk/util/WindowedValue.java#L618
   Tried to replace the InstantCoder implementation using either a VarInt or a ByteArray it breaks window decoding here. Seems like there is some predefined stream not using InstantCoder.encode to pack a timestamp.
   
   Thus I decided not to change InstantCoder at this moment, instead extend the support MicrosInstant logical type in Java sdk and DateTime logical type in python sdk, seems to be the simplest way to make xlang read/write records containing timestamp type work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #19817: datetime and decimal should be logical types

Posted by GitBox <gi...@apache.org>.
Abacn commented on issue #19817:
URL: https://github.com/apache/beam/issues/19817#issuecomment-1195913641

   .take-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #19817: datetime and decimal should be logical types

Posted by GitBox <gi...@apache.org>.
Abacn commented on issue #19817:
URL: https://github.com/apache/beam/issues/19817#issuecomment-1227377405

   > There is a problem of replace all DATETIME fieldtype to logical type, per @TheNeuralBit:
   > 
   > > One thing that I realized as I wrote up my PR: If we make DATETIME a logical type backed by INT64, then it will have to use VarLongCoder, rather than InstantCoder as it currently does.
   > > It seems like that would be a problem since InstantCoder is designed to make lexicographic order correspond to chronological order.
   > > Note that InstantCoder is not only used in processing schemas but also in coding windows: https://github.com/apache/beam/blob/bf39489b2a1fd45e6798483d083e4ad240f66891/sdks/java/core/src/main/java/org/apache/beam/sdk/util/WindowedValue.java#L618
   > > 
   > > Tried to replace the InstantCoder implementation using either a VarInt or a ByteArray it breaks window decoding here. Seems like there is some predefined stream not using InstantCoder.encode to pack a timestamp.
   > 
   > Thus I decided not to change InstantCoder at this moment, instead extend the support MicrosInstant logical type in Java sdk and DateTime logical type in python sdk, seems to be the simplest way to make xlang read/write records containing timestamp type work.
   
   Managed to get a test case using the following code snippet
   
   ```python
   
   def generate_millis():
       # Logical type that deals with millis_instant urn (MillisInstant)
       MillisLogicalType = LogicalType._known_logical_types.get_logical_type_by_urn('beam:logical_type:millis_instant:v1')
       # Original Logical type used to represent Timestamp (MicrosInstant)
       TimestampLogicalType = LogicalType._known_logical_types.get_logical_type_by_language_type(Timestamp)
       
       LogicalType._known_logical_types.by_language_type[Timestamp] = MillisLogicalType
   
       schema = beam.typehints.schemas.named_tuple_to_schema(TestTuple)
       coder = beam.coders.row_coder.RowCoder(schema)
       print("payload = %s" % schema.SerializeToString())
       examples = (TestTuple(
           f_timestamp=Timestamp.from_rfc3339("2020-08-13T14:14:14.123Z"),
           f_string="2020-08-13T14:14:14.123Z",
           f_int=1597328054123),)
       for example in examples:
           print("example = %s" % coder.encode(example))
       
       # recover original registration
       LogicalType._known_logical_types.by_language_type[Timestamp] = TimestampLogicalType
   
   ```
   
   The workaround is temporarily change the mapping of Timestamp -> MillisInstant logical type. Without it Timestamp always maps to MicrosInstant logical type in Python.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #19817: datetime and decimal should be logical types

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #19817:
URL: https://github.com/apache/beam/issues/19817#issuecomment-1442335475

   update: Python support is complete. Leave this open for Go support


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org