You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/11/17 20:40:40 UTC

[GitHub] [pinot] walterddr opened a new issue #7786: StringColumnPreIndexStatsCollector doesnt work for datetime

walterddr opened a new issue #7786:
URL: https://github.com/apache/pinot/issues/7786


   datetime fields are not necessarily sorted by string ordering. 
   
   adding StringDatetimeColumnPreIndexStatsCollector to handle the ordering issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #7786: StringColumnPreIndexStatsCollector doesnt work for datetime

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7786:
URL: https://github.com/apache/pinot/issues/7786#issuecomment-972363645


   We don't have a data type for date time as of now, so we need to use other data type (`STRING`, `INT` or `LONG`) to represent the values. We process date time fields the same way as dimensions and metrics, so the ordering must be aligned with the data type. Ordering date time field differently will cause wrong result.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #7786: StringColumnPreIndexStatsCollector doesnt work for datetime

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7786:
URL: https://github.com/apache/pinot/issues/7786#issuecomment-973293869


   E.g. date time is stored as `STRING` but ordered as time value, the following query will fail because it will still filter with string order:
   `SELECT * FROM myTable WHERE timeCol < '11182021'`
   This query will match time value `01012022`.
   
   Because of this behavior, we should definitely validate the time format to ensure the time ordering is the same as data type ordering.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr edited a comment on issue #7786: StringColumnPreIndexStatsCollector doesnt work for datetime

Posted by GitBox <gi...@apache.org>.
walterddr edited a comment on issue #7786:
URL: https://github.com/apache/pinot/issues/7786#issuecomment-972995918


   that's definitely one solution but IMO it will make user experience bad. either we enhance our documentation and error messaging; or we should automatically populate a shadow timeColumn that's sorted by time. Especially when we have the tableConfig / ingestionConfig telling us we should sort by underlying datetime.
   
   also could you elaborate more on this
   
   > We process date time fields the same way as dimensions and metrics, so the ordering must be aligned with the data type. Ordering date time field differently will cause wrong result.
   
   which part specifically will fail?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #7786: StringColumnPreIndexStatsCollector doesnt work for datetime

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7786:
URL: https://github.com/apache/pinot/issues/7786#issuecomment-974560335


   Agree. We should provide a `DATE_TIME` data type which is similar to the `TIMESTAMP` data type but with timezone info embedded, then everything would work seamlessly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang closed issue #7786: StringColumnPreIndexStatsCollector doesnt work for datetime

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang closed issue #7786:
URL: https://github.com/apache/pinot/issues/7786


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang closed issue #7786: StringColumnPreIndexStatsCollector doesnt work for datetime

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang closed issue #7786:
URL: https://github.com/apache/pinot/issues/7786


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #7786: StringColumnPreIndexStatsCollector doesnt work for datetime

Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #7786:
URL: https://github.com/apache/pinot/issues/7786#issuecomment-972995918


   that's definitely one solution but IMO it will make user experience bad. either we enhance our documentation and error messaging; or we should automatically populate a shadow timeColumn that's sorted by time. 
   
   Especially when we have the tableConfig / ingestionConfig telling us we should sort by underlying datetime.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #7786: StringColumnPreIndexStatsCollector doesnt work for datetime

Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #7786:
URL: https://github.com/apache/pinot/issues/7786#issuecomment-973656952


   > E.g. date time is stored as `STRING` but ordered as time value, the following query will fail because it will still filter with string order: `SELECT * FROM myTable WHERE timeCol < '11182021'` This query will match time value `01012022`.
   > 
   > Because of this behavior, we should definitely validate the time format to ensure the time ordering is the same as data type ordering.
   
   Well in that case yeah. I don't see a better solution. But I still think we need some sort of automated solution instead of relying on the user to safe guard this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #7786: StringColumnPreIndexStatsCollector doesnt work for datetime

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7786:
URL: https://github.com/apache/pinot/issues/7786#issuecomment-972393192


   I see. I feel we should force the time format to increase along with the time (e.g. `yyyyMMdd` is fine, but not `MMddyyyy`). If the original time value does not follow such time format, we can use ingestion transform to convert it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #7786: StringColumnPreIndexStatsCollector doesnt work for datetime

Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #7786:
URL: https://github.com/apache/pinot/issues/7786#issuecomment-972371045


   well. this will cause problem because it doesn't give correct statistics to the SegmentColumnarIndexCreator
   ```
   2021/11/17 11:56:03.510 ERROR [SegmentGenerationJobRunner] [pool-2-thread-1] Failed to generate Pinot segment for file - file:/data/mark.csv
   java.lang.IllegalArgumentException: The end instant must be greater than the start instant
   	at org.joda.time.base.AbstractInterval.checkInterval(AbstractInterval.java:63) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
   	at org.joda.time.base.BaseInterval.<init>(BaseInterval.java:73) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
   	at org.joda.time.Interval.<init>(Interval.java:173) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
   	at org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.writeMetadata(SegmentColumnarIndexCreator.java:672) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
   	at org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.seal(SegmentColumnarIndexCreator.java:603) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
   	at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:286) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
   	at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:258) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
   ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org