You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/11/03 16:58:50 UTC

[GitHub] [pinot] mapshen opened a new issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

mapshen opened a new issue #7689:
URL: https://github.com/apache/pinot/issues/7689


   
   The following error was encountered today 
   
   ```
   2021/11/03 14:21:49.226 INFO [LLRealtimeSegmentDataManager_table_realtime__0__0__20211101T1431Z] [table_realtime__0__0__20211101T1431Z] Waiting to acquire semaphore for building segment
   2021/11/03 14:21:49.227 INFO [LLRealtimeSegmentDataManager_table_realtime__0__0__20211101T1431Z] [table_realtime__0__0__20211101T1431Z] Trying to build segment
   2021/11/03 14:21:49.227 ERROR [LLRealtimeSegmentDataManager_table_realtime__0__0__20211101T1431Z] [table_realtime__0__0__20211101T1431Z] Could not build segment
   java.lang.IllegalArgumentException: Illegal pattern component: T
           at org.joda.time.format.DateTimeFormat.parsePatternTo(DateTimeFormat.java:566) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
           at org.joda.time.format.DateTimeFormat.createFormatterForPattern(DateTimeFormat.java:687) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
           at org.joda.time.format.DateTimeFormat.forPattern(DateTimeFormat.java:177) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
           at org.apache.pinot.spi.data.DateTimeFormatPatternSpec.<init>(DateTimeFormatPatternSpec.java:57) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
           at org.apache.pinot.spi.data.DateTimeFormatSpec.<init>(DateTimeFormatSpec.java:60) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
           at org.apache.pinot.segment.spi.creator.SegmentGeneratorConfig.setTime(SegmentGeneratorConfig.java:214) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
           at org.apache.pinot.segment.spi.creator.SegmentGeneratorConfig.<init>(SegmentGeneratorConfig.java:140) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
           at org.apache.pinot.segment.local.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:83) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
           at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentInternal(LLRealtimeSegmentDataManager.java:794) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808
   ]
   ```
   
   The relevant config was
   ```
       "dateTimeFieldSpecs": [
         {
           "name": "DATETIME_F",
           "dataType": "STRING",
           "format": "1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-ddTHH:mm:ss",
           "granularity": "1:SECONDS"
         }
       ],
   ```
   You can tell `yyyy-MM-ddTHH:mm:ss` should have been `yyyy-MM-dd'T'HH:mm:ss` instead. However, the error was not caught till we found the table had stopped consuming messages. Even further, and this format pattern is only validated when a segment is built, no before or during consumption.
   
   Therefore, it would be benefit the users if we can fail fast and throw the error when creating the schema in the first place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mapshen edited a comment on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
mapshen edited a comment on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-960343807


   Oh, I am saying the format config itself should be validated upon creation. You don't need an event to tell that `yyyy-MM-ddTHH:mm:ss` is not valid.  
   
   Also I agree with you that events should be validated against `dateTimeFieldSpecs` as well. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mapshen commented on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
mapshen commented on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-991706179


   > #7804 Added validation on SDF when adding/updating the schema, and ensure it is in lexicographic order. This is required for range query to work properly.
   
   Thanks @Jackie-Jiang - it would be great to update the docs to reflect this as well!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-961585597


   Validating all might be too much. Or we can add a flag to allow turning the validation on/off


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang edited a comment on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang edited a comment on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-960350661






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-961585597


   Validating all might be too much. Or we can add a flag to allow turning the validation on/off


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] ksnijjer commented on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
ksnijjer commented on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-991225375


   @Jackie-Jiang saw this issue again with another user, segment build failing because of wrong format(missing millis) specified for time column:
   `2021/12/10 08:42:35.713 ERROR [LLRealtimeSegmentDataManager_dark_store_projection_stage__0__0__20211210T0838Z] [dark_store_projection_stage__0__0__20211210T0838Z] Could not build segment
   java.lang.IllegalArgumentException: Invalid format: "2021-12-07 15:39:49.684" is malformed at ".684"
           at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187) ~[startree-pinot-all-0.9.0-ST.7-jar-with-dependencies.jar:0.9.0-ST.7-dbffa0a2688be756084876afd01079b22c7bc9f4]
           at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826) ~[startree-pinot-all-0.9.0-ST.7-jar-with-dependencies.jar:0.9.0-ST.7-dbffa0a2688be756084876afd01079b22c7bc9f4]`
   
   Can we expedite a fix ? cc @mayankshriv 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mapshen edited a comment on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
mapshen edited a comment on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-960353748


   > 2. Validate if the value follows the format when getting the first event
   
   Should Pinot continue to validate all the following events? I am not quite clear on how helpful it is if it only validated the first one. On the other hand, if it does validate all events, we need to figure out how much it will impact the performance. 
   
   Probably we should limit the of scope of this issue to "1. Validate the format itself when creating the table".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-991356743


   #7804 Added validation on SDF when adding/updating the schema, and ensure it is in lexicographic order. This is required for range query to work properly.
   
   To solve the problem raised by @ksnijjer, we need to validate the input value (the second validation mentioned above)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-992601774


   Yes I think #7804 should address the issue by validating during schema change, and the error message should also be improved to include the entire SDF pattern passed in. Let me also change the docs as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mapshen commented on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
mapshen commented on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-991706179


   > #7804 Added validation on SDF when adding/updating the schema, and ensure it is in lexicographic order. This is required for range query to work properly.
   
   Thanks @Jackie-Jiang - it would be great to update the docs to reflect this as well!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-960319341






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mapshen commented on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
mapshen commented on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-960343807


   Oh, I am saying the format config itself should be validated upon creation. You don't need an event to tell that `yyyy-MM-ddTHH:mm:ss` is not valid.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mapshen edited a comment on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
mapshen edited a comment on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-960343807


   Oh, I am saying the format config itself should be validated upon creation. You don't need an event to tell that `yyyy-MM-ddTHH:mm:ss` is not valid.  
   
   Also I agree with you that events should be validated against `dateTimeFieldSpecs` as well. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mapshen edited a comment on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
mapshen edited a comment on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-960353748


   > 2. Validate if the value follows the format when getting the first event
   
   Should Pinot continue to validate all the following events? I am not quite clear on how helpful it is if it only validated the first one. On the other hand, if it does validate all events, we need to figure out how much it will impact the performance. 
   
   Probably we should limit the of scope of this issue to "1. Validate the format itself when creating the table".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] ksnijjer commented on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
ksnijjer commented on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-991225375


   @Jackie-Jiang saw this issue again with another user, segment build failing because of wrong format(missing millis) specified for time column:
   `2021/12/10 08:42:35.713 ERROR [LLRealtimeSegmentDataManager_dark_store_projection_stage__0__0__20211210T0838Z] [dark_store_projection_stage__0__0__20211210T0838Z] Could not build segment
   java.lang.IllegalArgumentException: Invalid format: "2021-12-07 15:39:49.684" is malformed at ".684"
           at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187) ~[startree-pinot-all-0.9.0-ST.7-jar-with-dependencies.jar:0.9.0-ST.7-dbffa0a2688be756084876afd01079b22c7bc9f4]
           at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826) ~[startree-pinot-all-0.9.0-ST.7-jar-with-dependencies.jar:0.9.0-ST.7-dbffa0a2688be756084876afd01079b22c7bc9f4]`
   
   Can we expedite a fix ? cc @mayankshriv 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-991356743


   #7804 Added validation on SDF when adding/updating the schema, and ensure it is in lexicographic order. This is required for range query to work properly.
   
   To solve the problem raised by @ksnijjer, we need to validate the input value (the second validation mentioned above)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-961585597


   Validating all might be too much. Or we can add a flag to allow turning the validation on/off


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-960350661






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mapshen commented on issue #7689: Validate format pattern in `dateTimeFieldSpecs` upon schema creation

Posted by GitBox <gi...@apache.org>.
mapshen commented on issue #7689:
URL: https://github.com/apache/pinot/issues/7689#issuecomment-960353748






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org