You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/07/24 18:48:20 UTC

[GitHub] [pinot] liuchang0520 opened a new issue #7202: Handle null value from time column in NullValueTransformer

liuchang0520 opened a new issue #7202:
URL: https://github.com/apache/pinot/issues/7202


   Pinot table with null value from time column isn't filled with default value in [NullValueTransformer](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/NullValueTransformer.java). Then during the index step in [LLRealtimeSegmentDataManager:L502](https://github.com/apache/pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java#L502), exceptions are thrown because of the null value. This recently caused several our servers  go down because of too many exceptions.
   
   One solution is to remove the special case handling for time column. 
   
   BTW, curious what do we choose not to handle the null value in time column?
   
   cc @chenboat 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] liuchang0520 commented on issue #7202: Handle null value from time column in NullValueTransformer

Posted by GitBox <gi...@apache.org>.
liuchang0520 commented on issue #7202:
URL: https://github.com/apache/pinot/issues/7202#issuecomment-887689239


   I see. Thanks @Jackie-Jiang . Any suggestion to deal with the msg with null value in time column?
   
   The scenario is indeed some users errors: users create the Pinot table by themselves. But the corresponding time column has null value in many of the upstream Kafka msgs. As a result, many exceptions are thrown from the index step in [LLRealtimeSegmentDataManager:L502](https://github.com/apache/pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java#L502), which brings several our Pinot servers down. 
   
   I was thinking about filling in some default value for time column in order to avoid the exceptions. But as you mentioned there is no good default value for time column, 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #7202: Handle null value from time column in NullValueTransformer

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7202:
URL: https://github.com/apache/pinot/issues/7202#issuecomment-887075420


   This behavior is intentional. We don't allow default time value because in most cases that is caused by some client error. Also, there is no good default value for time column. It can also potentially mess up the segment management.
   
   Can you please elaborate more on the scenario where you need to fill default time values?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] liuchang0520 closed issue #7202: Handle null value from time column in NullValueTransformer

Posted by GitBox <gi...@apache.org>.
liuchang0520 closed issue #7202:
URL: https://github.com/apache/pinot/issues/7202


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] liuchang0520 commented on issue #7202: Handle null value from time column in NullValueTransformer

Posted by GitBox <gi...@apache.org>.
liuchang0520 commented on issue #7202:
URL: https://github.com/apache/pinot/issues/7202#issuecomment-895630805


   @Jackie-Jiang @chenboat @yupeng9 , per the discussion, add a flag to table config to indicate if we handle the null value in time column or not. 
   
   If it is enabled, we fill in machine time.
   
   By default set this flag to false to keep backward compatibility.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] chenboat commented on issue #7202: Handle null value from time column in NullValueTransformer

Posted by GitBox <gi...@apache.org>.
chenboat commented on issue #7202:
URL: https://github.com/apache/pinot/issues/7202#issuecomment-887712719


   Thank you @liuchang0520. I think we can use current time as the default time value -- in fact this is what we used today internally. It is in general good for retention management because data will be deleted along with time. We have seen many users for some reasons forgot to populate the time columns. There are many problems here: this most serious one is that it causes ingestion exception be thrown and cause server disruption. So a default value will help here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] liuchang0520 commented on issue #7202: Handle null value from time column in NullValueTransformer

Posted by GitBox <gi...@apache.org>.
liuchang0520 commented on issue #7202:
URL: https://github.com/apache/pinot/issues/7202#issuecomment-887719546


   Agree. @chenboat  We also need to know which column is time column in decoder.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] liuchang0520 edited a comment on issue #7202: Handle null value from time column in NullValueTransformer

Posted by GitBox <gi...@apache.org>.
liuchang0520 edited a comment on issue #7202:
URL: https://github.com/apache/pinot/issues/7202#issuecomment-887689239


   I see. Thanks @Jackie-Jiang . Any suggestion to deal with the msg with null value in time column?
   
   The scenario is indeed some users errors: users create the Pinot table by themselves. But the corresponding time column has null value in many of the upstream Kafka msgs. As a result, many exceptions are thrown from the index step in [LLRealtimeSegmentDataManager:L502](https://github.com/apache/pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/LLRealtimeSegmentDataManager.java#L502), which brings several our Pinot servers down. 
   
   I was thinking about filling in some default value for time column in order to avoid the exceptions. But as you mentioned there is no good default value for time column.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] liuchang0520 closed issue #7202: Handle null value from time column in NullValueTransformer

Posted by GitBox <gi...@apache.org>.
liuchang0520 closed issue #7202:
URL: https://github.com/apache/pinot/issues/7202


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] liuchang0520 commented on issue #7202: Handle null value from time column in NullValueTransformer

Posted by GitBox <gi...@apache.org>.
liuchang0520 commented on issue #7202:
URL: https://github.com/apache/pinot/issues/7202#issuecomment-887697513


   In our internal decoder, we fill the Kafka event timestamp value to some of the hard-coded system time column, e.g. secondsSinceEpoch. But for customized time column, since decoder has no notion of the Pinot schema, we are unable to fill the TS value to it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org