You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/10/09 15:56:59 UTC

[GitHub] [incubator-pinot] lgo opened a new issue #6126: Explicitly fail or alert users when batch generating segments with non-compliant data (unsorted with sortkey, partitioning)

lgo opened a new issue #6126:
URL: https://github.com/apache/incubator-pinot/issues/6126


   While using the Spark segment generation, we've had a configuration with configurations such as a `sortedColumn`. Initially it was assumed that the data will be sorted by the segment generation jobs and we missed that it needed to be sorted upstream.
   
   This is documented on the sort index section (https://docs.pinot.apache.org/basics/indexing/forward-index)
   > For offline push, input data needs to be sorted before running Pinot segment conversion and push job.
   
   Additioanlly, it's unclear if the same will happen if users specify a partition scheme on a table but do not correctly partition the input data. (Searching "partition" on the docs yielded no mentions about this).
   
   This is an easy thing to miss, and while pre-processing jobs especially help (https://github.com/apache/incubator-pinot/issues/4353) it would be good to prevent the mistake in the first place with invariants and actionable errors.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on issue #6126: Explicitly fail or alert users when batch generating segments with non-compliant data (unsorted with sortkey, partitioning)

Posted by GitBox <gi...@apache.org>.
kishoreg commented on issue #6126:
URL: https://github.com/apache/incubator-pinot/issues/6126#issuecomment-706266843


   We can easily sort it while generating the segment, that's probably a better solution. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg edited a comment on issue #6126: Explicitly fail or alert users when batch generating segments with non-compliant data (unsorted with sortkey, partitioning)

Posted by GitBox <gi...@apache.org>.
kishoreg edited a comment on issue #6126:
URL: https://github.com/apache/incubator-pinot/issues/6126#issuecomment-706266843


   We can easily sort it while generating the segment, that's probably a better solution. We do this in real-time segment generation


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org