You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/12/12 18:45:19 UTC

[GitHub] [pinot] somandal opened a new issue, #9972: MV duplicate handling for forwardIndexDisabled columns

somandal opened a new issue, #9972:
URL: https://github.com/apache/pinot/issues/9972

   Support for disabling the forward index was added (details can be found in this issue: https://github.com/apache/pinot/issues/6473). As part of our analysis, we found that for MV columns with duplicate entries within a row, regenerating the forward index to include the duplicated entries is not possible today. More details about this issue can be found in [this document](https://docs.google.com/document/d/1MNLLhYCg5e-UFBQ6wTBODd41sDsbjevwRfwoGuNowWw/edit?usp=sharing). To correctly regenerate the forward index for a MV column with duplicates within a row the information about the frequency of duplicated keys per row need to be tracked in an on-disk file. Opening this issue to track adding support for this.
   
   Until this is fixed, MV columns with duplicates will need to be backfilled if the forward index is to be enabled at a later point in time. Or customers need to assess that they do not need the duplicates per row, in which case reload code path will create the forward index without duplicates per row.
   
   cc @siddharthteotia 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on issue #9972: MV duplicate (within a row) handling for forwardIndexDisabled columns

Posted by GitBox <gi...@apache.org>.
somandal commented on issue #9972:
URL: https://github.com/apache/pinot/issues/9972#issuecomment-1347432372

   Right, but we don't plan to solve the reordering issue as of now. If ordering matters users shouldn't disable the forward index as trying to fix the ordering issue will probably take as much space as the forward index itself would take since we'd need to store ordering information.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #9972: MV duplicate (within a row) handling for forwardIndexDisabled columns

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #9972:
URL: https://github.com/apache/pinot/issues/9972#issuecomment-1347429092

   Also, the ordering of the values within the MV entry will also be lost after re-generating from inverted index. Currently certain functions (e.g. scalar functions under `ArrayFunctions`) treat MV as array.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org