You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/07/20 06:47:39 UTC

[GitHub] [pinot] richardstartin commented on issue #7870: Possible storage optimization for MV forward index

richardstartin commented on issue #7870:
URL: https://github.com/apache/pinot/issues/7870#issuecomment-1189894021

   > Hey @Jackie-Jiang @walterddr
   > 
   > As discussed over slack, we got the compression results for the actual table that ran into this forward index size bloat issue. I've updated the document in [this section](https://docs.google.com/document/d/1BWtNKvxL1Uaydni_BJCgWN8i9_WeSdgL3Ksh4IpY_K0/edit#heading=h.cq0je3xwcssi). The TL;DR is that for the actual table the compression savings are very minimal. I updated the [recommendations](https://docs.google.com/document/d/1BWtNKvxL1Uaydni_BJCgWN8i9_WeSdgL3Ksh4IpY_K0/edit#heading=h.b4ch3eh9yztq) to indicate that for now it does not make sense to try to solve this by compressing the data using any of the approaches.
   > 
   > @siddharthteotia and I would like to keep this issue open to explore further ideas in the future or perhaps revisit compression with dictionary in case we find users who have sufficient repeatability in their data to benefit from compression.
   > 
   > Also, as discussed over our call, there may be some use of implementing Approach 2 from the proposed approaches for the sake of speeding up the query rather than saving on storage costs (i.e. have a dictionary and store the forward index in raw format -> which can help avoid an additional dictionary lookup). I had started some work on Approach 2 and have an initial PR before we ran these compression experiments. My PR stores the data as raw + compressed in the forward index but creates a dictionary (passthrough compression can be enabled to avoid decompression overhead). I need to spend some time on seeing how best to divide up the PRs before submitting this to OSS. Just wanted to give a heads up.
   > 
   > cc @siddharthteotia
   
   I can’t access the document but was byte alignment (rounding the dictionary’s bits up to the next multiple of 8, so padding each dictionarized value with leading zeros) prior to LZ4 compression attempted? If the dictionary codes aren’t byte aligned, byte-oriented compression schemes won’t work well. I explained this on a call with @siddharthteotia several months ago.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org