You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/01/07 09:17:24 UTC

[GitHub] [pinot] richardstartin opened a new issue #7980: `InIdSetTransformFunction` is inefficient

richardstartin opened a new issue #7980:
URL: https://github.com/apache/pinot/issues/7980


   `InIdSetTransfomFunction.transformToIntValuesSV` works by expanding a set into an inflated bitmap using 32 bits per docId by checking if each dictId is in the `IdSet` or not:
   
   ```java
         case INT:
           int[] intValues = _transformFunction.transformToIntValuesSV(projectionBlock);
           for (int i = 0; i < length; i++) {
             _results[i] = _idSet.contains(intValues[i]) ? 1 : 0;
           }
           break;
   ``` 
   
   For `INT` columns, the `IdSet` is backed by a `RoaringBitmap` and `contains` is a log(n) operation. This translates to a lot of time spent locating  `Container`s in large `IdSet`s:
   
   <img width="840" alt="Screenshot 2022-01-07 at 09 07 56" src="https://user-images.githubusercontent.com/16439049/148520146-fa404d9c-0090-489f-9f55-fb8dab15e115.png">
   
   <img width="1539" alt="Screenshot 2022-01-07 at 09 07 37" src="https://user-images.githubusercontent.com/16439049/148520116-beceed90-eb9e-4ed3-b64d-c3c511886df7.png">
   
   This could be made more efficient by constructing a `RoaringBitmap` from `intValues` and intersecting it with the `IdSet`, which would be possible without breaking the abstraction by adding a method to `IdSet`:
   
   ```java
   default void intersect(int length, int[] values, int[] result) {
           for (int i = 0; i < length; i++) {
             results[i] = contains(values[i]) ? 1 : 0;
           }   
   }
   ```
   
   Which can be implemented efficiently when the set is backed by a `RoaringBitmap`.
   
   Longer term, representing `BOOLEAN` values as an `int[]` seems unfortunate - `BOOLEAN` columns could have been represented as a `RoaringBitmap` on disk and in memory in the query layer.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kishoreg commented on issue #7980: `InIdSetTransformFunction` is inefficient

Posted by GitBox <gi...@apache.org>.
kishoreg commented on issue #7980:
URL: https://github.com/apache/pinot/issues/7980#issuecomment-1008445305


   idset should probably have a bloom filter internally to check for existence before checking roaring bitmap


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kishoreg commented on issue #7980: `InIdSetTransformFunction` is inefficient

Posted by GitBox <gi...@apache.org>.
kishoreg commented on issue #7980:
URL: https://github.com/apache/pinot/issues/7980#issuecomment-1008070566


   @richardstartin adding int values into bitmap and doing the intersection will tell us the values that exists in both sets but we would lose the association with the docId rt?
   
   Am I missing something here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin closed issue #7980: `InIdSetTransformFunction` is inefficient

Posted by GitBox <gi...@apache.org>.
richardstartin closed issue #7980:
URL: https://github.com/apache/pinot/issues/7980


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on issue #7980: `InIdSetTransformFunction` is inefficient

Posted by GitBox <gi...@apache.org>.
richardstartin commented on issue #7980:
URL: https://github.com/apache/pinot/issues/7980#issuecomment-1008430817


   I implemented this and the cost of constructing the temporary bitmap is too high, so it's not a good idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mayankshriv commented on issue #7980: `InIdSetTransformFunction` is inefficient

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on issue #7980:
URL: https://github.com/apache/pinot/issues/7980#issuecomment-1008022616


   Thanks for filing this issue @richardstartin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org