You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Animesh Trivedi (JIRA)" <ji...@apache.org> on 2018/10/12 08:46:00 UTC

[jira] [Comment Edited] (ARROW-3495) [Java] Optimize bit operations performance

    [ https://issues.apache.org/jira/browse/ARROW-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647667#comment-16647667 ] 

Animesh Trivedi edited comment on ARROW-3495 at 10/12/18 8:45 AM:
------------------------------------------------------------------

This one has 3 items that need to be worked separately 

1) delete the optimized bitmap (for all null or all set case) routine in BitVectorHelper ([https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BitVectorHelper.java#L179)] - i am not sure what is the sense behind this optimization. As I wrote on the mailing list, at this point the validity buffer is already read from the storage, no need to spend more CPU time to generate an optimized bitmap. 

 

2) change the implementation of isSet function in (UnionVector.java , BaseFixedWidthVector.java ,  BaseVariableWidthVector.java , FixedSizeListVector.java , ListVector.java , StructVector.java) to test for equality with zero. No need to count number of bits set in a Long as done here : [https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java#L797]

 

3) I need to think about the UnsafeReader type, and how it should be integrated. 

 

I can open a pull request for the first two items. 


was (Author: atrivedi):
This one 3 items that needs to be worked separately 

1) delete optimized bitmap (for all null or all set case) routine in BitVectorHelper ([https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BitVectorHelper.java#L179)] - i am not sure what is the sense behind this optimization. As I wrote on the mailing list, at this point the validity buffer is already read from the storage, no need to spend more CPU time to generate an optimized bitmap. 

 

2) change the implementation of isSet function in (UnionVector.java , BaseFixedWidthVector.java ,  BaseVariableWidthVector.java , FixedSizeListVector.java , ListVector.java , StructVector.java) to test for equality with zero. No need to count number of bits set in a Long as done here : [https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java#L797]

 

3) I need to think about the UnsafeReader type, and how it should be integrated. 

 

I can open a pull request for the first two items. 

> [Java] Optimize bit operations performance
> ------------------------------------------
>
>                 Key: ARROW-3495
>                 URL: https://issues.apache.org/jira/browse/ARROW-3495
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java
>    Affects Versions: 0.11.0
>            Reporter: Li Jin
>            Priority: Major
>
> From [~atrivedi]'s benchmark finding:
> 2) Materialize values from Validity and Value direct buffers instead of
> calling getInt() function on the IntVector. This is implemented as a new
> Unsafe reader type (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L31]
> )
> 3) Optimize bitmap operation to check if a bit is set or not (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L23]
> )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)