You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "guiyanakuang (via GitHub)" <gi...@apache.org> on 2023/04/15 06:30:53 UTC

[GitHub] [orc] guiyanakuang opened a new issue, #1470: Java impl does not fully adhere to the usage contract of `ColumnVector.noNulls` and `ColumnVector.isNull`

guiyanakuang opened a new issue, #1470:
URL: https://github.com/apache/orc/issues/1470

   The comments in ColumnVector indicate that we should only use the `isNull` array to determine null values when `noNulls` is set to false. This helps us reuse `ColumnVector`, for example, during reuse when `noNulls` transitions from `true` to `true` or `false` to `true`, we don't need to set `isNull`, which can help improve performance.
   
   Unfortunately, there are some counterexamples in the Java impl, such as in [BitFieldReader.java](https://github.com/apache/orc/blob/511c8c19497cb70499353a59b6484a0e6a82a539/java/core/src/java/org/apache/orc/impl/BitFieldReader.java#L90-L107), where we directly read `isNull` without checking `noNulls` first. To ensure correctness, Java resets `noNulls` and `isNull` every time, as seen in [TreeReaderFactory.java](https://github.com/apache/orc/blob/511c8c19497cb70499353a59b6484a0e6a82a539/java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java#L358-L391).
   
   This issue originates from the discussion of ORC-1408, whether the ORC Java version needs to modify all counterexamples to adhere to the contract. I'm inclined to make the changes to be consistent with the C++ impl and to avoid unnecessary `isNull` setting, although I haven't verified how much performance improvement it could bring.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] guiyanakuang commented on issue #1470: Java impl does not fully adhere to the usage contract of `ColumnVector.noNulls` and `ColumnVector.isNull`

Posted by "guiyanakuang (via GitHub)" <gi...@apache.org>.
guiyanakuang commented on issue #1470:
URL: https://github.com/apache/orc/issues/1470#issuecomment-1509584272

   cc @williamhyun @dongjoon-hyun @wgtmac 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] pavibhai commented on issue #1470: Java impl does not fully adhere to the usage contract of `ColumnVector.noNulls` and `ColumnVector.isNull`

Posted by "pavibhai (via GitHub)" <gi...@apache.org>.
pavibhai commented on issue #1470:
URL: https://github.com/apache/orc/issues/1470#issuecomment-1511620932

   It would be great to standardize this interaction across the project with respect to the contract. Agree with the proposal that as long as the contract with external consumers is not affected we can work on internal standardization/consistency.
   
   Regards,
   Pavan
   
   > On Apr 15, 2023, at 6:07 AM, Gang Wu ***@***.***> wrote:
   > 
   > 
   > IMHO, it is safe to make the change if the writer and reader can promise the contract to the downstream consumers (e.g. Apache Spark and Apache Hive). The ORC java library can break the promise in the internal implementation.
   > 
   > —
   > Reply to this email directly, view it on GitHub <https://github.com/apache/orc/issues/1470#issuecomment-1509817969>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABYDHO3E55QXFBW5YYD2DFTXBKMSFANCNFSM6AAAAAAW7G4FZM>.
   > You are receiving this because you are subscribed to this thread.
   > 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] deshanxiao commented on issue #1470: Java impl does not fully adhere to the usage contract of `ColumnVector.noNulls` and `ColumnVector.isNull`

Posted by "deshanxiao (via GitHub)" <gi...@apache.org>.
deshanxiao commented on issue #1470:
URL: https://github.com/apache/orc/issues/1470#issuecomment-1510636115

   +1 for the proposal.
   
   It seems that there is no compatibility issue, just a performance optimization and a more standardized use of Hive classes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] wgtmac commented on issue #1470: Java impl does not fully adhere to the usage contract of `ColumnVector.noNulls` and `ColumnVector.isNull`

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on issue #1470:
URL: https://github.com/apache/orc/issues/1470#issuecomment-1509817969

   IMHO, it is safe to make the change if the writer and reader can promise the contract to the downstream consumers (e.g. Apache Spark and Apache Hive). The ORC java library can break the promise in the internal implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org