You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/18 12:37:37 UTC

[GitHub] [iceberg] findepi opened a new pull request, #4806: Fix invalid Stream to Iterable conversion

findepi opened a new pull request, #4806:
URL: https://github.com/apache/iceberg/pull/4806

   `Stream.iterator` is a one-short operation, cannot be called multiple
   times, so `stream::iterator` isn't an `Iterable`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on pull request #4806: Fix invalid Stream to Iterable conversion in decryption

Posted by GitBox <gi...@apache.org>.
findepi commented on PR #4806:
URL: https://github.com/apache/iceberg/pull/4806#issuecomment-1139413220

   @danielcweeks who can merge this?
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on pull request #4806: Fix invalid Stream to Iterable conversion in decryption

Posted by GitBox <gi...@apache.org>.
findepi commented on PR #4806:
URL: https://github.com/apache/iceberg/pull/4806#issuecomment-1137868400

   > I'd be fine with either the Iterator or Stream approaches as it's explicit that it'll be traversed only once.
   
   @danielcweeks changed to Iterator.
   
   PTAL
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #4806: Fix invalid Stream to Iterable conversion in decryption

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #4806:
URL: https://github.com/apache/iceberg/pull/4806#discussion_r884323420


##########
api/src/main/java/org/apache/iceberg/encryption/EncryptionManager.java:
##########
@@ -47,11 +49,26 @@ public interface EncryptionManager extends Serializable {
    * By default this calls the single-file decryption method for each element in the iterator.
    * Implementations can override this for a variety of optimizations. For example, an
    * implementation can perform lookahead on the input iterator and fetch encryption keys in batch.
+   *
+   * @deprecated Use {@link #decrypt(Iterator)}.
    */
+  @Deprecated

Review Comment:
   This is a public API. We shouldn't make changes to a public API just because it is being called incorrectly in some places.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on pull request #4806: Fix invalid Stream to Iterable conversion

Posted by GitBox <gi...@apache.org>.
findepi commented on PR #4806:
URL: https://github.com/apache/iceberg/pull/4806#issuecomment-1135530435

   @danielcweeks 
   
   The point is that this is not correct Java code
   ```java
   Stream s = ...
   Iterable i = s::iterator;
   ```
   
   it compiles, because `Stream::iterator` method handle can be fit as an implementation of the `Iterable` interface, but it's not a correct implementation of that interface. `Iterable.iterator` method can be invoked multiple times, while `s.iterator` only once.
   
   The change fixes the above, but i see also the downside: it's now possible to invoke the costly operation twice without code failing. Before the change, the Stream would probably report some exception on the second traversal.
   
   > f the intent is to allow for multiple iterations, .collect(Collectors.toList()) would be preferable to avoid recomputing.
   
   I think the interface intent was to avoid loading all data into memory at once. 
   
   A yet different approach would be to let `encryption.decrypt` take `Iterator` (or `Stream`) instead of `Iterable.`
   I think this one is actually best, but would prefer to have buy-in first before going that route. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on pull request #4806: Fix invalid Stream to Iterable conversion

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on PR #4806:
URL: https://github.com/apache/iceberg/pull/4806#issuecomment-1135190360

   @findepi Do you have more context on what the issue is here.  I may be reading this wrong but the new and old approaches appear to be equivalent.  Both approaches result in a function returning an iterator that is only traversed once.  If the intent is to allow for multiple iterations, `.collect(Collections.toList())` would be preferable to avoid recomputing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #4806: Fix invalid Stream to Iterable conversion in decryption

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #4806:
URL: https://github.com/apache/iceberg/pull/4806#discussion_r884323652


##########
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/ArrowReader.java:
##########
@@ -268,10 +268,10 @@ private static final class VectorizedCombinedScanIterator implements CloseableIt
           .map(entry -> EncryptedFiles.encryptedInput(io.newInputFile(entry.getKey()), entry.getValue()));
 
       // decrypt with the batch call to avoid multiple RPCs to a key server, if possible
-      Iterable<InputFile> decryptedFiles = encryptionManager.decrypt(encrypted::iterator);
+      Iterator<InputFile> decryptedFiles = encryptionManager.decrypt(encrypted.iterator());

Review Comment:
   While this isn't technically correct, I think it is better not to change a public API. If you think this is a significant problem, then we have to options:
   1. Collect `encrypted` into a list
   2. Create a `OneTimeIterable` to wrap `Stream::iterator` instead



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi closed pull request #4806: Fix invalid Stream to Iterable conversion in decryption

Posted by GitBox <gi...@apache.org>.
findepi closed pull request #4806: Fix invalid Stream to Iterable conversion in decryption
URL: https://github.com/apache/iceberg/pull/4806


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on pull request #4806: Fix invalid Stream to Iterable conversion

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on PR #4806:
URL: https://github.com/apache/iceberg/pull/4806#issuecomment-1136706252

   @findepi Yeah, I agree that the existing implementation does not adhere to the `Iterable` interface contract. 
   
   I just wasn't sure if this was breaking or just an improvement to address the contract.
   
   I'd be fine with either the Iterator or Stream approaches as it's explicit that it'll be traversed only once.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] findepi commented on pull request #4806: Fix invalid Stream to Iterable conversion

Posted by GitBox <gi...@apache.org>.
findepi commented on PR #4806:
URL: https://github.com/apache/iceberg/pull/4806#issuecomment-1134197511

   @openinx is it going in the right direction?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org