You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/10/22 19:06:21 UTC

[GitHub] [iceberg] stevenzwu commented on issue #1643: Flink RowDataIterator should support reuseObject flag

stevenzwu commented on issue #1643:
URL: https://github.com/apache/iceberg/issues/1643#issuecomment-714700304


   Let's look at the nextRecord contract from the `InputFormat`. reuse or not is controlled by the caller of `nextRecord(OT reuse)`. 
   ```
   	/**
   	 * @param reuse Object that may be reused.
   	 */
   	OT nextRecord(OT reuse) throws IOException;
   ```
   
   If the input `reuse` arg is null, there is no reuse. Otherwise, input arg should be used for deserialization. That is how `BinaryInputFormat` implemented.
   ```
   	@Override
   	public T nextRecord(T record) throws IOException {
                   // ...
   		record = this.deserialize(record, this.dataInputStream);
                   // ...
   	}
   ```
   
   `CloseableIterable<T>` interface is just a good fit for InputFormat interface. Now the `FlinkInputFormat` ignore the reuse hint from caller and simply returns reused RowData object.
   ```
     @Override
     public RowData nextRecord(RowData reuse) {
       return iterator.next();
     }
   ```
   
   So technically, `FlinkInputFormat` is not conforming to the contract of `InputFormat` interface.  If we truly want to implement the correct nextRecord contract, we need to change the `CloseableIterable/Iterator` to support `next(T reuse)`, which is impossible to change the java `Iterator` interface. If it can't conform to the contract, maybe the default behavior should be the more conservative (and less performant) behavior of no reuse.
   
   We don't use InputFormat internally. So its behavior is not very relevant to us. This is purely a semantic discussion. I will address other issues in separate comments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org