You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Aljoscha Krettek (JIRA)" <ji...@apache.org> on 2014/07/04 11:29:33 UTC
[jira] [Commented] (FLINK-987) Extend TypeSerializers and -Comparators to work directly on Memory Segments

    [ https://issues.apache.org/jira/browse/FLINK-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052295#comment-14052295 ] 

Aljoscha Krettek commented on FLINK-987:
----------------------------------------

I have some work already done but now I'm having a bit of a design issue. The problem is that in the old model we always knew how far byte buffers had been filled because we only allowed sequential writing. A simple counter for "bytes written" was enough. Now we want to allow "arbitrary" seeks which allows leaving gaps, filling them later, or overwriting previously written data. So how to we keep track of the filling level of our buffers. I came up with to solutions: 1) Check whether writing occurs in areas that are below the current fill level and only update the fill level when we write into new areas. 2) Let the serialization code specify how many bytes it has written and update accordingly. I prefer 2) since 1) requires checks for every write operation but please let me know what you think.

With 2) the TargetBuffer would have these methods in addition to the DataOutputView methods:
{code}
public void setReferenceAndLock();
public void seekFromReference(int position) throws IOException;
public void unlock(int bytesWritten) throws IOException;
{code}

where seeking is only permitted after locking the buffer first. Internally a stack of reference positions is kept because serializers can be nested.

If no locking is used we simply increment the fill level after write operations as before.


> Extend TypeSerializers and -Comparators to work directly on Memory Segments
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-987
>                 URL: https://issues.apache.org/jira/browse/FLINK-987
>             Project: Flink
>          Issue Type: Improvement
>          Components: Local Runtime
>    Affects Versions: 0.6-incubating
>            Reporter: Stephan Ewen
>            Assignee: Aljoscha Krettek
>             Fix For: 0.6-incubating
>
>
> As per discussion with [~till.rohrmann], [~uce], [~aljoscha], we suggest to change the way that the TypeSerialzers/Comparators and DataInputViews/DataOutputViews work.
> The goal is to allow more flexibility in the construction on the binary representation of data types, and to allow partial deserialization of individual fields. Both is currently prohibited by the fact that the abstraction of the memory (into which the data goes) is a stream abstraction ({{DataInputView}}, {{DataOutputView}}).
> An idea is to offer a random-access buffer like view for construction and random-access deserialization, as well as various methods to copy elements in a binary fashion between such buffers and streams.
> A possible set of methods for the {{TypeSerializer}} could be:
> {code}
> long serialize(T record, TargetBuffer buffer);
> 	
> T deserialize(T reuse, SourceBuffer source);
> 	
> void ensureBufferSufficientlyFilled(SourceBuffer source);
> 	
> <X> X deserializeField(X reuse, int logicalPos, SourceBuffer buffer);
> 	
> int getOffsetForField(int logicalPos, int offset, SourceBuffer buffer);
> 	
> void copy(DataInputView in, TargetBuffer buffer);
> 	
> void copy(SourceBuffer buffer,, DataOutputView out);
> 	
> void copy(DataInputView source, DataOutputView target);
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)