You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Aljoscha Krettek (JIRA)" <ji...@apache.org> on 2014/07/23 10:25:38 UTC

[jira] [Comment Edited] (FLINK-987) Extend TypeSerializers and -Comparators to work directly on Memory Segments

    [ https://issues.apache.org/jira/browse/FLINK-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071491#comment-14071491 ] 

Aljoscha Krettek edited comment on FLINK-987 at 7/23/14 8:24 AM:
-----------------------------------------------------------------

So, what do you think of the interface?

I also ran some tests to see what the overhead of using the seeking feature is. For this I added a new String serializer that does some fake seeking (5 tell() and 9 seek() calls) to simulate writing a header which would be the prevalent use case. The test is writing 100000 random strings to an in-memory paging output view with a segment size of 8000. It is repeated 10 times and the runtimes are added. For the non-seeking string I get 22000 msecs runtime for the seeking string I get 23000 msecs. What other testing would you propose, [~StephanEwen]?


was (Author: aljoscha):
So, what do you think of the interface?

I also ran some tests to see what the overhead of using the seeking feature is. For this I added a new String serializer that does some fake seeking to simulate writing a header which would be the prevalent use case. The test is writing 100000 random strings to an in-memory paging output view with a segment size of 8000. It is repeated 10 times and the runtimes are added. For the non-seeking string I get 22000 msecs runtime for the seeking string I get 23000 msecs. What other testing would you propose, [~StephanEwen]?

> Extend TypeSerializers and -Comparators to work directly on Memory Segments
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-987
>                 URL: https://issues.apache.org/jira/browse/FLINK-987
>             Project: Flink
>          Issue Type: Improvement
>          Components: Local Runtime
>    Affects Versions: 0.6-incubating
>            Reporter: Stephan Ewen
>            Assignee: Aljoscha Krettek
>             Fix For: 0.6-incubating
>
>
> As per discussion with [~till.rohrmann], [~uce], [~aljoscha], we suggest to change the way that the TypeSerialzers/Comparators and DataInputViews/DataOutputViews work.
> The goal is to allow more flexibility in the construction on the binary representation of data types, and to allow partial deserialization of individual fields. Both is currently prohibited by the fact that the abstraction of the memory (into which the data goes) is a stream abstraction ({{DataInputView}}, {{DataOutputView}}).
> An idea is to offer a random-access buffer like view for construction and random-access deserialization, as well as various methods to copy elements in a binary fashion between such buffers and streams.
> A possible set of methods for the {{TypeSerializer}} could be:
> {code}
> long serialize(T record, TargetBuffer buffer);
> 	
> T deserialize(T reuse, SourceBuffer source);
> 	
> void ensureBufferSufficientlyFilled(SourceBuffer source);
> 	
> <X> X deserializeField(X reuse, int logicalPos, SourceBuffer buffer);
> 	
> int getOffsetForField(int logicalPos, int offset, SourceBuffer buffer);
> 	
> void copy(DataInputView in, TargetBuffer buffer);
> 	
> void copy(SourceBuffer buffer,, DataOutputView out);
> 	
> void copy(DataInputView source, DataOutputView target);
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)