You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Chesnay Schepler (JIRA)" <ji...@apache.org> on 2018/07/31 11:12:00 UTC

[jira] [Created] (FLINK-10003) Encoder interface inefficient when wanting to use more sophisticated outputstreams

Chesnay Schepler created FLINK-10003:
----------------------------------------

             Summary: Encoder interface inefficient when wanting to use more sophisticated outputstreams
                 Key: FLINK-10003
                 URL: https://issues.apache.org/jira/browse/FLINK-10003
             Project: Flink
          Issue Type: Improvement
          Components: Streaming Connectors
    Affects Versions: 1.6.0
            Reporter: Chesnay Schepler


The {{StreamingFileSink}} uses the {{Encoder}} interface to serialize data.
{code}
public interface Encoder<IN> extends Serializable {
	void encode(IN element, OutputStream stream) throws IOException;
}
{code}

The implementation (with the exception for strings) must be provided by the user.
To use any {{OutputStream}} implementation that is a little more convenient than the base {{OutputStream}} (like {{DataOutputStream}}) requires creating a new stream for every single record. If an implementation is used that potentially buffers data users additionally have to call {{flush()}}.

Instead we could allow specifying an optional factory for the streams, that would be called once for each part file, and modify the {{Encoder}} interface to have a generic type for the output stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)