You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Chesnay Schepler (JIRA)" <ji...@apache.org> on 2014/06/29 16:07:24 UTC
[jira] [Commented] (FLINK-834) Extend writeAsText with custom
formatting function.
[ https://issues.apache.org/jira/browse/FLINK-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047125#comment-14047125 ]
Chesnay Schepler commented on FLINK-834:
----------------------------------------
which version would be more desirable:
implemented as additional map:
{code:java}
public DataSink<String> writeAsFormattedText(String filePath, final TextFormatter<T> txt) {
return this.map(new MapFunction<T, String>() {
@Override
public String map(T value) throws Exception {
return txt.format(value);
}
}).writeAsText(filePath);
}
{code}
implemented in sink:
{code:java}
...
public TextOutputFormat(Path outputPath, TextFormatter txt) {
this(outputPath, "UTF-8");
this.formatter = txt;
}
...
@Override
public void writeRecord(T record) throws IOException {
byte[] bytes = this.formatter == null
? record.toString().getBytes(charset)
: formatter.format(record).getBytes(charset);
this.stream.write(bytes);
this.stream.write(NEWLINE);
}
...
{code}
> Extend writeAsText with custom formatting function.
> ---------------------------------------------------
>
> Key: FLINK-834
> URL: https://issues.apache.org/jira/browse/FLINK-834
> Project: Flink
> Issue Type: Improvement
> Reporter: GitHub Import
> Labels: github-import, starter
> Fix For: pre-apache
>
>
> Currently, write as text uses the `toString()` method of data types to serialize the output as text. Alternatively, we have a CSV format that writes Tuple Datasets by using the `toString()` methods of the individual fields. Since Tuple's `toString()` method cannot be adapted without extending the class, it is not easily possible to define a custom output format of data sets which include Tuples.
> I think it would be good to have a way to explicitly format a text output.
> We could add a formatting function that returns a String for an input element, such as
> ```
> DataSet<Tuple2<String, MyPojo>> myDS;
> myDS.writeAsFormattedText("hdfs:///myOutPath",
> new TextFormatter<Tuple2<String, MyPojo>>() {
> @Override
> public String format(Tuple2<String, MyPojo> input) {
> return input.f0+" -> "+
> input.f1.getWhatEver()+" and "+
> input.f1.getSomethingElse();
> });
> ```
> Internally, we would use the default TextOutputFormat but with a previous Map for formatting.
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/834
> Created by: [fhueske|https://github.com/fhueske]
> Labels: enhancement, java api, simple-issue, user satisfaction,
> Milestone: Release 0.6 (unplanned)
> Created at: Mon May 19 14:39:51 CEST 2014
> State: open
--
This message was sent by Atlassian JIRA
(v6.2#6252)