You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Flavio Pompermaier <po...@okkam.it> on 2015/04/30 12:05:23 UTC
Distinct lines in a file
Hi to all,
I'd like to do a unique lines of a file with Flink. Do I really need to
make a map from String to Tuple1<String>, call unique() and then another
map from Tuple1 to String again before output?
Is there a smarter way to do it?
Best,
Flavio
Re: Distinct lines in a file
Posted by Fabian Hueske <fh...@gmail.com>.
Hi Flavio,
I agree, distinct() is a bit limited right now and in fact, there is no
good reason for that except nobody found time to improve it.
You can use distinct(KeySelector k) to work directly on DataSet<String> but
that's not very convenient either:
DataSet<String> strings = env.fromElements("Hello", "Hello", "World",
"Hello");
strings.distinct(new KeySelector<String, String>() {
@Override
public String getKey(String value) throws Exception {
return value;
}
}).print();
Making distinct more generic should take long.
I'll open a JIRA and might eventually fix it, if nobody picks it up.
Cheers, Fabian
2015-04-30 12:05 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
> Hi to all,
> I'd like to do a unique lines of a file with Flink. Do I really need to
> make a map from String to Tuple1<String>, call unique() and then another
> map from Tuple1 to String again before output?
> Is there a smarter way to do it?
>
> Best,
> Flavio
>