You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Andra Lungu <an...@apache.org> on 2015/06/10 10:45:00 UTC

The correct location for zipWithIndex and zipWithUniqueId

Hey everyone,

We needed to assign unique labels as vertex values in Gelly at some point.
We got a nice suggestion on how to do that in parallel (Implemented in
https://github.com/apache/flink/pull/801#issuecomment-110654447).

Now the question is where should these two functions go? Should they be
part of the API? Something like:

class DataSet<T> {
  public DataSet<Tuple2<Long, T>> zipWithID() {}
}

or should they go in flink-contrib? Fabian, Robert and Till seem to be
in favour of
the second option.

Thanks!

Andra

Re: The correct location for zipWithIndex and zipWithUniqueId

Posted by Till Rohrmann <tr...@apache.org>.
+1 for linking from DataSet's transformations.
On Jun 12, 2015 5:27 PM, "Fabian Hueske" <fh...@gmail.com> wrote:

> Linking from the DataSet Transformations page would be good, IMO.
>
> 2015-06-12 17:11 GMT+02:00 Andra Lungu <lu...@gmail.com>:
>
> > Thanks for the replies!
> >
> > I will add the two methods in a DataSetUtils separate class. Where would
> > you put the documentation for this? I think users should be able to
> easily
> > access it. This means that it, IMO, it shouldn't go in a separate zip
> page,
> > but rather in the programming guide. Or there could be a link in the
> > DataSet Transformations page poining to this...
> >
> > What do you think?
> >
> > On Wed, Jun 10, 2015 at 12:33 PM, Till Rohrmann <till.rohrmann@gmail.com
> >
> > wrote:
> >
> > > I agree with Theo. I think it’s a nice feature to have as part of the
> > > standard API because only few users will be aware of something like
> > > DataSetUtils. However, as a first version we can make it part of
> > > DataSetUtils.
> > >
> > > Cheers,
> > > Till
> > > ​
> > >
> > > On Wed, Jun 10, 2015 at 11:52 AM Theodore Vasiloudis <
> > > theodoros.vasiloudis@gmail.com> wrote:
> > >
> > > > +1 for Fabian, but I would very much like to see this as part of the
> > API
> > > in
> > > > the future.
> > > >
> > > > This function would be very useful for FlinkML as well, as we noted
> in
> > a
> > > > recent discussion on the mailing list regarding time series datasets.
> > > >
> > > > On Wed, Jun 10, 2015 at 10:56 AM, Fabian Hueske <fh...@gmail.com>
> > > wrote:
> > > >
> > > > > As Andra said, I'd would not add it to the API at this point.
> > > > > However, I don't think it should go into a separate Maven module
> > > > > (flink-contrib) that needs to be added as dependency but rather
> into
> > > some
> > > > > DataSetUtils class in flink-java.
> > > > >
> > > > > We can easily add it to the API later, if necessary. We should
> > however,
> > > > > extend the documentation such that users are aware of the
> > DataSetUtils.
> > > > >
> > > > > Cheers, Fabian
> > > > >
> > > > > 2015-06-10 10:45 GMT+02:00 Andra Lungu <an...@apache.org>:
> > > > >
> > > > > > Hey everyone,
> > > > > >
> > > > > > We needed to assign unique labels as vertex values in Gelly at
> some
> > > > > point.
> > > > > > We got a nice suggestion on how to do that in parallel
> (Implemented
> > > in
> > > > > > https://github.com/apache/flink/pull/801#issuecomment-110654447
> ).
> > > > > >
> > > > > > Now the question is where should these two functions go? Should
> > they
> > > be
> > > > > > part of the API? Something like:
> > > > > >
> > > > > > class DataSet<T> {
> > > > > >   public DataSet<Tuple2<Long, T>> zipWithID() {}
> > > > > > }
> > > > > >
> > > > > > or should they go in flink-contrib? Fabian, Robert and Till seem
> to
> > > be
> > > > > > in favour of
> > > > > > the second option.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Andra
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: The correct location for zipWithIndex and zipWithUniqueId

Posted by Fabian Hueske <fh...@gmail.com>.
Linking from the DataSet Transformations page would be good, IMO.

2015-06-12 17:11 GMT+02:00 Andra Lungu <lu...@gmail.com>:

> Thanks for the replies!
>
> I will add the two methods in a DataSetUtils separate class. Where would
> you put the documentation for this? I think users should be able to easily
> access it. This means that it, IMO, it shouldn't go in a separate zip page,
> but rather in the programming guide. Or there could be a link in the
> DataSet Transformations page poining to this...
>
> What do you think?
>
> On Wed, Jun 10, 2015 at 12:33 PM, Till Rohrmann <ti...@gmail.com>
> wrote:
>
> > I agree with Theo. I think it’s a nice feature to have as part of the
> > standard API because only few users will be aware of something like
> > DataSetUtils. However, as a first version we can make it part of
> > DataSetUtils.
> >
> > Cheers,
> > Till
> > ​
> >
> > On Wed, Jun 10, 2015 at 11:52 AM Theodore Vasiloudis <
> > theodoros.vasiloudis@gmail.com> wrote:
> >
> > > +1 for Fabian, but I would very much like to see this as part of the
> API
> > in
> > > the future.
> > >
> > > This function would be very useful for FlinkML as well, as we noted in
> a
> > > recent discussion on the mailing list regarding time series datasets.
> > >
> > > On Wed, Jun 10, 2015 at 10:56 AM, Fabian Hueske <fh...@gmail.com>
> > wrote:
> > >
> > > > As Andra said, I'd would not add it to the API at this point.
> > > > However, I don't think it should go into a separate Maven module
> > > > (flink-contrib) that needs to be added as dependency but rather into
> > some
> > > > DataSetUtils class in flink-java.
> > > >
> > > > We can easily add it to the API later, if necessary. We should
> however,
> > > > extend the documentation such that users are aware of the
> DataSetUtils.
> > > >
> > > > Cheers, Fabian
> > > >
> > > > 2015-06-10 10:45 GMT+02:00 Andra Lungu <an...@apache.org>:
> > > >
> > > > > Hey everyone,
> > > > >
> > > > > We needed to assign unique labels as vertex values in Gelly at some
> > > > point.
> > > > > We got a nice suggestion on how to do that in parallel (Implemented
> > in
> > > > > https://github.com/apache/flink/pull/801#issuecomment-110654447).
> > > > >
> > > > > Now the question is where should these two functions go? Should
> they
> > be
> > > > > part of the API? Something like:
> > > > >
> > > > > class DataSet<T> {
> > > > >   public DataSet<Tuple2<Long, T>> zipWithID() {}
> > > > > }
> > > > >
> > > > > or should they go in flink-contrib? Fabian, Robert and Till seem to
> > be
> > > > > in favour of
> > > > > the second option.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Andra
> > > > >
> > > >
> > >
> >
>

Re: The correct location for zipWithIndex and zipWithUniqueId

Posted by Andra Lungu <lu...@gmail.com>.
Thanks for the replies!

I will add the two methods in a DataSetUtils separate class. Where would
you put the documentation for this? I think users should be able to easily
access it. This means that it, IMO, it shouldn't go in a separate zip page,
but rather in the programming guide. Or there could be a link in the
DataSet Transformations page poining to this...

What do you think?

On Wed, Jun 10, 2015 at 12:33 PM, Till Rohrmann <ti...@gmail.com>
wrote:

> I agree with Theo. I think it’s a nice feature to have as part of the
> standard API because only few users will be aware of something like
> DataSetUtils. However, as a first version we can make it part of
> DataSetUtils.
>
> Cheers,
> Till
> ​
>
> On Wed, Jun 10, 2015 at 11:52 AM Theodore Vasiloudis <
> theodoros.vasiloudis@gmail.com> wrote:
>
> > +1 for Fabian, but I would very much like to see this as part of the API
> in
> > the future.
> >
> > This function would be very useful for FlinkML as well, as we noted in a
> > recent discussion on the mailing list regarding time series datasets.
> >
> > On Wed, Jun 10, 2015 at 10:56 AM, Fabian Hueske <fh...@gmail.com>
> wrote:
> >
> > > As Andra said, I'd would not add it to the API at this point.
> > > However, I don't think it should go into a separate Maven module
> > > (flink-contrib) that needs to be added as dependency but rather into
> some
> > > DataSetUtils class in flink-java.
> > >
> > > We can easily add it to the API later, if necessary. We should however,
> > > extend the documentation such that users are aware of the DataSetUtils.
> > >
> > > Cheers, Fabian
> > >
> > > 2015-06-10 10:45 GMT+02:00 Andra Lungu <an...@apache.org>:
> > >
> > > > Hey everyone,
> > > >
> > > > We needed to assign unique labels as vertex values in Gelly at some
> > > point.
> > > > We got a nice suggestion on how to do that in parallel (Implemented
> in
> > > > https://github.com/apache/flink/pull/801#issuecomment-110654447).
> > > >
> > > > Now the question is where should these two functions go? Should they
> be
> > > > part of the API? Something like:
> > > >
> > > > class DataSet<T> {
> > > >   public DataSet<Tuple2<Long, T>> zipWithID() {}
> > > > }
> > > >
> > > > or should they go in flink-contrib? Fabian, Robert and Till seem to
> be
> > > > in favour of
> > > > the second option.
> > > >
> > > > Thanks!
> > > >
> > > > Andra
> > > >
> > >
> >
>

Re: The correct location for zipWithIndex and zipWithUniqueId

Posted by Till Rohrmann <ti...@gmail.com>.
I agree with Theo. I think it’s a nice feature to have as part of the
standard API because only few users will be aware of something like
DataSetUtils. However, as a first version we can make it part of
DataSetUtils.

Cheers,
Till
​

On Wed, Jun 10, 2015 at 11:52 AM Theodore Vasiloudis <
theodoros.vasiloudis@gmail.com> wrote:

> +1 for Fabian, but I would very much like to see this as part of the API in
> the future.
>
> This function would be very useful for FlinkML as well, as we noted in a
> recent discussion on the mailing list regarding time series datasets.
>
> On Wed, Jun 10, 2015 at 10:56 AM, Fabian Hueske <fh...@gmail.com> wrote:
>
> > As Andra said, I'd would not add it to the API at this point.
> > However, I don't think it should go into a separate Maven module
> > (flink-contrib) that needs to be added as dependency but rather into some
> > DataSetUtils class in flink-java.
> >
> > We can easily add it to the API later, if necessary. We should however,
> > extend the documentation such that users are aware of the DataSetUtils.
> >
> > Cheers, Fabian
> >
> > 2015-06-10 10:45 GMT+02:00 Andra Lungu <an...@apache.org>:
> >
> > > Hey everyone,
> > >
> > > We needed to assign unique labels as vertex values in Gelly at some
> > point.
> > > We got a nice suggestion on how to do that in parallel (Implemented in
> > > https://github.com/apache/flink/pull/801#issuecomment-110654447).
> > >
> > > Now the question is where should these two functions go? Should they be
> > > part of the API? Something like:
> > >
> > > class DataSet<T> {
> > >   public DataSet<Tuple2<Long, T>> zipWithID() {}
> > > }
> > >
> > > or should they go in flink-contrib? Fabian, Robert and Till seem to be
> > > in favour of
> > > the second option.
> > >
> > > Thanks!
> > >
> > > Andra
> > >
> >
>

Re: The correct location for zipWithIndex and zipWithUniqueId

Posted by Theodore Vasiloudis <th...@gmail.com>.
+1 for Fabian, but I would very much like to see this as part of the API in
the future.

This function would be very useful for FlinkML as well, as we noted in a
recent discussion on the mailing list regarding time series datasets.

On Wed, Jun 10, 2015 at 10:56 AM, Fabian Hueske <fh...@gmail.com> wrote:

> As Andra said, I'd would not add it to the API at this point.
> However, I don't think it should go into a separate Maven module
> (flink-contrib) that needs to be added as dependency but rather into some
> DataSetUtils class in flink-java.
>
> We can easily add it to the API later, if necessary. We should however,
> extend the documentation such that users are aware of the DataSetUtils.
>
> Cheers, Fabian
>
> 2015-06-10 10:45 GMT+02:00 Andra Lungu <an...@apache.org>:
>
> > Hey everyone,
> >
> > We needed to assign unique labels as vertex values in Gelly at some
> point.
> > We got a nice suggestion on how to do that in parallel (Implemented in
> > https://github.com/apache/flink/pull/801#issuecomment-110654447).
> >
> > Now the question is where should these two functions go? Should they be
> > part of the API? Something like:
> >
> > class DataSet<T> {
> >   public DataSet<Tuple2<Long, T>> zipWithID() {}
> > }
> >
> > or should they go in flink-contrib? Fabian, Robert and Till seem to be
> > in favour of
> > the second option.
> >
> > Thanks!
> >
> > Andra
> >
>

Re: The correct location for zipWithIndex and zipWithUniqueId

Posted by Fabian Hueske <fh...@gmail.com>.
As Andra said, I'd would not add it to the API at this point.
However, I don't think it should go into a separate Maven module
(flink-contrib) that needs to be added as dependency but rather into some
DataSetUtils class in flink-java.

We can easily add it to the API later, if necessary. We should however,
extend the documentation such that users are aware of the DataSetUtils.

Cheers, Fabian

2015-06-10 10:45 GMT+02:00 Andra Lungu <an...@apache.org>:

> Hey everyone,
>
> We needed to assign unique labels as vertex values in Gelly at some point.
> We got a nice suggestion on how to do that in parallel (Implemented in
> https://github.com/apache/flink/pull/801#issuecomment-110654447).
>
> Now the question is where should these two functions go? Should they be
> part of the API? Something like:
>
> class DataSet<T> {
>   public DataSet<Tuple2<Long, T>> zipWithID() {}
> }
>
> or should they go in flink-contrib? Fabian, Robert and Till seem to be
> in favour of
> the second option.
>
> Thanks!
>
> Andra
>