You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Ismaël Mejía <ie...@gmail.com> on 2018/11/01 15:21:50 UTC

Re: Data Preprocessing in Beam

SInce the extenson (module) will probably be in the form of new
PTransforms to Beam, it is worth to take a look at:
https://beam.apache.org/contribute/ptransform-style-guide/
and of course to:
https://beam.apache.org/contribute/

On Wed, Oct 31, 2018 at 6:57 PM Alex <al...@gmail.com> wrote:
>
> Great. Thank you.
>
> On Oct 31, 2018 6:32 PM, Kenneth Knowles <kl...@google.com> wrote:
>
> The word "extension" doesn't really mean anything in the case of Beam. It is just a library. You can use the build set up of other libraries as examples.
>
> Kenn
>
> On Wed, Oct 31, 2018 at 10:23 AM Alejandro <al...@gmail.com> wrote:
>
> Hello,
>
> I am going to get familiarized on how to write a Beam extension then,
> although right now I am a little busy searching for a new job :-/.  I
> hope in a few weeks (Lets hope it doesn't take much longer to find a
> job) I can get hands on it this and contribute with this preprocessing
> extension to beam.
>
> Cheers.
>
> On 10/31/2018 02:13 PM, Ismaël Mejía wrote:
> > Hello,
> >
> > I mentored Arnaud to contribute the sketching extension into Beam and
> > from a quick look at Alex paper + implementation, I think this should
> > be an independent extension. Sketching is a collection of transforms
> > that rely on probabilistic data structures to give approximate results
> > and correspond clearly to the data sketching category.
> >
> > Alex work is clearly a different area, it is more about data
> > preprocessing and feature extraction, so I think it should be in a
> > different module.
> > Agree 100% that the best option is to do a rewrite on Java, this also
> > has the advantage of easier maintainability. It would be really nice
> > to have a new extension for this in Beam so don't hesitate to ask in
> > the mailing list / slack if you have questions.
> >
> > Regards,
> > Ismaël
> >
> > On Mon, Oct 29, 2018 at 10:38 AM Maximilian Michels <mx...@apache.org> wrote:
> >>
> >> Hey Alex,
> >>
> >> No need to reimplement. Java is the best option, since we don't
> >> currently have a Scala API in Beam.
> >>
> >> Cheers,
> >> Max
> >>
> >> On 25.10.18 21:50, Alex wrote:
> >>> Great! Right now there is a lot on that code I do not understand, hope in the next days I can document myself.
> >>>
> >>> Should I reimplement my algorithms in Scala? Or could I create a wrapper that interface with the sketching extension?
> >>>
> >>> Cheers.On Oct 24, 2018 15:00, Maximilian Michels <mx...@apache.org> wrote:
> >>>>
> >>>> Welcome Alejandro! Interesting work. The sketching extension looks like
> >>>> a good place for your algorithms.
> >>>>
> >>>> -Max
> >>>>
> >>>> On 23.10.18 19:05, Lukasz Cwik wrote:
> >>>>> Arnoud Fournier (afournier@talend.com <ma...@talend.com>)
> >>>>> started by adding a library to support sketching
> >>>>> (https://github.com/apache/beam/tree/master/sdks/java/extensions/sketching),
> >>>>> I feel as those some of these could be added there or possibly within
> >>>>> another extension.
> >>>>>
> >>>>> On Tue, Oct 23, 2018 at 9:54 AM Austin Bennett
> >>>>> <whatwouldaustindo@gmail.com <ma...@gmail.com>> wrote:
> >>>>>
> >>>>>       Hi Beam Devs,
> >>>>>
> >>>>>       Alejandro, copied, is an enthusiastic developer, who recently coded up:
> >>>>>       https://github.com/elbaulp/DPASF (associated paper found:
> >>>>>       https://arxiv.org/abs/1810.06021).
> >>>>>
> >>>>>       He had been looking to contribute that code to FlinkML, at which
> >>>>>       point I found him and alerted him to Beam.  He has been learning a
> >>>>>       bit on Beam recently.  Would this data-preprocessing be a welcome
> >>>>>       contribution to the project.  If yes, perhaps others better versed
> >>>>>       in internals (I'm not there yet -- though could follow along!) would
> >>>>>       be willing to provide feedback to shape this to be a suitable Beam
> >>>>>       contribution.
> >>>>>
> >>>>>       Cheers,
> >>>>>       Austin
> >>>>>
> >>>>>
>
> --
> elbauldelprogramador.com