You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Jesse Anderson <je...@smokinghand.com> on 2017/02/09 01:43:18 UTC

Re: Pico WordCount

I updated my Pico Wordcount example
<http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/> to show the
new ToString class that was released in 0.5.0. You don't have to manually
convert objects to strings now if the object's toString is the format you
want to use.

On Thu, Dec 8, 2016 at 11:18 AM Robert Bradshaw <ro...@google.com> wrote:

> No less typed than any other Python program :). To add our typechecks
> one would write
>
> import apache_beam as beam, re
> with beam.Pipeline() as p:
>   (p
>    | beam.io.textio.ReadFromText("playing_cards.tsv")
>    | beam.Map(lamdba s: re.split("\\W+",
> s)).with_input_types(str).with_output_types(str)
>    | beam.combiners.Count.PerElement()
>    | beam.Map(lambda (w, c): "%s: %d" % (w, c))
>    | beam.io.textio.WriteToText("output/stringcounts")
>
> and the rest is implicit.
>
>
> On Wed, Dec 7, 2016 at 4:13 PM, Dan Halperin <dh...@google.com> wrote:
> > Is the Python one actually fully type-checked, or could it fail at
> runtime
> > b/c of a typo?
> >
> > (If latter, what would the minimal type-checked Python WordCount look
> like?)
> >
> >
> > On Thu, Dec 8, 2016 at 4:32 AM, Robert Bradshaw <ro...@google.com>
> wrote:
> >>
> >> On Wed, Dec 7, 2016 at 12:19 PM, Jesse Anderson <je...@smokinghand.com>
> >> wrote:
> >>
> >> > Only gets beaten on the KV to string conversion. JB is going to change
> >> > that.
> >>
> >> That and the imports/python creation boilerplate. But yes, very similar.
> >>
> >> > On Wed, Dec 7, 2016, 11:05 AM Robert Bradshaw <ro...@google.com>
> >> > wrote:
> >> >>
> >> >> Nice. Of course for ultimate conciseness, you should have gone with
> >> >> Python
> >> >> :)
> >> >>
> >> >> import apache_beam as beam, re
> >> >> with beam.Pipeline() as p:
> >> >>   (p
> >> >>    | beam.io.textio.ReadFromText("playing_cards.tsv")
> >> >>    | beam.Map(lamdba s: re.split("\\W+", s))
> >> >>    | beam.combiners.Count.PerElement()
> >> >>    | beam.Map(lambda (w, c): "%s: %d" % (w, c))
> >> >>    | beam.io.textio.WriteToText("output/stringcounts")
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <
> jb@nanthrax.net>
> >> >> wrote:
> >> >> > Good idea Neelesh !
> >> >> >
> >> >> > definitively something we can add to the beam-samples (great
> >> >> > complement
> >> >> > to
> >> >> > what I have on my github).
> >> >> >
> >> >> > Regards
> >> >> > JB
> >> >> >
> >> >> > On 12/07/2016 07:10 PM, Neelesh Salian wrote:
> >> >> >>
> >> >> >> Perhaps we can add this to our examples.
> >> >> >> Thank you Jesse. :)
> >> >> >>
> >> >> >> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré
> >> >> >> <jb@nanthrax.net
> >> >> >> <ma...@nanthrax.net>> wrote:
> >> >> >>
> >> >> >>     Awesome !
> >> >> >>
> >> >> >>     Thanks Jesse !
> >> >> >>
> >> >> >>     Regards
> >> >> >>     JB
> >> >> >>
> >> >> >>     On 12/07/2016 06:22 PM, Jesse Anderson wrote:
> >> >> >>
> >> >> >>         I wrote a post on the smallest WordCount
> >> >> >>         <
> http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
> >> >> >>
> >> >> >> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>>
> >> >> >> I
> >> >> >>         could
> >> >> >>         write. I go through everything line by line and talk about
> >> >> >> some
> >> >> >>         of the
> >> >> >>         newest DoFNs that allow you to easily run regular
> >> >> >> expressions
> >> >> >> in a
> >> >> >>         distributed way.
> >> >> >>
> >> >> >>         Thanks,
> >> >> >>
> >> >> >>         Jesse
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>     --
> >> >> >>     Jean-Baptiste Onofré
> >> >> >>     jbonofre@apache.org <ma...@apache.org>
> >> >> >>     http://blog.nanthrax.net
> >> >> >>     Talend - http://www.talend.com
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Neelesh Srinivas Salian
> >> >> >> Customer Operations Engineer
> >> >> >>
> >> >> >> *
> >> >> >> *
> >> >> >> *
> >> >> >> *
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Jean-Baptiste Onofré
> >> >> > jbonofre@apache.org
> >> >> > http://blog.nanthrax.net
> >> >> > Talend - http://www.talend.com
> >
> >
>

Re: Pico WordCount

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Great !

Thanks Jesse.

By the way, when back from vacation next week, I plan to resume the PoC on data format extension. I will send an update on the mailing list then.

Regards
JB

On Feb 8, 2017, 21:43, at 21:43, Jesse Anderson <je...@smokinghand.com> wrote:
>I updated my Pico Wordcount example
><http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/> to show
>the
>new ToString class that was released in 0.5.0. You don't have to
>manually
>convert objects to strings now if the object's toString is the format
>you
>want to use.
>
>On Thu, Dec 8, 2016 at 11:18 AM Robert Bradshaw <ro...@google.com>
>wrote:
>
>> No less typed than any other Python program :). To add our typechecks
>> one would write
>>
>> import apache_beam as beam, re
>> with beam.Pipeline() as p:
>>   (p
>>    | beam.io.textio.ReadFromText("playing_cards.tsv")
>>    | beam.Map(lamdba s: re.split("\\W+",
>> s)).with_input_types(str).with_output_types(str)
>>    | beam.combiners.Count.PerElement()
>>    | beam.Map(lambda (w, c): "%s: %d" % (w, c))
>>    | beam.io.textio.WriteToText("output/stringcounts")
>>
>> and the rest is implicit.
>>
>>
>> On Wed, Dec 7, 2016 at 4:13 PM, Dan Halperin <dh...@google.com>
>wrote:
>> > Is the Python one actually fully type-checked, or could it fail at
>> runtime
>> > b/c of a typo?
>> >
>> > (If latter, what would the minimal type-checked Python WordCount
>look
>> like?)
>> >
>> >
>> > On Thu, Dec 8, 2016 at 4:32 AM, Robert Bradshaw
><ro...@google.com>
>> wrote:
>> >>
>> >> On Wed, Dec 7, 2016 at 12:19 PM, Jesse Anderson
><je...@smokinghand.com>
>> >> wrote:
>> >>
>> >> > Only gets beaten on the KV to string conversion. JB is going to
>change
>> >> > that.
>> >>
>> >> That and the imports/python creation boilerplate. But yes, very
>similar.
>> >>
>> >> > On Wed, Dec 7, 2016, 11:05 AM Robert Bradshaw
><ro...@google.com>
>> >> > wrote:
>> >> >>
>> >> >> Nice. Of course for ultimate conciseness, you should have gone
>with
>> >> >> Python
>> >> >> :)
>> >> >>
>> >> >> import apache_beam as beam, re
>> >> >> with beam.Pipeline() as p:
>> >> >>   (p
>> >> >>    | beam.io.textio.ReadFromText("playing_cards.tsv")
>> >> >>    | beam.Map(lamdba s: re.split("\\W+", s))
>> >> >>    | beam.combiners.Count.PerElement()
>> >> >>    | beam.Map(lambda (w, c): "%s: %d" % (w, c))
>> >> >>    | beam.io.textio.WriteToText("output/stringcounts")
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <
>> jb@nanthrax.net>
>> >> >> wrote:
>> >> >> > Good idea Neelesh !
>> >> >> >
>> >> >> > definitively something we can add to the beam-samples (great
>> >> >> > complement
>> >> >> > to
>> >> >> > what I have on my github).
>> >> >> >
>> >> >> > Regards
>> >> >> > JB
>> >> >> >
>> >> >> > On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>> >> >> >>
>> >> >> >> Perhaps we can add this to our examples.
>> >> >> >> Thank you Jesse. :)
>> >> >> >>
>> >> >> >> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré
>> >> >> >> <jb@nanthrax.net
>> >> >> >> <ma...@nanthrax.net>> wrote:
>> >> >> >>
>> >> >> >>     Awesome !
>> >> >> >>
>> >> >> >>     Thanks Jesse !
>> >> >> >>
>> >> >> >>     Regards
>> >> >> >>     JB
>> >> >> >>
>> >> >> >>     On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>> >> >> >>
>> >> >> >>         I wrote a post on the smallest WordCount
>> >> >> >>         <
>> http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>> >> >> >>
>> >> >> >>
><http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>>
>> >> >> >> I
>> >> >> >>         could
>> >> >> >>         write. I go through everything line by line and talk
>about
>> >> >> >> some
>> >> >> >>         of the
>> >> >> >>         newest DoFNs that allow you to easily run regular
>> >> >> >> expressions
>> >> >> >> in a
>> >> >> >>         distributed way.
>> >> >> >>
>> >> >> >>         Thanks,
>> >> >> >>
>> >> >> >>         Jesse
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>     --
>> >> >> >>     Jean-Baptiste Onofré
>> >> >> >>     jbonofre@apache.org <ma...@apache.org>
>> >> >> >>     http://blog.nanthrax.net
>> >> >> >>     Talend - http://www.talend.com
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Neelesh Srinivas Salian
>> >> >> >> Customer Operations Engineer
>> >> >> >>
>> >> >> >> *
>> >> >> >> *
>> >> >> >> *
>> >> >> >> *
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Jean-Baptiste Onofré
>> >> >> > jbonofre@apache.org
>> >> >> > http://blog.nanthrax.net
>> >> >> > Talend - http://www.talend.com
>> >
>> >
>>