You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Jesse Anderson <je...@smokinghand.com> on 2016/12/07 17:22:21 UTC

Pico WordCount

I wrote a post on the smallest WordCount
<http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/> I could
write. I go through everything line by line and talk about some of the
newest DoFNs that allow you to easily run regular expressions in a
distributed way.

Thanks,

Jesse

Re: Pico WordCount

Posted by James Malone <ja...@google.com>.
This is awesome. :)

On Wed, Dec 7, 2016 at 9:22 AM, Jesse Anderson <je...@smokinghand.com>
wrote:

> I wrote a post on the smallest WordCount
> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/> I could
> write. I go through everything line by line and talk about some of the
> newest DoFNs that allow you to easily run regular expressions in a
> distributed way.
>
> Thanks,
>
> Jesse
>
>
>

Re: Pico WordCount

Posted by Eric Anderson <er...@google.com>.
Looks Great! Thanks Jesse.

On Wed, Dec 7, 2016 at 9:22 AM Jesse Anderson <je...@smokinghand.com> wrote:

> I wrote a post on the smallest WordCount
> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/> I could
> write. I go through everything line by line and talk about some of the
> newest DoFNs that allow you to easily run regular expressions in a
> distributed way.
>
> Thanks,
>
> Jesse
>
>
>

Re: Pico WordCount

Posted by Neelesh Salian <ns...@cloudera.com>.
@Frances, agreed.
Opened this JIRA; https://issues.apache.org/jira/browse/BEAM-1105
Can perhaps make this to an umbrella JIRA if we have other examples to add
as well.



On Wed, Dec 7, 2016 at 10:59 AM, Frances Perry <fj...@google.com> wrote:

> Instead of adding this as a new example, should we figure out how to unify
> it with the java 7 [1] and java 8 [2] versions of MinimalWordCount?
> Everyone loves lambdas, so we should get them into the WordCount
> walkthrough [3]!
>
> [1] https://github.com/apache/incubator-beam/blob/master/
> examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java
> [2] https://github.com/apache/incubator-beam/blob/master/
> examples/java8/src/main/java/org/apache/beam/examples/
> MinimalWordCountJava8.java
> [3] http://beam.incubator.apache.org/get-started/wordcount-example/
>
>
> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
>> Good idea Neelesh !
>>
>> definitively something we can add to the beam-samples (great complement
>> to what I have on my github).
>>
>> Regards
>> JB
>>
>> On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>>
>>> Perhaps we can add this to our examples.
>>> Thank you Jesse. :)
>>>
>>> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <jb@nanthrax.net
>>> <ma...@nanthrax.net>> wrote:
>>>
>>>     Awesome !
>>>
>>>     Thanks Jesse !
>>>
>>>     Regards
>>>     JB
>>>
>>>     On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>>>
>>>         I wrote a post on the smallest WordCount
>>>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>>>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>> I
>>>         could
>>>         write. I go through everything line by line and talk about some
>>>         of the
>>>         newest DoFNs that allow you to easily run regular expressions in
>>> a
>>>         distributed way.
>>>
>>>         Thanks,
>>>
>>>         Jesse
>>>
>>>
>>>
>>>     --
>>>     Jean-Baptiste Onofré
>>>     jbonofre@apache.org <ma...@apache.org>
>>>     http://blog.nanthrax.net
>>>     Talend - http://www.talend.com
>>>
>>>
>>>
>>>
>>> --
>>> Neelesh Srinivas Salian
>>> Customer Operations Engineer
>>>
>>> *
>>> *
>>> *
>>> *
>>>
>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>


-- 
Neelesh Srinivas Salian
Customer Operations Engineer

Re: Pico WordCount

Posted by Jesse Anderson <je...@smokinghand.com>.
I was trying to show that with Beam's built-in DoFNs, you could actually do
something without creating a lambda. I think that's a big feature of Beam
that you could do something real without having to write every single thing.

On Wed, Dec 7, 2016 at 10:59 AM Frances Perry <fj...@google.com> wrote:

> Instead of adding this as a new example, should we figure out how to unify
> it with the java 7 [1] and java 8 [2] versions of MinimalWordCount?
> Everyone loves lambdas, so we should get them into the WordCount
> walkthrough [3]!
>
> [1]
> https://github.com/apache/incubator-beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java
> [2]
> https://github.com/apache/incubator-beam/blob/master/examples/java8/src/main/java/org/apache/beam/examples/MinimalWordCountJava8.java
> [3] http://beam.incubator.apache.org/get-started/wordcount-example/
>
>
> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
> Good idea Neelesh !
>
> definitively something we can add to the beam-samples (great complement to
> what I have on my github).
>
> Regards
> JB
>
> On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>
> Perhaps we can add this to our examples.
> Thank you Jesse. :)
>
> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <jb@nanthrax.net
> <ma...@nanthrax.net>> wrote:
>
>     Awesome !
>
>     Thanks Jesse !
>
>     Regards
>     JB
>
>     On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>
>         I wrote a post on the smallest WordCount
>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>> I
>         could
>         write. I go through everything line by line and talk about some
>         of the
>         newest DoFNs that allow you to easily run regular expressions in a
>         distributed way.
>
>         Thanks,
>
>         Jesse
>
>
>
>     --
>     Jean-Baptiste Onofré
>     jbonofre@apache.org <ma...@apache.org>
>     http://blog.nanthrax.net
>     Talend - http://www.talend.com
>
>
>
>
> --
> Neelesh Srinivas Salian
> Customer Operations Engineer
>
> *
> *
> *
> *
>
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>
>

Re: Pico WordCount

Posted by Frances Perry <fj...@google.com>.
Instead of adding this as a new example, should we figure out how to unify
it with the java 7 [1] and java 8 [2] versions of MinimalWordCount?
Everyone loves lambdas, so we should get them into the WordCount
walkthrough [3]!

[1]
https://github.com/apache/incubator-beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java
[2]
https://github.com/apache/incubator-beam/blob/master/examples/java8/src/main/java/org/apache/beam/examples/MinimalWordCountJava8.java
[3] http://beam.incubator.apache.org/get-started/wordcount-example/


On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Good idea Neelesh !
>
> definitively something we can add to the beam-samples (great complement to
> what I have on my github).
>
> Regards
> JB
>
> On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>
>> Perhaps we can add this to our examples.
>> Thank you Jesse. :)
>>
>> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <jb@nanthrax.net
>> <ma...@nanthrax.net>> wrote:
>>
>>     Awesome !
>>
>>     Thanks Jesse !
>>
>>     Regards
>>     JB
>>
>>     On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>>
>>         I wrote a post on the smallest WordCount
>>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>> I
>>         could
>>         write. I go through everything line by line and talk about some
>>         of the
>>         newest DoFNs that allow you to easily run regular expressions in a
>>         distributed way.
>>
>>         Thanks,
>>
>>         Jesse
>>
>>
>>
>>     --
>>     Jean-Baptiste Onofré
>>     jbonofre@apache.org <ma...@apache.org>
>>     http://blog.nanthrax.net
>>     Talend - http://www.talend.com
>>
>>
>>
>>
>> --
>> Neelesh Srinivas Salian
>> Customer Operations Engineer
>>
>> *
>> *
>> *
>> *
>>
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Pico WordCount

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Great !

Thanks Jesse.

By the way, when back from vacation next week, I plan to resume the PoC on data format extension. I will send an update on the mailing list then.

Regards
JB

On Feb 8, 2017, 21:43, at 21:43, Jesse Anderson <je...@smokinghand.com> wrote:
>I updated my Pico Wordcount example
><http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/> to show
>the
>new ToString class that was released in 0.5.0. You don't have to
>manually
>convert objects to strings now if the object's toString is the format
>you
>want to use.
>
>On Thu, Dec 8, 2016 at 11:18 AM Robert Bradshaw <ro...@google.com>
>wrote:
>
>> No less typed than any other Python program :). To add our typechecks
>> one would write
>>
>> import apache_beam as beam, re
>> with beam.Pipeline() as p:
>>   (p
>>    | beam.io.textio.ReadFromText("playing_cards.tsv")
>>    | beam.Map(lamdba s: re.split("\\W+",
>> s)).with_input_types(str).with_output_types(str)
>>    | beam.combiners.Count.PerElement()
>>    | beam.Map(lambda (w, c): "%s: %d" % (w, c))
>>    | beam.io.textio.WriteToText("output/stringcounts")
>>
>> and the rest is implicit.
>>
>>
>> On Wed, Dec 7, 2016 at 4:13 PM, Dan Halperin <dh...@google.com>
>wrote:
>> > Is the Python one actually fully type-checked, or could it fail at
>> runtime
>> > b/c of a typo?
>> >
>> > (If latter, what would the minimal type-checked Python WordCount
>look
>> like?)
>> >
>> >
>> > On Thu, Dec 8, 2016 at 4:32 AM, Robert Bradshaw
><ro...@google.com>
>> wrote:
>> >>
>> >> On Wed, Dec 7, 2016 at 12:19 PM, Jesse Anderson
><je...@smokinghand.com>
>> >> wrote:
>> >>
>> >> > Only gets beaten on the KV to string conversion. JB is going to
>change
>> >> > that.
>> >>
>> >> That and the imports/python creation boilerplate. But yes, very
>similar.
>> >>
>> >> > On Wed, Dec 7, 2016, 11:05 AM Robert Bradshaw
><ro...@google.com>
>> >> > wrote:
>> >> >>
>> >> >> Nice. Of course for ultimate conciseness, you should have gone
>with
>> >> >> Python
>> >> >> :)
>> >> >>
>> >> >> import apache_beam as beam, re
>> >> >> with beam.Pipeline() as p:
>> >> >>   (p
>> >> >>    | beam.io.textio.ReadFromText("playing_cards.tsv")
>> >> >>    | beam.Map(lamdba s: re.split("\\W+", s))
>> >> >>    | beam.combiners.Count.PerElement()
>> >> >>    | beam.Map(lambda (w, c): "%s: %d" % (w, c))
>> >> >>    | beam.io.textio.WriteToText("output/stringcounts")
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <
>> jb@nanthrax.net>
>> >> >> wrote:
>> >> >> > Good idea Neelesh !
>> >> >> >
>> >> >> > definitively something we can add to the beam-samples (great
>> >> >> > complement
>> >> >> > to
>> >> >> > what I have on my github).
>> >> >> >
>> >> >> > Regards
>> >> >> > JB
>> >> >> >
>> >> >> > On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>> >> >> >>
>> >> >> >> Perhaps we can add this to our examples.
>> >> >> >> Thank you Jesse. :)
>> >> >> >>
>> >> >> >> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré
>> >> >> >> <jb@nanthrax.net
>> >> >> >> <ma...@nanthrax.net>> wrote:
>> >> >> >>
>> >> >> >>     Awesome !
>> >> >> >>
>> >> >> >>     Thanks Jesse !
>> >> >> >>
>> >> >> >>     Regards
>> >> >> >>     JB
>> >> >> >>
>> >> >> >>     On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>> >> >> >>
>> >> >> >>         I wrote a post on the smallest WordCount
>> >> >> >>         <
>> http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>> >> >> >>
>> >> >> >>
><http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>>
>> >> >> >> I
>> >> >> >>         could
>> >> >> >>         write. I go through everything line by line and talk
>about
>> >> >> >> some
>> >> >> >>         of the
>> >> >> >>         newest DoFNs that allow you to easily run regular
>> >> >> >> expressions
>> >> >> >> in a
>> >> >> >>         distributed way.
>> >> >> >>
>> >> >> >>         Thanks,
>> >> >> >>
>> >> >> >>         Jesse
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>     --
>> >> >> >>     Jean-Baptiste Onofré
>> >> >> >>     jbonofre@apache.org <ma...@apache.org>
>> >> >> >>     http://blog.nanthrax.net
>> >> >> >>     Talend - http://www.talend.com
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Neelesh Srinivas Salian
>> >> >> >> Customer Operations Engineer
>> >> >> >>
>> >> >> >> *
>> >> >> >> *
>> >> >> >> *
>> >> >> >> *
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Jean-Baptiste Onofré
>> >> >> > jbonofre@apache.org
>> >> >> > http://blog.nanthrax.net
>> >> >> > Talend - http://www.talend.com
>> >
>> >
>>

Re: Pico WordCount

Posted by Jesse Anderson <je...@smokinghand.com>.
I updated my Pico Wordcount example
<http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/> to show the
new ToString class that was released in 0.5.0. You don't have to manually
convert objects to strings now if the object's toString is the format you
want to use.

On Thu, Dec 8, 2016 at 11:18 AM Robert Bradshaw <ro...@google.com> wrote:

> No less typed than any other Python program :). To add our typechecks
> one would write
>
> import apache_beam as beam, re
> with beam.Pipeline() as p:
>   (p
>    | beam.io.textio.ReadFromText("playing_cards.tsv")
>    | beam.Map(lamdba s: re.split("\\W+",
> s)).with_input_types(str).with_output_types(str)
>    | beam.combiners.Count.PerElement()
>    | beam.Map(lambda (w, c): "%s: %d" % (w, c))
>    | beam.io.textio.WriteToText("output/stringcounts")
>
> and the rest is implicit.
>
>
> On Wed, Dec 7, 2016 at 4:13 PM, Dan Halperin <dh...@google.com> wrote:
> > Is the Python one actually fully type-checked, or could it fail at
> runtime
> > b/c of a typo?
> >
> > (If latter, what would the minimal type-checked Python WordCount look
> like?)
> >
> >
> > On Thu, Dec 8, 2016 at 4:32 AM, Robert Bradshaw <ro...@google.com>
> wrote:
> >>
> >> On Wed, Dec 7, 2016 at 12:19 PM, Jesse Anderson <je...@smokinghand.com>
> >> wrote:
> >>
> >> > Only gets beaten on the KV to string conversion. JB is going to change
> >> > that.
> >>
> >> That and the imports/python creation boilerplate. But yes, very similar.
> >>
> >> > On Wed, Dec 7, 2016, 11:05 AM Robert Bradshaw <ro...@google.com>
> >> > wrote:
> >> >>
> >> >> Nice. Of course for ultimate conciseness, you should have gone with
> >> >> Python
> >> >> :)
> >> >>
> >> >> import apache_beam as beam, re
> >> >> with beam.Pipeline() as p:
> >> >>   (p
> >> >>    | beam.io.textio.ReadFromText("playing_cards.tsv")
> >> >>    | beam.Map(lamdba s: re.split("\\W+", s))
> >> >>    | beam.combiners.Count.PerElement()
> >> >>    | beam.Map(lambda (w, c): "%s: %d" % (w, c))
> >> >>    | beam.io.textio.WriteToText("output/stringcounts")
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <
> jb@nanthrax.net>
> >> >> wrote:
> >> >> > Good idea Neelesh !
> >> >> >
> >> >> > definitively something we can add to the beam-samples (great
> >> >> > complement
> >> >> > to
> >> >> > what I have on my github).
> >> >> >
> >> >> > Regards
> >> >> > JB
> >> >> >
> >> >> > On 12/07/2016 07:10 PM, Neelesh Salian wrote:
> >> >> >>
> >> >> >> Perhaps we can add this to our examples.
> >> >> >> Thank you Jesse. :)
> >> >> >>
> >> >> >> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré
> >> >> >> <jb@nanthrax.net
> >> >> >> <ma...@nanthrax.net>> wrote:
> >> >> >>
> >> >> >>     Awesome !
> >> >> >>
> >> >> >>     Thanks Jesse !
> >> >> >>
> >> >> >>     Regards
> >> >> >>     JB
> >> >> >>
> >> >> >>     On 12/07/2016 06:22 PM, Jesse Anderson wrote:
> >> >> >>
> >> >> >>         I wrote a post on the smallest WordCount
> >> >> >>         <
> http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
> >> >> >>
> >> >> >> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>>
> >> >> >> I
> >> >> >>         could
> >> >> >>         write. I go through everything line by line and talk about
> >> >> >> some
> >> >> >>         of the
> >> >> >>         newest DoFNs that allow you to easily run regular
> >> >> >> expressions
> >> >> >> in a
> >> >> >>         distributed way.
> >> >> >>
> >> >> >>         Thanks,
> >> >> >>
> >> >> >>         Jesse
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>     --
> >> >> >>     Jean-Baptiste Onofré
> >> >> >>     jbonofre@apache.org <ma...@apache.org>
> >> >> >>     http://blog.nanthrax.net
> >> >> >>     Talend - http://www.talend.com
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Neelesh Srinivas Salian
> >> >> >> Customer Operations Engineer
> >> >> >>
> >> >> >> *
> >> >> >> *
> >> >> >> *
> >> >> >> *
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Jean-Baptiste Onofré
> >> >> > jbonofre@apache.org
> >> >> > http://blog.nanthrax.net
> >> >> > Talend - http://www.talend.com
> >
> >
>

Re: Pico WordCount

Posted by Robert Bradshaw <ro...@google.com>.
No less typed than any other Python program :). To add our typechecks
one would write

import apache_beam as beam, re
with beam.Pipeline() as p:
  (p
   | beam.io.textio.ReadFromText("playing_cards.tsv")
   | beam.Map(lamdba s: re.split("\\W+",
s)).with_input_types(str).with_output_types(str)
   | beam.combiners.Count.PerElement()
   | beam.Map(lambda (w, c): "%s: %d" % (w, c))
   | beam.io.textio.WriteToText("output/stringcounts")

and the rest is implicit.


On Wed, Dec 7, 2016 at 4:13 PM, Dan Halperin <dh...@google.com> wrote:
> Is the Python one actually fully type-checked, or could it fail at runtime
> b/c of a typo?
>
> (If latter, what would the minimal type-checked Python WordCount look like?)
>
>
> On Thu, Dec 8, 2016 at 4:32 AM, Robert Bradshaw <ro...@google.com> wrote:
>>
>> On Wed, Dec 7, 2016 at 12:19 PM, Jesse Anderson <je...@smokinghand.com>
>> wrote:
>>
>> > Only gets beaten on the KV to string conversion. JB is going to change
>> > that.
>>
>> That and the imports/python creation boilerplate. But yes, very similar.
>>
>> > On Wed, Dec 7, 2016, 11:05 AM Robert Bradshaw <ro...@google.com>
>> > wrote:
>> >>
>> >> Nice. Of course for ultimate conciseness, you should have gone with
>> >> Python
>> >> :)
>> >>
>> >> import apache_beam as beam, re
>> >> with beam.Pipeline() as p:
>> >>   (p
>> >>    | beam.io.textio.ReadFromText("playing_cards.tsv")
>> >>    | beam.Map(lamdba s: re.split("\\W+", s))
>> >>    | beam.combiners.Count.PerElement()
>> >>    | beam.Map(lambda (w, c): "%s: %d" % (w, c))
>> >>    | beam.io.textio.WriteToText("output/stringcounts")
>> >>
>> >>
>> >>
>> >> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> >> wrote:
>> >> > Good idea Neelesh !
>> >> >
>> >> > definitively something we can add to the beam-samples (great
>> >> > complement
>> >> > to
>> >> > what I have on my github).
>> >> >
>> >> > Regards
>> >> > JB
>> >> >
>> >> > On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>> >> >>
>> >> >> Perhaps we can add this to our examples.
>> >> >> Thank you Jesse. :)
>> >> >>
>> >> >> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré
>> >> >> <jb@nanthrax.net
>> >> >> <ma...@nanthrax.net>> wrote:
>> >> >>
>> >> >>     Awesome !
>> >> >>
>> >> >>     Thanks Jesse !
>> >> >>
>> >> >>     Regards
>> >> >>     JB
>> >> >>
>> >> >>     On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>> >> >>
>> >> >>         I wrote a post on the smallest WordCount
>> >> >>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>> >> >>
>> >> >> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>>
>> >> >> I
>> >> >>         could
>> >> >>         write. I go through everything line by line and talk about
>> >> >> some
>> >> >>         of the
>> >> >>         newest DoFNs that allow you to easily run regular
>> >> >> expressions
>> >> >> in a
>> >> >>         distributed way.
>> >> >>
>> >> >>         Thanks,
>> >> >>
>> >> >>         Jesse
>> >> >>
>> >> >>
>> >> >>
>> >> >>     --
>> >> >>     Jean-Baptiste Onofré
>> >> >>     jbonofre@apache.org <ma...@apache.org>
>> >> >>     http://blog.nanthrax.net
>> >> >>     Talend - http://www.talend.com
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Neelesh Srinivas Salian
>> >> >> Customer Operations Engineer
>> >> >>
>> >> >> *
>> >> >> *
>> >> >> *
>> >> >> *
>> >> >
>> >> >
>> >> > --
>> >> > Jean-Baptiste Onofré
>> >> > jbonofre@apache.org
>> >> > http://blog.nanthrax.net
>> >> > Talend - http://www.talend.com
>
>

Re: Pico WordCount

Posted by Dan Halperin <dh...@google.com>.
Is the Python one actually fully type-checked, or could it fail at runtime
b/c of a typo?

(If latter, what would the minimal type-checked Python WordCount look like?)

On Thu, Dec 8, 2016 at 4:32 AM, Robert Bradshaw <ro...@google.com> wrote:

> On Wed, Dec 7, 2016 at 12:19 PM, Jesse Anderson <je...@smokinghand.com>
> wrote:
>
> > Only gets beaten on the KV to string conversion. JB is going to change
> that.
>
> That and the imports/python creation boilerplate. But yes, very similar.
>
> > On Wed, Dec 7, 2016, 11:05 AM Robert Bradshaw <ro...@google.com>
> wrote:
> >>
> >> Nice. Of course for ultimate conciseness, you should have gone with
> Python
> >> :)
> >>
> >> import apache_beam as beam, re
> >> with beam.Pipeline() as p:
> >>   (p
> >>    | beam.io.textio.ReadFromText("playing_cards.tsv")
> >>    | beam.Map(lamdba s: re.split("\\W+", s))
> >>    | beam.combiners.Count.PerElement()
> >>    | beam.Map(lambda (w, c): "%s: %d" % (w, c))
> >>    | beam.io.textio.WriteToText("output/stringcounts")
> >>
> >>
> >>
> >> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> >> wrote:
> >> > Good idea Neelesh !
> >> >
> >> > definitively something we can add to the beam-samples (great
> complement
> >> > to
> >> > what I have on my github).
> >> >
> >> > Regards
> >> > JB
> >> >
> >> > On 12/07/2016 07:10 PM, Neelesh Salian wrote:
> >> >>
> >> >> Perhaps we can add this to our examples.
> >> >> Thank you Jesse. :)
> >> >>
> >> >> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <
> jb@nanthrax.net
> >> >> <ma...@nanthrax.net>> wrote:
> >> >>
> >> >>     Awesome !
> >> >>
> >> >>     Thanks Jesse !
> >> >>
> >> >>     Regards
> >> >>     JB
> >> >>
> >> >>     On 12/07/2016 06:22 PM, Jesse Anderson wrote:
> >> >>
> >> >>         I wrote a post on the smallest WordCount
> >> >>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
> >> >>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
> >>
> >> >> I
> >> >>         could
> >> >>         write. I go through everything line by line and talk about
> some
> >> >>         of the
> >> >>         newest DoFNs that allow you to easily run regular expressions
> >> >> in a
> >> >>         distributed way.
> >> >>
> >> >>         Thanks,
> >> >>
> >> >>         Jesse
> >> >>
> >> >>
> >> >>
> >> >>     --
> >> >>     Jean-Baptiste Onofré
> >> >>     jbonofre@apache.org <ma...@apache.org>
> >> >>     http://blog.nanthrax.net
> >> >>     Talend - http://www.talend.com
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Neelesh Srinivas Salian
> >> >> Customer Operations Engineer
> >> >>
> >> >> *
> >> >> *
> >> >> *
> >> >> *
> >> >
> >> >
> >> > --
> >> > Jean-Baptiste Onofré
> >> > jbonofre@apache.org
> >> > http://blog.nanthrax.net
> >> > Talend - http://www.talend.com
>

Re: Pico WordCount

Posted by Robert Bradshaw <ro...@google.com>.
On Wed, Dec 7, 2016 at 12:19 PM, Jesse Anderson <je...@smokinghand.com> wrote:

> Only gets beaten on the KV to string conversion. JB is going to change that.

That and the imports/python creation boilerplate. But yes, very similar.

> On Wed, Dec 7, 2016, 11:05 AM Robert Bradshaw <ro...@google.com> wrote:
>>
>> Nice. Of course for ultimate conciseness, you should have gone with Python
>> :)
>>
>> import apache_beam as beam, re
>> with beam.Pipeline() as p:
>>   (p
>>    | beam.io.textio.ReadFromText("playing_cards.tsv")
>>    | beam.Map(lamdba s: re.split("\\W+", s))
>>    | beam.combiners.Count.PerElement()
>>    | beam.Map(lambda (w, c): "%s: %d" % (w, c))
>>    | beam.io.textio.WriteToText("output/stringcounts")
>>
>>
>>
>> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> wrote:
>> > Good idea Neelesh !
>> >
>> > definitively something we can add to the beam-samples (great complement
>> > to
>> > what I have on my github).
>> >
>> > Regards
>> > JB
>> >
>> > On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>> >>
>> >> Perhaps we can add this to our examples.
>> >> Thank you Jesse. :)
>> >>
>> >> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <jb@nanthrax.net
>> >> <ma...@nanthrax.net>> wrote:
>> >>
>> >>     Awesome !
>> >>
>> >>     Thanks Jesse !
>> >>
>> >>     Regards
>> >>     JB
>> >>
>> >>     On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>> >>
>> >>         I wrote a post on the smallest WordCount
>> >>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>> >>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>>
>> >> I
>> >>         could
>> >>         write. I go through everything line by line and talk about some
>> >>         of the
>> >>         newest DoFNs that allow you to easily run regular expressions
>> >> in a
>> >>         distributed way.
>> >>
>> >>         Thanks,
>> >>
>> >>         Jesse
>> >>
>> >>
>> >>
>> >>     --
>> >>     Jean-Baptiste Onofré
>> >>     jbonofre@apache.org <ma...@apache.org>
>> >>     http://blog.nanthrax.net
>> >>     Talend - http://www.talend.com
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Neelesh Srinivas Salian
>> >> Customer Operations Engineer
>> >>
>> >> *
>> >> *
>> >> *
>> >> *
>> >
>> >
>> > --
>> > Jean-Baptiste Onofré
>> > jbonofre@apache.org
>> > http://blog.nanthrax.net
>> > Talend - http://www.talend.com

Re: Pico WordCount

Posted by Jesse Anderson <je...@smokinghand.com>.
Only gets beaten on the KV to string conversion. JB is going to change that.

On Wed, Dec 7, 2016, 11:05 AM Robert Bradshaw <ro...@google.com> wrote:

> Nice. Of course for ultimate conciseness, you should have gone with Python
> :)
>
> import apache_beam as beam, re
> with beam.Pipeline() as p:
>   (p
>    | beam.io.textio.ReadFromText("playing_cards.tsv")
>    | beam.Map(lamdba s: re.split("\\W+", s))
>    | beam.combiners.Count.PerElement()
>    | beam.Map(lambda (w, c): "%s: %d" % (w, c))
>    | beam.io.textio.WriteToText("output/stringcounts")
>
>
>
> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
> > Good idea Neelesh !
> >
> > definitively something we can add to the beam-samples (great complement
> to
> > what I have on my github).
> >
> > Regards
> > JB
> >
> > On 12/07/2016 07:10 PM, Neelesh Salian wrote:
> >>
> >> Perhaps we can add this to our examples.
> >> Thank you Jesse. :)
> >>
> >> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <jb@nanthrax.net
> >> <ma...@nanthrax.net>> wrote:
> >>
> >>     Awesome !
> >>
> >>     Thanks Jesse !
> >>
> >>     Regards
> >>     JB
> >>
> >>     On 12/07/2016 06:22 PM, Jesse Anderson wrote:
> >>
> >>         I wrote a post on the smallest WordCount
> >>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
> >>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>>
> I
> >>         could
> >>         write. I go through everything line by line and talk about some
> >>         of the
> >>         newest DoFNs that allow you to easily run regular expressions
> in a
> >>         distributed way.
> >>
> >>         Thanks,
> >>
> >>         Jesse
> >>
> >>
> >>
> >>     --
> >>     Jean-Baptiste Onofré
> >>     jbonofre@apache.org <ma...@apache.org>
> >>     http://blog.nanthrax.net
> >>     Talend - http://www.talend.com
> >>
> >>
> >>
> >>
> >> --
> >> Neelesh Srinivas Salian
> >> Customer Operations Engineer
> >>
> >> *
> >> *
> >> *
> >> *
> >
> >
> > --
> > Jean-Baptiste Onofré
> > jbonofre@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
>

Re: Pico WordCount

Posted by Robert Bradshaw <ro...@google.com>.
Nice. Of course for ultimate conciseness, you should have gone with Python :)

import apache_beam as beam, re
with beam.Pipeline() as p:
  (p
   | beam.io.textio.ReadFromText("playing_cards.tsv")
   | beam.Map(lamdba s: re.split("\\W+", s))
   | beam.combiners.Count.PerElement()
   | beam.Map(lambda (w, c): "%s: %d" % (w, c))
   | beam.io.textio.WriteToText("output/stringcounts")



On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
> Good idea Neelesh !
>
> definitively something we can add to the beam-samples (great complement to
> what I have on my github).
>
> Regards
> JB
>
> On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>>
>> Perhaps we can add this to our examples.
>> Thank you Jesse. :)
>>
>> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <jb@nanthrax.net
>> <ma...@nanthrax.net>> wrote:
>>
>>     Awesome !
>>
>>     Thanks Jesse !
>>
>>     Regards
>>     JB
>>
>>     On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>>
>>         I wrote a post on the smallest WordCount
>>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>> I
>>         could
>>         write. I go through everything line by line and talk about some
>>         of the
>>         newest DoFNs that allow you to easily run regular expressions in a
>>         distributed way.
>>
>>         Thanks,
>>
>>         Jesse
>>
>>
>>
>>     --
>>     Jean-Baptiste Onofré
>>     jbonofre@apache.org <ma...@apache.org>
>>     http://blog.nanthrax.net
>>     Talend - http://www.talend.com
>>
>>
>>
>>
>> --
>> Neelesh Srinivas Salian
>> Customer Operations Engineer
>>
>> *
>> *
>> *
>> *
>
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com

Re: Pico WordCount

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Good idea Neelesh !

definitively something we can add to the beam-samples (great complement 
to what I have on my github).

Regards
JB

On 12/07/2016 07:10 PM, Neelesh Salian wrote:
> Perhaps we can add this to our examples.
> Thank you Jesse. :)
>
> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofr� <jb@nanthrax.net
> <ma...@nanthrax.net>> wrote:
>
>     Awesome !
>
>     Thanks Jesse !
>
>     Regards
>     JB
>
>     On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>
>         I wrote a post on the smallest WordCount
>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>         <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>> I
>         could
>         write. I go through everything line by line and talk about some
>         of the
>         newest DoFNs that allow you to easily run regular expressions in a
>         distributed way.
>
>         Thanks,
>
>         Jesse
>
>
>
>     --
>     Jean-Baptiste Onofr�
>     jbonofre@apache.org <ma...@apache.org>
>     http://blog.nanthrax.net
>     Talend - http://www.talend.com
>
>
>
>
> --
> Neelesh Srinivas Salian
> Customer Operations Engineer
>
> *
> *
> *
> *

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Pico WordCount

Posted by Neelesh Salian <ns...@cloudera.com>.
Perhaps we can add this to our examples.
Thank you Jesse. :)

On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Awesome !
>
> Thanks Jesse !
>
> Regards
> JB
>
> On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>
>> I wrote a post on the smallest WordCount
>> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/> I could
>> write. I go through everything line by line and talk about some of the
>> newest DoFNs that allow you to easily run regular expressions in a
>> distributed way.
>>
>> Thanks,
>>
>> Jesse
>>
>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>



-- 
Neelesh Srinivas Salian
Customer Operations Engineer

Re: Pico WordCount

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Awesome !

Thanks Jesse !

Regards
JB

On 12/07/2016 06:22 PM, Jesse Anderson wrote:
> I wrote a post on the smallest WordCount
> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/> I could
> write. I go through everything line by line and talk about some of the
> newest DoFNs that allow you to easily run regular expressions in a
> distributed way.
>
> Thanks,
>
> Jesse
>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com