You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Jesse Anderson <je...@smokinghand.com> on 2016/12/07 17:22:21 UTC
Pico WordCount
I wrote a post on the smallest WordCount
<http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/> I could
write. I go through everything line by line and talk about some of the
newest DoFNs that allow you to easily run regular expressions in a
distributed way.
Thanks,
Jesse
Re: Pico WordCount
Posted by James Malone <ja...@google.com>.
This is awesome. :)
On Wed, Dec 7, 2016 at 9:22 AM, Jesse Anderson <je...@smokinghand.com>
wrote:
> I wrote a post on the smallest WordCount
> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/> I could
> write. I go through everything line by line and talk about some of the
> newest DoFNs that allow you to easily run regular expressions in a
> distributed way.
>
> Thanks,
>
> Jesse
>
>
>
Re: Pico WordCount
Posted by Eric Anderson <er...@google.com>.
Looks Great! Thanks Jesse.
On Wed, Dec 7, 2016 at 9:22 AM Jesse Anderson <je...@smokinghand.com> wrote:
> I wrote a post on the smallest WordCount
> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/> I could
> write. I go through everything line by line and talk about some of the
> newest DoFNs that allow you to easily run regular expressions in a
> distributed way.
>
> Thanks,
>
> Jesse
>
>
>
Re: Pico WordCount
Posted by Neelesh Salian <ns...@cloudera.com>.
@Frances, agreed.
Opened this JIRA; https://issues.apache.org/jira/browse/BEAM-1105
Can perhaps make this to an umbrella JIRA if we have other examples to add
as well.
On Wed, Dec 7, 2016 at 10:59 AM, Frances Perry <fj...@google.com> wrote:
> Instead of adding this as a new example, should we figure out how to unify
> it with the java 7 [1] and java 8 [2] versions of MinimalWordCount?
> Everyone loves lambdas, so we should get them into the WordCount
> walkthrough [3]!
>
> [1] https://github.com/apache/incubator-beam/blob/master/
> examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java
> [2] https://github.com/apache/incubator-beam/blob/master/
> examples/java8/src/main/java/org/apache/beam/examples/
> MinimalWordCountJava8.java
> [3] http://beam.incubator.apache.org/get-started/wordcount-example/
>
>
> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
>> Good idea Neelesh !
>>
>> definitively something we can add to the beam-samples (great complement
>> to what I have on my github).
>>
>> Regards
>> JB
>>
>> On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>>
>>> Perhaps we can add this to our examples.
>>> Thank you Jesse. :)
>>>
>>> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <jb@nanthrax.net
>>> <ma...@nanthrax.net>> wrote:
>>>
>>> Awesome !
>>>
>>> Thanks Jesse !
>>>
>>> Regards
>>> JB
>>>
>>> On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>>>
>>> I wrote a post on the smallest WordCount
>>> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>>> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>> I
>>> could
>>> write. I go through everything line by line and talk about some
>>> of the
>>> newest DoFNs that allow you to easily run regular expressions in
>>> a
>>> distributed way.
>>>
>>> Thanks,
>>>
>>> Jesse
>>>
>>>
>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org <ma...@apache.org>
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>>
>>>
>>> --
>>> Neelesh Srinivas Salian
>>> Customer Operations Engineer
>>>
>>> *
>>> *
>>> *
>>> *
>>>
>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>
--
Neelesh Srinivas Salian
Customer Operations Engineer
Re: Pico WordCount
Posted by Jesse Anderson <je...@smokinghand.com>.
I was trying to show that with Beam's built-in DoFNs, you could actually do
something without creating a lambda. I think that's a big feature of Beam
that you could do something real without having to write every single thing.
On Wed, Dec 7, 2016 at 10:59 AM Frances Perry <fj...@google.com> wrote:
> Instead of adding this as a new example, should we figure out how to unify
> it with the java 7 [1] and java 8 [2] versions of MinimalWordCount?
> Everyone loves lambdas, so we should get them into the WordCount
> walkthrough [3]!
>
> [1]
> https://github.com/apache/incubator-beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java
> [2]
> https://github.com/apache/incubator-beam/blob/master/examples/java8/src/main/java/org/apache/beam/examples/MinimalWordCountJava8.java
> [3] http://beam.incubator.apache.org/get-started/wordcount-example/
>
>
> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
> Good idea Neelesh !
>
> definitively something we can add to the beam-samples (great complement to
> what I have on my github).
>
> Regards
> JB
>
> On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>
> Perhaps we can add this to our examples.
> Thank you Jesse. :)
>
> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <jb@nanthrax.net
> <ma...@nanthrax.net>> wrote:
>
> Awesome !
>
> Thanks Jesse !
>
> Regards
> JB
>
> On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>
> I wrote a post on the smallest WordCount
> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>> I
> could
> write. I go through everything line by line and talk about some
> of the
> newest DoFNs that allow you to easily run regular expressions in a
> distributed way.
>
> Thanks,
>
> Jesse
>
>
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org <ma...@apache.org>
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>
>
>
> --
> Neelesh Srinivas Salian
> Customer Operations Engineer
>
> *
> *
> *
> *
>
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>
>
Re: Pico WordCount
Posted by Frances Perry <fj...@google.com>.
Instead of adding this as a new example, should we figure out how to unify
it with the java 7 [1] and java 8 [2] versions of MinimalWordCount?
Everyone loves lambdas, so we should get them into the WordCount
walkthrough [3]!
[1]
https://github.com/apache/incubator-beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java
[2]
https://github.com/apache/incubator-beam/blob/master/examples/java8/src/main/java/org/apache/beam/examples/MinimalWordCountJava8.java
[3] http://beam.incubator.apache.org/get-started/wordcount-example/
On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:
> Good idea Neelesh !
>
> definitively something we can add to the beam-samples (great complement to
> what I have on my github).
>
> Regards
> JB
>
> On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>
>> Perhaps we can add this to our examples.
>> Thank you Jesse. :)
>>
>> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <jb@nanthrax.net
>> <ma...@nanthrax.net>> wrote:
>>
>> Awesome !
>>
>> Thanks Jesse !
>>
>> Regards
>> JB
>>
>> On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>>
>> I wrote a post on the smallest WordCount
>> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>> I
>> could
>> write. I go through everything line by line and talk about some
>> of the
>> newest DoFNs that allow you to easily run regular expressions in a
>> distributed way.
>>
>> Thanks,
>>
>> Jesse
>>
>>
>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org <ma...@apache.org>
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>>
>>
>>
>> --
>> Neelesh Srinivas Salian
>> Customer Operations Engineer
>>
>> *
>> *
>> *
>> *
>>
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
Re: Pico WordCount
Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Great !
Thanks Jesse.
By the way, when back from vacation next week, I plan to resume the PoC on data format extension. I will send an update on the mailing list then.
Regards
JB
On Feb 8, 2017, 21:43, at 21:43, Jesse Anderson <je...@smokinghand.com> wrote:
>I updated my Pico Wordcount example
><http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/> to show
>the
>new ToString class that was released in 0.5.0. You don't have to
>manually
>convert objects to strings now if the object's toString is the format
>you
>want to use.
>
>On Thu, Dec 8, 2016 at 11:18 AM Robert Bradshaw <ro...@google.com>
>wrote:
>
>> No less typed than any other Python program :). To add our typechecks
>> one would write
>>
>> import apache_beam as beam, re
>> with beam.Pipeline() as p:
>> (p
>> | beam.io.textio.ReadFromText("playing_cards.tsv")
>> | beam.Map(lamdba s: re.split("\\W+",
>> s)).with_input_types(str).with_output_types(str)
>> | beam.combiners.Count.PerElement()
>> | beam.Map(lambda (w, c): "%s: %d" % (w, c))
>> | beam.io.textio.WriteToText("output/stringcounts")
>>
>> and the rest is implicit.
>>
>>
>> On Wed, Dec 7, 2016 at 4:13 PM, Dan Halperin <dh...@google.com>
>wrote:
>> > Is the Python one actually fully type-checked, or could it fail at
>> runtime
>> > b/c of a typo?
>> >
>> > (If latter, what would the minimal type-checked Python WordCount
>look
>> like?)
>> >
>> >
>> > On Thu, Dec 8, 2016 at 4:32 AM, Robert Bradshaw
><ro...@google.com>
>> wrote:
>> >>
>> >> On Wed, Dec 7, 2016 at 12:19 PM, Jesse Anderson
><je...@smokinghand.com>
>> >> wrote:
>> >>
>> >> > Only gets beaten on the KV to string conversion. JB is going to
>change
>> >> > that.
>> >>
>> >> That and the imports/python creation boilerplate. But yes, very
>similar.
>> >>
>> >> > On Wed, Dec 7, 2016, 11:05 AM Robert Bradshaw
><ro...@google.com>
>> >> > wrote:
>> >> >>
>> >> >> Nice. Of course for ultimate conciseness, you should have gone
>with
>> >> >> Python
>> >> >> :)
>> >> >>
>> >> >> import apache_beam as beam, re
>> >> >> with beam.Pipeline() as p:
>> >> >> (p
>> >> >> | beam.io.textio.ReadFromText("playing_cards.tsv")
>> >> >> | beam.Map(lamdba s: re.split("\\W+", s))
>> >> >> | beam.combiners.Count.PerElement()
>> >> >> | beam.Map(lambda (w, c): "%s: %d" % (w, c))
>> >> >> | beam.io.textio.WriteToText("output/stringcounts")
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <
>> jb@nanthrax.net>
>> >> >> wrote:
>> >> >> > Good idea Neelesh !
>> >> >> >
>> >> >> > definitively something we can add to the beam-samples (great
>> >> >> > complement
>> >> >> > to
>> >> >> > what I have on my github).
>> >> >> >
>> >> >> > Regards
>> >> >> > JB
>> >> >> >
>> >> >> > On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>> >> >> >>
>> >> >> >> Perhaps we can add this to our examples.
>> >> >> >> Thank you Jesse. :)
>> >> >> >>
>> >> >> >> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré
>> >> >> >> <jb@nanthrax.net
>> >> >> >> <ma...@nanthrax.net>> wrote:
>> >> >> >>
>> >> >> >> Awesome !
>> >> >> >>
>> >> >> >> Thanks Jesse !
>> >> >> >>
>> >> >> >> Regards
>> >> >> >> JB
>> >> >> >>
>> >> >> >> On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>> >> >> >>
>> >> >> >> I wrote a post on the smallest WordCount
>> >> >> >> <
>> http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>> >> >> >>
>> >> >> >>
><http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>>
>> >> >> >> I
>> >> >> >> could
>> >> >> >> write. I go through everything line by line and talk
>about
>> >> >> >> some
>> >> >> >> of the
>> >> >> >> newest DoFNs that allow you to easily run regular
>> >> >> >> expressions
>> >> >> >> in a
>> >> >> >> distributed way.
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >>
>> >> >> >> Jesse
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Jean-Baptiste Onofré
>> >> >> >> jbonofre@apache.org <ma...@apache.org>
>> >> >> >> http://blog.nanthrax.net
>> >> >> >> Talend - http://www.talend.com
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Neelesh Srinivas Salian
>> >> >> >> Customer Operations Engineer
>> >> >> >>
>> >> >> >> *
>> >> >> >> *
>> >> >> >> *
>> >> >> >> *
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Jean-Baptiste Onofré
>> >> >> > jbonofre@apache.org
>> >> >> > http://blog.nanthrax.net
>> >> >> > Talend - http://www.talend.com
>> >
>> >
>>
Re: Pico WordCount
Posted by Jesse Anderson <je...@smokinghand.com>.
I updated my Pico Wordcount example
<http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/> to show the
new ToString class that was released in 0.5.0. You don't have to manually
convert objects to strings now if the object's toString is the format you
want to use.
On Thu, Dec 8, 2016 at 11:18 AM Robert Bradshaw <ro...@google.com> wrote:
> No less typed than any other Python program :). To add our typechecks
> one would write
>
> import apache_beam as beam, re
> with beam.Pipeline() as p:
> (p
> | beam.io.textio.ReadFromText("playing_cards.tsv")
> | beam.Map(lamdba s: re.split("\\W+",
> s)).with_input_types(str).with_output_types(str)
> | beam.combiners.Count.PerElement()
> | beam.Map(lambda (w, c): "%s: %d" % (w, c))
> | beam.io.textio.WriteToText("output/stringcounts")
>
> and the rest is implicit.
>
>
> On Wed, Dec 7, 2016 at 4:13 PM, Dan Halperin <dh...@google.com> wrote:
> > Is the Python one actually fully type-checked, or could it fail at
> runtime
> > b/c of a typo?
> >
> > (If latter, what would the minimal type-checked Python WordCount look
> like?)
> >
> >
> > On Thu, Dec 8, 2016 at 4:32 AM, Robert Bradshaw <ro...@google.com>
> wrote:
> >>
> >> On Wed, Dec 7, 2016 at 12:19 PM, Jesse Anderson <je...@smokinghand.com>
> >> wrote:
> >>
> >> > Only gets beaten on the KV to string conversion. JB is going to change
> >> > that.
> >>
> >> That and the imports/python creation boilerplate. But yes, very similar.
> >>
> >> > On Wed, Dec 7, 2016, 11:05 AM Robert Bradshaw <ro...@google.com>
> >> > wrote:
> >> >>
> >> >> Nice. Of course for ultimate conciseness, you should have gone with
> >> >> Python
> >> >> :)
> >> >>
> >> >> import apache_beam as beam, re
> >> >> with beam.Pipeline() as p:
> >> >> (p
> >> >> | beam.io.textio.ReadFromText("playing_cards.tsv")
> >> >> | beam.Map(lamdba s: re.split("\\W+", s))
> >> >> | beam.combiners.Count.PerElement()
> >> >> | beam.Map(lambda (w, c): "%s: %d" % (w, c))
> >> >> | beam.io.textio.WriteToText("output/stringcounts")
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <
> jb@nanthrax.net>
> >> >> wrote:
> >> >> > Good idea Neelesh !
> >> >> >
> >> >> > definitively something we can add to the beam-samples (great
> >> >> > complement
> >> >> > to
> >> >> > what I have on my github).
> >> >> >
> >> >> > Regards
> >> >> > JB
> >> >> >
> >> >> > On 12/07/2016 07:10 PM, Neelesh Salian wrote:
> >> >> >>
> >> >> >> Perhaps we can add this to our examples.
> >> >> >> Thank you Jesse. :)
> >> >> >>
> >> >> >> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré
> >> >> >> <jb@nanthrax.net
> >> >> >> <ma...@nanthrax.net>> wrote:
> >> >> >>
> >> >> >> Awesome !
> >> >> >>
> >> >> >> Thanks Jesse !
> >> >> >>
> >> >> >> Regards
> >> >> >> JB
> >> >> >>
> >> >> >> On 12/07/2016 06:22 PM, Jesse Anderson wrote:
> >> >> >>
> >> >> >> I wrote a post on the smallest WordCount
> >> >> >> <
> http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
> >> >> >>
> >> >> >> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>>
> >> >> >> I
> >> >> >> could
> >> >> >> write. I go through everything line by line and talk about
> >> >> >> some
> >> >> >> of the
> >> >> >> newest DoFNs that allow you to easily run regular
> >> >> >> expressions
> >> >> >> in a
> >> >> >> distributed way.
> >> >> >>
> >> >> >> Thanks,
> >> >> >>
> >> >> >> Jesse
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Jean-Baptiste Onofré
> >> >> >> jbonofre@apache.org <ma...@apache.org>
> >> >> >> http://blog.nanthrax.net
> >> >> >> Talend - http://www.talend.com
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Neelesh Srinivas Salian
> >> >> >> Customer Operations Engineer
> >> >> >>
> >> >> >> *
> >> >> >> *
> >> >> >> *
> >> >> >> *
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Jean-Baptiste Onofré
> >> >> > jbonofre@apache.org
> >> >> > http://blog.nanthrax.net
> >> >> > Talend - http://www.talend.com
> >
> >
>
Re: Pico WordCount
Posted by Robert Bradshaw <ro...@google.com>.
No less typed than any other Python program :). To add our typechecks
one would write
import apache_beam as beam, re
with beam.Pipeline() as p:
(p
| beam.io.textio.ReadFromText("playing_cards.tsv")
| beam.Map(lamdba s: re.split("\\W+",
s)).with_input_types(str).with_output_types(str)
| beam.combiners.Count.PerElement()
| beam.Map(lambda (w, c): "%s: %d" % (w, c))
| beam.io.textio.WriteToText("output/stringcounts")
and the rest is implicit.
On Wed, Dec 7, 2016 at 4:13 PM, Dan Halperin <dh...@google.com> wrote:
> Is the Python one actually fully type-checked, or could it fail at runtime
> b/c of a typo?
>
> (If latter, what would the minimal type-checked Python WordCount look like?)
>
>
> On Thu, Dec 8, 2016 at 4:32 AM, Robert Bradshaw <ro...@google.com> wrote:
>>
>> On Wed, Dec 7, 2016 at 12:19 PM, Jesse Anderson <je...@smokinghand.com>
>> wrote:
>>
>> > Only gets beaten on the KV to string conversion. JB is going to change
>> > that.
>>
>> That and the imports/python creation boilerplate. But yes, very similar.
>>
>> > On Wed, Dec 7, 2016, 11:05 AM Robert Bradshaw <ro...@google.com>
>> > wrote:
>> >>
>> >> Nice. Of course for ultimate conciseness, you should have gone with
>> >> Python
>> >> :)
>> >>
>> >> import apache_beam as beam, re
>> >> with beam.Pipeline() as p:
>> >> (p
>> >> | beam.io.textio.ReadFromText("playing_cards.tsv")
>> >> | beam.Map(lamdba s: re.split("\\W+", s))
>> >> | beam.combiners.Count.PerElement()
>> >> | beam.Map(lambda (w, c): "%s: %d" % (w, c))
>> >> | beam.io.textio.WriteToText("output/stringcounts")
>> >>
>> >>
>> >>
>> >> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> >> wrote:
>> >> > Good idea Neelesh !
>> >> >
>> >> > definitively something we can add to the beam-samples (great
>> >> > complement
>> >> > to
>> >> > what I have on my github).
>> >> >
>> >> > Regards
>> >> > JB
>> >> >
>> >> > On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>> >> >>
>> >> >> Perhaps we can add this to our examples.
>> >> >> Thank you Jesse. :)
>> >> >>
>> >> >> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré
>> >> >> <jb@nanthrax.net
>> >> >> <ma...@nanthrax.net>> wrote:
>> >> >>
>> >> >> Awesome !
>> >> >>
>> >> >> Thanks Jesse !
>> >> >>
>> >> >> Regards
>> >> >> JB
>> >> >>
>> >> >> On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>> >> >>
>> >> >> I wrote a post on the smallest WordCount
>> >> >> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>> >> >>
>> >> >> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>>
>> >> >> I
>> >> >> could
>> >> >> write. I go through everything line by line and talk about
>> >> >> some
>> >> >> of the
>> >> >> newest DoFNs that allow you to easily run regular
>> >> >> expressions
>> >> >> in a
>> >> >> distributed way.
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> Jesse
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Jean-Baptiste Onofré
>> >> >> jbonofre@apache.org <ma...@apache.org>
>> >> >> http://blog.nanthrax.net
>> >> >> Talend - http://www.talend.com
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Neelesh Srinivas Salian
>> >> >> Customer Operations Engineer
>> >> >>
>> >> >> *
>> >> >> *
>> >> >> *
>> >> >> *
>> >> >
>> >> >
>> >> > --
>> >> > Jean-Baptiste Onofré
>> >> > jbonofre@apache.org
>> >> > http://blog.nanthrax.net
>> >> > Talend - http://www.talend.com
>
>
Re: Pico WordCount
Posted by Dan Halperin <dh...@google.com>.
Is the Python one actually fully type-checked, or could it fail at runtime
b/c of a typo?
(If latter, what would the minimal type-checked Python WordCount look like?)
On Thu, Dec 8, 2016 at 4:32 AM, Robert Bradshaw <ro...@google.com> wrote:
> On Wed, Dec 7, 2016 at 12:19 PM, Jesse Anderson <je...@smokinghand.com>
> wrote:
>
> > Only gets beaten on the KV to string conversion. JB is going to change
> that.
>
> That and the imports/python creation boilerplate. But yes, very similar.
>
> > On Wed, Dec 7, 2016, 11:05 AM Robert Bradshaw <ro...@google.com>
> wrote:
> >>
> >> Nice. Of course for ultimate conciseness, you should have gone with
> Python
> >> :)
> >>
> >> import apache_beam as beam, re
> >> with beam.Pipeline() as p:
> >> (p
> >> | beam.io.textio.ReadFromText("playing_cards.tsv")
> >> | beam.Map(lamdba s: re.split("\\W+", s))
> >> | beam.combiners.Count.PerElement()
> >> | beam.Map(lambda (w, c): "%s: %d" % (w, c))
> >> | beam.io.textio.WriteToText("output/stringcounts")
> >>
> >>
> >>
> >> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> >> wrote:
> >> > Good idea Neelesh !
> >> >
> >> > definitively something we can add to the beam-samples (great
> complement
> >> > to
> >> > what I have on my github).
> >> >
> >> > Regards
> >> > JB
> >> >
> >> > On 12/07/2016 07:10 PM, Neelesh Salian wrote:
> >> >>
> >> >> Perhaps we can add this to our examples.
> >> >> Thank you Jesse. :)
> >> >>
> >> >> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <
> jb@nanthrax.net
> >> >> <ma...@nanthrax.net>> wrote:
> >> >>
> >> >> Awesome !
> >> >>
> >> >> Thanks Jesse !
> >> >>
> >> >> Regards
> >> >> JB
> >> >>
> >> >> On 12/07/2016 06:22 PM, Jesse Anderson wrote:
> >> >>
> >> >> I wrote a post on the smallest WordCount
> >> >> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
> >> >> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
> >>
> >> >> I
> >> >> could
> >> >> write. I go through everything line by line and talk about
> some
> >> >> of the
> >> >> newest DoFNs that allow you to easily run regular expressions
> >> >> in a
> >> >> distributed way.
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Jesse
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jean-Baptiste Onofré
> >> >> jbonofre@apache.org <ma...@apache.org>
> >> >> http://blog.nanthrax.net
> >> >> Talend - http://www.talend.com
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Neelesh Srinivas Salian
> >> >> Customer Operations Engineer
> >> >>
> >> >> *
> >> >> *
> >> >> *
> >> >> *
> >> >
> >> >
> >> > --
> >> > Jean-Baptiste Onofré
> >> > jbonofre@apache.org
> >> > http://blog.nanthrax.net
> >> > Talend - http://www.talend.com
>
Re: Pico WordCount
Posted by Robert Bradshaw <ro...@google.com>.
On Wed, Dec 7, 2016 at 12:19 PM, Jesse Anderson <je...@smokinghand.com> wrote:
> Only gets beaten on the KV to string conversion. JB is going to change that.
That and the imports/python creation boilerplate. But yes, very similar.
> On Wed, Dec 7, 2016, 11:05 AM Robert Bradshaw <ro...@google.com> wrote:
>>
>> Nice. Of course for ultimate conciseness, you should have gone with Python
>> :)
>>
>> import apache_beam as beam, re
>> with beam.Pipeline() as p:
>> (p
>> | beam.io.textio.ReadFromText("playing_cards.tsv")
>> | beam.Map(lamdba s: re.split("\\W+", s))
>> | beam.combiners.Count.PerElement()
>> | beam.Map(lambda (w, c): "%s: %d" % (w, c))
>> | beam.io.textio.WriteToText("output/stringcounts")
>>
>>
>>
>> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> wrote:
>> > Good idea Neelesh !
>> >
>> > definitively something we can add to the beam-samples (great complement
>> > to
>> > what I have on my github).
>> >
>> > Regards
>> > JB
>> >
>> > On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>> >>
>> >> Perhaps we can add this to our examples.
>> >> Thank you Jesse. :)
>> >>
>> >> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <jb@nanthrax.net
>> >> <ma...@nanthrax.net>> wrote:
>> >>
>> >> Awesome !
>> >>
>> >> Thanks Jesse !
>> >>
>> >> Regards
>> >> JB
>> >>
>> >> On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>> >>
>> >> I wrote a post on the smallest WordCount
>> >> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>> >> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>>
>> >> I
>> >> could
>> >> write. I go through everything line by line and talk about some
>> >> of the
>> >> newest DoFNs that allow you to easily run regular expressions
>> >> in a
>> >> distributed way.
>> >>
>> >> Thanks,
>> >>
>> >> Jesse
>> >>
>> >>
>> >>
>> >> --
>> >> Jean-Baptiste Onofré
>> >> jbonofre@apache.org <ma...@apache.org>
>> >> http://blog.nanthrax.net
>> >> Talend - http://www.talend.com
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Neelesh Srinivas Salian
>> >> Customer Operations Engineer
>> >>
>> >> *
>> >> *
>> >> *
>> >> *
>> >
>> >
>> > --
>> > Jean-Baptiste Onofré
>> > jbonofre@apache.org
>> > http://blog.nanthrax.net
>> > Talend - http://www.talend.com
Re: Pico WordCount
Posted by Jesse Anderson <je...@smokinghand.com>.
Only gets beaten on the KV to string conversion. JB is going to change that.
On Wed, Dec 7, 2016, 11:05 AM Robert Bradshaw <ro...@google.com> wrote:
> Nice. Of course for ultimate conciseness, you should have gone with Python
> :)
>
> import apache_beam as beam, re
> with beam.Pipeline() as p:
> (p
> | beam.io.textio.ReadFromText("playing_cards.tsv")
> | beam.Map(lamdba s: re.split("\\W+", s))
> | beam.combiners.Count.PerElement()
> | beam.Map(lambda (w, c): "%s: %d" % (w, c))
> | beam.io.textio.WriteToText("output/stringcounts")
>
>
>
> On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
> > Good idea Neelesh !
> >
> > definitively something we can add to the beam-samples (great complement
> to
> > what I have on my github).
> >
> > Regards
> > JB
> >
> > On 12/07/2016 07:10 PM, Neelesh Salian wrote:
> >>
> >> Perhaps we can add this to our examples.
> >> Thank you Jesse. :)
> >>
> >> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <jb@nanthrax.net
> >> <ma...@nanthrax.net>> wrote:
> >>
> >> Awesome !
> >>
> >> Thanks Jesse !
> >>
> >> Regards
> >> JB
> >>
> >> On 12/07/2016 06:22 PM, Jesse Anderson wrote:
> >>
> >> I wrote a post on the smallest WordCount
> >> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
> >> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>>
> I
> >> could
> >> write. I go through everything line by line and talk about some
> >> of the
> >> newest DoFNs that allow you to easily run regular expressions
> in a
> >> distributed way.
> >>
> >> Thanks,
> >>
> >> Jesse
> >>
> >>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> jbonofre@apache.org <ma...@apache.org>
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >>
> >>
> >>
> >> --
> >> Neelesh Srinivas Salian
> >> Customer Operations Engineer
> >>
> >> *
> >> *
> >> *
> >> *
> >
> >
> > --
> > Jean-Baptiste Onofré
> > jbonofre@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
>
Re: Pico WordCount
Posted by Robert Bradshaw <ro...@google.com>.
Nice. Of course for ultimate conciseness, you should have gone with Python :)
import apache_beam as beam, re
with beam.Pipeline() as p:
(p
| beam.io.textio.ReadFromText("playing_cards.tsv")
| beam.Map(lamdba s: re.split("\\W+", s))
| beam.combiners.Count.PerElement()
| beam.Map(lambda (w, c): "%s: %d" % (w, c))
| beam.io.textio.WriteToText("output/stringcounts")
On Wed, Dec 7, 2016 at 10:14 AM, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
> Good idea Neelesh !
>
> definitively something we can add to the beam-samples (great complement to
> what I have on my github).
>
> Regards
> JB
>
> On 12/07/2016 07:10 PM, Neelesh Salian wrote:
>>
>> Perhaps we can add this to our examples.
>> Thank you Jesse. :)
>>
>> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <jb@nanthrax.net
>> <ma...@nanthrax.net>> wrote:
>>
>> Awesome !
>>
>> Thanks Jesse !
>>
>> Regards
>> JB
>>
>> On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>>
>> I wrote a post on the smallest WordCount
>> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
>> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>> I
>> could
>> write. I go through everything line by line and talk about some
>> of the
>> newest DoFNs that allow you to easily run regular expressions in a
>> distributed way.
>>
>> Thanks,
>>
>> Jesse
>>
>>
>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org <ma...@apache.org>
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>>
>>
>>
>> --
>> Neelesh Srinivas Salian
>> Customer Operations Engineer
>>
>> *
>> *
>> *
>> *
>
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
Re: Pico WordCount
Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Good idea Neelesh !
definitively something we can add to the beam-samples (great complement
to what I have on my github).
Regards
JB
On 12/07/2016 07:10 PM, Neelesh Salian wrote:
> Perhaps we can add this to our examples.
> Thank you Jesse. :)
>
> On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofr� <jb@nanthrax.net
> <ma...@nanthrax.net>> wrote:
>
> Awesome !
>
> Thanks Jesse !
>
> Regards
> JB
>
> On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>
> I wrote a post on the smallest WordCount
> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/>> I
> could
> write. I go through everything line by line and talk about some
> of the
> newest DoFNs that allow you to easily run regular expressions in a
> distributed way.
>
> Thanks,
>
> Jesse
>
>
>
> --
> Jean-Baptiste Onofr�
> jbonofre@apache.org <ma...@apache.org>
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>
>
>
> --
> Neelesh Srinivas Salian
> Customer Operations Engineer
>
> *
> *
> *
> *
--
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com
Re: Pico WordCount
Posted by Neelesh Salian <ns...@cloudera.com>.
Perhaps we can add this to our examples.
Thank you Jesse. :)
On Wed, Dec 7, 2016 at 10:07 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:
> Awesome !
>
> Thanks Jesse !
>
> Regards
> JB
>
> On 12/07/2016 06:22 PM, Jesse Anderson wrote:
>
>> I wrote a post on the smallest WordCount
>> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/> I could
>> write. I go through everything line by line and talk about some of the
>> newest DoFNs that allow you to easily run regular expressions in a
>> distributed way.
>>
>> Thanks,
>>
>> Jesse
>>
>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
--
Neelesh Srinivas Salian
Customer Operations Engineer
Re: Pico WordCount
Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Awesome !
Thanks Jesse !
Regards
JB
On 12/07/2016 06:22 PM, Jesse Anderson wrote:
> I wrote a post on the smallest WordCount
> <http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/> I could
> write. I go through everything line by line and talk about some of the
> newest DoFNs that allow you to easily run regular expressions in a
> distributed way.
>
> Thanks,
>
> Jesse
>
>
--
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com