You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2011/11/02 05:35:39 UTC

Please review new APIs.

Hi all,

As you know, recently combiners and IO are added.

Please review them from user viewpoint.

http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java

I'm testing multiple tasks and IO features on 100 nodes cluster using
10 tasks per node. If there's no issue, I'll close HAMA-258.

Thanks.

-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Please review new APIs.

Posted by Thomas Jungblut <th...@googlemail.com>.
Yes I'm sorry, the problem was actually that I thought we are going to be
incompatible.
But that is not correct ;)

2011/11/2 Edward J. Yoon <ed...@apache.org>

> Just FYI, one reason is that there're a lot of KeyValue stores.
>
> On Wed, Nov 2, 2011 at 11:46 PM, Thomas Jungblut
> <th...@googlemail.com> wrote:
> > Ah okay I see why.
> > But I don't see that this is very good. BTW the classes you've added from
> > Hadoop are missing the Apache header.
> >
> > Sorry for spamming.
> >
> > 2011/11/2 Thomas Jungblut <th...@googlemail.com>
> >
> >> And what is the reason to implement our own Input/output format if you
> >> stick with key/value pairs.
> >> Let's be compatible to Hadoop and use theirs.
> >>
> >> And we should really stop copying hadoop stuff arround. It is already
> >> there.
> >>
> >>
> >> 2011/11/2 Thomas Jungblut <th...@googlemail.com>
> >>
> >>> Great :)
> >>>
> >>> Do you have plans to integrate a partitioning? Currently this is just a
> >>> block assignment partitioning, hardcoded in the client.
> >>> This won't be useful for PageRank and SSSP.
> >>> This would help us in Graph package as well for the next release.
> >>>
> >>> 2011/11/2 Edward J. Yoon <ed...@apache.org>
> >>>
> >>>> > For sure I agree we should allow the former programming model with
> no
> >>>> input> without explicitly instantiating dummy inputs/splits. What
> about
> >>>> providing> two basic (different) implementations?
> >>>>
> >>>> +1
> >>>>
> >>>> I was about to.
> >>>> On Wed, Nov 2, 2011 at 9:23 PM, Tommaso Teofili
> >>>> <to...@gmail.com> wrote:
> >>>> > 2011/11/2 Thomas Jungblut <th...@googlemail.com>
> >>>> >
> >>>> >> Another point while fixing the local runner:
> >>>> >>
> >>>> >> Are we now input driven?
> >>>> >> I see in the code that the user defined task number is overriden by
> >>>> the
> >>>> >> number of splits.
> >>>> >> Was this your intention? This will actually make realtime
> processing
> >>>> with
> >>>> >> no static input a real pain.
> >>>> >> For example if you want a similar behaviour in Hadoop M/R you'll
> need
> >>>> to
> >>>> >> create dummy splits, and this is not what we should aim at.
> >>>> >>
> >>>> >> We could simply check if the user define the NullInputFormat or
> >>>> nothing and
> >>>> >> then use the number of tasks the user has configured.
> >>>> >>
> >>>> >
> >>>> > For sure I agree we should allow the former programming model with
> no
> >>>> input
> >>>> > without explicitly instantiating dummy inputs/splits. What about
> >>>> providing
> >>>> > two basic (different) implementations?
> >>>> > Tommaso
> >>>> >
> >>>> >
> >>>> >>
> >>>> >> 2011/11/2 Tommaso Teofili <to...@gmail.com>
> >>>> >>
> >>>> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> >>>> >> >
> >>>> >> > > > I'm sure that not every job actually needs a cleanup or a
> setup.
> >>>> >> > >
> >>>> >> > > You're right. Almost BSP applications should override bsp()
> method
> >>>> >> > > but, setup() and cleaner() methods are not as you said. Let's
> fix
> >>>> >> > > them.
> >>>> >> > >
> >>>> >> >
> >>>> >> > Agreed +1
> >>>> >> >
> >>>> >> >
> >>>> >> > >
> >>>> >> > > > Generally I would suggest to integrate the OutputCollector
> and
> >>>> the
> >>>> >> > > > RecordReader into the BSPPeerImpl.
> >>>> >> > > > So our peer is like the context in Hadoop.
> >>>> >> > >
> >>>> >> > > Good idea.
> >>>> >> > >
> >>>> >> >
> >>>> >> > +1 here too
> >>>> >> >
> >>>> >> > Tommaso
> >>>> >> >
> >>>> >> >
> >>>> >> > >
> >>>> >> > > On Wed, Nov 2, 2011 at 9:03 PM, Thomas Jungblut
> >>>> >> > > <th...@googlemail.com> wrote:
> >>>> >> > > > Yes. When I reworked that API, I made a default
> implementation
> >>>> in our
> >>>> >> > > > abstract BSP class.
> >>>> >> > > > So the user has to override the methods for himself, if he
> >>>> needs to.
> >>>> >> > > > I'm sure that not every job actually needs a cleanup or a
> setup.
> >>>> >> > > >
> >>>> >> > > > Generally I would suggest to integrate the OutputCollector
> and
> >>>> the
> >>>> >> > > > RecordReader into the BSPPeerImpl.
> >>>> >> > > > So our peer is like the context in Hadoop.
> >>>> >> > > > But that is just a minor thing. It is a great improvement ;)
> >>>> >> > > >
> >>>> >> > > > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> >>>> >> > > >
> >>>> >> > > >> There're bsp(), setup() and cleaner() methods.
> >>>> >> > > >>
> >>>> >> > > >> What is you suggestion?
> >>>> >> > > >>
> >>>> >> > > >> On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut
> >>>> >> > > >> <th...@googlemail.com> wrote:
> >>>> >> > > >> > Have a look at the combiner class. I know that this is
> just a
> >>>> >> > "test",
> >>>> >> > > but
> >>>> >> > > >> > it is really messy if the user does not use the methods,
> but
> >>>> is
> >>>> >> > > forced to
> >>>> >> > > >> > override them.
> >>>> >> > > >> >
> >>>> >> > > >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> >>>> >> > > >> >
> >>>> >> > > >> >> Why?
> >>>> >> > > >> >>
> >>>> >> > > >> >> On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut
> >>>> >> > > >> >> <th...@googlemail.com> wrote:
> >>>> >> > > >> >> > I totally dislike that BSP class now has abstract
> methods
> >>>> >> instead
> >>>> >> > > of
> >>>> >> > > >> >> > default implementations.
> >>>> >> > > >> >> >
> >>>> >> > > >> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> >>>> >> > > >> >> >
> >>>> >> > > >> >> >> Hi all,
> >>>> >> > > >> >> >>
> >>>> >> > > >> >> >> As you know, recently combiners and IO are added.
> >>>> >> > > >> >> >>
> >>>> >> > > >> >> >> Please review them from user viewpoint.
> >>>> >> > > >> >> >>
> >>>> >> > > >> >> >>
> >>>> >> > > >> >> >>
> >>>> >> > > >> >>
> >>>> >> > > >>
> >>>> >> > >
> >>>> >> >
> >>>> >>
> >>>>
> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
> >>>> >> > > >> >> >>
> >>>> >> > > >> >> >> I'm testing multiple tasks and IO features on 100
> nodes
> >>>> >> cluster
> >>>> >> > > using
> >>>> >> > > >> >> >> 10 tasks per node. If there's no issue, I'll close
> >>>> HAMA-258.
> >>>> >> > > >> >> >>
> >>>> >> > > >> >> >> Thanks.
> >>>> >> > > >> >> >>
> >>>> >> > > >> >> >> --
> >>>> >> > > >> >> >> Best Regards, Edward J. Yoon
> >>>> >> > > >> >> >> @eddieyoon
> >>>> >> > > >> >> >>
> >>>> >> > > >> >> >
> >>>> >> > > >> >> >
> >>>> >> > > >> >> >
> >>>> >> > > >> >> > --
> >>>> >> > > >> >> > Thomas Jungblut
> >>>> >> > > >> >> > Berlin <th...@gmail.com>
> >>>> >> > > >> >> >
> >>>> >> > > >> >>
> >>>> >> > > >> >>
> >>>> >> > > >> >>
> >>>> >> > > >> >> --
> >>>> >> > > >> >> Best Regards, Edward J. Yoon
> >>>> >> > > >> >> @eddieyoon
> >>>> >> > > >> >>
> >>>> >> > > >> >
> >>>> >> > > >> >
> >>>> >> > > >> >
> >>>> >> > > >> > --
> >>>> >> > > >> > Thomas Jungblut
> >>>> >> > > >> > Berlin <th...@gmail.com>
> >>>> >> > > >> >
> >>>> >> > > >>
> >>>> >> > > >>
> >>>> >> > > >>
> >>>> >> > > >> --
> >>>> >> > > >> Best Regards, Edward J. Yoon
> >>>> >> > > >> @eddieyoon
> >>>> >> > > >>
> >>>> >> > > >
> >>>> >> > > >
> >>>> >> > > >
> >>>> >> > > > --
> >>>> >> > > > Thomas Jungblut
> >>>> >> > > > Berlin <th...@gmail.com>
> >>>> >> > > >
> >>>> >> > >
> >>>> >> > >
> >>>> >> > >
> >>>> >> > > --
> >>>> >> > > Best Regards, Edward J. Yoon
> >>>> >> > > @eddieyoon
> >>>> >> > >
> >>>> >> >
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> --
> >>>> >> Thomas Jungblut
> >>>> >> Berlin <th...@gmail.com>
> >>>> >>
> >>>> >
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best Regards, Edward J. Yoon
> >>>> @eddieyoon
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Thomas Jungblut
> >>> Berlin <th...@gmail.com>
> >>>
> >>
> >>
> >>
> >> --
> >> Thomas Jungblut
> >> Berlin <th...@gmail.com>
> >>
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <th...@gmail.com>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Please review new APIs.

Posted by "Edward J. Yoon" <ed...@apache.org>.
Just FYI, one reason is that there're a lot of KeyValue stores.

On Wed, Nov 2, 2011 at 11:46 PM, Thomas Jungblut
<th...@googlemail.com> wrote:
> Ah okay I see why.
> But I don't see that this is very good. BTW the classes you've added from
> Hadoop are missing the Apache header.
>
> Sorry for spamming.
>
> 2011/11/2 Thomas Jungblut <th...@googlemail.com>
>
>> And what is the reason to implement our own Input/output format if you
>> stick with key/value pairs.
>> Let's be compatible to Hadoop and use theirs.
>>
>> And we should really stop copying hadoop stuff arround. It is already
>> there.
>>
>>
>> 2011/11/2 Thomas Jungblut <th...@googlemail.com>
>>
>>> Great :)
>>>
>>> Do you have plans to integrate a partitioning? Currently this is just a
>>> block assignment partitioning, hardcoded in the client.
>>> This won't be useful for PageRank and SSSP.
>>> This would help us in Graph package as well for the next release.
>>>
>>> 2011/11/2 Edward J. Yoon <ed...@apache.org>
>>>
>>>> > For sure I agree we should allow the former programming model with no
>>>> input> without explicitly instantiating dummy inputs/splits. What about
>>>> providing> two basic (different) implementations?
>>>>
>>>> +1
>>>>
>>>> I was about to.
>>>> On Wed, Nov 2, 2011 at 9:23 PM, Tommaso Teofili
>>>> <to...@gmail.com> wrote:
>>>> > 2011/11/2 Thomas Jungblut <th...@googlemail.com>
>>>> >
>>>> >> Another point while fixing the local runner:
>>>> >>
>>>> >> Are we now input driven?
>>>> >> I see in the code that the user defined task number is overriden by
>>>> the
>>>> >> number of splits.
>>>> >> Was this your intention? This will actually make realtime processing
>>>> with
>>>> >> no static input a real pain.
>>>> >> For example if you want a similar behaviour in Hadoop M/R you'll need
>>>> to
>>>> >> create dummy splits, and this is not what we should aim at.
>>>> >>
>>>> >> We could simply check if the user define the NullInputFormat or
>>>> nothing and
>>>> >> then use the number of tasks the user has configured.
>>>> >>
>>>> >
>>>> > For sure I agree we should allow the former programming model with no
>>>> input
>>>> > without explicitly instantiating dummy inputs/splits. What about
>>>> providing
>>>> > two basic (different) implementations?
>>>> > Tommaso
>>>> >
>>>> >
>>>> >>
>>>> >> 2011/11/2 Tommaso Teofili <to...@gmail.com>
>>>> >>
>>>> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>>>> >> >
>>>> >> > > > I'm sure that not every job actually needs a cleanup or a setup.
>>>> >> > >
>>>> >> > > You're right. Almost BSP applications should override bsp() method
>>>> >> > > but, setup() and cleaner() methods are not as you said. Let's fix
>>>> >> > > them.
>>>> >> > >
>>>> >> >
>>>> >> > Agreed +1
>>>> >> >
>>>> >> >
>>>> >> > >
>>>> >> > > > Generally I would suggest to integrate the OutputCollector and
>>>> the
>>>> >> > > > RecordReader into the BSPPeerImpl.
>>>> >> > > > So our peer is like the context in Hadoop.
>>>> >> > >
>>>> >> > > Good idea.
>>>> >> > >
>>>> >> >
>>>> >> > +1 here too
>>>> >> >
>>>> >> > Tommaso
>>>> >> >
>>>> >> >
>>>> >> > >
>>>> >> > > On Wed, Nov 2, 2011 at 9:03 PM, Thomas Jungblut
>>>> >> > > <th...@googlemail.com> wrote:
>>>> >> > > > Yes. When I reworked that API, I made a default implementation
>>>> in our
>>>> >> > > > abstract BSP class.
>>>> >> > > > So the user has to override the methods for himself, if he
>>>> needs to.
>>>> >> > > > I'm sure that not every job actually needs a cleanup or a setup.
>>>> >> > > >
>>>> >> > > > Generally I would suggest to integrate the OutputCollector and
>>>> the
>>>> >> > > > RecordReader into the BSPPeerImpl.
>>>> >> > > > So our peer is like the context in Hadoop.
>>>> >> > > > But that is just a minor thing. It is a great improvement ;)
>>>> >> > > >
>>>> >> > > > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>>>> >> > > >
>>>> >> > > >> There're bsp(), setup() and cleaner() methods.
>>>> >> > > >>
>>>> >> > > >> What is you suggestion?
>>>> >> > > >>
>>>> >> > > >> On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut
>>>> >> > > >> <th...@googlemail.com> wrote:
>>>> >> > > >> > Have a look at the combiner class. I know that this is just a
>>>> >> > "test",
>>>> >> > > but
>>>> >> > > >> > it is really messy if the user does not use the methods, but
>>>> is
>>>> >> > > forced to
>>>> >> > > >> > override them.
>>>> >> > > >> >
>>>> >> > > >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>>>> >> > > >> >
>>>> >> > > >> >> Why?
>>>> >> > > >> >>
>>>> >> > > >> >> On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut
>>>> >> > > >> >> <th...@googlemail.com> wrote:
>>>> >> > > >> >> > I totally dislike that BSP class now has abstract methods
>>>> >> instead
>>>> >> > > of
>>>> >> > > >> >> > default implementations.
>>>> >> > > >> >> >
>>>> >> > > >> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>>>> >> > > >> >> >
>>>> >> > > >> >> >> Hi all,
>>>> >> > > >> >> >>
>>>> >> > > >> >> >> As you know, recently combiners and IO are added.
>>>> >> > > >> >> >>
>>>> >> > > >> >> >> Please review them from user viewpoint.
>>>> >> > > >> >> >>
>>>> >> > > >> >> >>
>>>> >> > > >> >> >>
>>>> >> > > >> >>
>>>> >> > > >>
>>>> >> > >
>>>> >> >
>>>> >>
>>>> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
>>>> >> > > >> >> >>
>>>> >> > > >> >> >> I'm testing multiple tasks and IO features on 100 nodes
>>>> >> cluster
>>>> >> > > using
>>>> >> > > >> >> >> 10 tasks per node. If there's no issue, I'll close
>>>> HAMA-258.
>>>> >> > > >> >> >>
>>>> >> > > >> >> >> Thanks.
>>>> >> > > >> >> >>
>>>> >> > > >> >> >> --
>>>> >> > > >> >> >> Best Regards, Edward J. Yoon
>>>> >> > > >> >> >> @eddieyoon
>>>> >> > > >> >> >>
>>>> >> > > >> >> >
>>>> >> > > >> >> >
>>>> >> > > >> >> >
>>>> >> > > >> >> > --
>>>> >> > > >> >> > Thomas Jungblut
>>>> >> > > >> >> > Berlin <th...@gmail.com>
>>>> >> > > >> >> >
>>>> >> > > >> >>
>>>> >> > > >> >>
>>>> >> > > >> >>
>>>> >> > > >> >> --
>>>> >> > > >> >> Best Regards, Edward J. Yoon
>>>> >> > > >> >> @eddieyoon
>>>> >> > > >> >>
>>>> >> > > >> >
>>>> >> > > >> >
>>>> >> > > >> >
>>>> >> > > >> > --
>>>> >> > > >> > Thomas Jungblut
>>>> >> > > >> > Berlin <th...@gmail.com>
>>>> >> > > >> >
>>>> >> > > >>
>>>> >> > > >>
>>>> >> > > >>
>>>> >> > > >> --
>>>> >> > > >> Best Regards, Edward J. Yoon
>>>> >> > > >> @eddieyoon
>>>> >> > > >>
>>>> >> > > >
>>>> >> > > >
>>>> >> > > >
>>>> >> > > > --
>>>> >> > > > Thomas Jungblut
>>>> >> > > > Berlin <th...@gmail.com>
>>>> >> > > >
>>>> >> > >
>>>> >> > >
>>>> >> > >
>>>> >> > > --
>>>> >> > > Best Regards, Edward J. Yoon
>>>> >> > > @eddieyoon
>>>> >> > >
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Thomas Jungblut
>>>> >> Berlin <th...@gmail.com>
>>>> >>
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards, Edward J. Yoon
>>>> @eddieyoon
>>>>
>>>
>>>
>>>
>>> --
>>> Thomas Jungblut
>>> Berlin <th...@gmail.com>
>>>
>>
>>
>>
>> --
>> Thomas Jungblut
>> Berlin <th...@gmail.com>
>>
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Please review new APIs.

Posted by Thomas Jungblut <th...@googlemail.com>.
Ah okay I see why.
But I don't see that this is very good. BTW the classes you've added from
Hadoop are missing the Apache header.

Sorry for spamming.

2011/11/2 Thomas Jungblut <th...@googlemail.com>

> And what is the reason to implement our own Input/output format if you
> stick with key/value pairs.
> Let's be compatible to Hadoop and use theirs.
>
> And we should really stop copying hadoop stuff arround. It is already
> there.
>
>
> 2011/11/2 Thomas Jungblut <th...@googlemail.com>
>
>> Great :)
>>
>> Do you have plans to integrate a partitioning? Currently this is just a
>> block assignment partitioning, hardcoded in the client.
>> This won't be useful for PageRank and SSSP.
>> This would help us in Graph package as well for the next release.
>>
>> 2011/11/2 Edward J. Yoon <ed...@apache.org>
>>
>>> > For sure I agree we should allow the former programming model with no
>>> input> without explicitly instantiating dummy inputs/splits. What about
>>> providing> two basic (different) implementations?
>>>
>>> +1
>>>
>>> I was about to.
>>> On Wed, Nov 2, 2011 at 9:23 PM, Tommaso Teofili
>>> <to...@gmail.com> wrote:
>>> > 2011/11/2 Thomas Jungblut <th...@googlemail.com>
>>> >
>>> >> Another point while fixing the local runner:
>>> >>
>>> >> Are we now input driven?
>>> >> I see in the code that the user defined task number is overriden by
>>> the
>>> >> number of splits.
>>> >> Was this your intention? This will actually make realtime processing
>>> with
>>> >> no static input a real pain.
>>> >> For example if you want a similar behaviour in Hadoop M/R you'll need
>>> to
>>> >> create dummy splits, and this is not what we should aim at.
>>> >>
>>> >> We could simply check if the user define the NullInputFormat or
>>> nothing and
>>> >> then use the number of tasks the user has configured.
>>> >>
>>> >
>>> > For sure I agree we should allow the former programming model with no
>>> input
>>> > without explicitly instantiating dummy inputs/splits. What about
>>> providing
>>> > two basic (different) implementations?
>>> > Tommaso
>>> >
>>> >
>>> >>
>>> >> 2011/11/2 Tommaso Teofili <to...@gmail.com>
>>> >>
>>> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>>> >> >
>>> >> > > > I'm sure that not every job actually needs a cleanup or a setup.
>>> >> > >
>>> >> > > You're right. Almost BSP applications should override bsp() method
>>> >> > > but, setup() and cleaner() methods are not as you said. Let's fix
>>> >> > > them.
>>> >> > >
>>> >> >
>>> >> > Agreed +1
>>> >> >
>>> >> >
>>> >> > >
>>> >> > > > Generally I would suggest to integrate the OutputCollector and
>>> the
>>> >> > > > RecordReader into the BSPPeerImpl.
>>> >> > > > So our peer is like the context in Hadoop.
>>> >> > >
>>> >> > > Good idea.
>>> >> > >
>>> >> >
>>> >> > +1 here too
>>> >> >
>>> >> > Tommaso
>>> >> >
>>> >> >
>>> >> > >
>>> >> > > On Wed, Nov 2, 2011 at 9:03 PM, Thomas Jungblut
>>> >> > > <th...@googlemail.com> wrote:
>>> >> > > > Yes. When I reworked that API, I made a default implementation
>>> in our
>>> >> > > > abstract BSP class.
>>> >> > > > So the user has to override the methods for himself, if he
>>> needs to.
>>> >> > > > I'm sure that not every job actually needs a cleanup or a setup.
>>> >> > > >
>>> >> > > > Generally I would suggest to integrate the OutputCollector and
>>> the
>>> >> > > > RecordReader into the BSPPeerImpl.
>>> >> > > > So our peer is like the context in Hadoop.
>>> >> > > > But that is just a minor thing. It is a great improvement ;)
>>> >> > > >
>>> >> > > > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>>> >> > > >
>>> >> > > >> There're bsp(), setup() and cleaner() methods.
>>> >> > > >>
>>> >> > > >> What is you suggestion?
>>> >> > > >>
>>> >> > > >> On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut
>>> >> > > >> <th...@googlemail.com> wrote:
>>> >> > > >> > Have a look at the combiner class. I know that this is just a
>>> >> > "test",
>>> >> > > but
>>> >> > > >> > it is really messy if the user does not use the methods, but
>>> is
>>> >> > > forced to
>>> >> > > >> > override them.
>>> >> > > >> >
>>> >> > > >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>>> >> > > >> >
>>> >> > > >> >> Why?
>>> >> > > >> >>
>>> >> > > >> >> On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut
>>> >> > > >> >> <th...@googlemail.com> wrote:
>>> >> > > >> >> > I totally dislike that BSP class now has abstract methods
>>> >> instead
>>> >> > > of
>>> >> > > >> >> > default implementations.
>>> >> > > >> >> >
>>> >> > > >> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>>> >> > > >> >> >
>>> >> > > >> >> >> Hi all,
>>> >> > > >> >> >>
>>> >> > > >> >> >> As you know, recently combiners and IO are added.
>>> >> > > >> >> >>
>>> >> > > >> >> >> Please review them from user viewpoint.
>>> >> > > >> >> >>
>>> >> > > >> >> >>
>>> >> > > >> >> >>
>>> >> > > >> >>
>>> >> > > >>
>>> >> > >
>>> >> >
>>> >>
>>> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
>>> >> > > >> >> >>
>>> >> > > >> >> >> I'm testing multiple tasks and IO features on 100 nodes
>>> >> cluster
>>> >> > > using
>>> >> > > >> >> >> 10 tasks per node. If there's no issue, I'll close
>>> HAMA-258.
>>> >> > > >> >> >>
>>> >> > > >> >> >> Thanks.
>>> >> > > >> >> >>
>>> >> > > >> >> >> --
>>> >> > > >> >> >> Best Regards, Edward J. Yoon
>>> >> > > >> >> >> @eddieyoon
>>> >> > > >> >> >>
>>> >> > > >> >> >
>>> >> > > >> >> >
>>> >> > > >> >> >
>>> >> > > >> >> > --
>>> >> > > >> >> > Thomas Jungblut
>>> >> > > >> >> > Berlin <th...@gmail.com>
>>> >> > > >> >> >
>>> >> > > >> >>
>>> >> > > >> >>
>>> >> > > >> >>
>>> >> > > >> >> --
>>> >> > > >> >> Best Regards, Edward J. Yoon
>>> >> > > >> >> @eddieyoon
>>> >> > > >> >>
>>> >> > > >> >
>>> >> > > >> >
>>> >> > > >> >
>>> >> > > >> > --
>>> >> > > >> > Thomas Jungblut
>>> >> > > >> > Berlin <th...@gmail.com>
>>> >> > > >> >
>>> >> > > >>
>>> >> > > >>
>>> >> > > >>
>>> >> > > >> --
>>> >> > > >> Best Regards, Edward J. Yoon
>>> >> > > >> @eddieyoon
>>> >> > > >>
>>> >> > > >
>>> >> > > >
>>> >> > > >
>>> >> > > > --
>>> >> > > > Thomas Jungblut
>>> >> > > > Berlin <th...@gmail.com>
>>> >> > > >
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > --
>>> >> > > Best Regards, Edward J. Yoon
>>> >> > > @eddieyoon
>>> >> > >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Thomas Jungblut
>>> >> Berlin <th...@gmail.com>
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>>
>>
>>
>>
>> --
>> Thomas Jungblut
>> Berlin <th...@gmail.com>
>>
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Please review new APIs.

Posted by Thomas Jungblut <th...@googlemail.com>.
And what is the reason to implement our own Input/output format if you
stick with key/value pairs.
Let's be compatible to Hadoop and use theirs.

And we should really stop copying hadoop stuff arround. It is already there.

2011/11/2 Thomas Jungblut <th...@googlemail.com>

> Great :)
>
> Do you have plans to integrate a partitioning? Currently this is just a
> block assignment partitioning, hardcoded in the client.
> This won't be useful for PageRank and SSSP.
> This would help us in Graph package as well for the next release.
>
> 2011/11/2 Edward J. Yoon <ed...@apache.org>
>
>> > For sure I agree we should allow the former programming model with no
>> input> without explicitly instantiating dummy inputs/splits. What about
>> providing> two basic (different) implementations?
>>
>> +1
>>
>> I was about to.
>> On Wed, Nov 2, 2011 at 9:23 PM, Tommaso Teofili
>> <to...@gmail.com> wrote:
>> > 2011/11/2 Thomas Jungblut <th...@googlemail.com>
>> >
>> >> Another point while fixing the local runner:
>> >>
>> >> Are we now input driven?
>> >> I see in the code that the user defined task number is overriden by the
>> >> number of splits.
>> >> Was this your intention? This will actually make realtime processing
>> with
>> >> no static input a real pain.
>> >> For example if you want a similar behaviour in Hadoop M/R you'll need
>> to
>> >> create dummy splits, and this is not what we should aim at.
>> >>
>> >> We could simply check if the user define the NullInputFormat or
>> nothing and
>> >> then use the number of tasks the user has configured.
>> >>
>> >
>> > For sure I agree we should allow the former programming model with no
>> input
>> > without explicitly instantiating dummy inputs/splits. What about
>> providing
>> > two basic (different) implementations?
>> > Tommaso
>> >
>> >
>> >>
>> >> 2011/11/2 Tommaso Teofili <to...@gmail.com>
>> >>
>> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>> >> >
>> >> > > > I'm sure that not every job actually needs a cleanup or a setup.
>> >> > >
>> >> > > You're right. Almost BSP applications should override bsp() method
>> >> > > but, setup() and cleaner() methods are not as you said. Let's fix
>> >> > > them.
>> >> > >
>> >> >
>> >> > Agreed +1
>> >> >
>> >> >
>> >> > >
>> >> > > > Generally I would suggest to integrate the OutputCollector and
>> the
>> >> > > > RecordReader into the BSPPeerImpl.
>> >> > > > So our peer is like the context in Hadoop.
>> >> > >
>> >> > > Good idea.
>> >> > >
>> >> >
>> >> > +1 here too
>> >> >
>> >> > Tommaso
>> >> >
>> >> >
>> >> > >
>> >> > > On Wed, Nov 2, 2011 at 9:03 PM, Thomas Jungblut
>> >> > > <th...@googlemail.com> wrote:
>> >> > > > Yes. When I reworked that API, I made a default implementation
>> in our
>> >> > > > abstract BSP class.
>> >> > > > So the user has to override the methods for himself, if he needs
>> to.
>> >> > > > I'm sure that not every job actually needs a cleanup or a setup.
>> >> > > >
>> >> > > > Generally I would suggest to integrate the OutputCollector and
>> the
>> >> > > > RecordReader into the BSPPeerImpl.
>> >> > > > So our peer is like the context in Hadoop.
>> >> > > > But that is just a minor thing. It is a great improvement ;)
>> >> > > >
>> >> > > > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>> >> > > >
>> >> > > >> There're bsp(), setup() and cleaner() methods.
>> >> > > >>
>> >> > > >> What is you suggestion?
>> >> > > >>
>> >> > > >> On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut
>> >> > > >> <th...@googlemail.com> wrote:
>> >> > > >> > Have a look at the combiner class. I know that this is just a
>> >> > "test",
>> >> > > but
>> >> > > >> > it is really messy if the user does not use the methods, but
>> is
>> >> > > forced to
>> >> > > >> > override them.
>> >> > > >> >
>> >> > > >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>> >> > > >> >
>> >> > > >> >> Why?
>> >> > > >> >>
>> >> > > >> >> On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut
>> >> > > >> >> <th...@googlemail.com> wrote:
>> >> > > >> >> > I totally dislike that BSP class now has abstract methods
>> >> instead
>> >> > > of
>> >> > > >> >> > default implementations.
>> >> > > >> >> >
>> >> > > >> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>> >> > > >> >> >
>> >> > > >> >> >> Hi all,
>> >> > > >> >> >>
>> >> > > >> >> >> As you know, recently combiners and IO are added.
>> >> > > >> >> >>
>> >> > > >> >> >> Please review them from user viewpoint.
>> >> > > >> >> >>
>> >> > > >> >> >>
>> >> > > >> >> >>
>> >> > > >> >>
>> >> > > >>
>> >> > >
>> >> >
>> >>
>> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
>> >> > > >> >> >>
>> >> > > >> >> >> I'm testing multiple tasks and IO features on 100 nodes
>> >> cluster
>> >> > > using
>> >> > > >> >> >> 10 tasks per node. If there's no issue, I'll close
>> HAMA-258.
>> >> > > >> >> >>
>> >> > > >> >> >> Thanks.
>> >> > > >> >> >>
>> >> > > >> >> >> --
>> >> > > >> >> >> Best Regards, Edward J. Yoon
>> >> > > >> >> >> @eddieyoon
>> >> > > >> >> >>
>> >> > > >> >> >
>> >> > > >> >> >
>> >> > > >> >> >
>> >> > > >> >> > --
>> >> > > >> >> > Thomas Jungblut
>> >> > > >> >> > Berlin <th...@gmail.com>
>> >> > > >> >> >
>> >> > > >> >>
>> >> > > >> >>
>> >> > > >> >>
>> >> > > >> >> --
>> >> > > >> >> Best Regards, Edward J. Yoon
>> >> > > >> >> @eddieyoon
>> >> > > >> >>
>> >> > > >> >
>> >> > > >> >
>> >> > > >> >
>> >> > > >> > --
>> >> > > >> > Thomas Jungblut
>> >> > > >> > Berlin <th...@gmail.com>
>> >> > > >> >
>> >> > > >>
>> >> > > >>
>> >> > > >>
>> >> > > >> --
>> >> > > >> Best Regards, Edward J. Yoon
>> >> > > >> @eddieyoon
>> >> > > >>
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > --
>> >> > > > Thomas Jungblut
>> >> > > > Berlin <th...@gmail.com>
>> >> > > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > Best Regards, Edward J. Yoon
>> >> > > @eddieyoon
>> >> > >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Thomas Jungblut
>> >> Berlin <th...@gmail.com>
>> >>
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Please review new APIs.

Posted by Thomas Jungblut <th...@googlemail.com>.
Great :)

Do you have plans to integrate a partitioning? Currently this is just a
block assignment partitioning, hardcoded in the client.
This won't be useful for PageRank and SSSP.
This would help us in Graph package as well for the next release.

2011/11/2 Edward J. Yoon <ed...@apache.org>

> > For sure I agree we should allow the former programming model with no
> input> without explicitly instantiating dummy inputs/splits. What about
> providing> two basic (different) implementations?
>
> +1
>
> I was about to.
> On Wed, Nov 2, 2011 at 9:23 PM, Tommaso Teofili
> <to...@gmail.com> wrote:
> > 2011/11/2 Thomas Jungblut <th...@googlemail.com>
> >
> >> Another point while fixing the local runner:
> >>
> >> Are we now input driven?
> >> I see in the code that the user defined task number is overriden by the
> >> number of splits.
> >> Was this your intention? This will actually make realtime processing
> with
> >> no static input a real pain.
> >> For example if you want a similar behaviour in Hadoop M/R you'll need to
> >> create dummy splits, and this is not what we should aim at.
> >>
> >> We could simply check if the user define the NullInputFormat or nothing
> and
> >> then use the number of tasks the user has configured.
> >>
> >
> > For sure I agree we should allow the former programming model with no
> input
> > without explicitly instantiating dummy inputs/splits. What about
> providing
> > two basic (different) implementations?
> > Tommaso
> >
> >
> >>
> >> 2011/11/2 Tommaso Teofili <to...@gmail.com>
> >>
> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> >> >
> >> > > > I'm sure that not every job actually needs a cleanup or a setup.
> >> > >
> >> > > You're right. Almost BSP applications should override bsp() method
> >> > > but, setup() and cleaner() methods are not as you said. Let's fix
> >> > > them.
> >> > >
> >> >
> >> > Agreed +1
> >> >
> >> >
> >> > >
> >> > > > Generally I would suggest to integrate the OutputCollector and the
> >> > > > RecordReader into the BSPPeerImpl.
> >> > > > So our peer is like the context in Hadoop.
> >> > >
> >> > > Good idea.
> >> > >
> >> >
> >> > +1 here too
> >> >
> >> > Tommaso
> >> >
> >> >
> >> > >
> >> > > On Wed, Nov 2, 2011 at 9:03 PM, Thomas Jungblut
> >> > > <th...@googlemail.com> wrote:
> >> > > > Yes. When I reworked that API, I made a default implementation in
> our
> >> > > > abstract BSP class.
> >> > > > So the user has to override the methods for himself, if he needs
> to.
> >> > > > I'm sure that not every job actually needs a cleanup or a setup.
> >> > > >
> >> > > > Generally I would suggest to integrate the OutputCollector and the
> >> > > > RecordReader into the BSPPeerImpl.
> >> > > > So our peer is like the context in Hadoop.
> >> > > > But that is just a minor thing. It is a great improvement ;)
> >> > > >
> >> > > > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> >> > > >
> >> > > >> There're bsp(), setup() and cleaner() methods.
> >> > > >>
> >> > > >> What is you suggestion?
> >> > > >>
> >> > > >> On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut
> >> > > >> <th...@googlemail.com> wrote:
> >> > > >> > Have a look at the combiner class. I know that this is just a
> >> > "test",
> >> > > but
> >> > > >> > it is really messy if the user does not use the methods, but is
> >> > > forced to
> >> > > >> > override them.
> >> > > >> >
> >> > > >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> >> > > >> >
> >> > > >> >> Why?
> >> > > >> >>
> >> > > >> >> On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut
> >> > > >> >> <th...@googlemail.com> wrote:
> >> > > >> >> > I totally dislike that BSP class now has abstract methods
> >> instead
> >> > > of
> >> > > >> >> > default implementations.
> >> > > >> >> >
> >> > > >> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> >> > > >> >> >
> >> > > >> >> >> Hi all,
> >> > > >> >> >>
> >> > > >> >> >> As you know, recently combiners and IO are added.
> >> > > >> >> >>
> >> > > >> >> >> Please review them from user viewpoint.
> >> > > >> >> >>
> >> > > >> >> >>
> >> > > >> >> >>
> >> > > >> >>
> >> > > >>
> >> > >
> >> >
> >>
> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
> >> > > >> >> >>
> >> > > >> >> >> I'm testing multiple tasks and IO features on 100 nodes
> >> cluster
> >> > > using
> >> > > >> >> >> 10 tasks per node. If there's no issue, I'll close
> HAMA-258.
> >> > > >> >> >>
> >> > > >> >> >> Thanks.
> >> > > >> >> >>
> >> > > >> >> >> --
> >> > > >> >> >> Best Regards, Edward J. Yoon
> >> > > >> >> >> @eddieyoon
> >> > > >> >> >>
> >> > > >> >> >
> >> > > >> >> >
> >> > > >> >> >
> >> > > >> >> > --
> >> > > >> >> > Thomas Jungblut
> >> > > >> >> > Berlin <th...@gmail.com>
> >> > > >> >> >
> >> > > >> >>
> >> > > >> >>
> >> > > >> >>
> >> > > >> >> --
> >> > > >> >> Best Regards, Edward J. Yoon
> >> > > >> >> @eddieyoon
> >> > > >> >>
> >> > > >> >
> >> > > >> >
> >> > > >> >
> >> > > >> > --
> >> > > >> > Thomas Jungblut
> >> > > >> > Berlin <th...@gmail.com>
> >> > > >> >
> >> > > >>
> >> > > >>
> >> > > >>
> >> > > >> --
> >> > > >> Best Regards, Edward J. Yoon
> >> > > >> @eddieyoon
> >> > > >>
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Thomas Jungblut
> >> > > > Berlin <th...@gmail.com>
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Best Regards, Edward J. Yoon
> >> > > @eddieyoon
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Thomas Jungblut
> >> Berlin <th...@gmail.com>
> >>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Please review new APIs.

Posted by "Edward J. Yoon" <ed...@apache.org>.
> For sure I agree we should allow the former programming model with no input> without explicitly instantiating dummy inputs/splits. What about providing> two basic (different) implementations?

+1

I was about to.
On Wed, Nov 2, 2011 at 9:23 PM, Tommaso Teofili
<to...@gmail.com> wrote:
> 2011/11/2 Thomas Jungblut <th...@googlemail.com>
>
>> Another point while fixing the local runner:
>>
>> Are we now input driven?
>> I see in the code that the user defined task number is overriden by the
>> number of splits.
>> Was this your intention? This will actually make realtime processing with
>> no static input a real pain.
>> For example if you want a similar behaviour in Hadoop M/R you'll need to
>> create dummy splits, and this is not what we should aim at.
>>
>> We could simply check if the user define the NullInputFormat or nothing and
>> then use the number of tasks the user has configured.
>>
>
> For sure I agree we should allow the former programming model with no input
> without explicitly instantiating dummy inputs/splits. What about providing
> two basic (different) implementations?
> Tommaso
>
>
>>
>> 2011/11/2 Tommaso Teofili <to...@gmail.com>
>>
>> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>> >
>> > > > I'm sure that not every job actually needs a cleanup or a setup.
>> > >
>> > > You're right. Almost BSP applications should override bsp() method
>> > > but, setup() and cleaner() methods are not as you said. Let's fix
>> > > them.
>> > >
>> >
>> > Agreed +1
>> >
>> >
>> > >
>> > > > Generally I would suggest to integrate the OutputCollector and the
>> > > > RecordReader into the BSPPeerImpl.
>> > > > So our peer is like the context in Hadoop.
>> > >
>> > > Good idea.
>> > >
>> >
>> > +1 here too
>> >
>> > Tommaso
>> >
>> >
>> > >
>> > > On Wed, Nov 2, 2011 at 9:03 PM, Thomas Jungblut
>> > > <th...@googlemail.com> wrote:
>> > > > Yes. When I reworked that API, I made a default implementation in our
>> > > > abstract BSP class.
>> > > > So the user has to override the methods for himself, if he needs to.
>> > > > I'm sure that not every job actually needs a cleanup or a setup.
>> > > >
>> > > > Generally I would suggest to integrate the OutputCollector and the
>> > > > RecordReader into the BSPPeerImpl.
>> > > > So our peer is like the context in Hadoop.
>> > > > But that is just a minor thing. It is a great improvement ;)
>> > > >
>> > > > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>> > > >
>> > > >> There're bsp(), setup() and cleaner() methods.
>> > > >>
>> > > >> What is you suggestion?
>> > > >>
>> > > >> On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut
>> > > >> <th...@googlemail.com> wrote:
>> > > >> > Have a look at the combiner class. I know that this is just a
>> > "test",
>> > > but
>> > > >> > it is really messy if the user does not use the methods, but is
>> > > forced to
>> > > >> > override them.
>> > > >> >
>> > > >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>> > > >> >
>> > > >> >> Why?
>> > > >> >>
>> > > >> >> On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut
>> > > >> >> <th...@googlemail.com> wrote:
>> > > >> >> > I totally dislike that BSP class now has abstract methods
>> instead
>> > > of
>> > > >> >> > default implementations.
>> > > >> >> >
>> > > >> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>> > > >> >> >
>> > > >> >> >> Hi all,
>> > > >> >> >>
>> > > >> >> >> As you know, recently combiners and IO are added.
>> > > >> >> >>
>> > > >> >> >> Please review them from user viewpoint.
>> > > >> >> >>
>> > > >> >> >>
>> > > >> >> >>
>> > > >> >>
>> > > >>
>> > >
>> >
>> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
>> > > >> >> >>
>> > > >> >> >> I'm testing multiple tasks and IO features on 100 nodes
>> cluster
>> > > using
>> > > >> >> >> 10 tasks per node. If there's no issue, I'll close HAMA-258.
>> > > >> >> >>
>> > > >> >> >> Thanks.
>> > > >> >> >>
>> > > >> >> >> --
>> > > >> >> >> Best Regards, Edward J. Yoon
>> > > >> >> >> @eddieyoon
>> > > >> >> >>
>> > > >> >> >
>> > > >> >> >
>> > > >> >> >
>> > > >> >> > --
>> > > >> >> > Thomas Jungblut
>> > > >> >> > Berlin <th...@gmail.com>
>> > > >> >> >
>> > > >> >>
>> > > >> >>
>> > > >> >>
>> > > >> >> --
>> > > >> >> Best Regards, Edward J. Yoon
>> > > >> >> @eddieyoon
>> > > >> >>
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> > --
>> > > >> > Thomas Jungblut
>> > > >> > Berlin <th...@gmail.com>
>> > > >> >
>> > > >>
>> > > >>
>> > > >>
>> > > >> --
>> > > >> Best Regards, Edward J. Yoon
>> > > >> @eddieyoon
>> > > >>
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Thomas Jungblut
>> > > > Berlin <th...@gmail.com>
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Best Regards, Edward J. Yoon
>> > > @eddieyoon
>> > >
>> >
>>
>>
>>
>> --
>> Thomas Jungblut
>> Berlin <th...@gmail.com>
>>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Please review new APIs.

Posted by Tommaso Teofili <to...@gmail.com>.
2011/11/2 Thomas Jungblut <th...@googlemail.com>

> Another point while fixing the local runner:
>
> Are we now input driven?
> I see in the code that the user defined task number is overriden by the
> number of splits.
> Was this your intention? This will actually make realtime processing with
> no static input a real pain.
> For example if you want a similar behaviour in Hadoop M/R you'll need to
> create dummy splits, and this is not what we should aim at.
>
> We could simply check if the user define the NullInputFormat or nothing and
> then use the number of tasks the user has configured.
>

For sure I agree we should allow the former programming model with no input
without explicitly instantiating dummy inputs/splits. What about providing
two basic (different) implementations?
Tommaso


>
> 2011/11/2 Tommaso Teofili <to...@gmail.com>
>
> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> >
> > > > I'm sure that not every job actually needs a cleanup or a setup.
> > >
> > > You're right. Almost BSP applications should override bsp() method
> > > but, setup() and cleaner() methods are not as you said. Let's fix
> > > them.
> > >
> >
> > Agreed +1
> >
> >
> > >
> > > > Generally I would suggest to integrate the OutputCollector and the
> > > > RecordReader into the BSPPeerImpl.
> > > > So our peer is like the context in Hadoop.
> > >
> > > Good idea.
> > >
> >
> > +1 here too
> >
> > Tommaso
> >
> >
> > >
> > > On Wed, Nov 2, 2011 at 9:03 PM, Thomas Jungblut
> > > <th...@googlemail.com> wrote:
> > > > Yes. When I reworked that API, I made a default implementation in our
> > > > abstract BSP class.
> > > > So the user has to override the methods for himself, if he needs to.
> > > > I'm sure that not every job actually needs a cleanup or a setup.
> > > >
> > > > Generally I would suggest to integrate the OutputCollector and the
> > > > RecordReader into the BSPPeerImpl.
> > > > So our peer is like the context in Hadoop.
> > > > But that is just a minor thing. It is a great improvement ;)
> > > >
> > > > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> > > >
> > > >> There're bsp(), setup() and cleaner() methods.
> > > >>
> > > >> What is you suggestion?
> > > >>
> > > >> On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut
> > > >> <th...@googlemail.com> wrote:
> > > >> > Have a look at the combiner class. I know that this is just a
> > "test",
> > > but
> > > >> > it is really messy if the user does not use the methods, but is
> > > forced to
> > > >> > override them.
> > > >> >
> > > >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> > > >> >
> > > >> >> Why?
> > > >> >>
> > > >> >> On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut
> > > >> >> <th...@googlemail.com> wrote:
> > > >> >> > I totally dislike that BSP class now has abstract methods
> instead
> > > of
> > > >> >> > default implementations.
> > > >> >> >
> > > >> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> > > >> >> >
> > > >> >> >> Hi all,
> > > >> >> >>
> > > >> >> >> As you know, recently combiners and IO are added.
> > > >> >> >>
> > > >> >> >> Please review them from user viewpoint.
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>
> > > >> >>
> > > >>
> > >
> >
> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
> > > >> >> >>
> > > >> >> >> I'm testing multiple tasks and IO features on 100 nodes
> cluster
> > > using
> > > >> >> >> 10 tasks per node. If there's no issue, I'll close HAMA-258.
> > > >> >> >>
> > > >> >> >> Thanks.
> > > >> >> >>
> > > >> >> >> --
> > > >> >> >> Best Regards, Edward J. Yoon
> > > >> >> >> @eddieyoon
> > > >> >> >>
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > --
> > > >> >> > Thomas Jungblut
> > > >> >> > Berlin <th...@gmail.com>
> > > >> >> >
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >> Best Regards, Edward J. Yoon
> > > >> >> @eddieyoon
> > > >> >>
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Thomas Jungblut
> > > >> > Berlin <th...@gmail.com>
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Best Regards, Edward J. Yoon
> > > >> @eddieyoon
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Thomas Jungblut
> > > > Berlin <th...@gmail.com>
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Edward J. Yoon
> > > @eddieyoon
> > >
> >
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>

Re: Please review new APIs.

Posted by Thomas Jungblut <th...@googlemail.com>.
Another point while fixing the local runner:

Are we now input driven?
I see in the code that the user defined task number is overriden by the
number of splits.
Was this your intention? This will actually make realtime processing with
no static input a real pain.
For example if you want a similar behaviour in Hadoop M/R you'll need to
create dummy splits, and this is not what we should aim at.

We could simply check if the user define the NullInputFormat or nothing and
then use the number of tasks the user has configured.

2011/11/2 Tommaso Teofili <to...@gmail.com>

> 2011/11/2 Edward J. Yoon <ed...@apache.org>
>
> > > I'm sure that not every job actually needs a cleanup or a setup.
> >
> > You're right. Almost BSP applications should override bsp() method
> > but, setup() and cleaner() methods are not as you said. Let's fix
> > them.
> >
>
> Agreed +1
>
>
> >
> > > Generally I would suggest to integrate the OutputCollector and the
> > > RecordReader into the BSPPeerImpl.
> > > So our peer is like the context in Hadoop.
> >
> > Good idea.
> >
>
> +1 here too
>
> Tommaso
>
>
> >
> > On Wed, Nov 2, 2011 at 9:03 PM, Thomas Jungblut
> > <th...@googlemail.com> wrote:
> > > Yes. When I reworked that API, I made a default implementation in our
> > > abstract BSP class.
> > > So the user has to override the methods for himself, if he needs to.
> > > I'm sure that not every job actually needs a cleanup or a setup.
> > >
> > > Generally I would suggest to integrate the OutputCollector and the
> > > RecordReader into the BSPPeerImpl.
> > > So our peer is like the context in Hadoop.
> > > But that is just a minor thing. It is a great improvement ;)
> > >
> > > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> > >
> > >> There're bsp(), setup() and cleaner() methods.
> > >>
> > >> What is you suggestion?
> > >>
> > >> On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut
> > >> <th...@googlemail.com> wrote:
> > >> > Have a look at the combiner class. I know that this is just a
> "test",
> > but
> > >> > it is really messy if the user does not use the methods, but is
> > forced to
> > >> > override them.
> > >> >
> > >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> > >> >
> > >> >> Why?
> > >> >>
> > >> >> On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut
> > >> >> <th...@googlemail.com> wrote:
> > >> >> > I totally dislike that BSP class now has abstract methods instead
> > of
> > >> >> > default implementations.
> > >> >> >
> > >> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> > >> >> >
> > >> >> >> Hi all,
> > >> >> >>
> > >> >> >> As you know, recently combiners and IO are added.
> > >> >> >>
> > >> >> >> Please review them from user viewpoint.
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >>
> > >>
> >
> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
> > >> >> >>
> > >> >> >> I'm testing multiple tasks and IO features on 100 nodes cluster
> > using
> > >> >> >> 10 tasks per node. If there's no issue, I'll close HAMA-258.
> > >> >> >>
> > >> >> >> Thanks.
> > >> >> >>
> > >> >> >> --
> > >> >> >> Best Regards, Edward J. Yoon
> > >> >> >> @eddieyoon
> > >> >> >>
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > --
> > >> >> > Thomas Jungblut
> > >> >> > Berlin <th...@gmail.com>
> > >> >> >
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Best Regards, Edward J. Yoon
> > >> >> @eddieyoon
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Thomas Jungblut
> > >> > Berlin <th...@gmail.com>
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Best Regards, Edward J. Yoon
> > >> @eddieyoon
> > >>
> > >
> > >
> > >
> > > --
> > > Thomas Jungblut
> > > Berlin <th...@gmail.com>
> > >
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Please review new APIs.

Posted by Tommaso Teofili <to...@gmail.com>.
2011/11/2 Edward J. Yoon <ed...@apache.org>

> > I'm sure that not every job actually needs a cleanup or a setup.
>
> You're right. Almost BSP applications should override bsp() method
> but, setup() and cleaner() methods are not as you said. Let's fix
> them.
>

Agreed +1


>
> > Generally I would suggest to integrate the OutputCollector and the
> > RecordReader into the BSPPeerImpl.
> > So our peer is like the context in Hadoop.
>
> Good idea.
>

+1 here too

Tommaso


>
> On Wed, Nov 2, 2011 at 9:03 PM, Thomas Jungblut
> <th...@googlemail.com> wrote:
> > Yes. When I reworked that API, I made a default implementation in our
> > abstract BSP class.
> > So the user has to override the methods for himself, if he needs to.
> > I'm sure that not every job actually needs a cleanup or a setup.
> >
> > Generally I would suggest to integrate the OutputCollector and the
> > RecordReader into the BSPPeerImpl.
> > So our peer is like the context in Hadoop.
> > But that is just a minor thing. It is a great improvement ;)
> >
> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> >
> >> There're bsp(), setup() and cleaner() methods.
> >>
> >> What is you suggestion?
> >>
> >> On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut
> >> <th...@googlemail.com> wrote:
> >> > Have a look at the combiner class. I know that this is just a "test",
> but
> >> > it is really messy if the user does not use the methods, but is
> forced to
> >> > override them.
> >> >
> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> >> >
> >> >> Why?
> >> >>
> >> >> On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut
> >> >> <th...@googlemail.com> wrote:
> >> >> > I totally dislike that BSP class now has abstract methods instead
> of
> >> >> > default implementations.
> >> >> >
> >> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> >> >> >
> >> >> >> Hi all,
> >> >> >>
> >> >> >> As you know, recently combiners and IO are added.
> >> >> >>
> >> >> >> Please review them from user viewpoint.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >>
> >>
> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
> >> >> >>
> >> >> >> I'm testing multiple tasks and IO features on 100 nodes cluster
> using
> >> >> >> 10 tasks per node. If there's no issue, I'll close HAMA-258.
> >> >> >>
> >> >> >> Thanks.
> >> >> >>
> >> >> >> --
> >> >> >> Best Regards, Edward J. Yoon
> >> >> >> @eddieyoon
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Thomas Jungblut
> >> >> > Berlin <th...@gmail.com>
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Best Regards, Edward J. Yoon
> >> >> @eddieyoon
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Thomas Jungblut
> >> > Berlin <th...@gmail.com>
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <th...@gmail.com>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Please review new APIs.

Posted by "Edward J. Yoon" <ed...@apache.org>.
> I'm sure that not every job actually needs a cleanup or a setup.

You're right. Almost BSP applications should override bsp() method
but, setup() and cleaner() methods are not as you said. Let's fix
them.

> Generally I would suggest to integrate the OutputCollector and the
> RecordReader into the BSPPeerImpl.
> So our peer is like the context in Hadoop.

Good idea.

On Wed, Nov 2, 2011 at 9:03 PM, Thomas Jungblut
<th...@googlemail.com> wrote:
> Yes. When I reworked that API, I made a default implementation in our
> abstract BSP class.
> So the user has to override the methods for himself, if he needs to.
> I'm sure that not every job actually needs a cleanup or a setup.
>
> Generally I would suggest to integrate the OutputCollector and the
> RecordReader into the BSPPeerImpl.
> So our peer is like the context in Hadoop.
> But that is just a minor thing. It is a great improvement ;)
>
> 2011/11/2 Edward J. Yoon <ed...@apache.org>
>
>> There're bsp(), setup() and cleaner() methods.
>>
>> What is you suggestion?
>>
>> On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut
>> <th...@googlemail.com> wrote:
>> > Have a look at the combiner class. I know that this is just a "test", but
>> > it is really messy if the user does not use the methods, but is forced to
>> > override them.
>> >
>> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>> >
>> >> Why?
>> >>
>> >> On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut
>> >> <th...@googlemail.com> wrote:
>> >> > I totally dislike that BSP class now has abstract methods instead of
>> >> > default implementations.
>> >> >
>> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>> >> >
>> >> >> Hi all,
>> >> >>
>> >> >> As you know, recently combiners and IO are added.
>> >> >>
>> >> >> Please review them from user viewpoint.
>> >> >>
>> >> >>
>> >> >>
>> >>
>> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
>> >> >>
>> >> >> I'm testing multiple tasks and IO features on 100 nodes cluster using
>> >> >> 10 tasks per node. If there's no issue, I'll close HAMA-258.
>> >> >>
>> >> >> Thanks.
>> >> >>
>> >> >> --
>> >> >> Best Regards, Edward J. Yoon
>> >> >> @eddieyoon
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Thomas Jungblut
>> >> > Berlin <th...@gmail.com>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >>
>> >
>> >
>> >
>> > --
>> > Thomas Jungblut
>> > Berlin <th...@gmail.com>
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Please review new APIs.

Posted by Thomas Jungblut <th...@googlemail.com>.
Yes. When I reworked that API, I made a default implementation in our
abstract BSP class.
So the user has to override the methods for himself, if he needs to.
I'm sure that not every job actually needs a cleanup or a setup.

Generally I would suggest to integrate the OutputCollector and the
RecordReader into the BSPPeerImpl.
So our peer is like the context in Hadoop.
But that is just a minor thing. It is a great improvement ;)

2011/11/2 Edward J. Yoon <ed...@apache.org>

> There're bsp(), setup() and cleaner() methods.
>
> What is you suggestion?
>
> On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut
> <th...@googlemail.com> wrote:
> > Have a look at the combiner class. I know that this is just a "test", but
> > it is really messy if the user does not use the methods, but is forced to
> > override them.
> >
> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> >
> >> Why?
> >>
> >> On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut
> >> <th...@googlemail.com> wrote:
> >> > I totally dislike that BSP class now has abstract methods instead of
> >> > default implementations.
> >> >
> >> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> >> >
> >> >> Hi all,
> >> >>
> >> >> As you know, recently combiners and IO are added.
> >> >>
> >> >> Please review them from user viewpoint.
> >> >>
> >> >>
> >> >>
> >>
> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
> >> >>
> >> >> I'm testing multiple tasks and IO features on 100 nodes cluster using
> >> >> 10 tasks per node. If there's no issue, I'll close HAMA-258.
> >> >>
> >> >> Thanks.
> >> >>
> >> >> --
> >> >> Best Regards, Edward J. Yoon
> >> >> @eddieyoon
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Thomas Jungblut
> >> > Berlin <th...@gmail.com>
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <th...@gmail.com>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Please review new APIs.

Posted by "Edward J. Yoon" <ed...@apache.org>.
There're bsp(), setup() and cleaner() methods.

What is you suggestion?

On Wed, Nov 2, 2011 at 8:47 PM, Thomas Jungblut
<th...@googlemail.com> wrote:
> Have a look at the combiner class. I know that this is just a "test", but
> it is really messy if the user does not use the methods, but is forced to
> override them.
>
> 2011/11/2 Edward J. Yoon <ed...@apache.org>
>
>> Why?
>>
>> On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut
>> <th...@googlemail.com> wrote:
>> > I totally dislike that BSP class now has abstract methods instead of
>> > default implementations.
>> >
>> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
>> >
>> >> Hi all,
>> >>
>> >> As you know, recently combiners and IO are added.
>> >>
>> >> Please review them from user viewpoint.
>> >>
>> >>
>> >>
>> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
>> >>
>> >> I'm testing multiple tasks and IO features on 100 nodes cluster using
>> >> 10 tasks per node. If there's no issue, I'll close HAMA-258.
>> >>
>> >> Thanks.
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >>
>> >
>> >
>> >
>> > --
>> > Thomas Jungblut
>> > Berlin <th...@gmail.com>
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Please review new APIs.

Posted by Thomas Jungblut <th...@googlemail.com>.
Have a look at the combiner class. I know that this is just a "test", but
it is really messy if the user does not use the methods, but is forced to
override them.

2011/11/2 Edward J. Yoon <ed...@apache.org>

> Why?
>
> On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut
> <th...@googlemail.com> wrote:
> > I totally dislike that BSP class now has abstract methods instead of
> > default implementations.
> >
> > 2011/11/2 Edward J. Yoon <ed...@apache.org>
> >
> >> Hi all,
> >>
> >> As you know, recently combiners and IO are added.
> >>
> >> Please review them from user viewpoint.
> >>
> >>
> >>
> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
> >>
> >> I'm testing multiple tasks and IO features on 100 nodes cluster using
> >> 10 tasks per node. If there's no issue, I'll close HAMA-258.
> >>
> >> Thanks.
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <th...@gmail.com>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Please review new APIs.

Posted by "Edward J. Yoon" <ed...@apache.org>.
Why?

On Wed, Nov 2, 2011 at 8:21 PM, Thomas Jungblut
<th...@googlemail.com> wrote:
> I totally dislike that BSP class now has abstract methods instead of
> default implementations.
>
> 2011/11/2 Edward J. Yoon <ed...@apache.org>
>
>> Hi all,
>>
>> As you know, recently combiners and IO are added.
>>
>> Please review them from user viewpoint.
>>
>>
>> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
>>
>> I'm testing multiple tasks and IO features on 100 nodes cluster using
>> 10 tasks per node. If there's no issue, I'll close HAMA-258.
>>
>> Thanks.
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Please review new APIs.

Posted by Thomas Jungblut <th...@googlemail.com>.
I totally dislike that BSP class now has abstract methods instead of
default implementations.

2011/11/2 Edward J. Yoon <ed...@apache.org>

> Hi all,
>
> As you know, recently combiners and IO are added.
>
> Please review them from user viewpoint.
>
>
> http://svn.apache.org/repos/asf/incubator/hama/trunk/examples/src/main/java/org/apache/hama/examples/PiEstimator.java
>
> I'm testing multiple tasks and IO features on 100 nodes cluster using
> 10 tasks per node. If there's no issue, I'll close HAMA-258.
>
> Thanks.
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>