You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@airavata.apache.org by Danushka Menikkumbura <da...@gmail.com> on 2013/06/18 21:27:20 UTC

XBaya/Hadoop Integration - Concern

Hi All,

The current UI implementation does not take application/host description
into account simply because they have little or no meaning in the Hadoop
world as I believe. The current implementation enables configuring each
individual job using the UI (Please see the attached xbaya-hadoop.png).

The upside of this approach is that new jobs could be added/configured
dynamically, without adding application descriptions/generating
code/compiling/re-deploying/etc. The downside is that it is different from
general GFac application invocation, where each application has an
associated application/host/etc. Nevertheless we are trying to incorporate
something that does not quite fit into application/host domain.

Thoughts appreciated.

Thanks,
Danushka

Re: XBaya/Hadoop Integration - Concern

Posted by Chathura Herath <ch...@cs.indiana.edu>.

Sounds good, and pick the name you like for the config file since its your idea.

Its great if we could have an abstraction that captures all the host
descriptions as Amila pointed out. But i not sure how much work that
is going to be.

On Sat, Jun 22, 2013 at 1:13 PM, Danushka Menikkumbura
<da...@gmail.com> wrote:
> Chathura,
>
> It is good to see that the snowball is gathering mass and I am totally +1
> for the point that you are trying to make.
>
> Lemme extend it little further.
>
> As far as I know, GFac was conceived with the intention of facilitating
> provisions for associating legacy (hosted) applications with workflows.
> Naturally these applications would have a host of some form, hence a host
> description and a deployed application, hence an application deployment
> description. Because of that it is taken for granted that a GFac invocation
> would have a host description and an application description associated
> with it.
>
> But it turns out to be little different when it comes to Hadoop, specially
> in terms of host description. We cannot actually treat Hadoop run
> environment (Single node, Local cluster, EMR, etc) as a host, but rather a
> runtime that could be described in a different file. (I would say
> TaskRuntimeDescritption.xml or even HadoopAccountDescription.xml as you
> have said).
>
> My initial approach was to completely ignore the notion of "host" when it
> comes to Hadoop but then decided to think of Hadoop runtime as a variation
> of host (see my previous reply on this thread). Therefore, I am +1 for
> having a separate description file for runtimes.
>
> Thanks,
> Danushka
>
> On Sat, Jun 22, 2013 at 10:46 AM, Chathura Herath <ch...@cs.indiana.edu>wrote:
>
>> Lets think about this a bit more.
>> Hadoop is a data driven model, which is very different from the MPI
>> model that we deal  in scientific computing. When you launch an MPI
>> job the number of nodes are decided by you(meaning Suresh or Sudhakar
>> who configure the app ), who knows how many nodes are available and
>> required for the job. Now if you think about Hadoop,  the number of
>> nodes/partitions are decides by the framework dynamically(may be using
>> the Formatters). I believe this is the reason why this idea of dynamic
>> node selection was left untouched for most art.
>>
>> I agree that we need a configuration file to store all the
>> configuration that are static for most part but required for hadoop
>> job launch. Current encapsulation of hadoop configuration is not the
>> best way and certainly be improved.
>>
>> Point i am trying to make is, may be what we need is a
>> HadoopAccountDescription.xml file instead of trying to push this to
>> the existing hostdescription. Their semantics are different as well as
>> the paremerters and model. The host description schema was defined
>> with super computing applications in mind may be this schema was
>> revisited since i last seen it and rethought. I wouldn't worry about
>> this if you are working on a conference paper due in three weeks. But
>> definitely something to think about.
>>
>>
>>
>> On Fri, Jun 21, 2013 at 11:40 PM, Lahiru Gunathilake <gl...@gmail.com>
>> wrote:
>> > Hi Danushka,
>> >
>> >
>> > I am +1 for this approach, but I am sure you need to patch gfac-core
>> > without breaking default gfac functionality.
>> >
>> > Lahiru
>> >
>> >
>> > On Fri, Jun 21, 2013 at 7:44 PM, Danushka Menikkumbura <
>> > danushka.menikkumbura@gmail.com> wrote:
>> >
>> >> Hadoop deployment model (single node, local cluster, EMR, etc) is not
>> >> exactly a host, as in Airavata, but is along the lines of host IMO.
>> >> Therefore we can still stick to a similar model but need to have a
>> >> different UI interface to configure them. Still Hadoop jobs would be
>> >> treated differently and have them configured in workflow itself (i.e.
>> the
>> >> current implementation), as opposed to having them predefined as in GFac
>> >> applications.
>> >>
>> >> Please kindly let me know if you think otherwise.
>> >>
>> >> Cheers,
>> >> Danushka
>> >>
>> >>
>> >> On Wed, Jun 19, 2013 at 12:57 AM, Danushka Menikkumbura <
>> >> danushka.menikkumbura@gmail.com> wrote:
>> >>
>> >>> Hi All,
>> >>>
>> >>> The current UI implementation does not take application/host
>> description
>> >>> into account simply because they have little or no meaning in the
>> Hadoop
>> >>> world as I believe. The current implementation enables configuring each
>> >>> individual job using the UI (Please see the attached xbaya-hadoop.png).
>> >>>
>> >>> The upside of this approach is that new jobs could be added/configured
>> >>> dynamically, without adding application descriptions/generating
>> >>> code/compiling/re-deploying/etc. The downside is that it is different
>> from
>> >>> general GFac application invocation, where each application has an
>> >>> associated application/host/etc. Nevertheless we are trying to
>> incorporate
>> >>> something that does not quite fit into application/host domain.
>> >>>
>> >>> Thoughts appreciated.
>> >>>
>> >>> Thanks,
>> >>> Danushka
>> >>>
>> >>
>> >>
>> >
>> >
>> > --
>> > System Analyst Programmer
>> > PTI Lab
>> > Indiana University
>>
>>
>>
>> --
>> Chathura Herath Ph.D
>> http://people.apache.org/~chathura/
>> http://chathurah.blogspot.com/
>>



-- 
Chathura Herath Ph.D
http://people.apache.org/~chathura/
http://chathurah.blogspot.com/

Re: XBaya/Hadoop Integration - Concern

Posted by Danushka Menikkumbura <da...@gmail.com>.

Chathura,

It is good to see that the snowball is gathering mass and I am totally +1
for the point that you are trying to make.

Lemme extend it little further.

As far as I know, GFac was conceived with the intention of facilitating
provisions for associating legacy (hosted) applications with workflows.
Naturally these applications would have a host of some form, hence a host
description and a deployed application, hence an application deployment
description. Because of that it is taken for granted that a GFac invocation
would have a host description and an application description associated
with it.

But it turns out to be little different when it comes to Hadoop, specially
in terms of host description. We cannot actually treat Hadoop run
environment (Single node, Local cluster, EMR, etc) as a host, but rather a
runtime that could be described in a different file. (I would say
TaskRuntimeDescritption.xml or even HadoopAccountDescription.xml as you
have said).

My initial approach was to completely ignore the notion of "host" when it
comes to Hadoop but then decided to think of Hadoop runtime as a variation
of host (see my previous reply on this thread). Therefore, I am +1 for
having a separate description file for runtimes.

Thanks,
Danushka

On Sat, Jun 22, 2013 at 10:46 AM, Chathura Herath <ch...@cs.indiana.edu>wrote:

> Lets think about this a bit more.
> Hadoop is a data driven model, which is very different from the MPI
> model that we deal  in scientific computing. When you launch an MPI
> job the number of nodes are decided by you(meaning Suresh or Sudhakar
> who configure the app ), who knows how many nodes are available and
> required for the job. Now if you think about Hadoop,  the number of
> nodes/partitions are decides by the framework dynamically(may be using
> the Formatters). I believe this is the reason why this idea of dynamic
> node selection was left untouched for most art.
>
> I agree that we need a configuration file to store all the
> configuration that are static for most part but required for hadoop
> job launch. Current encapsulation of hadoop configuration is not the
> best way and certainly be improved.
>
> Point i am trying to make is, may be what we need is a
> HadoopAccountDescription.xml file instead of trying to push this to
> the existing hostdescription. Their semantics are different as well as
> the paremerters and model. The host description schema was defined
> with super computing applications in mind may be this schema was
> revisited since i last seen it and rethought. I wouldn't worry about
> this if you are working on a conference paper due in three weeks. But
> definitely something to think about.
>
>
>
> On Fri, Jun 21, 2013 at 11:40 PM, Lahiru Gunathilake <gl...@gmail.com>
> wrote:
> > Hi Danushka,
> >
> >
> > I am +1 for this approach, but I am sure you need to patch gfac-core
> > without breaking default gfac functionality.
> >
> > Lahiru
> >
> >
> > On Fri, Jun 21, 2013 at 7:44 PM, Danushka Menikkumbura <
> > danushka.menikkumbura@gmail.com> wrote:
> >
> >> Hadoop deployment model (single node, local cluster, EMR, etc) is not
> >> exactly a host, as in Airavata, but is along the lines of host IMO.
> >> Therefore we can still stick to a similar model but need to have a
> >> different UI interface to configure them. Still Hadoop jobs would be
> >> treated differently and have them configured in workflow itself (i.e.
> the
> >> current implementation), as opposed to having them predefined as in GFac
> >> applications.
> >>
> >> Please kindly let me know if you think otherwise.
> >>
> >> Cheers,
> >> Danushka
> >>
> >>
> >> On Wed, Jun 19, 2013 at 12:57 AM, Danushka Menikkumbura <
> >> danushka.menikkumbura@gmail.com> wrote:
> >>
> >>> Hi All,
> >>>
> >>> The current UI implementation does not take application/host
> description
> >>> into account simply because they have little or no meaning in the
> Hadoop
> >>> world as I believe. The current implementation enables configuring each
> >>> individual job using the UI (Please see the attached xbaya-hadoop.png).
> >>>
> >>> The upside of this approach is that new jobs could be added/configured
> >>> dynamically, without adding application descriptions/generating
> >>> code/compiling/re-deploying/etc. The downside is that it is different
> from
> >>> general GFac application invocation, where each application has an
> >>> associated application/host/etc. Nevertheless we are trying to
> incorporate
> >>> something that does not quite fit into application/host domain.
> >>>
> >>> Thoughts appreciated.
> >>>
> >>> Thanks,
> >>> Danushka
> >>>
> >>
> >>
> >
> >
> > --
> > System Analyst Programmer
> > PTI Lab
> > Indiana University
>
>
>
> --
> Chathura Herath Ph.D
> http://people.apache.org/~chathura/
> http://chathurah.blogspot.com/
>

Re: XBaya/Hadoop Integration - Concern

Posted by Amila Jayasekara <th...@gmail.com>.

I didnt "quite" understand the issue. But as per Chathura's discussion what
I realize is, we need an abstraction over host description. The core
framework only needs to be aware about the abstract host descriptor and it
is the responsibility of the appropriate provider (Hadoop provider in this
case) to deal with the concrete host descriptor implementation.
In the UI we need to support concrete implementation of host descriptors
and it is the responsibility of the API to correctly decode and instantiate
appropriate host descriptor implementation.

Please, disregard this email if I am talking complete none-sense.

Thanks
Amila


On Sat, Jun 22, 2013 at 1:16 AM, Chathura Herath <ch...@cs.indiana.edu>wrote:

> Lets think about this a bit more.
> Hadoop is a data driven model, which is very different from the MPI
> model that we deal  in scientific computing. When you launch an MPI
> job the number of nodes are decided by you(meaning Suresh or Sudhakar
> who configure the app ), who knows how many nodes are available and
> required for the job. Now if you think about Hadoop,  the number of
> nodes/partitions are decides by the framework dynamically(may be using
> the Formatters). I believe this is the reason why this idea of dynamic
> node selection was left untouched for most art.
>
> I agree that we need a configuration file to store all the
> configuration that are static for most part but required for hadoop
> job launch. Current encapsulation of hadoop configuration is not the
> best way and certainly be improved.
>
> Point i am trying to make is, may be what we need is a
> HadoopAccountDescription.xml file instead of trying to push this to
> the existing hostdescription. Their semantics are different as well as
> the paremerters and model. The host description schema was defined
> with super computing applications in mind may be this schema was
> revisited since i last seen it and rethought. I wouldn't worry about
> this if you are working on a conference paper due in three weeks. But
> definitely something to think about.
>
>
>
> On Fri, Jun 21, 2013 at 11:40 PM, Lahiru Gunathilake <gl...@gmail.com>
> wrote:
> > Hi Danushka,
> >
> >
> > I am +1 for this approach, but I am sure you need to patch gfac-core
> > without breaking default gfac functionality.
> >
> > Lahiru
> >
> >
> > On Fri, Jun 21, 2013 at 7:44 PM, Danushka Menikkumbura <
> > danushka.menikkumbura@gmail.com> wrote:
> >
> >> Hadoop deployment model (single node, local cluster, EMR, etc) is not
> >> exactly a host, as in Airavata, but is along the lines of host IMO.
> >> Therefore we can still stick to a similar model but need to have a
> >> different UI interface to configure them. Still Hadoop jobs would be
> >> treated differently and have them configured in workflow itself (i.e.
> the
> >> current implementation), as opposed to having them predefined as in GFac
> >> applications.
> >>
> >> Please kindly let me know if you think otherwise.
> >>
> >> Cheers,
> >> Danushka
> >>
> >>
> >> On Wed, Jun 19, 2013 at 12:57 AM, Danushka Menikkumbura <
> >> danushka.menikkumbura@gmail.com> wrote:
> >>
> >>> Hi All,
> >>>
> >>> The current UI implementation does not take application/host
> description
> >>> into account simply because they have little or no meaning in the
> Hadoop
> >>> world as I believe. The current implementation enables configuring each
> >>> individual job using the UI (Please see the attached xbaya-hadoop.png).
> >>>
> >>> The upside of this approach is that new jobs could be added/configured
> >>> dynamically, without adding application descriptions/generating
> >>> code/compiling/re-deploying/etc. The downside is that it is different
> from
> >>> general GFac application invocation, where each application has an
> >>> associated application/host/etc. Nevertheless we are trying to
> incorporate
> >>> something that does not quite fit into application/host domain.
> >>>
> >>> Thoughts appreciated.
> >>>
> >>> Thanks,
> >>> Danushka
> >>>
> >>
> >>
> >
> >
> > --
> > System Analyst Programmer
> > PTI Lab
> > Indiana University
>
>
>
> --
> Chathura Herath Ph.D
> http://people.apache.org/~chathura/
> http://chathurah.blogspot.com/
>

Re: XBaya/Hadoop Integration - Concern

Posted by Chathura Herath <ch...@cs.indiana.edu>.

Lets think about this a bit more.
Hadoop is a data driven model, which is very different from the MPI
model that we deal  in scientific computing. When you launch an MPI
job the number of nodes are decided by you(meaning Suresh or Sudhakar
who configure the app ), who knows how many nodes are available and
required for the job. Now if you think about Hadoop,  the number of
nodes/partitions are decides by the framework dynamically(may be using
the Formatters). I believe this is the reason why this idea of dynamic
node selection was left untouched for most art.

I agree that we need a configuration file to store all the
configuration that are static for most part but required for hadoop
job launch. Current encapsulation of hadoop configuration is not the
best way and certainly be improved.

Point i am trying to make is, may be what we need is a
HadoopAccountDescription.xml file instead of trying to push this to
the existing hostdescription. Their semantics are different as well as
the paremerters and model. The host description schema was defined
with super computing applications in mind may be this schema was
revisited since i last seen it and rethought. I wouldn't worry about
this if you are working on a conference paper due in three weeks. But
definitely something to think about.

On Fri, Jun 21, 2013 at 11:40 PM, Lahiru Gunathilake <gl...@gmail.com> wrote:
> Hi Danushka,
>
>
> I am +1 for this approach, but I am sure you need to patch gfac-core
> without breaking default gfac functionality.
>
> Lahiru
>
>
> On Fri, Jun 21, 2013 at 7:44 PM, Danushka Menikkumbura <
> danushka.menikkumbura@gmail.com> wrote:
>
>> Hadoop deployment model (single node, local cluster, EMR, etc) is not
>> exactly a host, as in Airavata, but is along the lines of host IMO.
>> Therefore we can still stick to a similar model but need to have a
>> different UI interface to configure them. Still Hadoop jobs would be
>> treated differently and have them configured in workflow itself (i.e. the
>> current implementation), as opposed to having them predefined as in GFac
>> applications.
>>
>> Please kindly let me know if you think otherwise.
>>
>> Cheers,
>> Danushka
>>
>>
>> On Wed, Jun 19, 2013 at 12:57 AM, Danushka Menikkumbura <
>> danushka.menikkumbura@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> The current UI implementation does not take application/host description
>>> into account simply because they have little or no meaning in the Hadoop
>>> world as I believe. The current implementation enables configuring each
>>> individual job using the UI (Please see the attached xbaya-hadoop.png).
>>>
>>> The upside of this approach is that new jobs could be added/configured
>>> dynamically, without adding application descriptions/generating
>>> code/compiling/re-deploying/etc. The downside is that it is different from
>>> general GFac application invocation, where each application has an
>>> associated application/host/etc. Nevertheless we are trying to incorporate
>>> something that does not quite fit into application/host domain.
>>>
>>> Thoughts appreciated.
>>>
>>> Thanks,
>>> Danushka
>>>
>>
>>
>
>
> --
> System Analyst Programmer
> PTI Lab
> Indiana University

-- 
Chathura Herath Ph.D
http://people.apache.org/~chathura/
http://chathurah.blogspot.com/

Re: XBaya/Hadoop Integration - Concern

Posted by Lahiru Gunathilake <gl...@gmail.com>.

Hi Danushka,


I am +1 for this approach, but I am sure you need to patch gfac-core
without breaking default gfac functionality.

Lahiru


On Fri, Jun 21, 2013 at 7:44 PM, Danushka Menikkumbura <
danushka.menikkumbura@gmail.com> wrote:

> Hadoop deployment model (single node, local cluster, EMR, etc) is not
> exactly a host, as in Airavata, but is along the lines of host IMO.
> Therefore we can still stick to a similar model but need to have a
> different UI interface to configure them. Still Hadoop jobs would be
> treated differently and have them configured in workflow itself (i.e. the
> current implementation), as opposed to having them predefined as in GFac
> applications.
>
> Please kindly let me know if you think otherwise.
>
> Cheers,
> Danushka
>
>
> On Wed, Jun 19, 2013 at 12:57 AM, Danushka Menikkumbura <
> danushka.menikkumbura@gmail.com> wrote:
>
>> Hi All,
>>
>> The current UI implementation does not take application/host description
>> into account simply because they have little or no meaning in the Hadoop
>> world as I believe. The current implementation enables configuring each
>> individual job using the UI (Please see the attached xbaya-hadoop.png).
>>
>> The upside of this approach is that new jobs could be added/configured
>> dynamically, without adding application descriptions/generating
>> code/compiling/re-deploying/etc. The downside is that it is different from
>> general GFac application invocation, where each application has an
>> associated application/host/etc. Nevertheless we are trying to incorporate
>> something that does not quite fit into application/host domain.
>>
>> Thoughts appreciated.
>>
>> Thanks,
>> Danushka
>>
>
>


-- 
System Analyst Programmer
PTI Lab
Indiana University

Re: XBaya/Hadoop Integration - Concern

Posted by Danushka Menikkumbura <da...@gmail.com>.

Hi Amila,

Nope. It does not.

We still need GFac to submit jobs.

Danushka


On Sat, Jun 22, 2013 at 8:56 AM, Amila Jayasekara
<th...@gmail.com>wrote:

> Hi Dhanushka,
>
> I am not an expert in Hadoop.
>
> But does this says that we dont go through GFac to submit jobs ?
>
> Regards
> Amila
>
>
> On Fri, Jun 21, 2013 at 7:44 PM, Danushka Menikkumbura <
> danushka.menikkumbura@gmail.com> wrote:
>
> > Hadoop deployment model (single node, local cluster, EMR, etc) is not
> > exactly a host, as in Airavata, but is along the lines of host IMO.
> > Therefore we can still stick to a similar model but need to have a
> > different UI interface to configure them. Still Hadoop jobs would be
> > treated differently and have them configured in workflow itself (i.e. the
> > current implementation), as opposed to having them predefined as in GFac
> > applications.
> >
> > Please kindly let me know if you think otherwise.
> >
> > Cheers,
> > Danushka
> >
> >
> > On Wed, Jun 19, 2013 at 12:57 AM, Danushka Menikkumbura <
> > danushka.menikkumbura@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > The current UI implementation does not take application/host
> description
> > > into account simply because they have little or no meaning in the
> Hadoop
> > > world as I believe. The current implementation enables configuring each
> > > individual job using the UI (Please see the attached xbaya-hadoop.png).
> > >
> > > The upside of this approach is that new jobs could be added/configured
> > > dynamically, without adding application descriptions/generating
> > > code/compiling/re-deploying/etc. The downside is that it is different
> > from
> > > general GFac application invocation, where each application has an
> > > associated application/host/etc. Nevertheless we are trying to
> > incorporate
> > > something that does not quite fit into application/host domain.
> > >
> > > Thoughts appreciated.
> > >
> > > Thanks,
> > > Danushka
> > >
> >
>

Re: XBaya/Hadoop Integration - Concern

Posted by Amila Jayasekara <th...@gmail.com>.

Hi Dhanushka,

I am not an expert in Hadoop.

But does this says that we dont go through GFac to submit jobs ?

Regards
Amila


On Fri, Jun 21, 2013 at 7:44 PM, Danushka Menikkumbura <
danushka.menikkumbura@gmail.com> wrote:

> Hadoop deployment model (single node, local cluster, EMR, etc) is not
> exactly a host, as in Airavata, but is along the lines of host IMO.
> Therefore we can still stick to a similar model but need to have a
> different UI interface to configure them. Still Hadoop jobs would be
> treated differently and have them configured in workflow itself (i.e. the
> current implementation), as opposed to having them predefined as in GFac
> applications.
>
> Please kindly let me know if you think otherwise.
>
> Cheers,
> Danushka
>
>
> On Wed, Jun 19, 2013 at 12:57 AM, Danushka Menikkumbura <
> danushka.menikkumbura@gmail.com> wrote:
>
> > Hi All,
> >
> > The current UI implementation does not take application/host description
> > into account simply because they have little or no meaning in the Hadoop
> > world as I believe. The current implementation enables configuring each
> > individual job using the UI (Please see the attached xbaya-hadoop.png).
> >
> > The upside of this approach is that new jobs could be added/configured
> > dynamically, without adding application descriptions/generating
> > code/compiling/re-deploying/etc. The downside is that it is different
> from
> > general GFac application invocation, where each application has an
> > associated application/host/etc. Nevertheless we are trying to
> incorporate
> > something that does not quite fit into application/host domain.
> >
> > Thoughts appreciated.
> >
> > Thanks,
> > Danushka
> >
>

Re: XBaya/Hadoop Integration - Concern

Posted by Danushka Menikkumbura <da...@gmail.com>.

Hadoop deployment model (single node, local cluster, EMR, etc) is not
exactly a host, as in Airavata, but is along the lines of host IMO.
Therefore we can still stick to a similar model but need to have a
different UI interface to configure them. Still Hadoop jobs would be
treated differently and have them configured in workflow itself (i.e. the
current implementation), as opposed to having them predefined as in GFac
applications.

Please kindly let me know if you think otherwise.

Cheers,
Danushka

On Wed, Jun 19, 2013 at 12:57 AM, Danushka Menikkumbura <
danushka.menikkumbura@gmail.com> wrote:

> Hi All,
>
> The current UI implementation does not take application/host description
> into account simply because they have little or no meaning in the Hadoop
> world as I believe. The current implementation enables configuring each
> individual job using the UI (Please see the attached xbaya-hadoop.png).
>
> The upside of this approach is that new jobs could be added/configured
> dynamically, without adding application descriptions/generating
> code/compiling/re-deploying/etc. The downside is that it is different from
> general GFac application invocation, where each application has an
> associated application/host/etc. Nevertheless we are trying to incorporate
> something that does not quite fit into application/host domain.
>
> Thoughts appreciated.
>
> Thanks,
> Danushka
>