You are viewing a plain text version of this content. The canonical link for it is here.

Posted to architecture@airavata.apache.org by Suresh Marru <sm...@apache.org> on 2014/02/23 23:20:25 UTC

Object Database Suggestions for Airavata Registry

Hi All,

Airavata is actively migrating to use Thrift API for the RESTless design and to facilitate various language bindings from client gateways. The programming language support in thrift has been so far very encouraging. The current architecture is looking like Figure 1 at [1]. 

Language specific clients will be released as thrift SDK’s (similar to evernote sdk’s [1]). These clients will be integrated into gateway portals which connect to the API Server. The API operations brokers he simple calls into one or more backend CPI calls (Airavata internal component interfaces).  An example set of mappings are illustrated in Figure 2 at [1]. The current draft of thrift API for version 0.12 is at [3], please pay attention to experiment model at [4]. 

For the persistent store, we had few iterations of Airavata Registry shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based registry. To allow the API and the associated data models to evolve, it will be useful to explore object databases so we can store the serialized version of thrift objects directly. But it will be nice to have all (or most) of the fields queriable. This calls for a more column-family design of any NoSQL approaches. 

Any recommendations for a registry architecture? 

Quickly hacking through I find the following approach a viable one: ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can benefit immediately from the replication and reliability of cassandra and scalability in near future. Some of the model objects like experiment creation will need to have strong consistency and most of the monitoring can live with eventual consistency. 

Critical comments please? 

Thanks for your time,
Suresh

[1] - https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
[2] - https://dev.evernote.com/doc/
[3] - https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
[4] - https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
[5] - https://github.com/MisterTea/ZombieDB
[6] - https://github.com/Netflix/astyanax

Re: Object Database Suggestions for Airavata Registry

Posted by Suresh Marru <sm...@apache.org>.

On Feb 23, 2014, at 8:30 PM, Sachith Withana <sw...@gmail.com> wrote:

> First of all thanks a lot for the detailed explanation Suresh.
> I agree with the fact that with the current implementations and the Data
> Models getting complex will require a more efficient way to handle the data
> at the back end.
> 
> But can you please elaborate on why you chose ZombieDB over Astyanax?
> Hector and DataStax are also used widely in production.

ZombieDB is a wrapper and takes care of the thrift model wrapping into cassandra. The key decision is to lock into zombiedb or not, the drivers themselves can be easily replaced with one over other (with some tradeoff’s and small effort). 

> 
> One more thing, We were planning to have real time processing of data
> stored in the registry using Storm ( or using a better alternative). If
> that's the case I suggest we can use Storm Cassandra project [1] as well.
> 
> [1] https://github.com/hmsonline/storm-cassandra<https://github.com/hmsonline/storm-cassandra>

Storm is targeted for stream processing and current Airavata use cases do not fit well, at least not as of now. 

Suresh
> 
> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org> wrote:
> 
>> Hi All,
>> 
>> Airavata is actively migrating to use Thrift API for the RESTless design
>> and to facilitate various language bindings from client gateways. The
>> programming language support in thrift has been so far very encouraging.
>> The current architecture is looking like Figure 1 at [1].
>> 
>> Language specific clients will be released as thrift SDK's (similar to
>> evernote sdk's [1]). These clients will be integrated into gateway portals
>> which connect to the API Server. The API operations brokers he simple calls
>> into one or more backend CPI calls (Airavata internal component
>> interfaces).  An example set of mappings are illustrated in Figure 2 at
>> [1]. The current draft of thrift API for version 0.12 is at [3], please pay
>> attention to experiment model at [4].
>> 
>> For the persistent store, we had few iterations of Airavata Registry
>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
>> registry. To allow the API and the associated data models to evolve, it
>> will be useful to explore object databases so we can store the serialized
>> version of thrift objects directly. But it will be nice to have all (or
>> most) of the fields queriable. This calls for a more column-family design
>> of any NoSQL approaches.
>> 
>> Any recommendations for a registry architecture?
>> 
>> Quickly hacking through I find the following approach a viable one:
>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can benefit
>> immediately from the replication and reliability of cassandra and
>> scalability in near future. Some of the model objects like experiment
>> creation will need to have strong consistency and most of the monitoring
>> can live with eventual consistency.
>> 
>> Critical comments please?
>> 
>> Thanks for your time,
>> Suresh
>> 
>> [1] -
>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>> [2] - https://dev.evernote.com/doc/
>> [3] -
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>> [4] -
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>> [5] - https://github.com/MisterTea/ZombieDB
>> [6] - https://github.com/Netflix/astyanax
>> 
>> 
> 
> 
> -- 
> Thanks,
> Sachith Withana

Re: Object Database Suggestions for Airavata Registry

Posted by Sachith Withana <sw...@gmail.com>.

First of all thanks a lot for the detailed explanation Suresh.
I agree with the fact that with the current implementations and the Data
Models getting complex will require a more efficient way to handle the data
at the back end.

But can you please elaborate on why you chose ZombieDB over Astyanax?
Hector and DataStax are also used widely in production.

One more thing, We were planning to have real time processing of data
stored in the registry using Storm ( or using a better alternative). If
that's the case I suggest we can use Storm Cassandra project [1] as well.

[1] https://github.com/hmsonline/storm-cassandra<https://github.com/hmsonline/storm-cassandra>


On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org> wrote:

> Hi All,
>
> Airavata is actively migrating to use Thrift API for the RESTless design
> and to facilitate various language bindings from client gateways. The
> programming language support in thrift has been so far very encouraging.
> The current architecture is looking like Figure 1 at [1].
>
> Language specific clients will be released as thrift SDK's (similar to
> evernote sdk's [1]). These clients will be integrated into gateway portals
> which connect to the API Server. The API operations brokers he simple calls
> into one or more backend CPI calls (Airavata internal component
> interfaces).  An example set of mappings are illustrated in Figure 2 at
> [1]. The current draft of thrift API for version 0.12 is at [3], please pay
> attention to experiment model at [4].
>
> For the persistent store, we had few iterations of Airavata Registry
> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
> registry. To allow the API and the associated data models to evolve, it
> will be useful to explore object databases so we can store the serialized
> version of thrift objects directly. But it will be nice to have all (or
> most) of the fields queriable. This calls for a more column-family design
> of any NoSQL approaches.
>
> Any recommendations for a registry architecture?
>
> Quickly hacking through I find the following approach a viable one:
> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can benefit
> immediately from the replication and reliability of cassandra and
> scalability in near future. Some of the model objects like experiment
> creation will need to have strong consistency and most of the monitoring
> can live with eventual consistency.
>
> Critical comments please?
>
> Thanks for your time,
> Suresh
>
> [1] -
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> [2] - https://dev.evernote.com/doc/
> [3] -
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> [4] -
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> [5] - https://github.com/MisterTea/ZombieDB
> [6] - https://github.com/Netflix/astyanax
>
>


-- 
Thanks,
Sachith Withana

Re: Object Database Suggestions for Airavata Registry

Posted by "Miller, Mark" <mm...@sdsc.edu>.

Our experience is that nosql is in its infancy, and that you lose a lot by removing SQL. The driver for this loss is often rapid capture of data. I don't see that as a driver here. Would someone explain what the driver is in our use case?

Thanks
Mark 

Sent from my iPhone

> On Feb 24, 2014, at 10:58 AM, "Supun Kamburugamuva" <su...@gmail.com> wrote:
> 
> Hi all,
> 
> I'm not trying to discourage you on your exploration to NoSQL databases. I
> have the following concern.
> 
> Your database schema is moderately complex - even for a RDBMS it seems
> complex and the data size is relatively small. I'm not sure about the
> current tools available but I think you will need to write more code to
> support all your requirements in a NoSQL database. So writing more code and
> allow redundancy to support *relatively small* and *structured
> data*doesn't seem right to me. May be I'm wrong and there are better
> tools in
> NoSQL than RDBMS, which I doubt.
> 
> Thanks,
> Supun..
> 
> 
> 
>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org> wrote:
>> 
>> Hi All,
>> 
>> Airavata is actively migrating to use Thrift API for the RESTless design
>> and to facilitate various language bindings from client gateways. The
>> programming language support in thrift has been so far very encouraging.
>> The current architecture is looking like Figure 1 at [1].
>> 
>> Language specific clients will be released as thrift SDK's (similar to
>> evernote sdk's [1]). These clients will be integrated into gateway portals
>> which connect to the API Server. The API operations brokers he simple calls
>> into one or more backend CPI calls (Airavata internal component
>> interfaces).  An example set of mappings are illustrated in Figure 2 at
>> [1]. The current draft of thrift API for version 0.12 is at [3], please pay
>> attention to experiment model at [4].
>> 
>> For the persistent store, we had few iterations of Airavata Registry
>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
>> registry. To allow the API and the associated data models to evolve, it
>> will be useful to explore object databases so we can store the serialized
>> version of thrift objects directly. But it will be nice to have all (or
>> most) of the fields queriable. This calls for a more column-family design
>> of any NoSQL approaches.
>> 
>> Any recommendations for a registry architecture?
>> 
>> Quickly hacking through I find the following approach a viable one:
>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can benefit
>> immediately from the replication and reliability of cassandra and
>> scalability in near future. Some of the model objects like experiment
>> creation will need to have strong consistency and most of the monitoring
>> can live with eventual consistency.
>> 
>> Critical comments please?
>> 
>> Thanks for your time,
>> Suresh
>> 
>> [1] -
>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>> [2] - https://dev.evernote.com/doc/
>> [3] -
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>> [4] -
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>> [5] - https://github.com/MisterTea/ZombieDB
>> [6] - https://github.com/Netflix/astyanax
> 
> 
> -- 
> Supun Kamburugamuva
> Member, Apache Software Foundation; http://www.apache.org
> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> Blog: http://supunk.blogspot.com

Re: Object Database Suggestions for Airavata Registry

Posted by Suresh Marru <sm...@apache.org>.

On Feb 24, 2014, at 1:21 PM, Chathuri Wimalasena <ka...@gmail.com> wrote:

> Hi Suresh,
> 
> I think it took time to do changes to the registry because we did lot of
> changes. We deleted all the previous data models and came up with new set
> of data models (more than 20 of them) with iterative changes. On top of
> that, we introduce new CPIs and registry CPI implementation needs to
> support all those new data models plus previous data models. Even if we go
> with general relational model (without openJPA) to incorporate all these
> changes, I'm pretty sure it will take more than 2 weeks. IMO, openJPA gave
> us lot of time saving with respect to developer effort. With openJPA,
> slight changes to the data models will not take 2 weeks of time.
> 
> Thanks..
> Chathuri

Hi Chathuri,

Thanks for your first-hands insights on these changes. I am not against the use of OpenJPA and not blaming it for the current complexity. I am fully sympathetic to the plethora of current refactoring. But this is not the first time we had to do right? from 0.5 to 0.6 we had to have this massive effort. I do not think we can limit these needs where we have to change the data models drastically. Rather we find a design, which can absorb these seamlessly. We are just about to settle on single job executions. We still need to evolve them to encompass all the different patterns of airavata use cases and also still need to find good ways to capture workflows (simple, hierarchal) and also having a single experiment forking of multiple workflows. All of these will require data model iterations similar to what we had to go through now. The ongoing debate of the API being simple and yet flexible will keep iterating the data models as well. So if there is a simpler relational database solution which can absorb the changes and directly work with thrift generated data models, I am all for it.

Suresh

> 
> 
> On Mon, Feb 24, 2014 at 12:32 PM, Suresh Marru <sm...@apache.org> wrote:
> 
>> I could respond to each thread in detail, but I see the general sense is
>> inquiring on the use case, so let me try and explain this and see if it
>> comes across. I am fully onboard with perceptions of relational vs nosql
>> and also agree current Airavata needs are not a direct map for NoSQL
>> migration. I will summarize the driving motivation:
>> 
>> Background: The key problem Airavata needs to solve is getting the API and
>> associated data model right. The problem is current relational database
>> (with OpenJPA overlay) is severely limiting the API evolution. Science
>> Gateways by nature are very science domain and use-case specific. But
>> Airavata is tackling this challenging problem of providing a generic API
>> which will meet and enable these use case centric integration. The issue
>> here is, we are designing an API to handle a wide range of known (and some
>> foreseen) use cases. But at the same time trying to keep it simple and yet
>> flexible. The only way we can get through a reasonable, normalized version
>> of API is by hands-on programming against the API. Within the Airavata PMC
>> itself, we can solicit a half-a-dozen different ways on how to visualize
>> the data model. And we need few hackethon's with real-end users of Airavata
>> until we find a common ground. All of this needs rapid prototyping.
>> Currently a slight change in the data model is taking close to two weeks of
>> re-arcitecting the Open-JPA based registry. There are many known problems
>> with current draft of data model which have to be put-down in the interest
>> of making over all system progress.
>> 
>> So the driving motivation is not certainly any of the classic NoSQL needs.
>> But a simple one, can we have registry which is schema-agnostic and yet is
>> queriable for most of the fields in the model? Can we try 10 different
>> variants of data model (hence API) within the next 3 months with focused
>> hackethon's and arrive at a stable 1.0 version of API?
>> 
>> Part one is the discussion is successful that it raised every one's eye
>> brows. Now that we have every one's attention, what will be a good data
>> store for Airavata which will meet these needs?
>> 
>> P.S: Additional background: The API has been in development for close to 3
>> years and is falling short of pleasing a majority. Many academic
>> standardization efforts fail terribly trying to pretend to understand all
>> use cases and proposing a standard way (which ends up unnecessarily complex
>> and not usable). Science by nature is evolutionary, and restricting the
>> capabilities by a known set of use cases prevents the use of middleware for
>> real-scientific research (and gets limited to proof of concept
>> demonstrations, papers, educational use). The only way meeting the
>> challenges of these evolving needs is to have the framework which can
>> evolve with minimal disruption.
>> 
>> Great thoughts so far, please keep 'em coming until we can find a solution
>> not by the technical fancies but to address the real need.
>> 
>> Cheers,
>> Suresh
>> 
>> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <gl...@gmail.com>
>> wrote:
>> 
>>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
>>> milinda.pathirage@gmail.com> wrote:
>>> 
>>>> I also think that moving to Cassandra or any other NoSQL will add
>>>> unneccessary complexity to your solution. Also designing proper (easy to
>>>> manage changes, easy to query) NoSQL data models are hard (AFAIK,
>> require
>>>> lots of experience and understanding about data structures and queries).
>>>> Also migrating from one NoSQL technology to other can require complete
>>>> re-write. And current relational databases can handle heavy loads except
>>>> Google, Twitter, Amazon and Facebook like loads. I don't think Airavata
>>>> will see Google and Amazon like loads.
>>>> 
>>> +1
>>> 
>>>> 
>>>> If the constant changes to the data model is the problem , I think best
>>>> option is to abstract registry implementation to something like
>> collections
>>>> and resources used in WSO2 Registry [1] or something suitable for
>> Airavata
>>>> context. That will make it easy to handle changes in data model.
>>>> 
>>>> Also don't let the technologies drive design decision. Its always
>> better to
>>>> let use cases drive the design decision.
>>>> 
>>> +1
>>> 
>>> Regards
>>> Lahiru
>>> 
>>>> 
>>>> Thanks
>>>> Milinda
>>>> 
>>>> [1] http://wso2.com/products/governance-registry/
>>>> 
>>>> 
>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
>> supun06@gmail.com
>>>>> wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I'm not trying to discourage you on your exploration to NoSQL
>> databases.
>>>> I
>>>>> have the following concern.
>>>>> 
>>>>> Your database schema is moderately complex - even for a RDBMS it seems
>>>>> complex and the data size is relatively small. I'm not sure about the
>>>>> current tools available but I think you will need to write more code to
>>>>> support all your requirements in a NoSQL database. So writing more code
>>>> and
>>>>> allow redundancy to support *relatively small* and *structured
>>>>> data*doesn't seem right to me. May be I'm wrong and there are better
>>>>> tools in
>>>>> NoSQL than RDBMS, which I doubt.
>>>>> 
>>>>> Thanks,
>>>>> Supun..
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org>
>> wrote:
>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> Airavata is actively migrating to use Thrift API for the RESTless
>>>> design
>>>>>> and to facilitate various language bindings from client gateways. The
>>>>>> programming language support in thrift has been so far very
>>>> encouraging.
>>>>>> The current architecture is looking like Figure 1 at [1].
>>>>>> 
>>>>>> Language specific clients will be released as thrift SDK's (similar to
>>>>>> evernote sdk's [1]). These clients will be integrated into gateway
>>>>> portals
>>>>>> which connect to the API Server. The API operations brokers he simple
>>>>> calls
>>>>>> into one or more backend CPI calls (Airavata internal component
>>>>>> interfaces).  An example set of mappings are illustrated in Figure 2
>> at
>>>>>> [1]. The current draft of thrift API for version 0.12 is at [3],
>> please
>>>>> pay
>>>>>> attention to experiment model at [4].
>>>>>> 
>>>>>> For the persistent store, we had few iterations of Airavata Registry
>>>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
>>>>>> registry. To allow the API and the associated data models to evolve,
>> it
>>>>>> will be useful to explore object databases so we can store the
>>>> serialized
>>>>>> version of thrift objects directly. But it will be nice to have all
>> (or
>>>>>> most) of the fields queriable. This calls for a more column-family
>>>> design
>>>>>> of any NoSQL approaches.
>>>>>> 
>>>>>> Any recommendations for a registry architecture?
>>>>>> 
>>>>>> Quickly hacking through I find the following approach a viable one:
>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
>>>>> benefit
>>>>>> immediately from the replication and reliability of cassandra and
>>>>>> scalability in near future. Some of the model objects like experiment
>>>>>> creation will need to have strong consistency and most of the
>>>> monitoring
>>>>>> can live with eventual consistency.
>>>>>> 
>>>>>> Critical comments please?
>>>>>> 
>>>>>> Thanks for your time,
>>>>>> Suresh
>>>>>> 
>>>>>> [1] -
>>>>>> 
>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>>>>>> [2] - https://dev.evernote.com/doc/
>>>>>> [3] -
>>>>>> 
>>>>> 
>>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>>>>>> [4] -
>>>>>> 
>>>>> 
>>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>>>>>> [5] - https://github.com/MisterTea/ZombieDB
>>>>>> [6] - https://github.com/Netflix/astyanax
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Supun Kamburugamuva
>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>>>> Blog: http://supunk.blogspot.com
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Milinda Pathirage
>>>> PhD Student Indiana University, Bloomington;
>>>> E-mail: milinda.pathirage@gmail.com
>>>> Web: http://mpathirage.com
>>>> Blog: http://blog.mpathirage.com
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> System Analyst Programmer
>>> PTI Lab
>>> Indiana University
>> 
>>

Re: Object Database Suggestions for Airavata Registry

Posted by Chathuri Wimalasena <ka...@gmail.com>.

Hi Suresh,

I think it took time to do changes to the registry because we did lot of
changes. We deleted all the previous data models and came up with new set
of data models (more than 20 of them) with iterative changes. On top of
that, we introduce new CPIs and registry CPI implementation needs to
support all those new data models plus previous data models. Even if we go
with general relational model (without openJPA) to incorporate all these
changes, I'm pretty sure it will take more than 2 weeks. IMO, openJPA gave
us lot of time saving with respect to developer effort. With openJPA,
slight changes to the data models will not take 2 weeks of time.

Thanks..
Chathuri


On Mon, Feb 24, 2014 at 12:32 PM, Suresh Marru <sm...@apache.org> wrote:

> I could respond to each thread in detail, but I see the general sense is
> inquiring on the use case, so let me try and explain this and see if it
> comes across. I am fully onboard with perceptions of relational vs nosql
> and also agree current Airavata needs are not a direct map for NoSQL
> migration. I will summarize the driving motivation:
>
> Background: The key problem Airavata needs to solve is getting the API and
> associated data model right. The problem is current relational database
> (with OpenJPA overlay) is severely limiting the API evolution. Science
> Gateways by nature are very science domain and use-case specific. But
> Airavata is tackling this challenging problem of providing a generic API
> which will meet and enable these use case centric integration. The issue
> here is, we are designing an API to handle a wide range of known (and some
> foreseen) use cases. But at the same time trying to keep it simple and yet
> flexible. The only way we can get through a reasonable, normalized version
> of API is by hands-on programming against the API. Within the Airavata PMC
> itself, we can solicit a half-a-dozen different ways on how to visualize
> the data model. And we need few hackethon's with real-end users of Airavata
> until we find a common ground. All of this needs rapid prototyping.
> Currently a slight change in the data model is taking close to two weeks of
> re-arcitecting the Open-JPA based registry. There are many known problems
> with current draft of data model which have to be put-down in the interest
> of making over all system progress.
>
> So the driving motivation is not certainly any of the classic NoSQL needs.
> But a simple one, can we have registry which is schema-agnostic and yet is
> queriable for most of the fields in the model? Can we try 10 different
> variants of data model (hence API) within the next 3 months with focused
> hackethon's and arrive at a stable 1.0 version of API?
>
> Part one is the discussion is successful that it raised every one's eye
> brows. Now that we have every one's attention, what will be a good data
> store for Airavata which will meet these needs?
>
> P.S: Additional background: The API has been in development for close to 3
> years and is falling short of pleasing a majority. Many academic
> standardization efforts fail terribly trying to pretend to understand all
> use cases and proposing a standard way (which ends up unnecessarily complex
> and not usable). Science by nature is evolutionary, and restricting the
> capabilities by a known set of use cases prevents the use of middleware for
> real-scientific research (and gets limited to proof of concept
> demonstrations, papers, educational use). The only way meeting the
> challenges of these evolving needs is to have the framework which can
> evolve with minimal disruption.
>
> Great thoughts so far, please keep 'em coming until we can find a solution
> not by the technical fancies but to address the real need.
>
> Cheers,
> Suresh
>
> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <gl...@gmail.com>
> wrote:
>
> > On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
> > milinda.pathirage@gmail.com> wrote:
> >
> >> I also think that moving to Cassandra or any other NoSQL will add
> >> unneccessary complexity to your solution. Also designing proper (easy to
> >> manage changes, easy to query) NoSQL data models are hard (AFAIK,
> require
> >> lots of experience and understanding about data structures and queries).
> >> Also migrating from one NoSQL technology to other can require complete
> >> re-write. And current relational databases can handle heavy loads except
> >> Google, Twitter, Amazon and Facebook like loads. I don't think Airavata
> >> will see Google and Amazon like loads.
> >>
> > +1
> >
> >>
> >> If the constant changes to the data model is the problem , I think best
> >> option is to abstract registry implementation to something like
> collections
> >> and resources used in WSO2 Registry [1] or something suitable for
> Airavata
> >> context. That will make it easy to handle changes in data model.
> >>
> >> Also don't let the technologies drive design decision. Its always
> better to
> >> let use cases drive the design decision.
> >>
> > +1
> >
> > Regards
> > Lahiru
> >
> >>
> >> Thanks
> >> Milinda
> >>
> >> [1] http://wso2.com/products/governance-registry/
> >>
> >>
> >> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
> supun06@gmail.com
> >>> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I'm not trying to discourage you on your exploration to NoSQL
> databases.
> >> I
> >>> have the following concern.
> >>>
> >>> Your database schema is moderately complex - even for a RDBMS it seems
> >>> complex and the data size is relatively small. I'm not sure about the
> >>> current tools available but I think you will need to write more code to
> >>> support all your requirements in a NoSQL database. So writing more code
> >> and
> >>> allow redundancy to support *relatively small* and *structured
> >>> data*doesn't seem right to me. May be I'm wrong and there are better
> >>> tools in
> >>> NoSQL than RDBMS, which I doubt.
> >>>
> >>> Thanks,
> >>> Supun..
> >>>
> >>>
> >>>
> >>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org>
> wrote:
> >>>
> >>>> Hi All,
> >>>>
> >>>> Airavata is actively migrating to use Thrift API for the RESTless
> >> design
> >>>> and to facilitate various language bindings from client gateways. The
> >>>> programming language support in thrift has been so far very
> >> encouraging.
> >>>> The current architecture is looking like Figure 1 at [1].
> >>>>
> >>>> Language specific clients will be released as thrift SDK's (similar to
> >>>> evernote sdk's [1]). These clients will be integrated into gateway
> >>> portals
> >>>> which connect to the API Server. The API operations brokers he simple
> >>> calls
> >>>> into one or more backend CPI calls (Airavata internal component
> >>>> interfaces).  An example set of mappings are illustrated in Figure 2
> at
> >>>> [1]. The current draft of thrift API for version 0.12 is at [3],
> please
> >>> pay
> >>>> attention to experiment model at [4].
> >>>>
> >>>> For the persistent store, we had few iterations of Airavata Registry
> >>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
> >>>> registry. To allow the API and the associated data models to evolve,
> it
> >>>> will be useful to explore object databases so we can store the
> >> serialized
> >>>> version of thrift objects directly. But it will be nice to have all
> (or
> >>>> most) of the fields queriable. This calls for a more column-family
> >> design
> >>>> of any NoSQL approaches.
> >>>>
> >>>> Any recommendations for a registry architecture?
> >>>>
> >>>> Quickly hacking through I find the following approach a viable one:
> >>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
> >>> benefit
> >>>> immediately from the replication and reliability of cassandra and
> >>>> scalability in near future. Some of the model objects like experiment
> >>>> creation will need to have strong consistency and most of the
> >> monitoring
> >>>> can live with eventual consistency.
> >>>>
> >>>> Critical comments please?
> >>>>
> >>>> Thanks for your time,
> >>>> Suresh
> >>>>
> >>>> [1] -
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> >>>> [2] - https://dev.evernote.com/doc/
> >>>> [3] -
> >>>>
> >>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> >>>> [4] -
> >>>>
> >>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> >>>> [5] - https://github.com/MisterTea/ZombieDB
> >>>> [6] - https://github.com/Netflix/astyanax
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Supun Kamburugamuva
> >>> Member, Apache Software Foundation; http://www.apache.org
> >>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> >>> Blog: http://supunk.blogspot.com
> >>>
> >>
> >>
> >>
> >> --
> >> Milinda Pathirage
> >> PhD Student Indiana University, Bloomington;
> >> E-mail: milinda.pathirage@gmail.com
> >> Web: http://mpathirage.com
> >> Blog: http://blog.mpathirage.com
> >>
> >
> >
> >
> > --
> > System Analyst Programmer
> > PTI Lab
> > Indiana University
>
>

Re: Object Database Suggestions for Airavata Registry

Posted by Samir Faci <sa...@esamir.com>.

I wasn't able to attend either.  I think I got myself mixed up with the
email conversation and the events invitations I was getting from G+.  I did
watch the meeting later on.

(the archives is very convenient. ).

It feels like the general purpose of the project is still a bit adhoc, or
I'm not seeing the light just yet.  But I *believe* the general idea is to
provide a framework/infrastructure to be used
by various scientific fields and expose APIs that would trigger certain
tasks which can trigger large jobs.  Exposing both a specialized API per
domain, and generic one.  I'm probably over simplifying the problem, but is
that the general idea?

As I mentioned before, I said, I'd be more then happy to contribute my
assistance/thoughts based on my experience at wize, though of late my
'free' time has been dwindling.

I pushed a pre-alpha version of medusa out.  I'm still not happy with the
state of the testing and such, but since you guys are looking to possibly
using it.  It'll give you an idea of the tool and what it does, unblocks
you and gives you opportunity to evaluate the tool and see if it even does
what you needed it to do.


https://github.com/WizeCommerce/medusa/tree/feature/floss






On Mon, Mar 3, 2014 at 10:27 AM, Jijoe Vurghese <ji...@gmail.com> wrote:

> Sorry, everyone. Sunday turned out to by busier than I expected...next time...
>
> --
> Jijoe
>
> On March 3, 2014 at 6:01:54, Marlon Pierce (marpierc@iu.edu) wrote:
>
> My regrets for missing the meeting but I was babysitting.
>
> Marlon
>
> On 3/2/14 11:16 PM, Suresh Marru wrote:
> > Thank you all for taking couple of hours on a sunday evening to
> participate. I think these discussions help Airavata very significantly.
> >
> > Here is the you tube link is any one would like to follow -
> http://www.youtube.com/watch?v=EY6oPwqi1g4
> >
> > Key Summary: Sachith is interested to do a GSoC project on this topic
> and he will start with summarizing the challenges in current registry. Once
> the problem statement is more clearer, we can take the next steps.
> >
> > Appreciate every one input on this key topic.
> >
> > Suresh
> >
> > P.S. I will be traveling for next 4 days, so I will be slow in my
> responses.
> >
> > On Mar 2, 2014, at 7:56 PM, Suresh Marru <sm...@apache.org> wrote:
> >
> >> Lets use this -
> https://plus.google.com/hangouts/_/hoaevent/AP36tYdve-71oizx25DGUbTZjSX4PtLxmDsddtqnfuDYlE9SXDSB9Q?authuser=0&hl=en
> >>
> >> I will compile a set of instructions for website so any one of us can
> preschedule it for future.
> >>
> >> Sures
> >>
> >> On Mar 2, 2014, at 7:36 PM, Eran Chinthaka Withana <
> eran.chinthaka@gmail.com> wrote:
> >>
> >>> Oops, in that case, Suresh, can you please create one?
> >>>
> >>> Thanks,
> >>> Eran Chinthaka Withana
> >>>
> >>>
> >>> On Sun, Mar 2, 2014 at 4:28 PM, Suresh Marru <sm...@apache.org>
> wrote:
> >>>
> >>>> Hi Eran,
> >>>>
> >>>> Is this a On-Air event? Previously I had trouble changing the
> previously
> >>>> scheduled event to On-Air.
> >>>>
> >>>> If you are creating a new hangout, can you first create it on G+
> Airavata
> >>>> Community (all PMC Members are moderators on this community). This
> will be
> >>>> easier for archival reference -
> >>>> https://plus.google.com/communities/100700433662281905708
> >>>>
> >>>> Suresh
> >>>>
> >>>> On Mar 2, 2014, at 7:21 PM, Eran Chinthaka Withana <
> >>>> eran.chinthaka@gmail.com> wrote:
> >>>>
> >>>>> Here is the link to hangout:
> >>>>>
> >>>>
> https://plus.google.com/hangouts/_/event/c1sgvk7dha37rkr0adktb195lgc?authuser=0&hl=en
> >>>>> Thanks,
> >>>>> Eran Chinthaka Withana
> >>>>>
> >>>>>
> >>>>> On Sun, Mar 2, 2014 at 12:46 PM, Suresh Marru <sm...@apache.org>
> wrote:
> >>>>>
> >>>>>> Hi All,
> >>>>>>
> >>>>>> Since Eran has been the one who first proposed the hangout and has
> >>>>>> specific suggestion on this thread I prefer to postpone to 8pm
> (EST).
> >>>> But
> >>>>>> if others planned for 4pm, lets goahead with the plan.
> >>>>>>
> >>>>>> Any one who planned to attend now cannot make it at 8pm (EST)? If
> do not
> >>>>>> hear any objections lets shoot for 8pm. Otherwise, lets go as
> planned.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Suresh
> >>>>>>
> >>>>>> On Mar 2, 2014, at 3:31 PM, Eran Chinthaka Withana <
> >>>>>> eran.chinthaka@gmail.com> wrote:
> >>>>>>
> >>>>>>> Hi Suresh,
> >>>>>>>
> >>>>>>> Sorry for the late reply. I don't think I can make it at 1pm PST
> today.
> >>>>>> Can
> >>>>>>> we please re-schedule this to 5pm PST (8pm EST) or later?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Eran Chinthaka Withana
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sun, Mar 2, 2014 at 6:38 AM, Suresh Marru <sm...@apache.org>
> >>>> wrote:
> >>>>>>>> Hi All,
> >>>>>>>>
> >>>>>>>> Great to see we have a good quorum. So how about 4pm EST (1pm PST)
> >>>> today
> >>>>>>>> with a hangout on air. It works best if we start a a hangout then
> >>>>>> (previous
> >>>>>>>> attempts to pre-schedules on-air events did not work well. So
> please
> >>>>>> check
> >>>>>>>> this mailing list around 4pm EST for the hangout on air link.
> >>>>>>>>
> >>>>>>>> Meanwhile, please join the Airavata Google Plus community, that
> might
> >>>> be
> >>>>>>>> easier to share the link -
> >>>>>>>> https://plus.google.com/communities/100700433662281905708
> >>>>>>>>
> >>>>>>>> Thanks all for willing to take time on a sunday,
> >>>>>>>> Suresh
> >>>>>>>>
> >>>>>>>> On Feb 28, 2014, at 9:15 PM, Supun Kamburugamuva <
> supun06@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> +1 for Sunday afternoon. I can make it after 4 pm EST.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Supun..
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, Feb 28, 2014 at 5:04 PM, Shameera Rathnayaka <
> >>>>>>>> shameerainfo@gmail.com
> >>>>>>>>>> wrote:
> >>>>>>>>>> +1
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Shameera.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Sat, Mar 1, 2014 at 3:11 AM, Eran Chinthaka Withana <
> >>>>>>>>>> eran.chinthaka@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> +1 for Sunday afternoon
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Eran Chinthaka Withana
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Feb 28, 2014 at 5:17 AM, Suresh Marru <
> smarru@apache.org>
> >>>>>>>> wrote:
> >>>>>>>>>>>> Hi Eran,
> >>>>>>>>>>>>
> >>>>>>>>>>>> This is a great idea. I myself owe few replies on this thread
> and
> >>>>>>>>>> unable
> >>>>>>>>>>>> to take time to comprehend my thoughts (and realized I should
> take
> >>>>>>>> time
> >>>>>>>>>>> to
> >>>>>>>>>>>> properly articulate the challenges otherwise we will be
> discussing
> >>>>>>>>>>>> orthogonal issues).
> >>>>>>>>>>>>
> >>>>>>>>>>>> A hangout will help us brainstorm more comprehensively. We can
> >>>> have
> >>>>>> it
> >>>>>>>>>> on
> >>>>>>>>>>>> air so we can refer back for archival purposes. How is Sunday
> >>>>>>>> afternoon
> >>>>>>>>>>> for
> >>>>>>>>>>>> everyone willing to join and contribute?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Suresh
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <
> >>>>>>>>>>>> eran.chinthaka@gmail.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Is there any chance of hosting a google hangout to talk about
> >>>>>> this. I
> >>>>>>>>>>>> think
> >>>>>>>>>>>>> with long emails and multiple directions things are getting
> >>>> little
> >>>>>>>>>> bit
> >>>>>>>>>>>>> confusing in thread (I'm partly responsible for this :) ). I
> can
> >>>>>>>>>> join a
> >>>>>>>>>>>>> video chat during a weekend but lets make sure its
> convenient for
> >>>>>>>>>> both
> >>>>>>>>>>>> east
> >>>>>>>>>>>>> and west coasts :)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> WDYT?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Eran Chinthaka Withana
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <
> smarru@apache.org
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>> I could respond to each thread in detail, but I see the
> general
> >>>>>>>>>> sense
> >>>>>>>>>>> is
> >>>>>>>>>>>>>> inquiring on the use case, so let me try and explain this
> and
> >>>> see
> >>>>>> if
> >>>>>>>>>>> it
> >>>>>>>>>>>>>> comes across. I am fully onboard with perceptions of
> relational
> >>>> vs
> >>>>>>>>>>> nosql
> >>>>>>>>>>>>>> and also agree current Airavata needs are not a direct map
> for
> >>>>>> NoSQL
> >>>>>>>>>>>>>> migration. I will summarize the driving motivation:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Background: The key problem Airavata needs to solve is
> getting
> >>>> the
> >>>>>>>>>> API
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> associated data model right. The problem is current
> relational
> >>>>>>>>>>> database
> >>>>>>>>>>>>>> (with OpenJPA overlay) is severely limiting the API
> evolution.
> >>>>>>>>>> Science
> >>>>>>>>>>>>>> Gateways by nature are very science domain and use-case
> >>>> specific.
> >>>>>>>>>> But
> >>>>>>>>>>>>>> Airavata is tackling this challenging problem of providing a
> >>>>>> generic
> >>>>>>>>>>> API
> >>>>>>>>>>>>>> which will meet and enable these use case centric
> integration.
> >>>> The
> >>>>>>>>>>> issue
> >>>>>>>>>>>>>> here is, we are designing an API to handle a wide range of
> known
> >>>>>>>>>> (and
> >>>>>>>>>>>> some
> >>>>>>>>>>>>>> foreseen) use cases. But at the same time trying to keep it
> >>>> simple
> >>>>>>>>>> and
> >>>>>>>>>>>> yet
> >>>>>>>>>>>>>> flexible. The only way we can get through a reasonable,
> >>>> normalized
> >>>>>>>>>>>> version
> >>>>>>>>>>>>>> of API is by hands-on programming against the API. Within
> the
> >>>>>>>>>> Airavata
> >>>>>>>>>>>> PMC
> >>>>>>>>>>>>>> itself, we can solicit a half-a-dozen different ways on how
> to
> >>>>>>>>>>> visualize
> >>>>>>>>>>>>>> the data model. And we need few hackethon's with real-end
> users
> >>>> of
> >>>>>>>>>>>> Airavata
> >>>>>>>>>>>>>> until we find a common ground. All of this needs rapid
> >>>>>> prototyping.
> >>>>>>>>>>>>>> Currently a slight change in the data model is taking close
> to
> >>>> two
> >>>>>>>>>>>> weeks of
> >>>>>>>>>>>>>> re-arcitecting the Open-JPA based registry. There are many
> known
> >>>>>>>>>>>> problems
> >>>>>>>>>>>>>> with current draft of data model which have to be put-down
> in
> >>>> the
> >>>>>>>>>>>> interest
> >>>>>>>>>>>>>> of making over all system progress.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So the driving motivation is not certainly any of the
> classic
> >>>>>> NoSQL
> >>>>>>>>>>>> needs.
> >>>>>>>>>>>>>> But a simple one, can we have registry which is
> schema-agnostic
> >>>>>> and
> >>>>>>>>>>> yet
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>> queriable for most of the fields in the model? Can we try 10
> >>>>>>>>>> different
> >>>>>>>>>>>>>> variants of data model (hence API) within the next 3 months
> with
> >>>>>>>>>>> focused
> >>>>>>>>>>>>>> hackethon's and arrive at a stable 1.0 version of API?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Part one is the discussion is successful that it raised
> every
> >>>>>> one's
> >>>>>>>>>>> eye
> >>>>>>>>>>>>>> brows. Now that we have every one's attention, what will be
> a
> >>>> good
> >>>>>>>>>>> data
> >>>>>>>>>>>>>> store for Airavata which will meet these needs?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> P.S: Additional background: The API has been in development
> for
> >>>>>>>>>> close
> >>>>>>>>>>>> to 3
> >>>>>>>>>>>>>> years and is falling short of pleasing a majority. Many
> academic
> >>>>>>>>>>>>>> standardization efforts fail terribly trying to pretend to
> >>>>>>>>>> understand
> >>>>>>>>>>>> all
> >>>>>>>>>>>>>> use cases and proposing a standard way (which ends up
> >>>>>> unnecessarily
> >>>>>>>>>>>> complex
> >>>>>>>>>>>>>> and not usable). Science by nature is evolutionary, and
> >>>>>> restricting
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> capabilities by a known set of use cases prevents the use of
> >>>>>>>>>>> middleware
> >>>>>>>>>>>> for
> >>>>>>>>>>>>>> real-scientific research (and gets limited to proof of
> concept
> >>>>>>>>>>>>>> demonstrations, papers, educational use). The only way
> meeting
> >>>> the
> >>>>>>>>>>>>>> challenges of these evolving needs is to have the framework
> >>>> which
> >>>>>>>>>> can
> >>>>>>>>>>>>>> evolve with minimal disruption.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Great thoughts so far, please keep 'em coming until we can
> find
> >>>> a
> >>>>>>>>>>>> solution
> >>>>>>>>>>>>>> not by the technical fancies but to address the real need.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>> Suresh
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <
> >>>>>> glahiru@gmail.com
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
> >>>>>>>>>>>>>>> milinda.pathirage@gmail.com> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I also think that moving to Cassandra or any other NoSQL
> will
> >>>>>> add
> >>>>>>>>>>>>>>>> unneccessary complexity to your solution. Also designing
> >>>> proper
> >>>>>>>>>>> (easy
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> manage changes, easy to query) NoSQL data models are hard
> >>>>>> (AFAIK,
> >>>>>>>>>>>>>> require
> >>>>>>>>>>>>>>>> lots of experience and understanding about data
> structures and
> >>>>>>>>>>>> queries).
> >>>>>>>>>>>>>>>> Also migrating from one NoSQL technology to other can
> require
> >>>>>>>>>>> complete
> >>>>>>>>>>>>>>>> re-write. And current relational databases can handle
> heavy
> >>>>>> loads
> >>>>>>>>>>>> except
> >>>>>>>>>>>>>>>> Google, Twitter, Amazon and Facebook like loads. I don't
> think
> >>>>>>>>>>>> Airavata
> >>>>>>>>>>>>>>>> will see Google and Amazon like loads.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> +1
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> If the constant changes to the data model is the problem
> , I
> >>>>>> think
> >>>>>>>>>>>> best
> >>>>>>>>>>>>>>>> option is to abstract registry implementation to something
> >>>> like
> >>>>>>>>>>>>>> collections
> >>>>>>>>>>>>>>>> and resources used in WSO2 Registry [1] or something
> suitable
> >>>>>> for
> >>>>>>>>>>>>>> Airavata
> >>>>>>>>>>>>>>>> context. That will make it easy to handle changes in data
> >>>> model.
> >>>>>>>>>>>>>>>> Also don't let the technologies drive design decision. Its
> >>>>>> always
> >>>>>>>>>>>>>> better to
> >>>>>>>>>>>>>>>> let use cases drive the design decision.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> +1
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Regards
> >>>>>>>>>>>>>>> Lahiru
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>>>> Milinda
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> [1] http://wso2.com/products/governance-registry/
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
> >>>>>>>>>>>>>> supun06@gmail.com
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I'm not trying to discourage you on your exploration to
> NoSQL
> >>>>>>>>>>>>>> databases.
> >>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>> have the following concern.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Your database schema is moderately complex - even for a
> RDBMS
> >>>>>> it
> >>>>>>>>>>>> seems
> >>>>>>>>>>>>>>>>> complex and the data size is relatively small. I'm not
> sure
> >>>>>> about
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> current tools available but I think you will need to
> write
> >>>> more
> >>>>>>>>>>> code
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> support all your requirements in a NoSQL database. So
> writing
> >>>>>>>>>> more
> >>>>>>>>>>>> code
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> allow redundancy to support *relatively small* and
> >>>> *structured
> >>>>>>>>>>>>>>>>> data*doesn't seem right to me. May be I'm wrong and
> there are
> >>>>>>>>>>> better
> >>>>>>>>>>>>>>>>> tools in
> >>>>>>>>>>>>>>>>> NoSQL than RDBMS, which I doubt.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>> Supun..
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <
> >>>>>> smarru@apache.org
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>> Hi All,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Airavata is actively migrating to use Thrift API for the
> >>>>>>>>>> RESTless
> >>>>>>>>>>>>>>>> design
> >>>>>>>>>>>>>>>>>> and to facilitate various language bindings from client
> >>>>>>>>>> gateways.
> >>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>> programming language support in thrift has been so far
> very
> >>>>>>>>>>>>>>>> encouraging.
> >>>>>>>>>>>>>>>>>> The current architecture is looking like Figure 1 at
> [1].
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Language specific clients will be released as thrift
> SDK's
> >>>>>>>>>>> (similar
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> evernote sdk's [1]). These clients will be integrated
> into
> >>>>>>>>>> gateway
> >>>>>>>>>>>>>>>>> portals
> >>>>>>>>>>>>>>>>>> which connect to the API Server. The API operations
> brokers
> >>>> he
> >>>>>>>>>>>> simple
> >>>>>>>>>>>>>>>>> calls
> >>>>>>>>>>>>>>>>>> into one or more backend CPI calls (Airavata internal
> >>>>>> component
> >>>>>>>>>>>>>>>>>> interfaces). An example set of mappings are illustrated
> in
> >>>>>>>>>>> Figure 2
> >>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>> [1]. The current draft of thrift API for version 0.12
> is at
> >>>>>> [3],
> >>>>>>>>>>>>>> please
> >>>>>>>>>>>>>>>>> pay
> >>>>>>>>>>>>>>>>>> attention to experiment model at [4].
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> For the persistent store, we had few iterations of
> Airavata
> >>>>>>>>>>> Registry
> >>>>>>>>>>>>>>>>>> shifting from a legacy XRegistry to JackRabbit to now a
> >>>>>> OpenJPA
> >>>>>>>>>>>> based
> >>>>>>>>>>>>>>>>>> registry. To allow the API and the associated data
> models to
> >>>>>>>>>>> evolve,
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>> will be useful to explore object databases so we can
> store
> >>>> the
> >>>>>>>>>>>>>>>> serialized
> >>>>>>>>>>>>>>>>>> version of thrift objects directly. But it will be nice
> to
> >>>>>> have
> >>>>>>>>>>> all
> >>>>>>>>>>>>>> (or
> >>>>>>>>>>>>>>>>>> most) of the fields queriable. This calls for a more
> >>>>>>>>>> column-family
> >>>>>>>>>>>>>>>> design
> >>>>>>>>>>>>>>>>>> of any NoSQL approaches.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Any recommendations for a registry architecture?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Quickly hacking through I find the following approach a
> >>>> viable
> >>>>>>>>>>> one:
> >>>>>>>>>>>>>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra.
> >>>>>> Airavata
> >>>>>>>>>>> can
> >>>>>>>>>>>>>>>>> benefit
> >>>>>>>>>>>>>>>>>> immediately from the replication and reliability of
> >>>> cassandra
> >>>>>>>>>> and
> >>>>>>>>>>>>>>>>>> scalability in near future. Some of the model objects
> like
> >>>>>>>>>>>> experiment
> >>>>>>>>>>>>>>>>>> creation will need to have strong consistency and most
> of
> >>>> the
> >>>>>>>>>>>>>>>> monitoring
> >>>>>>>>>>>>>>>>>> can live with eventual consistency.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Critical comments please?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks for your time,
> >>>>>>>>>>>>>>>>>> Suresh
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> [1] -
> >>>>>>>>>>>>>>>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> >>>>>>>>>>>>>>>>>> [2] - https://dev.evernote.com/doc/
> >>>>>>>>>>>>>>>>>> [3] -
> >>>>>>>>>>>>>>>>>>
> >>>>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> >>>>>>>>>>>>>>>>>> [4] -
> >>>>>>>>>>>>>>>>>>
> >>>>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> >>>>>>>>>>>>>>>>>> [5] - https://github.com/MisterTea/ZombieDB
> >>>>>>>>>>>>>>>>>> [6] - https://github.com/Netflix/astyanax
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>> Supun Kamburugamuva
> >>>>>>>>>>>>>>>>> Member, Apache Software Foundation;
> http://www.apache.org
> >>>>>>>>>>>>>>>>> E-mail: supun06@gmail.com; Mobile: +1 812 369 6762
> >>>>>>>>>>>>>>>>> Blog: http://supunk.blogspot.com
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> Milinda Pathirage
> >>>>>>>>>>>>>>>> PhD Student Indiana University, Bloomington;
> >>>>>>>>>>>>>>>> E-mail: milinda.pathirage@gmail.com
> >>>>>>>>>>>>>>>> Web: http://mpathirage.com
> >>>>>>>>>>>>>>>> Blog: http://blog.mpathirage.com
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> System Analyst Programmer
> >>>>>>>>>>>>>>> PTI Lab
> >>>>>>>>>>>>>>> Indiana University
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Best Regards,
> >>>>>>>>>> Shameera Rathnayaka.
> >>>>>>>>>>
> >>>>>>>>>> email: shameera AT apache.org , shameerainfo AT gmail.com
> >>>>>>>>>> Blog : http://shameerarathnayaka.blogspot.com/
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Supun Kamburugamuva
> >>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
> >>>>>>>>> E-mail: supun06@gmail.com; Mobile: +1 812 369 6762
> >>>>>>>>> Blog: http://supunk.blogspot.com
> >>>>>>>>
> >>>>>>
> >>>>
>
>


-- 
Samir Faci
*insert title*
fortune | cowsay -f /usr/share/cows/tux.cow

Sent from my non-iphone laptop.

Re: Object Database Suggestions for Airavata Registry

Posted by Jijoe Vurghese <ji...@gmail.com>.

Sorry, everyone. Sunday turned out to by busier than I expected…next time…

-- 
Jijoe

On March 3, 2014 at 6:01:54, Marlon Pierce (marpierc@iu.edu) wrote:

My regrets for missing the meeting but I was babysitting.  

Marlon  

On 3/2/14 11:16 PM, Suresh Marru wrote:  
> Thank you all for taking couple of hours on a sunday evening to participate. I think these discussions help Airavata very significantly.  
>  
> Here is the you tube link is any one would like to follow - http://www.youtube.com/watch?v=EY6oPwqi1g4  
>  
> Key Summary: Sachith is interested to do a GSoC project on this topic and he will start with summarizing the challenges in current registry. Once the problem statement is more clearer, we can take the next steps.  
>  
> Appreciate every one input on this key topic.  
>  
> Suresh  
>  
> P.S. I will be traveling for next 4 days, so I will be slow in my responses.  
>  
> On Mar 2, 2014, at 7:56 PM, Suresh Marru <sm...@apache.org> wrote:  
>  
>> Lets use this - https://plus.google.com/hangouts/_/hoaevent/AP36tYdve-71oizx25DGUbTZjSX4PtLxmDsddtqnfuDYlE9SXDSB9Q?authuser=0&hl=en  
>>  
>> I will compile a set of instructions for website so any one of us can preschedule it for future.  
>>  
>> Sures  
>>  
>> On Mar 2, 2014, at 7:36 PM, Eran Chinthaka Withana <er...@gmail.com> wrote:  
>>  
>>> Oops, in that case, Suresh, can you please create one?  
>>>  
>>> Thanks,  
>>> Eran Chinthaka Withana  
>>>  
>>>  
>>> On Sun, Mar 2, 2014 at 4:28 PM, Suresh Marru <sm...@apache.org> wrote:  
>>>  
>>>> Hi Eran,  
>>>>  
>>>> Is this a On-Air event? Previously I had trouble changing the previously  
>>>> scheduled event to On-Air.  
>>>>  
>>>> If you are creating a new hangout, can you first create it on G+ Airavata  
>>>> Community (all PMC Members are moderators on this community). This will be  
>>>> easier for archival reference -  
>>>> https://plus.google.com/communities/100700433662281905708  
>>>>  
>>>> Suresh  
>>>>  
>>>> On Mar 2, 2014, at 7:21 PM, Eran Chinthaka Withana <  
>>>> eran.chinthaka@gmail.com> wrote:  
>>>>  
>>>>> Here is the link to hangout:  
>>>>>  
>>>> https://plus.google.com/hangouts/_/event/c1sgvk7dha37rkr0adktb195lgc?authuser=0&hl=en  
>>>>> Thanks,  
>>>>> Eran Chinthaka Withana  
>>>>>  
>>>>>  
>>>>> On Sun, Mar 2, 2014 at 12:46 PM, Suresh Marru <sm...@apache.org> wrote:  
>>>>>  
>>>>>> Hi All,  
>>>>>>  
>>>>>> Since Eran has been the one who first proposed the hangout and has  
>>>>>> specific suggestion on this thread I prefer to postpone to 8pm (EST).  
>>>> But  
>>>>>> if others planned for 4pm, lets goahead with the plan.  
>>>>>>  
>>>>>> Any one who planned to attend now cannot make it at 8pm (EST)? If do not  
>>>>>> hear any objections lets shoot for 8pm. Otherwise, lets go as planned.  
>>>>>>  
>>>>>> Cheers,  
>>>>>> Suresh  
>>>>>>  
>>>>>> On Mar 2, 2014, at 3:31 PM, Eran Chinthaka Withana <  
>>>>>> eran.chinthaka@gmail.com> wrote:  
>>>>>>  
>>>>>>> Hi Suresh,  
>>>>>>>  
>>>>>>> Sorry for the late reply. I don't think I can make it at 1pm PST today.  
>>>>>> Can  
>>>>>>> we please re-schedule this to 5pm PST (8pm EST) or later?  
>>>>>>>  
>>>>>>> Thanks,  
>>>>>>> Eran Chinthaka Withana  
>>>>>>>  
>>>>>>>  
>>>>>>> On Sun, Mar 2, 2014 at 6:38 AM, Suresh Marru <sm...@apache.org>  
>>>> wrote:  
>>>>>>>> Hi All,  
>>>>>>>>  
>>>>>>>> Great to see we have a good quorum. So how about 4pm EST (1pm PST)  
>>>> today  
>>>>>>>> with a hangout on air. It works best if we start a a hangout then  
>>>>>> (previous  
>>>>>>>> attempts to pre-schedules on-air events did not work well. So please  
>>>>>> check  
>>>>>>>> this mailing list around 4pm EST for the hangout on air link.  
>>>>>>>>  
>>>>>>>> Meanwhile, please join the Airavata Google Plus community, that might  
>>>> be  
>>>>>>>> easier to share the link -  
>>>>>>>> https://plus.google.com/communities/100700433662281905708  
>>>>>>>>  
>>>>>>>> Thanks all for willing to take time on a sunday,  
>>>>>>>> Suresh  
>>>>>>>>  
>>>>>>>> On Feb 28, 2014, at 9:15 PM, Supun Kamburugamuva <su...@gmail.com>  
>>>>>>>> wrote:  
>>>>>>>>  
>>>>>>>>> +1 for Sunday afternoon. I can make it after 4 pm EST.  
>>>>>>>>>  
>>>>>>>>> Thanks,  
>>>>>>>>> Supun..  
>>>>>>>>>  
>>>>>>>>>  
>>>>>>>>> On Fri, Feb 28, 2014 at 5:04 PM, Shameera Rathnayaka <  
>>>>>>>> shameerainfo@gmail.com  
>>>>>>>>>> wrote:  
>>>>>>>>>> +1  
>>>>>>>>>>  
>>>>>>>>>> Thanks,  
>>>>>>>>>> Shameera.  
>>>>>>>>>>  
>>>>>>>>>>  
>>>>>>>>>> On Sat, Mar 1, 2014 at 3:11 AM, Eran Chinthaka Withana <  
>>>>>>>>>> eran.chinthaka@gmail.com> wrote:  
>>>>>>>>>>  
>>>>>>>>>>> +1 for Sunday afternoon  
>>>>>>>>>>>  
>>>>>>>>>>> Thanks,  
>>>>>>>>>>> Eran Chinthaka Withana  
>>>>>>>>>>>  
>>>>>>>>>>>  
>>>>>>>>>>> On Fri, Feb 28, 2014 at 5:17 AM, Suresh Marru <sm...@apache.org>  
>>>>>>>> wrote:  
>>>>>>>>>>>> Hi Eran,  
>>>>>>>>>>>>  
>>>>>>>>>>>> This is a great idea. I myself owe few replies on this thread and  
>>>>>>>>>> unable  
>>>>>>>>>>>> to take time to comprehend my thoughts (and realized I should take  
>>>>>>>> time  
>>>>>>>>>>> to  
>>>>>>>>>>>> properly articulate the challenges otherwise we will be discussing  
>>>>>>>>>>>> orthogonal issues).  
>>>>>>>>>>>>  
>>>>>>>>>>>> A hangout will help us brainstorm more comprehensively. We can  
>>>> have  
>>>>>> it  
>>>>>>>>>> on  
>>>>>>>>>>>> air so we can refer back for archival purposes. How is Sunday  
>>>>>>>> afternoon  
>>>>>>>>>>> for  
>>>>>>>>>>>> everyone willing to join and contribute?  
>>>>>>>>>>>>  
>>>>>>>>>>>> Thanks,  
>>>>>>>>>>>> Suresh  
>>>>>>>>>>>>  
>>>>>>>>>>>> On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <  
>>>>>>>>>>>> eran.chinthaka@gmail.com> wrote:  
>>>>>>>>>>>>  
>>>>>>>>>>>>> Hi,  
>>>>>>>>>>>>>  
>>>>>>>>>>>>> Is there any chance of hosting a google hangout to talk about  
>>>>>> this. I  
>>>>>>>>>>>> think  
>>>>>>>>>>>>> with long emails and multiple directions things are getting  
>>>> little  
>>>>>>>>>> bit  
>>>>>>>>>>>>> confusing in thread (I'm partly responsible for this :) ). I can  
>>>>>>>>>> join a  
>>>>>>>>>>>>> video chat during a weekend but lets make sure its convenient for  
>>>>>>>>>> both  
>>>>>>>>>>>> east  
>>>>>>>>>>>>> and west coasts :)  
>>>>>>>>>>>>>  
>>>>>>>>>>>>> WDYT?  
>>>>>>>>>>>>>  
>>>>>>>>>>>>> Thanks,  
>>>>>>>>>>>>> Eran Chinthaka Withana  
>>>>>>>>>>>>>  
>>>>>>>>>>>>>  
>>>>>>>>>>>>> On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <smarru@apache.org  
>>>>>>>>>>> wrote:  
>>>>>>>>>>>>>> I could respond to each thread in detail, but I see the general  
>>>>>>>>>> sense  
>>>>>>>>>>> is  
>>>>>>>>>>>>>> inquiring on the use case, so let me try and explain this and  
>>>> see  
>>>>>> if  
>>>>>>>>>>> it  
>>>>>>>>>>>>>> comes across. I am fully onboard with perceptions of relational  
>>>> vs  
>>>>>>>>>>> nosql  
>>>>>>>>>>>>>> and also agree current Airavata needs are not a direct map for  
>>>>>> NoSQL  
>>>>>>>>>>>>>> migration. I will summarize the driving motivation:  
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> Background: The key problem Airavata needs to solve is getting  
>>>> the  
>>>>>>>>>> API  
>>>>>>>>>>>> and  
>>>>>>>>>>>>>> associated data model right. The problem is current relational  
>>>>>>>>>>> database  
>>>>>>>>>>>>>> (with OpenJPA overlay) is severely limiting the API evolution.  
>>>>>>>>>> Science  
>>>>>>>>>>>>>> Gateways by nature are very science domain and use-case  
>>>> specific.  
>>>>>>>>>> But  
>>>>>>>>>>>>>> Airavata is tackling this challenging problem of providing a  
>>>>>> generic  
>>>>>>>>>>> API  
>>>>>>>>>>>>>> which will meet and enable these use case centric integration.  
>>>> The  
>>>>>>>>>>> issue  
>>>>>>>>>>>>>> here is, we are designing an API to handle a wide range of known  
>>>>>>>>>> (and  
>>>>>>>>>>>> some  
>>>>>>>>>>>>>> foreseen) use cases. But at the same time trying to keep it  
>>>> simple  
>>>>>>>>>> and  
>>>>>>>>>>>> yet  
>>>>>>>>>>>>>> flexible. The only way we can get through a reasonable,  
>>>> normalized  
>>>>>>>>>>>> version  
>>>>>>>>>>>>>> of API is by hands-on programming against the API. Within the  
>>>>>>>>>> Airavata  
>>>>>>>>>>>> PMC  
>>>>>>>>>>>>>> itself, we can solicit a half-a-dozen different ways on how to  
>>>>>>>>>>> visualize  
>>>>>>>>>>>>>> the data model. And we need few hackethon's with real-end users  
>>>> of  
>>>>>>>>>>>> Airavata  
>>>>>>>>>>>>>> until we find a common ground. All of this needs rapid  
>>>>>> prototyping.  
>>>>>>>>>>>>>> Currently a slight change in the data model is taking close to  
>>>> two  
>>>>>>>>>>>> weeks of  
>>>>>>>>>>>>>> re-arcitecting the Open-JPA based registry. There are many known  
>>>>>>>>>>>> problems  
>>>>>>>>>>>>>> with current draft of data model which have to be put-down in  
>>>> the  
>>>>>>>>>>>> interest  
>>>>>>>>>>>>>> of making over all system progress.  
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> So the driving motivation is not certainly any of the classic  
>>>>>> NoSQL  
>>>>>>>>>>>> needs.  
>>>>>>>>>>>>>> But a simple one, can we have registry which is schema-agnostic  
>>>>>> and  
>>>>>>>>>>> yet  
>>>>>>>>>>>> is  
>>>>>>>>>>>>>> queriable for most of the fields in the model? Can we try 10  
>>>>>>>>>> different  
>>>>>>>>>>>>>> variants of data model (hence API) within the next 3 months with  
>>>>>>>>>>> focused  
>>>>>>>>>>>>>> hackethon's and arrive at a stable 1.0 version of API?  
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> Part one is the discussion is successful that it raised every  
>>>>>> one's  
>>>>>>>>>>> eye  
>>>>>>>>>>>>>> brows. Now that we have every one's attention, what will be a  
>>>> good  
>>>>>>>>>>> data  
>>>>>>>>>>>>>> store for Airavata which will meet these needs?  
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> P.S: Additional background: The API has been in development for  
>>>>>>>>>> close  
>>>>>>>>>>>> to 3  
>>>>>>>>>>>>>> years and is falling short of pleasing a majority. Many academic  
>>>>>>>>>>>>>> standardization efforts fail terribly trying to pretend to  
>>>>>>>>>> understand  
>>>>>>>>>>>> all  
>>>>>>>>>>>>>> use cases and proposing a standard way (which ends up  
>>>>>> unnecessarily  
>>>>>>>>>>>> complex  
>>>>>>>>>>>>>> and not usable). Science by nature is evolutionary, and  
>>>>>> restricting  
>>>>>>>>>>> the  
>>>>>>>>>>>>>> capabilities by a known set of use cases prevents the use of  
>>>>>>>>>>> middleware  
>>>>>>>>>>>> for  
>>>>>>>>>>>>>> real-scientific research (and gets limited to proof of concept  
>>>>>>>>>>>>>> demonstrations, papers, educational use). The only way meeting  
>>>> the  
>>>>>>>>>>>>>> challenges of these evolving needs is to have the framework  
>>>> which  
>>>>>>>>>> can  
>>>>>>>>>>>>>> evolve with minimal disruption.  
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> Great thoughts so far, please keep 'em coming until we can find  
>>>> a  
>>>>>>>>>>>> solution  
>>>>>>>>>>>>>> not by the technical fancies but to address the real need.  
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> Cheers,  
>>>>>>>>>>>>>> Suresh  
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <  
>>>>>> glahiru@gmail.com  
>>>>>>>>>>>>>> wrote:  
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <  
>>>>>>>>>>>>>>> milinda.pathirage@gmail.com> wrote:  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> I also think that moving to Cassandra or any other NoSQL will  
>>>>>> add  
>>>>>>>>>>>>>>>> unneccessary complexity to your solution. Also designing  
>>>> proper  
>>>>>>>>>>> (easy  
>>>>>>>>>>>> to  
>>>>>>>>>>>>>>>> manage changes, easy to query) NoSQL data models are hard  
>>>>>> (AFAIK,  
>>>>>>>>>>>>>> require  
>>>>>>>>>>>>>>>> lots of experience and understanding about data structures and  
>>>>>>>>>>>> queries).  
>>>>>>>>>>>>>>>> Also migrating from one NoSQL technology to other can require  
>>>>>>>>>>> complete  
>>>>>>>>>>>>>>>> re-write. And current relational databases can handle heavy  
>>>>>> loads  
>>>>>>>>>>>> except  
>>>>>>>>>>>>>>>> Google, Twitter, Amazon and Facebook like loads. I don't think  
>>>>>>>>>>>> Airavata  
>>>>>>>>>>>>>>>> will see Google and Amazon like loads.  
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> +1  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> If the constant changes to the data model is the problem , I  
>>>>>> think  
>>>>>>>>>>>> best  
>>>>>>>>>>>>>>>> option is to abstract registry implementation to something  
>>>> like  
>>>>>>>>>>>>>> collections  
>>>>>>>>>>>>>>>> and resources used in WSO2 Registry [1] or something suitable  
>>>>>> for  
>>>>>>>>>>>>>> Airavata  
>>>>>>>>>>>>>>>> context. That will make it easy to handle changes in data  
>>>> model.  
>>>>>>>>>>>>>>>> Also don't let the technologies drive design decision. Its  
>>>>>> always  
>>>>>>>>>>>>>> better to  
>>>>>>>>>>>>>>>> let use cases drive the design decision.  
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> +1  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> Regards  
>>>>>>>>>>>>>>> Lahiru  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> Thanks  
>>>>>>>>>>>>>>>> Milinda  
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> [1] http://wso2.com/products/governance-registry/  
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <  
>>>>>>>>>>>>>> supun06@gmail.com  
>>>>>>>>>>>>>>>>> wrote:  
>>>>>>>>>>>>>>>>> Hi all,  
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> I'm not trying to discourage you on your exploration to NoSQL  
>>>>>>>>>>>>>> databases.  
>>>>>>>>>>>>>>>> I  
>>>>>>>>>>>>>>>>> have the following concern.  
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> Your database schema is moderately complex - even for a RDBMS  
>>>>>> it  
>>>>>>>>>>>> seems  
>>>>>>>>>>>>>>>>> complex and the data size is relatively small. I'm not sure  
>>>>>> about  
>>>>>>>>>>> the  
>>>>>>>>>>>>>>>>> current tools available but I think you will need to write  
>>>> more  
>>>>>>>>>>> code  
>>>>>>>>>>>> to  
>>>>>>>>>>>>>>>>> support all your requirements in a NoSQL database. So writing  
>>>>>>>>>> more  
>>>>>>>>>>>> code  
>>>>>>>>>>>>>>>> and  
>>>>>>>>>>>>>>>>> allow redundancy to support *relatively small* and  
>>>> *structured  
>>>>>>>>>>>>>>>>> data*doesn't seem right to me. May be I'm wrong and there are  
>>>>>>>>>>> better  
>>>>>>>>>>>>>>>>> tools in  
>>>>>>>>>>>>>>>>> NoSQL than RDBMS, which I doubt.  
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> Thanks,  
>>>>>>>>>>>>>>>>> Supun..  
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <  
>>>>>> smarru@apache.org  
>>>>>>>>>>>>>> wrote:  
>>>>>>>>>>>>>>>>>> Hi All,  
>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>> Airavata is actively migrating to use Thrift API for the  
>>>>>>>>>> RESTless  
>>>>>>>>>>>>>>>> design  
>>>>>>>>>>>>>>>>>> and to facilitate various language bindings from client  
>>>>>>>>>> gateways.  
>>>>>>>>>>>> The  
>>>>>>>>>>>>>>>>>> programming language support in thrift has been so far very  
>>>>>>>>>>>>>>>> encouraging.  
>>>>>>>>>>>>>>>>>> The current architecture is looking like Figure 1 at [1].  
>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>> Language specific clients will be released as thrift SDK's  
>>>>>>>>>>> (similar  
>>>>>>>>>>>> to  
>>>>>>>>>>>>>>>>>> evernote sdk's [1]). These clients will be integrated into  
>>>>>>>>>> gateway  
>>>>>>>>>>>>>>>>> portals  
>>>>>>>>>>>>>>>>>> which connect to the API Server. The API operations brokers  
>>>> he  
>>>>>>>>>>>> simple  
>>>>>>>>>>>>>>>>> calls  
>>>>>>>>>>>>>>>>>> into one or more backend CPI calls (Airavata internal  
>>>>>> component  
>>>>>>>>>>>>>>>>>> interfaces). An example set of mappings are illustrated in  
>>>>>>>>>>> Figure 2  
>>>>>>>>>>>>>> at  
>>>>>>>>>>>>>>>>>> [1]. The current draft of thrift API for version 0.12 is at  
>>>>>> [3],  
>>>>>>>>>>>>>> please  
>>>>>>>>>>>>>>>>> pay  
>>>>>>>>>>>>>>>>>> attention to experiment model at [4].  
>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>> For the persistent store, we had few iterations of Airavata  
>>>>>>>>>>> Registry  
>>>>>>>>>>>>>>>>>> shifting from a legacy XRegistry to JackRabbit to now a  
>>>>>> OpenJPA  
>>>>>>>>>>>> based  
>>>>>>>>>>>>>>>>>> registry. To allow the API and the associated data models to  
>>>>>>>>>>> evolve,  
>>>>>>>>>>>>>> it  
>>>>>>>>>>>>>>>>>> will be useful to explore object databases so we can store  
>>>> the  
>>>>>>>>>>>>>>>> serialized  
>>>>>>>>>>>>>>>>>> version of thrift objects directly. But it will be nice to  
>>>>>> have  
>>>>>>>>>>> all  
>>>>>>>>>>>>>> (or  
>>>>>>>>>>>>>>>>>> most) of the fields queriable. This calls for a more  
>>>>>>>>>> column-family  
>>>>>>>>>>>>>>>> design  
>>>>>>>>>>>>>>>>>> of any NoSQL approaches.  
>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>> Any recommendations for a registry architecture?  
>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>> Quickly hacking through I find the following approach a  
>>>> viable  
>>>>>>>>>>> one:  
>>>>>>>>>>>>>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra.  
>>>>>> Airavata  
>>>>>>>>>>> can  
>>>>>>>>>>>>>>>>> benefit  
>>>>>>>>>>>>>>>>>> immediately from the replication and reliability of  
>>>> cassandra  
>>>>>>>>>> and  
>>>>>>>>>>>>>>>>>> scalability in near future. Some of the model objects like  
>>>>>>>>>>>> experiment  
>>>>>>>>>>>>>>>>>> creation will need to have strong consistency and most of  
>>>> the  
>>>>>>>>>>>>>>>> monitoring  
>>>>>>>>>>>>>>>>>> can live with eventual consistency.  
>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>> Critical comments please?  
>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>> Thanks for your time,  
>>>>>>>>>>>>>>>>>> Suresh  
>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>> [1] -  
>>>>>>>>>>>>>>>>>>  
>>>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams  
>>>>>>>>>>>>>>>>>> [2] - https://dev.evernote.com/doc/  
>>>>>>>>>>>>>>>>>> [3] -  
>>>>>>>>>>>>>>>>>>  
>>>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD  
>>>>>>>>>>>>>>>>>> [4] -  
>>>>>>>>>>>>>>>>>>  
>>>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD  
>>>>>>>>>>>>>>>>>> [5] - https://github.com/MisterTea/ZombieDB  
>>>>>>>>>>>>>>>>>> [6] - https://github.com/Netflix/astyanax  
>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> --  
>>>>>>>>>>>>>>>>> Supun Kamburugamuva  
>>>>>>>>>>>>>>>>> Member, Apache Software Foundation; http://www.apache.org  
>>>>>>>>>>>>>>>>> E-mail: supun06@gmail.com; Mobile: +1 812 369 6762  
>>>>>>>>>>>>>>>>> Blog: http://supunk.blogspot.com  
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> --  
>>>>>>>>>>>>>>>> Milinda Pathirage  
>>>>>>>>>>>>>>>> PhD Student Indiana University, Bloomington;  
>>>>>>>>>>>>>>>> E-mail: milinda.pathirage@gmail.com  
>>>>>>>>>>>>>>>> Web: http://mpathirage.com  
>>>>>>>>>>>>>>>> Blog: http://blog.mpathirage.com  
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> --  
>>>>>>>>>>>>>>> System Analyst Programmer  
>>>>>>>>>>>>>>> PTI Lab  
>>>>>>>>>>>>>>> Indiana University  
>>>>>>>>>>>>>>  
>>>>>>>>>>>>  
>>>>>>>>>>  
>>>>>>>>>>  
>>>>>>>>>> --  
>>>>>>>>>> Best Regards,  
>>>>>>>>>> Shameera Rathnayaka.  
>>>>>>>>>>  
>>>>>>>>>> email: shameera AT apache.org , shameerainfo AT gmail.com  
>>>>>>>>>> Blog : http://shameerarathnayaka.blogspot.com/  
>>>>>>>>>>  
>>>>>>>>>  
>>>>>>>>>  
>>>>>>>>> --  
>>>>>>>>> Supun Kamburugamuva  
>>>>>>>>> Member, Apache Software Foundation; http://www.apache.org  
>>>>>>>>> E-mail: supun06@gmail.com; Mobile: +1 812 369 6762  
>>>>>>>>> Blog: http://supunk.blogspot.com  
>>>>>>>>  
>>>>>>  
>>>>

Re: Object Database Suggestions for Airavata Registry

Posted by Marlon Pierce <ma...@iu.edu>.

My regrets for missing the meeting but I was babysitting.

Marlon

On 3/2/14 11:16 PM, Suresh Marru wrote:
> Thank you all for taking couple of hours on a sunday evening to participate. I think these discussions help Airavata very significantly.
>
> Here is the you tube link is any one would like to follow - http://www.youtube.com/watch?v=EY6oPwqi1g4
>
> Key Summary: Sachith is interested to do a GSoC project on this topic and he will start with summarizing the challenges in current registry. Once the problem statement is more clearer, we can take the next steps.
>
> Appreciate every one input on this key topic.
>
> Suresh
>
> P.S. I will be traveling for next 4 days, so I will be slow in my responses. 
>
> On Mar 2, 2014, at 7:56 PM, Suresh Marru <sm...@apache.org> wrote:
>
>> Lets use this - https://plus.google.com/hangouts/_/hoaevent/AP36tYdve-71oizx25DGUbTZjSX4PtLxmDsddtqnfuDYlE9SXDSB9Q?authuser=0&hl=en
>>
>> I will compile a set of instructions for website so any one of us can preschedule it for future.
>>
>> Sures
>>
>> On Mar 2, 2014, at 7:36 PM, Eran Chinthaka Withana <er...@gmail.com> wrote:
>>
>>> Oops, in that case, Suresh, can you please create one?
>>>
>>> Thanks,
>>> Eran Chinthaka Withana
>>>
>>>
>>> On Sun, Mar 2, 2014 at 4:28 PM, Suresh Marru <sm...@apache.org> wrote:
>>>
>>>> Hi Eran,
>>>>
>>>> Is this a On-Air event? Previously I had trouble changing the previously
>>>> scheduled event to On-Air.
>>>>
>>>> If you are creating a new hangout, can you first create it on G+ Airavata
>>>> Community (all PMC Members are moderators on this community). This will be
>>>> easier for archival reference -
>>>> https://plus.google.com/communities/100700433662281905708
>>>>
>>>> Suresh
>>>>
>>>> On Mar 2, 2014, at 7:21 PM, Eran Chinthaka Withana <
>>>> eran.chinthaka@gmail.com> wrote:
>>>>
>>>>> Here is the link to hangout:
>>>>>
>>>> https://plus.google.com/hangouts/_/event/c1sgvk7dha37rkr0adktb195lgc?authuser=0&hl=en
>>>>> Thanks,
>>>>> Eran Chinthaka Withana
>>>>>
>>>>>
>>>>> On Sun, Mar 2, 2014 at 12:46 PM, Suresh Marru <sm...@apache.org> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Since Eran has been the one who first proposed the hangout and has
>>>>>> specific suggestion on this thread I prefer to postpone to 8pm (EST).
>>>> But
>>>>>> if others planned for 4pm, lets goahead with the plan.
>>>>>>
>>>>>> Any one who planned to attend now cannot make it at 8pm (EST)? If do not
>>>>>> hear any objections lets shoot for 8pm. Otherwise, lets go as planned.
>>>>>>
>>>>>> Cheers,
>>>>>> Suresh
>>>>>>
>>>>>> On Mar 2, 2014, at 3:31 PM, Eran Chinthaka Withana <
>>>>>> eran.chinthaka@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Suresh,
>>>>>>>
>>>>>>> Sorry for the late reply. I don't think I can make it at 1pm PST today.
>>>>>> Can
>>>>>>> we please re-schedule this to 5pm PST (8pm EST) or later?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Eran Chinthaka Withana
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Mar 2, 2014 at 6:38 AM, Suresh Marru <sm...@apache.org>
>>>> wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> Great to see we have a good quorum. So how about 4pm EST (1pm PST)
>>>> today
>>>>>>>> with a hangout on air. It works best if we start a a hangout then
>>>>>> (previous
>>>>>>>> attempts to pre-schedules on-air events did not work well. So please
>>>>>> check
>>>>>>>> this mailing list around 4pm EST for the hangout on air link.
>>>>>>>>
>>>>>>>> Meanwhile, please join the Airavata Google Plus community, that might
>>>> be
>>>>>>>> easier to share the link -
>>>>>>>> https://plus.google.com/communities/100700433662281905708
>>>>>>>>
>>>>>>>> Thanks all for willing to take time on a sunday,
>>>>>>>> Suresh
>>>>>>>>
>>>>>>>> On Feb 28, 2014, at 9:15 PM, Supun Kamburugamuva <su...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1 for Sunday afternoon. I can make it after 4 pm EST.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Supun..
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Feb 28, 2014 at 5:04 PM, Shameera Rathnayaka <
>>>>>>>> shameerainfo@gmail.com
>>>>>>>>>> wrote:
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Shameera.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, Mar 1, 2014 at 3:11 AM, Eran Chinthaka Withana <
>>>>>>>>>> eran.chinthaka@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1 for Sunday afternoon
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Eran Chinthaka Withana
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Feb 28, 2014 at 5:17 AM, Suresh Marru <sm...@apache.org>
>>>>>>>> wrote:
>>>>>>>>>>>> Hi Eran,
>>>>>>>>>>>>
>>>>>>>>>>>> This is a great idea. I myself owe few replies on this thread and
>>>>>>>>>> unable
>>>>>>>>>>>> to take time to comprehend my thoughts (and realized I should take
>>>>>>>> time
>>>>>>>>>>> to
>>>>>>>>>>>> properly articulate the challenges otherwise we will be discussing
>>>>>>>>>>>> orthogonal issues).
>>>>>>>>>>>>
>>>>>>>>>>>> A hangout will help us brainstorm more comprehensively. We can
>>>> have
>>>>>> it
>>>>>>>>>> on
>>>>>>>>>>>> air so we can refer back for archival purposes. How is Sunday
>>>>>>>> afternoon
>>>>>>>>>>> for
>>>>>>>>>>>> everyone willing to join and contribute?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Suresh
>>>>>>>>>>>>
>>>>>>>>>>>> On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <
>>>>>>>>>>>> eran.chinthaka@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there any chance of hosting a google hangout to talk about
>>>>>> this. I
>>>>>>>>>>>> think
>>>>>>>>>>>>> with long emails and multiple directions things are getting
>>>> little
>>>>>>>>>> bit
>>>>>>>>>>>>> confusing in thread (I'm partly responsible for this :) ). I can
>>>>>>>>>> join a
>>>>>>>>>>>>> video chat during a weekend but lets make sure its convenient for
>>>>>>>>>> both
>>>>>>>>>>>> east
>>>>>>>>>>>>> and west coasts :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Eran Chinthaka Withana
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <smarru@apache.org
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> I could respond to each thread in detail, but I see the general
>>>>>>>>>> sense
>>>>>>>>>>> is
>>>>>>>>>>>>>> inquiring on the use case, so let me try and explain this and
>>>> see
>>>>>> if
>>>>>>>>>>> it
>>>>>>>>>>>>>> comes across. I am fully onboard with perceptions of relational
>>>> vs
>>>>>>>>>>> nosql
>>>>>>>>>>>>>> and also agree current Airavata needs are not a direct map for
>>>>>> NoSQL
>>>>>>>>>>>>>> migration. I will summarize the driving motivation:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Background: The key problem Airavata needs to solve is getting
>>>> the
>>>>>>>>>> API
>>>>>>>>>>>> and
>>>>>>>>>>>>>> associated data model right. The problem is current relational
>>>>>>>>>>> database
>>>>>>>>>>>>>> (with OpenJPA overlay) is severely limiting the API evolution.
>>>>>>>>>> Science
>>>>>>>>>>>>>> Gateways by nature are very science domain and use-case
>>>> specific.
>>>>>>>>>> But
>>>>>>>>>>>>>> Airavata is tackling this challenging problem of providing a
>>>>>> generic
>>>>>>>>>>> API
>>>>>>>>>>>>>> which will meet and enable these use case centric integration.
>>>> The
>>>>>>>>>>> issue
>>>>>>>>>>>>>> here is, we are designing an API to handle a wide range of known
>>>>>>>>>> (and
>>>>>>>>>>>> some
>>>>>>>>>>>>>> foreseen) use cases. But at the same time trying to keep it
>>>> simple
>>>>>>>>>> and
>>>>>>>>>>>> yet
>>>>>>>>>>>>>> flexible. The only way we can get through a reasonable,
>>>> normalized
>>>>>>>>>>>> version
>>>>>>>>>>>>>> of API is by hands-on programming against the API. Within the
>>>>>>>>>> Airavata
>>>>>>>>>>>> PMC
>>>>>>>>>>>>>> itself, we can solicit a half-a-dozen different ways on how to
>>>>>>>>>>> visualize
>>>>>>>>>>>>>> the data model. And we need few hackethon's with real-end users
>>>> of
>>>>>>>>>>>> Airavata
>>>>>>>>>>>>>> until we find a common ground. All of this needs rapid
>>>>>> prototyping.
>>>>>>>>>>>>>> Currently a slight change in the data model is taking close to
>>>> two
>>>>>>>>>>>> weeks of
>>>>>>>>>>>>>> re-arcitecting the Open-JPA based registry. There are many known
>>>>>>>>>>>> problems
>>>>>>>>>>>>>> with current draft of data model which have to be put-down in
>>>> the
>>>>>>>>>>>> interest
>>>>>>>>>>>>>> of making over all system progress.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So the driving motivation is not certainly any of the classic
>>>>>> NoSQL
>>>>>>>>>>>> needs.
>>>>>>>>>>>>>> But a simple one, can we have registry which is schema-agnostic
>>>>>> and
>>>>>>>>>>> yet
>>>>>>>>>>>> is
>>>>>>>>>>>>>> queriable for most of the fields in the model? Can we try 10
>>>>>>>>>> different
>>>>>>>>>>>>>> variants of data model (hence API) within the next 3 months with
>>>>>>>>>>> focused
>>>>>>>>>>>>>> hackethon's and arrive at a stable 1.0 version of API?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Part one is the discussion is successful that it raised every
>>>>>> one's
>>>>>>>>>>> eye
>>>>>>>>>>>>>> brows. Now that we have every one's attention, what will be a
>>>> good
>>>>>>>>>>> data
>>>>>>>>>>>>>> store for Airavata which will meet these needs?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> P.S: Additional background: The API has been in development for
>>>>>>>>>> close
>>>>>>>>>>>> to 3
>>>>>>>>>>>>>> years and is falling short of pleasing a majority. Many academic
>>>>>>>>>>>>>> standardization efforts fail terribly trying to pretend to
>>>>>>>>>> understand
>>>>>>>>>>>> all
>>>>>>>>>>>>>> use cases and proposing a standard way (which ends up
>>>>>> unnecessarily
>>>>>>>>>>>> complex
>>>>>>>>>>>>>> and not usable). Science by nature is evolutionary, and
>>>>>> restricting
>>>>>>>>>>> the
>>>>>>>>>>>>>> capabilities by a known set of use cases prevents the use of
>>>>>>>>>>> middleware
>>>>>>>>>>>> for
>>>>>>>>>>>>>> real-scientific research (and gets limited to proof of concept
>>>>>>>>>>>>>> demonstrations, papers, educational use). The only way meeting
>>>> the
>>>>>>>>>>>>>> challenges of these evolving needs is to have the framework
>>>> which
>>>>>>>>>> can
>>>>>>>>>>>>>> evolve with minimal disruption.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Great thoughts so far, please keep 'em coming until we can find
>>>> a
>>>>>>>>>>>> solution
>>>>>>>>>>>>>> not by the technical fancies but to address the real need.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>> Suresh
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <
>>>>>> glahiru@gmail.com
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
>>>>>>>>>>>>>>> milinda.pathirage@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I also think that moving to Cassandra or any other NoSQL will
>>>>>> add
>>>>>>>>>>>>>>>> unneccessary complexity to your solution. Also designing
>>>> proper
>>>>>>>>>>> (easy
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> manage changes, easy to query) NoSQL data models are hard
>>>>>> (AFAIK,
>>>>>>>>>>>>>> require
>>>>>>>>>>>>>>>> lots of experience and understanding about data structures and
>>>>>>>>>>>> queries).
>>>>>>>>>>>>>>>> Also migrating from one NoSQL technology to other can require
>>>>>>>>>>> complete
>>>>>>>>>>>>>>>> re-write. And current relational databases can handle heavy
>>>>>> loads
>>>>>>>>>>>> except
>>>>>>>>>>>>>>>> Google, Twitter, Amazon and Facebook like loads. I don't think
>>>>>>>>>>>> Airavata
>>>>>>>>>>>>>>>> will see Google and Amazon like loads.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If the constant changes to the data model is the problem , I
>>>>>> think
>>>>>>>>>>>> best
>>>>>>>>>>>>>>>> option is to abstract registry implementation to something
>>>> like
>>>>>>>>>>>>>> collections
>>>>>>>>>>>>>>>> and resources used in WSO2 Registry [1] or something suitable
>>>>>> for
>>>>>>>>>>>>>> Airavata
>>>>>>>>>>>>>>>> context. That will make it easy to handle changes in data
>>>> model.
>>>>>>>>>>>>>>>> Also don't let the technologies drive design decision. Its
>>>>>> always
>>>>>>>>>>>>>> better to
>>>>>>>>>>>>>>>> let use cases drive the design decision.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>> Lahiru
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>> Milinda
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1] http://wso2.com/products/governance-registry/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
>>>>>>>>>>>>>> supun06@gmail.com
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm not trying to discourage you on your exploration to NoSQL
>>>>>>>>>>>>>> databases.
>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>> have the following concern.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Your database schema is moderately complex - even for a RDBMS
>>>>>> it
>>>>>>>>>>>> seems
>>>>>>>>>>>>>>>>> complex and the data size is relatively small. I'm not sure
>>>>>> about
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> current tools available but I think you will need to write
>>>> more
>>>>>>>>>>> code
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> support all your requirements in a NoSQL database. So writing
>>>>>>>>>> more
>>>>>>>>>>>> code
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> allow redundancy to support *relatively small* and
>>>> *structured
>>>>>>>>>>>>>>>>> data*doesn't seem right to me. May be I'm wrong and there are
>>>>>>>>>>> better
>>>>>>>>>>>>>>>>> tools in
>>>>>>>>>>>>>>>>> NoSQL than RDBMS, which I doubt.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Supun..
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <
>>>>>> smarru@apache.org
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Airavata is actively migrating to use Thrift API for the
>>>>>>>>>> RESTless
>>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>>> and to facilitate various language bindings from client
>>>>>>>>>> gateways.
>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>> programming language support in thrift has been so far very
>>>>>>>>>>>>>>>> encouraging.
>>>>>>>>>>>>>>>>>> The current architecture is looking like Figure 1 at [1].
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Language specific clients will be released as thrift SDK's
>>>>>>>>>>> (similar
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> evernote sdk's [1]). These clients will be integrated into
>>>>>>>>>> gateway
>>>>>>>>>>>>>>>>> portals
>>>>>>>>>>>>>>>>>> which connect to the API Server. The API operations brokers
>>>> he
>>>>>>>>>>>> simple
>>>>>>>>>>>>>>>>> calls
>>>>>>>>>>>>>>>>>> into one or more backend CPI calls (Airavata internal
>>>>>> component
>>>>>>>>>>>>>>>>>> interfaces).  An example set of mappings are illustrated in
>>>>>>>>>>> Figure 2
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> [1]. The current draft of thrift API for version 0.12 is at
>>>>>> [3],
>>>>>>>>>>>>>> please
>>>>>>>>>>>>>>>>> pay
>>>>>>>>>>>>>>>>>> attention to experiment model at [4].
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For the persistent store, we had few iterations of Airavata
>>>>>>>>>>> Registry
>>>>>>>>>>>>>>>>>> shifting from a legacy XRegistry to JackRabbit to now a
>>>>>> OpenJPA
>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>>> registry. To allow the API and the associated data models to
>>>>>>>>>>> evolve,
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>> will be useful to explore object databases so we can store
>>>> the
>>>>>>>>>>>>>>>> serialized
>>>>>>>>>>>>>>>>>> version of thrift objects directly. But it will be nice to
>>>>>> have
>>>>>>>>>>> all
>>>>>>>>>>>>>> (or
>>>>>>>>>>>>>>>>>> most) of the fields queriable. This calls for a more
>>>>>>>>>> column-family
>>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>>> of any NoSQL approaches.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Any recommendations for a registry architecture?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Quickly hacking through I find the following approach a
>>>> viable
>>>>>>>>>>> one:
>>>>>>>>>>>>>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra.
>>>>>> Airavata
>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> benefit
>>>>>>>>>>>>>>>>>> immediately from the replication and reliability of
>>>> cassandra
>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> scalability in near future. Some of the model objects like
>>>>>>>>>>>> experiment
>>>>>>>>>>>>>>>>>> creation will need to have strong consistency and most of
>>>> the
>>>>>>>>>>>>>>>> monitoring
>>>>>>>>>>>>>>>>>> can live with eventual consistency.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Critical comments please?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for your time,
>>>>>>>>>>>>>>>>>> Suresh
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1] -
>>>>>>>>>>>>>>>>>>
>>>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>>>>>>>>>>>>>>>>>> [2] - https://dev.evernote.com/doc/
>>>>>>>>>>>>>>>>>> [3] -
>>>>>>>>>>>>>>>>>>
>>>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>>>>>>>>>>>>>>>>>> [4] -
>>>>>>>>>>>>>>>>>>
>>>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>>>>>>>>>>>>>>>>>> [5] - https://github.com/MisterTea/ZombieDB
>>>>>>>>>>>>>>>>>> [6] - https://github.com/Netflix/astyanax
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Supun Kamburugamuva
>>>>>>>>>>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>>>>>>>>>>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>>>>>>>>>>>>>>>> Blog: http://supunk.blogspot.com
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Milinda Pathirage
>>>>>>>>>>>>>>>> PhD Student Indiana University, Bloomington;
>>>>>>>>>>>>>>>> E-mail: milinda.pathirage@gmail.com
>>>>>>>>>>>>>>>> Web: http://mpathirage.com
>>>>>>>>>>>>>>>> Blog: http://blog.mpathirage.com
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> System Analyst Programmer
>>>>>>>>>>>>>>> PTI Lab
>>>>>>>>>>>>>>> Indiana University
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Shameera Rathnayaka.
>>>>>>>>>>
>>>>>>>>>> email: shameera AT apache.org , shameerainfo AT gmail.com
>>>>>>>>>> Blog : http://shameerarathnayaka.blogspot.com/
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Supun Kamburugamuva
>>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>>>>>>>> Blog: http://supunk.blogspot.com
>>>>>>>>
>>>>>>
>>>>

Re: Object Database Suggestions for Airavata Registry

Posted by Suresh Marru <sm...@apache.org>.

Thank you all for taking couple of hours on a sunday evening to participate. I think these discussions help Airavata very significantly.

Here is the you tube link is any one would like to follow - http://www.youtube.com/watch?v=EY6oPwqi1g4

Key Summary: Sachith is interested to do a GSoC project on this topic and he will start with summarizing the challenges in current registry. Once the problem statement is more clearer, we can take the next steps.

Appreciate every one input on this key topic.

Suresh

P.S. I will be traveling for next 4 days, so I will be slow in my responses. 

On Mar 2, 2014, at 7:56 PM, Suresh Marru <sm...@apache.org> wrote:

> Lets use this - https://plus.google.com/hangouts/_/hoaevent/AP36tYdve-71oizx25DGUbTZjSX4PtLxmDsddtqnfuDYlE9SXDSB9Q?authuser=0&hl=en
> 
> I will compile a set of instructions for website so any one of us can preschedule it for future.
> 
> Sures
> 
> On Mar 2, 2014, at 7:36 PM, Eran Chinthaka Withana <er...@gmail.com> wrote:
> 
>> Oops, in that case, Suresh, can you please create one?
>> 
>> Thanks,
>> Eran Chinthaka Withana
>> 
>> 
>> On Sun, Mar 2, 2014 at 4:28 PM, Suresh Marru <sm...@apache.org> wrote:
>> 
>>> Hi Eran,
>>> 
>>> Is this a On-Air event? Previously I had trouble changing the previously
>>> scheduled event to On-Air.
>>> 
>>> If you are creating a new hangout, can you first create it on G+ Airavata
>>> Community (all PMC Members are moderators on this community). This will be
>>> easier for archival reference -
>>> https://plus.google.com/communities/100700433662281905708
>>> 
>>> Suresh
>>> 
>>> On Mar 2, 2014, at 7:21 PM, Eran Chinthaka Withana <
>>> eran.chinthaka@gmail.com> wrote:
>>> 
>>>> Here is the link to hangout:
>>>> 
>>> https://plus.google.com/hangouts/_/event/c1sgvk7dha37rkr0adktb195lgc?authuser=0&hl=en
>>>> 
>>>> Thanks,
>>>> Eran Chinthaka Withana
>>>> 
>>>> 
>>>> On Sun, Mar 2, 2014 at 12:46 PM, Suresh Marru <sm...@apache.org> wrote:
>>>> 
>>>>> Hi All,
>>>>> 
>>>>> Since Eran has been the one who first proposed the hangout and has
>>>>> specific suggestion on this thread I prefer to postpone to 8pm (EST).
>>> But
>>>>> if others planned for 4pm, lets goahead with the plan.
>>>>> 
>>>>> Any one who planned to attend now cannot make it at 8pm (EST)? If do not
>>>>> hear any objections lets shoot for 8pm. Otherwise, lets go as planned.
>>>>> 
>>>>> Cheers,
>>>>> Suresh
>>>>> 
>>>>> On Mar 2, 2014, at 3:31 PM, Eran Chinthaka Withana <
>>>>> eran.chinthaka@gmail.com> wrote:
>>>>> 
>>>>>> Hi Suresh,
>>>>>> 
>>>>>> Sorry for the late reply. I don't think I can make it at 1pm PST today.
>>>>> Can
>>>>>> we please re-schedule this to 5pm PST (8pm EST) or later?
>>>>>> 
>>>>>> Thanks,
>>>>>> Eran Chinthaka Withana
>>>>>> 
>>>>>> 
>>>>>> On Sun, Mar 2, 2014 at 6:38 AM, Suresh Marru <sm...@apache.org>
>>> wrote:
>>>>>> 
>>>>>>> Hi All,
>>>>>>> 
>>>>>>> Great to see we have a good quorum. So how about 4pm EST (1pm PST)
>>> today
>>>>>>> with a hangout on air. It works best if we start a a hangout then
>>>>> (previous
>>>>>>> attempts to pre-schedules on-air events did not work well. So please
>>>>> check
>>>>>>> this mailing list around 4pm EST for the hangout on air link.
>>>>>>> 
>>>>>>> Meanwhile, please join the Airavata Google Plus community, that might
>>> be
>>>>>>> easier to share the link -
>>>>>>> https://plus.google.com/communities/100700433662281905708
>>>>>>> 
>>>>>>> Thanks all for willing to take time on a sunday,
>>>>>>> Suresh
>>>>>>> 
>>>>>>> On Feb 28, 2014, at 9:15 PM, Supun Kamburugamuva <su...@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> +1 for Sunday afternoon. I can make it after 4 pm EST.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Supun..
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, Feb 28, 2014 at 5:04 PM, Shameera Rathnayaka <
>>>>>>> shameerainfo@gmail.com
>>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> +1
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Shameera.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Sat, Mar 1, 2014 at 3:11 AM, Eran Chinthaka Withana <
>>>>>>>>> eran.chinthaka@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>>> +1 for Sunday afternoon
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Eran Chinthaka Withana
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Fri, Feb 28, 2014 at 5:17 AM, Suresh Marru <sm...@apache.org>
>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Eran,
>>>>>>>>>>> 
>>>>>>>>>>> This is a great idea. I myself owe few replies on this thread and
>>>>>>>>> unable
>>>>>>>>>>> to take time to comprehend my thoughts (and realized I should take
>>>>>>> time
>>>>>>>>>> to
>>>>>>>>>>> properly articulate the challenges otherwise we will be discussing
>>>>>>>>>>> orthogonal issues).
>>>>>>>>>>> 
>>>>>>>>>>> A hangout will help us brainstorm more comprehensively. We can
>>> have
>>>>> it
>>>>>>>>> on
>>>>>>>>>>> air so we can refer back for archival purposes. How is Sunday
>>>>>>> afternoon
>>>>>>>>>> for
>>>>>>>>>>> everyone willing to join and contribute?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Suresh
>>>>>>>>>>> 
>>>>>>>>>>> On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <
>>>>>>>>>>> eran.chinthaka@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> Is there any chance of hosting a google hangout to talk about
>>>>> this. I
>>>>>>>>>>> think
>>>>>>>>>>>> with long emails and multiple directions things are getting
>>> little
>>>>>>>>> bit
>>>>>>>>>>>> confusing in thread (I'm partly responsible for this :) ). I can
>>>>>>>>> join a
>>>>>>>>>>>> video chat during a weekend but lets make sure its convenient for
>>>>>>>>> both
>>>>>>>>>>> east
>>>>>>>>>>>> and west coasts :)
>>>>>>>>>>>> 
>>>>>>>>>>>> WDYT?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Eran Chinthaka Withana
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <smarru@apache.org
>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> I could respond to each thread in detail, but I see the general
>>>>>>>>> sense
>>>>>>>>>> is
>>>>>>>>>>>>> inquiring on the use case, so let me try and explain this and
>>> see
>>>>> if
>>>>>>>>>> it
>>>>>>>>>>>>> comes across. I am fully onboard with perceptions of relational
>>> vs
>>>>>>>>>> nosql
>>>>>>>>>>>>> and also agree current Airavata needs are not a direct map for
>>>>> NoSQL
>>>>>>>>>>>>> migration. I will summarize the driving motivation:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Background: The key problem Airavata needs to solve is getting
>>> the
>>>>>>>>> API
>>>>>>>>>>> and
>>>>>>>>>>>>> associated data model right. The problem is current relational
>>>>>>>>>> database
>>>>>>>>>>>>> (with OpenJPA overlay) is severely limiting the API evolution.
>>>>>>>>> Science
>>>>>>>>>>>>> Gateways by nature are very science domain and use-case
>>> specific.
>>>>>>>>> But
>>>>>>>>>>>>> Airavata is tackling this challenging problem of providing a
>>>>> generic
>>>>>>>>>> API
>>>>>>>>>>>>> which will meet and enable these use case centric integration.
>>> The
>>>>>>>>>> issue
>>>>>>>>>>>>> here is, we are designing an API to handle a wide range of known
>>>>>>>>> (and
>>>>>>>>>>> some
>>>>>>>>>>>>> foreseen) use cases. But at the same time trying to keep it
>>> simple
>>>>>>>>> and
>>>>>>>>>>> yet
>>>>>>>>>>>>> flexible. The only way we can get through a reasonable,
>>> normalized
>>>>>>>>>>> version
>>>>>>>>>>>>> of API is by hands-on programming against the API. Within the
>>>>>>>>> Airavata
>>>>>>>>>>> PMC
>>>>>>>>>>>>> itself, we can solicit a half-a-dozen different ways on how to
>>>>>>>>>> visualize
>>>>>>>>>>>>> the data model. And we need few hackethon's with real-end users
>>> of
>>>>>>>>>>> Airavata
>>>>>>>>>>>>> until we find a common ground. All of this needs rapid
>>>>> prototyping.
>>>>>>>>>>>>> Currently a slight change in the data model is taking close to
>>> two
>>>>>>>>>>> weeks of
>>>>>>>>>>>>> re-arcitecting the Open-JPA based registry. There are many known
>>>>>>>>>>> problems
>>>>>>>>>>>>> with current draft of data model which have to be put-down in
>>> the
>>>>>>>>>>> interest
>>>>>>>>>>>>> of making over all system progress.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So the driving motivation is not certainly any of the classic
>>>>> NoSQL
>>>>>>>>>>> needs.
>>>>>>>>>>>>> But a simple one, can we have registry which is schema-agnostic
>>>>> and
>>>>>>>>>> yet
>>>>>>>>>>> is
>>>>>>>>>>>>> queriable for most of the fields in the model? Can we try 10
>>>>>>>>> different
>>>>>>>>>>>>> variants of data model (hence API) within the next 3 months with
>>>>>>>>>> focused
>>>>>>>>>>>>> hackethon's and arrive at a stable 1.0 version of API?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Part one is the discussion is successful that it raised every
>>>>> one's
>>>>>>>>>> eye
>>>>>>>>>>>>> brows. Now that we have every one's attention, what will be a
>>> good
>>>>>>>>>> data
>>>>>>>>>>>>> store for Airavata which will meet these needs?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> P.S: Additional background: The API has been in development for
>>>>>>>>> close
>>>>>>>>>>> to 3
>>>>>>>>>>>>> years and is falling short of pleasing a majority. Many academic
>>>>>>>>>>>>> standardization efforts fail terribly trying to pretend to
>>>>>>>>> understand
>>>>>>>>>>> all
>>>>>>>>>>>>> use cases and proposing a standard way (which ends up
>>>>> unnecessarily
>>>>>>>>>>> complex
>>>>>>>>>>>>> and not usable). Science by nature is evolutionary, and
>>>>> restricting
>>>>>>>>>> the
>>>>>>>>>>>>> capabilities by a known set of use cases prevents the use of
>>>>>>>>>> middleware
>>>>>>>>>>> for
>>>>>>>>>>>>> real-scientific research (and gets limited to proof of concept
>>>>>>>>>>>>> demonstrations, papers, educational use). The only way meeting
>>> the
>>>>>>>>>>>>> challenges of these evolving needs is to have the framework
>>> which
>>>>>>>>> can
>>>>>>>>>>>>> evolve with minimal disruption.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Great thoughts so far, please keep 'em coming until we can find
>>> a
>>>>>>>>>>> solution
>>>>>>>>>>>>> not by the technical fancies but to address the real need.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Suresh
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <
>>>>> glahiru@gmail.com
>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
>>>>>>>>>>>>>> milinda.pathirage@gmail.com> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I also think that moving to Cassandra or any other NoSQL will
>>>>> add
>>>>>>>>>>>>>>> unneccessary complexity to your solution. Also designing
>>> proper
>>>>>>>>>> (easy
>>>>>>>>>>> to
>>>>>>>>>>>>>>> manage changes, easy to query) NoSQL data models are hard
>>>>> (AFAIK,
>>>>>>>>>>>>> require
>>>>>>>>>>>>>>> lots of experience and understanding about data structures and
>>>>>>>>>>> queries).
>>>>>>>>>>>>>>> Also migrating from one NoSQL technology to other can require
>>>>>>>>>> complete
>>>>>>>>>>>>>>> re-write. And current relational databases can handle heavy
>>>>> loads
>>>>>>>>>>> except
>>>>>>>>>>>>>>> Google, Twitter, Amazon and Facebook like loads. I don't think
>>>>>>>>>>> Airavata
>>>>>>>>>>>>>>> will see Google and Amazon like loads.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If the constant changes to the data model is the problem , I
>>>>> think
>>>>>>>>>>> best
>>>>>>>>>>>>>>> option is to abstract registry implementation to something
>>> like
>>>>>>>>>>>>> collections
>>>>>>>>>>>>>>> and resources used in WSO2 Registry [1] or something suitable
>>>>> for
>>>>>>>>>>>>> Airavata
>>>>>>>>>>>>>>> context. That will make it easy to handle changes in data
>>> model.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Also don't let the technologies drive design decision. Its
>>>>> always
>>>>>>>>>>>>> better to
>>>>>>>>>>>>>>> let use cases drive the design decision.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>> Lahiru
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> Milinda
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> [1] http://wso2.com/products/governance-registry/
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
>>>>>>>>>>>>> supun06@gmail.com
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I'm not trying to discourage you on your exploration to NoSQL
>>>>>>>>>>>>> databases.
>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> have the following concern.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Your database schema is moderately complex - even for a RDBMS
>>>>> it
>>>>>>>>>>> seems
>>>>>>>>>>>>>>>> complex and the data size is relatively small. I'm not sure
>>>>> about
>>>>>>>>>> the
>>>>>>>>>>>>>>>> current tools available but I think you will need to write
>>> more
>>>>>>>>>> code
>>>>>>>>>>> to
>>>>>>>>>>>>>>>> support all your requirements in a NoSQL database. So writing
>>>>>>>>> more
>>>>>>>>>>> code
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> allow redundancy to support *relatively small* and
>>> *structured
>>>>>>>>>>>>>>>> data*doesn't seem right to me. May be I'm wrong and there are
>>>>>>>>>> better
>>>>>>>>>>>>>>>> tools in
>>>>>>>>>>>>>>>> NoSQL than RDBMS, which I doubt.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Supun..
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <
>>>>> smarru@apache.org
>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Airavata is actively migrating to use Thrift API for the
>>>>>>>>> RESTless
>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>> and to facilitate various language bindings from client
>>>>>>>>> gateways.
>>>>>>>>>>> The
>>>>>>>>>>>>>>>>> programming language support in thrift has been so far very
>>>>>>>>>>>>>>> encouraging.
>>>>>>>>>>>>>>>>> The current architecture is looking like Figure 1 at [1].
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Language specific clients will be released as thrift SDK's
>>>>>>>>>> (similar
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> evernote sdk's [1]). These clients will be integrated into
>>>>>>>>> gateway
>>>>>>>>>>>>>>>> portals
>>>>>>>>>>>>>>>>> which connect to the API Server. The API operations brokers
>>> he
>>>>>>>>>>> simple
>>>>>>>>>>>>>>>> calls
>>>>>>>>>>>>>>>>> into one or more backend CPI calls (Airavata internal
>>>>> component
>>>>>>>>>>>>>>>>> interfaces).  An example set of mappings are illustrated in
>>>>>>>>>> Figure 2
>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> [1]. The current draft of thrift API for version 0.12 is at
>>>>> [3],
>>>>>>>>>>>>> please
>>>>>>>>>>>>>>>> pay
>>>>>>>>>>>>>>>>> attention to experiment model at [4].
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> For the persistent store, we had few iterations of Airavata
>>>>>>>>>> Registry
>>>>>>>>>>>>>>>>> shifting from a legacy XRegistry to JackRabbit to now a
>>>>> OpenJPA
>>>>>>>>>>> based
>>>>>>>>>>>>>>>>> registry. To allow the API and the associated data models to
>>>>>>>>>> evolve,
>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> will be useful to explore object databases so we can store
>>> the
>>>>>>>>>>>>>>> serialized
>>>>>>>>>>>>>>>>> version of thrift objects directly. But it will be nice to
>>>>> have
>>>>>>>>>> all
>>>>>>>>>>>>> (or
>>>>>>>>>>>>>>>>> most) of the fields queriable. This calls for a more
>>>>>>>>> column-family
>>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>>> of any NoSQL approaches.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Any recommendations for a registry architecture?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Quickly hacking through I find the following approach a
>>> viable
>>>>>>>>>> one:
>>>>>>>>>>>>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra.
>>>>> Airavata
>>>>>>>>>> can
>>>>>>>>>>>>>>>> benefit
>>>>>>>>>>>>>>>>> immediately from the replication and reliability of
>>> cassandra
>>>>>>>>> and
>>>>>>>>>>>>>>>>> scalability in near future. Some of the model objects like
>>>>>>>>>>> experiment
>>>>>>>>>>>>>>>>> creation will need to have strong consistency and most of
>>> the
>>>>>>>>>>>>>>> monitoring
>>>>>>>>>>>>>>>>> can live with eventual consistency.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Critical comments please?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for your time,
>>>>>>>>>>>>>>>>> Suresh
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> [1] -
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>>>>>>>>>>>>>>>>> [2] - https://dev.evernote.com/doc/
>>>>>>>>>>>>>>>>> [3] -
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>>>>>>>>>>>>>>>>> [4] -
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>>>>>>>>>>>>>>>>> [5] - https://github.com/MisterTea/ZombieDB
>>>>>>>>>>>>>>>>> [6] - https://github.com/Netflix/astyanax
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Supun Kamburugamuva
>>>>>>>>>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>>>>>>>>>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>>>>>>>>>>>>>>> Blog: http://supunk.blogspot.com
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Milinda Pathirage
>>>>>>>>>>>>>>> PhD Student Indiana University, Bloomington;
>>>>>>>>>>>>>>> E-mail: milinda.pathirage@gmail.com
>>>>>>>>>>>>>>> Web: http://mpathirage.com
>>>>>>>>>>>>>>> Blog: http://blog.mpathirage.com
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> System Analyst Programmer
>>>>>>>>>>>>>> PTI Lab
>>>>>>>>>>>>>> Indiana University
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Shameera Rathnayaka.
>>>>>>>>> 
>>>>>>>>> email: shameera AT apache.org , shameerainfo AT gmail.com
>>>>>>>>> Blog : http://shameerarathnayaka.blogspot.com/
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Supun Kamburugamuva
>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>>>>>>> Blog: http://supunk.blogspot.com
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>

Re: Object Database Suggestions for Airavata Registry

Posted by Suresh Marru <sm...@apache.org>.

Lets use this - https://plus.google.com/hangouts/_/hoaevent/AP36tYdve-71oizx25DGUbTZjSX4PtLxmDsddtqnfuDYlE9SXDSB9Q?authuser=0&hl=en

I will compile a set of instructions for website so any one of us can preschedule it for future.

Sures

On Mar 2, 2014, at 7:36 PM, Eran Chinthaka Withana <er...@gmail.com> wrote:

> Oops, in that case, Suresh, can you please create one?
> 
> Thanks,
> Eran Chinthaka Withana
> 
> 
> On Sun, Mar 2, 2014 at 4:28 PM, Suresh Marru <sm...@apache.org> wrote:
> 
>> Hi Eran,
>> 
>> Is this a On-Air event? Previously I had trouble changing the previously
>> scheduled event to On-Air.
>> 
>> If you are creating a new hangout, can you first create it on G+ Airavata
>> Community (all PMC Members are moderators on this community). This will be
>> easier for archival reference -
>> https://plus.google.com/communities/100700433662281905708
>> 
>> Suresh
>> 
>> On Mar 2, 2014, at 7:21 PM, Eran Chinthaka Withana <
>> eran.chinthaka@gmail.com> wrote:
>> 
>>> Here is the link to hangout:
>>> 
>> https://plus.google.com/hangouts/_/event/c1sgvk7dha37rkr0adktb195lgc?authuser=0&hl=en
>>> 
>>> Thanks,
>>> Eran Chinthaka Withana
>>> 
>>> 
>>> On Sun, Mar 2, 2014 at 12:46 PM, Suresh Marru <sm...@apache.org> wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> Since Eran has been the one who first proposed the hangout and has
>>>> specific suggestion on this thread I prefer to postpone to 8pm (EST).
>> But
>>>> if others planned for 4pm, lets goahead with the plan.
>>>> 
>>>> Any one who planned to attend now cannot make it at 8pm (EST)? If do not
>>>> hear any objections lets shoot for 8pm. Otherwise, lets go as planned.
>>>> 
>>>> Cheers,
>>>> Suresh
>>>> 
>>>> On Mar 2, 2014, at 3:31 PM, Eran Chinthaka Withana <
>>>> eran.chinthaka@gmail.com> wrote:
>>>> 
>>>>> Hi Suresh,
>>>>> 
>>>>> Sorry for the late reply. I don't think I can make it at 1pm PST today.
>>>> Can
>>>>> we please re-schedule this to 5pm PST (8pm EST) or later?
>>>>> 
>>>>> Thanks,
>>>>> Eran Chinthaka Withana
>>>>> 
>>>>> 
>>>>> On Sun, Mar 2, 2014 at 6:38 AM, Suresh Marru <sm...@apache.org>
>> wrote:
>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> Great to see we have a good quorum. So how about 4pm EST (1pm PST)
>> today
>>>>>> with a hangout on air. It works best if we start a a hangout then
>>>> (previous
>>>>>> attempts to pre-schedules on-air events did not work well. So please
>>>> check
>>>>>> this mailing list around 4pm EST for the hangout on air link.
>>>>>> 
>>>>>> Meanwhile, please join the Airavata Google Plus community, that might
>> be
>>>>>> easier to share the link -
>>>>>> https://plus.google.com/communities/100700433662281905708
>>>>>> 
>>>>>> Thanks all for willing to take time on a sunday,
>>>>>> Suresh
>>>>>> 
>>>>>> On Feb 28, 2014, at 9:15 PM, Supun Kamburugamuva <su...@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> +1 for Sunday afternoon. I can make it after 4 pm EST.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Supun..
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Feb 28, 2014 at 5:04 PM, Shameera Rathnayaka <
>>>>>> shameerainfo@gmail.com
>>>>>>>> wrote:
>>>>>>> 
>>>>>>>> +1
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Shameera.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sat, Mar 1, 2014 at 3:11 AM, Eran Chinthaka Withana <
>>>>>>>> eran.chinthaka@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> +1 for Sunday afternoon
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Eran Chinthaka Withana
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Fri, Feb 28, 2014 at 5:17 AM, Suresh Marru <sm...@apache.org>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Eran,
>>>>>>>>>> 
>>>>>>>>>> This is a great idea. I myself owe few replies on this thread and
>>>>>>>> unable
>>>>>>>>>> to take time to comprehend my thoughts (and realized I should take
>>>>>> time
>>>>>>>>> to
>>>>>>>>>> properly articulate the challenges otherwise we will be discussing
>>>>>>>>>> orthogonal issues).
>>>>>>>>>> 
>>>>>>>>>> A hangout will help us brainstorm more comprehensively. We can
>> have
>>>> it
>>>>>>>> on
>>>>>>>>>> air so we can refer back for archival purposes. How is Sunday
>>>>>> afternoon
>>>>>>>>> for
>>>>>>>>>> everyone willing to join and contribute?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Suresh
>>>>>>>>>> 
>>>>>>>>>> On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <
>>>>>>>>>> eran.chinthaka@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> Is there any chance of hosting a google hangout to talk about
>>>> this. I
>>>>>>>>>> think
>>>>>>>>>>> with long emails and multiple directions things are getting
>> little
>>>>>>>> bit
>>>>>>>>>>> confusing in thread (I'm partly responsible for this :) ). I can
>>>>>>>> join a
>>>>>>>>>>> video chat during a weekend but lets make sure its convenient for
>>>>>>>> both
>>>>>>>>>> east
>>>>>>>>>>> and west coasts :)
>>>>>>>>>>> 
>>>>>>>>>>> WDYT?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Eran Chinthaka Withana
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <smarru@apache.org
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I could respond to each thread in detail, but I see the general
>>>>>>>> sense
>>>>>>>>> is
>>>>>>>>>>>> inquiring on the use case, so let me try and explain this and
>> see
>>>> if
>>>>>>>>> it
>>>>>>>>>>>> comes across. I am fully onboard with perceptions of relational
>> vs
>>>>>>>>> nosql
>>>>>>>>>>>> and also agree current Airavata needs are not a direct map for
>>>> NoSQL
>>>>>>>>>>>> migration. I will summarize the driving motivation:
>>>>>>>>>>>> 
>>>>>>>>>>>> Background: The key problem Airavata needs to solve is getting
>> the
>>>>>>>> API
>>>>>>>>>> and
>>>>>>>>>>>> associated data model right. The problem is current relational
>>>>>>>>> database
>>>>>>>>>>>> (with OpenJPA overlay) is severely limiting the API evolution.
>>>>>>>> Science
>>>>>>>>>>>> Gateways by nature are very science domain and use-case
>> specific.
>>>>>>>> But
>>>>>>>>>>>> Airavata is tackling this challenging problem of providing a
>>>> generic
>>>>>>>>> API
>>>>>>>>>>>> which will meet and enable these use case centric integration.
>> The
>>>>>>>>> issue
>>>>>>>>>>>> here is, we are designing an API to handle a wide range of known
>>>>>>>> (and
>>>>>>>>>> some
>>>>>>>>>>>> foreseen) use cases. But at the same time trying to keep it
>> simple
>>>>>>>> and
>>>>>>>>>> yet
>>>>>>>>>>>> flexible. The only way we can get through a reasonable,
>> normalized
>>>>>>>>>> version
>>>>>>>>>>>> of API is by hands-on programming against the API. Within the
>>>>>>>> Airavata
>>>>>>>>>> PMC
>>>>>>>>>>>> itself, we can solicit a half-a-dozen different ways on how to
>>>>>>>>> visualize
>>>>>>>>>>>> the data model. And we need few hackethon's with real-end users
>> of
>>>>>>>>>> Airavata
>>>>>>>>>>>> until we find a common ground. All of this needs rapid
>>>> prototyping.
>>>>>>>>>>>> Currently a slight change in the data model is taking close to
>> two
>>>>>>>>>> weeks of
>>>>>>>>>>>> re-arcitecting the Open-JPA based registry. There are many known
>>>>>>>>>> problems
>>>>>>>>>>>> with current draft of data model which have to be put-down in
>> the
>>>>>>>>>> interest
>>>>>>>>>>>> of making over all system progress.
>>>>>>>>>>>> 
>>>>>>>>>>>> So the driving motivation is not certainly any of the classic
>>>> NoSQL
>>>>>>>>>> needs.
>>>>>>>>>>>> But a simple one, can we have registry which is schema-agnostic
>>>> and
>>>>>>>>> yet
>>>>>>>>>> is
>>>>>>>>>>>> queriable for most of the fields in the model? Can we try 10
>>>>>>>> different
>>>>>>>>>>>> variants of data model (hence API) within the next 3 months with
>>>>>>>>> focused
>>>>>>>>>>>> hackethon's and arrive at a stable 1.0 version of API?
>>>>>>>>>>>> 
>>>>>>>>>>>> Part one is the discussion is successful that it raised every
>>>> one's
>>>>>>>>> eye
>>>>>>>>>>>> brows. Now that we have every one's attention, what will be a
>> good
>>>>>>>>> data
>>>>>>>>>>>> store for Airavata which will meet these needs?
>>>>>>>>>>>> 
>>>>>>>>>>>> P.S: Additional background: The API has been in development for
>>>>>>>> close
>>>>>>>>>> to 3
>>>>>>>>>>>> years and is falling short of pleasing a majority. Many academic
>>>>>>>>>>>> standardization efforts fail terribly trying to pretend to
>>>>>>>> understand
>>>>>>>>>> all
>>>>>>>>>>>> use cases and proposing a standard way (which ends up
>>>> unnecessarily
>>>>>>>>>> complex
>>>>>>>>>>>> and not usable). Science by nature is evolutionary, and
>>>> restricting
>>>>>>>>> the
>>>>>>>>>>>> capabilities by a known set of use cases prevents the use of
>>>>>>>>> middleware
>>>>>>>>>> for
>>>>>>>>>>>> real-scientific research (and gets limited to proof of concept
>>>>>>>>>>>> demonstrations, papers, educational use). The only way meeting
>> the
>>>>>>>>>>>> challenges of these evolving needs is to have the framework
>> which
>>>>>>>> can
>>>>>>>>>>>> evolve with minimal disruption.
>>>>>>>>>>>> 
>>>>>>>>>>>> Great thoughts so far, please keep 'em coming until we can find
>> a
>>>>>>>>>> solution
>>>>>>>>>>>> not by the technical fancies but to address the real need.
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Suresh
>>>>>>>>>>>> 
>>>>>>>>>>>> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <
>>>> glahiru@gmail.com
>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
>>>>>>>>>>>>> milinda.pathirage@gmail.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I also think that moving to Cassandra or any other NoSQL will
>>>> add
>>>>>>>>>>>>>> unneccessary complexity to your solution. Also designing
>> proper
>>>>>>>>> (easy
>>>>>>>>>> to
>>>>>>>>>>>>>> manage changes, easy to query) NoSQL data models are hard
>>>> (AFAIK,
>>>>>>>>>>>> require
>>>>>>>>>>>>>> lots of experience and understanding about data structures and
>>>>>>>>>> queries).
>>>>>>>>>>>>>> Also migrating from one NoSQL technology to other can require
>>>>>>>>> complete
>>>>>>>>>>>>>> re-write. And current relational databases can handle heavy
>>>> loads
>>>>>>>>>> except
>>>>>>>>>>>>>> Google, Twitter, Amazon and Facebook like loads. I don't think
>>>>>>>>>> Airavata
>>>>>>>>>>>>>> will see Google and Amazon like loads.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> +1
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If the constant changes to the data model is the problem , I
>>>> think
>>>>>>>>>> best
>>>>>>>>>>>>>> option is to abstract registry implementation to something
>> like
>>>>>>>>>>>> collections
>>>>>>>>>>>>>> and resources used in WSO2 Registry [1] or something suitable
>>>> for
>>>>>>>>>>>> Airavata
>>>>>>>>>>>>>> context. That will make it easy to handle changes in data
>> model.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Also don't let the technologies drive design decision. Its
>>>> always
>>>>>>>>>>>> better to
>>>>>>>>>>>>>> let use cases drive the design decision.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> +1
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> Lahiru
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Milinda
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [1] http://wso2.com/products/governance-registry/
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
>>>>>>>>>>>> supun06@gmail.com
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I'm not trying to discourage you on your exploration to NoSQL
>>>>>>>>>>>> databases.
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>> have the following concern.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Your database schema is moderately complex - even for a RDBMS
>>>> it
>>>>>>>>>> seems
>>>>>>>>>>>>>>> complex and the data size is relatively small. I'm not sure
>>>> about
>>>>>>>>> the
>>>>>>>>>>>>>>> current tools available but I think you will need to write
>> more
>>>>>>>>> code
>>>>>>>>>> to
>>>>>>>>>>>>>>> support all your requirements in a NoSQL database. So writing
>>>>>>>> more
>>>>>>>>>> code
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> allow redundancy to support *relatively small* and
>> *structured
>>>>>>>>>>>>>>> data*doesn't seem right to me. May be I'm wrong and there are
>>>>>>>>> better
>>>>>>>>>>>>>>> tools in
>>>>>>>>>>>>>>> NoSQL than RDBMS, which I doubt.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Supun..
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <
>>>> smarru@apache.org
>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Airavata is actively migrating to use Thrift API for the
>>>>>>>> RESTless
>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>> and to facilitate various language bindings from client
>>>>>>>> gateways.
>>>>>>>>>> The
>>>>>>>>>>>>>>>> programming language support in thrift has been so far very
>>>>>>>>>>>>>> encouraging.
>>>>>>>>>>>>>>>> The current architecture is looking like Figure 1 at [1].
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Language specific clients will be released as thrift SDK's
>>>>>>>>> (similar
>>>>>>>>>> to
>>>>>>>>>>>>>>>> evernote sdk's [1]). These clients will be integrated into
>>>>>>>> gateway
>>>>>>>>>>>>>>> portals
>>>>>>>>>>>>>>>> which connect to the API Server. The API operations brokers
>> he
>>>>>>>>>> simple
>>>>>>>>>>>>>>> calls
>>>>>>>>>>>>>>>> into one or more backend CPI calls (Airavata internal
>>>> component
>>>>>>>>>>>>>>>> interfaces).  An example set of mappings are illustrated in
>>>>>>>>> Figure 2
>>>>>>>>>>>> at
>>>>>>>>>>>>>>>> [1]. The current draft of thrift API for version 0.12 is at
>>>> [3],
>>>>>>>>>>>> please
>>>>>>>>>>>>>>> pay
>>>>>>>>>>>>>>>> attention to experiment model at [4].
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> For the persistent store, we had few iterations of Airavata
>>>>>>>>> Registry
>>>>>>>>>>>>>>>> shifting from a legacy XRegistry to JackRabbit to now a
>>>> OpenJPA
>>>>>>>>>> based
>>>>>>>>>>>>>>>> registry. To allow the API and the associated data models to
>>>>>>>>> evolve,
>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> will be useful to explore object databases so we can store
>> the
>>>>>>>>>>>>>> serialized
>>>>>>>>>>>>>>>> version of thrift objects directly. But it will be nice to
>>>> have
>>>>>>>>> all
>>>>>>>>>>>> (or
>>>>>>>>>>>>>>>> most) of the fields queriable. This calls for a more
>>>>>>>> column-family
>>>>>>>>>>>>>> design
>>>>>>>>>>>>>>>> of any NoSQL approaches.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Any recommendations for a registry architecture?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Quickly hacking through I find the following approach a
>> viable
>>>>>>>>> one:
>>>>>>>>>>>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra.
>>>> Airavata
>>>>>>>>> can
>>>>>>>>>>>>>>> benefit
>>>>>>>>>>>>>>>> immediately from the replication and reliability of
>> cassandra
>>>>>>>> and
>>>>>>>>>>>>>>>> scalability in near future. Some of the model objects like
>>>>>>>>>> experiment
>>>>>>>>>>>>>>>> creation will need to have strong consistency and most of
>> the
>>>>>>>>>>>>>> monitoring
>>>>>>>>>>>>>>>> can live with eventual consistency.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Critical comments please?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks for your time,
>>>>>>>>>>>>>>>> Suresh
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> [1] -
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>>>>>>>>>>>>>>>> [2] - https://dev.evernote.com/doc/
>>>>>>>>>>>>>>>> [3] -
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>>>>>>>>>>>>>>>> [4] -
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>>>>>>>>>>>>>>>> [5] - https://github.com/MisterTea/ZombieDB
>>>>>>>>>>>>>>>> [6] - https://github.com/Netflix/astyanax
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Supun Kamburugamuva
>>>>>>>>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>>>>>>>>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>>>>>>>>>>>>>> Blog: http://supunk.blogspot.com
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Milinda Pathirage
>>>>>>>>>>>>>> PhD Student Indiana University, Bloomington;
>>>>>>>>>>>>>> E-mail: milinda.pathirage@gmail.com
>>>>>>>>>>>>>> Web: http://mpathirage.com
>>>>>>>>>>>>>> Blog: http://blog.mpathirage.com
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> System Analyst Programmer
>>>>>>>>>>>>> PTI Lab
>>>>>>>>>>>>> Indiana University
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Shameera Rathnayaka.
>>>>>>>> 
>>>>>>>> email: shameera AT apache.org , shameerainfo AT gmail.com
>>>>>>>> Blog : http://shameerarathnayaka.blogspot.com/
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Supun Kamburugamuva
>>>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>>>>>> Blog: http://supunk.blogspot.com
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Object Database Suggestions for Airavata Registry

Posted by Eran Chinthaka Withana <er...@gmail.com>.

Oops, in that case, Suresh, can you please create one?

Thanks,
Eran Chinthaka Withana


On Sun, Mar 2, 2014 at 4:28 PM, Suresh Marru <sm...@apache.org> wrote:

> Hi Eran,
>
> Is this a On-Air event? Previously I had trouble changing the previously
> scheduled event to On-Air.
>
> If you are creating a new hangout, can you first create it on G+ Airavata
> Community (all PMC Members are moderators on this community). This will be
> easier for archival reference -
> https://plus.google.com/communities/100700433662281905708
>
> Suresh
>
> On Mar 2, 2014, at 7:21 PM, Eran Chinthaka Withana <
> eran.chinthaka@gmail.com> wrote:
>
> > Here is the link to hangout:
> >
> https://plus.google.com/hangouts/_/event/c1sgvk7dha37rkr0adktb195lgc?authuser=0&hl=en
> >
> > Thanks,
> > Eran Chinthaka Withana
> >
> >
> > On Sun, Mar 2, 2014 at 12:46 PM, Suresh Marru <sm...@apache.org> wrote:
> >
> >> Hi All,
> >>
> >> Since Eran has been the one who first proposed the hangout and has
> >> specific suggestion on this thread I prefer to postpone to 8pm (EST).
> But
> >> if others planned for 4pm, lets goahead with the plan.
> >>
> >> Any one who planned to attend now cannot make it at 8pm (EST)? If do not
> >> hear any objections lets shoot for 8pm. Otherwise, lets go as planned.
> >>
> >> Cheers,
> >> Suresh
> >>
> >> On Mar 2, 2014, at 3:31 PM, Eran Chinthaka Withana <
> >> eran.chinthaka@gmail.com> wrote:
> >>
> >>> Hi Suresh,
> >>>
> >>> Sorry for the late reply. I don't think I can make it at 1pm PST today.
> >> Can
> >>> we please re-schedule this to 5pm PST (8pm EST) or later?
> >>>
> >>> Thanks,
> >>> Eran Chinthaka Withana
> >>>
> >>>
> >>> On Sun, Mar 2, 2014 at 6:38 AM, Suresh Marru <sm...@apache.org>
> wrote:
> >>>
> >>>> Hi All,
> >>>>
> >>>> Great to see we have a good quorum. So how about 4pm EST (1pm PST)
> today
> >>>> with a hangout on air. It works best if we start a a hangout then
> >> (previous
> >>>> attempts to pre-schedules on-air events did not work well. So please
> >> check
> >>>> this mailing list around 4pm EST for the hangout on air link.
> >>>>
> >>>> Meanwhile, please join the Airavata Google Plus community, that might
> be
> >>>> easier to share the link -
> >>>> https://plus.google.com/communities/100700433662281905708
> >>>>
> >>>> Thanks all for willing to take time on a sunday,
> >>>> Suresh
> >>>>
> >>>> On Feb 28, 2014, at 9:15 PM, Supun Kamburugamuva <su...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> +1 for Sunday afternoon. I can make it after 4 pm EST.
> >>>>>
> >>>>> Thanks,
> >>>>> Supun..
> >>>>>
> >>>>>
> >>>>> On Fri, Feb 28, 2014 at 5:04 PM, Shameera Rathnayaka <
> >>>> shameerainfo@gmail.com
> >>>>>> wrote:
> >>>>>
> >>>>>> +1
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Shameera.
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Mar 1, 2014 at 3:11 AM, Eran Chinthaka Withana <
> >>>>>> eran.chinthaka@gmail.com> wrote:
> >>>>>>
> >>>>>>> +1 for Sunday afternoon
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Eran Chinthaka Withana
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Feb 28, 2014 at 5:17 AM, Suresh Marru <sm...@apache.org>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Eran,
> >>>>>>>>
> >>>>>>>> This is a great idea. I myself owe few replies on this thread and
> >>>>>> unable
> >>>>>>>> to take time to comprehend my thoughts (and realized I should take
> >>>> time
> >>>>>>> to
> >>>>>>>> properly articulate the challenges otherwise we will be discussing
> >>>>>>>> orthogonal issues).
> >>>>>>>>
> >>>>>>>> A hangout will help us brainstorm more comprehensively. We can
> have
> >> it
> >>>>>> on
> >>>>>>>> air so we can refer back for archival purposes. How is Sunday
> >>>> afternoon
> >>>>>>> for
> >>>>>>>> everyone willing to join and contribute?
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Suresh
> >>>>>>>>
> >>>>>>>> On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <
> >>>>>>>> eran.chinthaka@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> Is there any chance of hosting a google hangout to talk about
> >> this. I
> >>>>>>>> think
> >>>>>>>>> with long emails and multiple directions things are getting
> little
> >>>>>> bit
> >>>>>>>>> confusing in thread (I'm partly responsible for this :) ). I can
> >>>>>> join a
> >>>>>>>>> video chat during a weekend but lets make sure its convenient for
> >>>>>> both
> >>>>>>>> east
> >>>>>>>>> and west coasts :)
> >>>>>>>>>
> >>>>>>>>> WDYT?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Eran Chinthaka Withana
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <smarru@apache.org
> >
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> I could respond to each thread in detail, but I see the general
> >>>>>> sense
> >>>>>>> is
> >>>>>>>>>> inquiring on the use case, so let me try and explain this and
> see
> >> if
> >>>>>>> it
> >>>>>>>>>> comes across. I am fully onboard with perceptions of relational
> vs
> >>>>>>> nosql
> >>>>>>>>>> and also agree current Airavata needs are not a direct map for
> >> NoSQL
> >>>>>>>>>> migration. I will summarize the driving motivation:
> >>>>>>>>>>
> >>>>>>>>>> Background: The key problem Airavata needs to solve is getting
> the
> >>>>>> API
> >>>>>>>> and
> >>>>>>>>>> associated data model right. The problem is current relational
> >>>>>>> database
> >>>>>>>>>> (with OpenJPA overlay) is severely limiting the API evolution.
> >>>>>> Science
> >>>>>>>>>> Gateways by nature are very science domain and use-case
> specific.
> >>>>>> But
> >>>>>>>>>> Airavata is tackling this challenging problem of providing a
> >> generic
> >>>>>>> API
> >>>>>>>>>> which will meet and enable these use case centric integration.
> The
> >>>>>>> issue
> >>>>>>>>>> here is, we are designing an API to handle a wide range of known
> >>>>>> (and
> >>>>>>>> some
> >>>>>>>>>> foreseen) use cases. But at the same time trying to keep it
> simple
> >>>>>> and
> >>>>>>>> yet
> >>>>>>>>>> flexible. The only way we can get through a reasonable,
> normalized
> >>>>>>>> version
> >>>>>>>>>> of API is by hands-on programming against the API. Within the
> >>>>>> Airavata
> >>>>>>>> PMC
> >>>>>>>>>> itself, we can solicit a half-a-dozen different ways on how to
> >>>>>>> visualize
> >>>>>>>>>> the data model. And we need few hackethon's with real-end users
> of
> >>>>>>>> Airavata
> >>>>>>>>>> until we find a common ground. All of this needs rapid
> >> prototyping.
> >>>>>>>>>> Currently a slight change in the data model is taking close to
> two
> >>>>>>>> weeks of
> >>>>>>>>>> re-arcitecting the Open-JPA based registry. There are many known
> >>>>>>>> problems
> >>>>>>>>>> with current draft of data model which have to be put-down in
> the
> >>>>>>>> interest
> >>>>>>>>>> of making over all system progress.
> >>>>>>>>>>
> >>>>>>>>>> So the driving motivation is not certainly any of the classic
> >> NoSQL
> >>>>>>>> needs.
> >>>>>>>>>> But a simple one, can we have registry which is schema-agnostic
> >> and
> >>>>>>> yet
> >>>>>>>> is
> >>>>>>>>>> queriable for most of the fields in the model? Can we try 10
> >>>>>> different
> >>>>>>>>>> variants of data model (hence API) within the next 3 months with
> >>>>>>> focused
> >>>>>>>>>> hackethon's and arrive at a stable 1.0 version of API?
> >>>>>>>>>>
> >>>>>>>>>> Part one is the discussion is successful that it raised every
> >> one's
> >>>>>>> eye
> >>>>>>>>>> brows. Now that we have every one's attention, what will be a
> good
> >>>>>>> data
> >>>>>>>>>> store for Airavata which will meet these needs?
> >>>>>>>>>>
> >>>>>>>>>> P.S: Additional background: The API has been in development for
> >>>>>> close
> >>>>>>>> to 3
> >>>>>>>>>> years and is falling short of pleasing a majority. Many academic
> >>>>>>>>>> standardization efforts fail terribly trying to pretend to
> >>>>>> understand
> >>>>>>>> all
> >>>>>>>>>> use cases and proposing a standard way (which ends up
> >> unnecessarily
> >>>>>>>> complex
> >>>>>>>>>> and not usable). Science by nature is evolutionary, and
> >> restricting
> >>>>>>> the
> >>>>>>>>>> capabilities by a known set of use cases prevents the use of
> >>>>>>> middleware
> >>>>>>>> for
> >>>>>>>>>> real-scientific research (and gets limited to proof of concept
> >>>>>>>>>> demonstrations, papers, educational use). The only way meeting
> the
> >>>>>>>>>> challenges of these evolving needs is to have the framework
> which
> >>>>>> can
> >>>>>>>>>> evolve with minimal disruption.
> >>>>>>>>>>
> >>>>>>>>>> Great thoughts so far, please keep 'em coming until we can find
> a
> >>>>>>>> solution
> >>>>>>>>>> not by the technical fancies but to address the real need.
> >>>>>>>>>>
> >>>>>>>>>> Cheers,
> >>>>>>>>>> Suresh
> >>>>>>>>>>
> >>>>>>>>>> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <
> >> glahiru@gmail.com
> >>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
> >>>>>>>>>>> milinda.pathirage@gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> I also think that moving to Cassandra or any other NoSQL will
> >> add
> >>>>>>>>>>>> unneccessary complexity to your solution. Also designing
> proper
> >>>>>>> (easy
> >>>>>>>> to
> >>>>>>>>>>>> manage changes, easy to query) NoSQL data models are hard
> >> (AFAIK,
> >>>>>>>>>> require
> >>>>>>>>>>>> lots of experience and understanding about data structures and
> >>>>>>>> queries).
> >>>>>>>>>>>> Also migrating from one NoSQL technology to other can require
> >>>>>>> complete
> >>>>>>>>>>>> re-write. And current relational databases can handle heavy
> >> loads
> >>>>>>>> except
> >>>>>>>>>>>> Google, Twitter, Amazon and Facebook like loads. I don't think
> >>>>>>>> Airavata
> >>>>>>>>>>>> will see Google and Amazon like loads.
> >>>>>>>>>>>>
> >>>>>>>>>>> +1
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> If the constant changes to the data model is the problem , I
> >> think
> >>>>>>>> best
> >>>>>>>>>>>> option is to abstract registry implementation to something
> like
> >>>>>>>>>> collections
> >>>>>>>>>>>> and resources used in WSO2 Registry [1] or something suitable
> >> for
> >>>>>>>>>> Airavata
> >>>>>>>>>>>> context. That will make it easy to handle changes in data
> model.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Also don't let the technologies drive design decision. Its
> >> always
> >>>>>>>>>> better to
> >>>>>>>>>>>> let use cases drive the design decision.
> >>>>>>>>>>>>
> >>>>>>>>>>> +1
> >>>>>>>>>>>
> >>>>>>>>>>> Regards
> >>>>>>>>>>> Lahiru
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks
> >>>>>>>>>>>> Milinda
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1] http://wso2.com/products/governance-registry/
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
> >>>>>>>>>> supun06@gmail.com
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'm not trying to discourage you on your exploration to NoSQL
> >>>>>>>>>> databases.
> >>>>>>>>>>>> I
> >>>>>>>>>>>>> have the following concern.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Your database schema is moderately complex - even for a RDBMS
> >> it
> >>>>>>>> seems
> >>>>>>>>>>>>> complex and the data size is relatively small. I'm not sure
> >> about
> >>>>>>> the
> >>>>>>>>>>>>> current tools available but I think you will need to write
> more
> >>>>>>> code
> >>>>>>>> to
> >>>>>>>>>>>>> support all your requirements in a NoSQL database. So writing
> >>>>>> more
> >>>>>>>> code
> >>>>>>>>>>>> and
> >>>>>>>>>>>>> allow redundancy to support *relatively small* and
> *structured
> >>>>>>>>>>>>> data*doesn't seem right to me. May be I'm wrong and there are
> >>>>>>> better
> >>>>>>>>>>>>> tools in
> >>>>>>>>>>>>> NoSQL than RDBMS, which I doubt.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Supun..
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <
> >> smarru@apache.org
> >>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi All,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Airavata is actively migrating to use Thrift API for the
> >>>>>> RESTless
> >>>>>>>>>>>> design
> >>>>>>>>>>>>>> and to facilitate various language bindings from client
> >>>>>> gateways.
> >>>>>>>> The
> >>>>>>>>>>>>>> programming language support in thrift has been so far very
> >>>>>>>>>>>> encouraging.
> >>>>>>>>>>>>>> The current architecture is looking like Figure 1 at [1].
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Language specific clients will be released as thrift SDK's
> >>>>>>> (similar
> >>>>>>>> to
> >>>>>>>>>>>>>> evernote sdk's [1]). These clients will be integrated into
> >>>>>> gateway
> >>>>>>>>>>>>> portals
> >>>>>>>>>>>>>> which connect to the API Server. The API operations brokers
> he
> >>>>>>>> simple
> >>>>>>>>>>>>> calls
> >>>>>>>>>>>>>> into one or more backend CPI calls (Airavata internal
> >> component
> >>>>>>>>>>>>>> interfaces).  An example set of mappings are illustrated in
> >>>>>>> Figure 2
> >>>>>>>>>> at
> >>>>>>>>>>>>>> [1]. The current draft of thrift API for version 0.12 is at
> >> [3],
> >>>>>>>>>> please
> >>>>>>>>>>>>> pay
> >>>>>>>>>>>>>> attention to experiment model at [4].
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For the persistent store, we had few iterations of Airavata
> >>>>>>> Registry
> >>>>>>>>>>>>>> shifting from a legacy XRegistry to JackRabbit to now a
> >> OpenJPA
> >>>>>>>> based
> >>>>>>>>>>>>>> registry. To allow the API and the associated data models to
> >>>>>>> evolve,
> >>>>>>>>>> it
> >>>>>>>>>>>>>> will be useful to explore object databases so we can store
> the
> >>>>>>>>>>>> serialized
> >>>>>>>>>>>>>> version of thrift objects directly. But it will be nice to
> >> have
> >>>>>>> all
> >>>>>>>>>> (or
> >>>>>>>>>>>>>> most) of the fields queriable. This calls for a more
> >>>>>> column-family
> >>>>>>>>>>>> design
> >>>>>>>>>>>>>> of any NoSQL approaches.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Any recommendations for a registry architecture?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Quickly hacking through I find the following approach a
> viable
> >>>>>>> one:
> >>>>>>>>>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra.
> >> Airavata
> >>>>>>> can
> >>>>>>>>>>>>> benefit
> >>>>>>>>>>>>>> immediately from the replication and reliability of
> cassandra
> >>>>>> and
> >>>>>>>>>>>>>> scalability in near future. Some of the model objects like
> >>>>>>>> experiment
> >>>>>>>>>>>>>> creation will need to have strong consistency and most of
> the
> >>>>>>>>>>>> monitoring
> >>>>>>>>>>>>>> can live with eventual consistency.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Critical comments please?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks for your time,
> >>>>>>>>>>>>>> Suresh
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> [1] -
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> >>>>>>>>>>>>>> [2] - https://dev.evernote.com/doc/
> >>>>>>>>>>>>>> [3] -
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> >>>>>>>>>>>>>> [4] -
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> >>>>>>>>>>>>>> [5] - https://github.com/MisterTea/ZombieDB
> >>>>>>>>>>>>>> [6] - https://github.com/Netflix/astyanax
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Supun Kamburugamuva
> >>>>>>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
> >>>>>>>>>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> >>>>>>>>>>>>> Blog: http://supunk.blogspot.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Milinda Pathirage
> >>>>>>>>>>>> PhD Student Indiana University, Bloomington;
> >>>>>>>>>>>> E-mail: milinda.pathirage@gmail.com
> >>>>>>>>>>>> Web: http://mpathirage.com
> >>>>>>>>>>>> Blog: http://blog.mpathirage.com
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> System Analyst Programmer
> >>>>>>>>>>> PTI Lab
> >>>>>>>>>>> Indiana University
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Best Regards,
> >>>>>> Shameera Rathnayaka.
> >>>>>>
> >>>>>> email: shameera AT apache.org , shameerainfo AT gmail.com
> >>>>>> Blog : http://shameerarathnayaka.blogspot.com/
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Supun Kamburugamuva
> >>>>> Member, Apache Software Foundation; http://www.apache.org
> >>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> >>>>> Blog: http://supunk.blogspot.com
> >>>>
> >>>>
> >>
> >>
>
>

Re: Object Database Suggestions for Airavata Registry

Posted by Suresh Marru <sm...@apache.org>.

Hi Eran,

Is this a On-Air event? Previously I had trouble changing the previously scheduled event to On-Air. 

If you are creating a new hangout, can you first create it on G+ Airavata Community (all PMC Members are moderators on this community). This will be easier for archival reference - https://plus.google.com/communities/100700433662281905708

Suresh

On Mar 2, 2014, at 7:21 PM, Eran Chinthaka Withana <er...@gmail.com> wrote:

> Here is the link to hangout:
> https://plus.google.com/hangouts/_/event/c1sgvk7dha37rkr0adktb195lgc?authuser=0&hl=en
> 
> Thanks,
> Eran Chinthaka Withana
> 
> 
> On Sun, Mar 2, 2014 at 12:46 PM, Suresh Marru <sm...@apache.org> wrote:
> 
>> Hi All,
>> 
>> Since Eran has been the one who first proposed the hangout and has
>> specific suggestion on this thread I prefer to postpone to 8pm (EST). But
>> if others planned for 4pm, lets goahead with the plan.
>> 
>> Any one who planned to attend now cannot make it at 8pm (EST)? If do not
>> hear any objections lets shoot for 8pm. Otherwise, lets go as planned.
>> 
>> Cheers,
>> Suresh
>> 
>> On Mar 2, 2014, at 3:31 PM, Eran Chinthaka Withana <
>> eran.chinthaka@gmail.com> wrote:
>> 
>>> Hi Suresh,
>>> 
>>> Sorry for the late reply. I don't think I can make it at 1pm PST today.
>> Can
>>> we please re-schedule this to 5pm PST (8pm EST) or later?
>>> 
>>> Thanks,
>>> Eran Chinthaka Withana
>>> 
>>> 
>>> On Sun, Mar 2, 2014 at 6:38 AM, Suresh Marru <sm...@apache.org> wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> Great to see we have a good quorum. So how about 4pm EST (1pm PST) today
>>>> with a hangout on air. It works best if we start a a hangout then
>> (previous
>>>> attempts to pre-schedules on-air events did not work well. So please
>> check
>>>> this mailing list around 4pm EST for the hangout on air link.
>>>> 
>>>> Meanwhile, please join the Airavata Google Plus community, that might be
>>>> easier to share the link -
>>>> https://plus.google.com/communities/100700433662281905708
>>>> 
>>>> Thanks all for willing to take time on a sunday,
>>>> Suresh
>>>> 
>>>> On Feb 28, 2014, at 9:15 PM, Supun Kamburugamuva <su...@gmail.com>
>>>> wrote:
>>>> 
>>>>> +1 for Sunday afternoon. I can make it after 4 pm EST.
>>>>> 
>>>>> Thanks,
>>>>> Supun..
>>>>> 
>>>>> 
>>>>> On Fri, Feb 28, 2014 at 5:04 PM, Shameera Rathnayaka <
>>>> shameerainfo@gmail.com
>>>>>> wrote:
>>>>> 
>>>>>> +1
>>>>>> 
>>>>>> Thanks,
>>>>>> Shameera.
>>>>>> 
>>>>>> 
>>>>>> On Sat, Mar 1, 2014 at 3:11 AM, Eran Chinthaka Withana <
>>>>>> eran.chinthaka@gmail.com> wrote:
>>>>>> 
>>>>>>> +1 for Sunday afternoon
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Eran Chinthaka Withana
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Feb 28, 2014 at 5:17 AM, Suresh Marru <sm...@apache.org>
>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Eran,
>>>>>>>> 
>>>>>>>> This is a great idea. I myself owe few replies on this thread and
>>>>>> unable
>>>>>>>> to take time to comprehend my thoughts (and realized I should take
>>>> time
>>>>>>> to
>>>>>>>> properly articulate the challenges otherwise we will be discussing
>>>>>>>> orthogonal issues).
>>>>>>>> 
>>>>>>>> A hangout will help us brainstorm more comprehensively. We can have
>> it
>>>>>> on
>>>>>>>> air so we can refer back for archival purposes. How is Sunday
>>>> afternoon
>>>>>>> for
>>>>>>>> everyone willing to join and contribute?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Suresh
>>>>>>>> 
>>>>>>>> On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <
>>>>>>>> eran.chinthaka@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Is there any chance of hosting a google hangout to talk about
>> this. I
>>>>>>>> think
>>>>>>>>> with long emails and multiple directions things are getting little
>>>>>> bit
>>>>>>>>> confusing in thread (I'm partly responsible for this :) ). I can
>>>>>> join a
>>>>>>>>> video chat during a weekend but lets make sure its convenient for
>>>>>> both
>>>>>>>> east
>>>>>>>>> and west coasts :)
>>>>>>>>> 
>>>>>>>>> WDYT?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Eran Chinthaka Withana
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <sm...@apache.org>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> I could respond to each thread in detail, but I see the general
>>>>>> sense
>>>>>>> is
>>>>>>>>>> inquiring on the use case, so let me try and explain this and see
>> if
>>>>>>> it
>>>>>>>>>> comes across. I am fully onboard with perceptions of relational vs
>>>>>>> nosql
>>>>>>>>>> and also agree current Airavata needs are not a direct map for
>> NoSQL
>>>>>>>>>> migration. I will summarize the driving motivation:
>>>>>>>>>> 
>>>>>>>>>> Background: The key problem Airavata needs to solve is getting the
>>>>>> API
>>>>>>>> and
>>>>>>>>>> associated data model right. The problem is current relational
>>>>>>> database
>>>>>>>>>> (with OpenJPA overlay) is severely limiting the API evolution.
>>>>>> Science
>>>>>>>>>> Gateways by nature are very science domain and use-case specific.
>>>>>> But
>>>>>>>>>> Airavata is tackling this challenging problem of providing a
>> generic
>>>>>>> API
>>>>>>>>>> which will meet and enable these use case centric integration. The
>>>>>>> issue
>>>>>>>>>> here is, we are designing an API to handle a wide range of known
>>>>>> (and
>>>>>>>> some
>>>>>>>>>> foreseen) use cases. But at the same time trying to keep it simple
>>>>>> and
>>>>>>>> yet
>>>>>>>>>> flexible. The only way we can get through a reasonable, normalized
>>>>>>>> version
>>>>>>>>>> of API is by hands-on programming against the API. Within the
>>>>>> Airavata
>>>>>>>> PMC
>>>>>>>>>> itself, we can solicit a half-a-dozen different ways on how to
>>>>>>> visualize
>>>>>>>>>> the data model. And we need few hackethon's with real-end users of
>>>>>>>> Airavata
>>>>>>>>>> until we find a common ground. All of this needs rapid
>> prototyping.
>>>>>>>>>> Currently a slight change in the data model is taking close to two
>>>>>>>> weeks of
>>>>>>>>>> re-arcitecting the Open-JPA based registry. There are many known
>>>>>>>> problems
>>>>>>>>>> with current draft of data model which have to be put-down in the
>>>>>>>> interest
>>>>>>>>>> of making over all system progress.
>>>>>>>>>> 
>>>>>>>>>> So the driving motivation is not certainly any of the classic
>> NoSQL
>>>>>>>> needs.
>>>>>>>>>> But a simple one, can we have registry which is schema-agnostic
>> and
>>>>>>> yet
>>>>>>>> is
>>>>>>>>>> queriable for most of the fields in the model? Can we try 10
>>>>>> different
>>>>>>>>>> variants of data model (hence API) within the next 3 months with
>>>>>>> focused
>>>>>>>>>> hackethon's and arrive at a stable 1.0 version of API?
>>>>>>>>>> 
>>>>>>>>>> Part one is the discussion is successful that it raised every
>> one's
>>>>>>> eye
>>>>>>>>>> brows. Now that we have every one's attention, what will be a good
>>>>>>> data
>>>>>>>>>> store for Airavata which will meet these needs?
>>>>>>>>>> 
>>>>>>>>>> P.S: Additional background: The API has been in development for
>>>>>> close
>>>>>>>> to 3
>>>>>>>>>> years and is falling short of pleasing a majority. Many academic
>>>>>>>>>> standardization efforts fail terribly trying to pretend to
>>>>>> understand
>>>>>>>> all
>>>>>>>>>> use cases and proposing a standard way (which ends up
>> unnecessarily
>>>>>>>> complex
>>>>>>>>>> and not usable). Science by nature is evolutionary, and
>> restricting
>>>>>>> the
>>>>>>>>>> capabilities by a known set of use cases prevents the use of
>>>>>>> middleware
>>>>>>>> for
>>>>>>>>>> real-scientific research (and gets limited to proof of concept
>>>>>>>>>> demonstrations, papers, educational use). The only way meeting the
>>>>>>>>>> challenges of these evolving needs is to have the framework which
>>>>>> can
>>>>>>>>>> evolve with minimal disruption.
>>>>>>>>>> 
>>>>>>>>>> Great thoughts so far, please keep 'em coming until we can find a
>>>>>>>> solution
>>>>>>>>>> not by the technical fancies but to address the real need.
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> Suresh
>>>>>>>>>> 
>>>>>>>>>> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <
>> glahiru@gmail.com
>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
>>>>>>>>>>> milinda.pathirage@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I also think that moving to Cassandra or any other NoSQL will
>> add
>>>>>>>>>>>> unneccessary complexity to your solution. Also designing proper
>>>>>>> (easy
>>>>>>>> to
>>>>>>>>>>>> manage changes, easy to query) NoSQL data models are hard
>> (AFAIK,
>>>>>>>>>> require
>>>>>>>>>>>> lots of experience and understanding about data structures and
>>>>>>>> queries).
>>>>>>>>>>>> Also migrating from one NoSQL technology to other can require
>>>>>>> complete
>>>>>>>>>>>> re-write. And current relational databases can handle heavy
>> loads
>>>>>>>> except
>>>>>>>>>>>> Google, Twitter, Amazon and Facebook like loads. I don't think
>>>>>>>> Airavata
>>>>>>>>>>>> will see Google and Amazon like loads.
>>>>>>>>>>>> 
>>>>>>>>>>> +1
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> If the constant changes to the data model is the problem , I
>> think
>>>>>>>> best
>>>>>>>>>>>> option is to abstract registry implementation to something like
>>>>>>>>>> collections
>>>>>>>>>>>> and resources used in WSO2 Registry [1] or something suitable
>> for
>>>>>>>>>> Airavata
>>>>>>>>>>>> context. That will make it easy to handle changes in data model.
>>>>>>>>>>>> 
>>>>>>>>>>>> Also don't let the technologies drive design decision. Its
>> always
>>>>>>>>>> better to
>>>>>>>>>>>> let use cases drive the design decision.
>>>>>>>>>>>> 
>>>>>>>>>>> +1
>>>>>>>>>>> 
>>>>>>>>>>> Regards
>>>>>>>>>>> Lahiru
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Milinda
>>>>>>>>>>>> 
>>>>>>>>>>>> [1] http://wso2.com/products/governance-registry/
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
>>>>>>>>>> supun06@gmail.com
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm not trying to discourage you on your exploration to NoSQL
>>>>>>>>>> databases.
>>>>>>>>>>>> I
>>>>>>>>>>>>> have the following concern.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Your database schema is moderately complex - even for a RDBMS
>> it
>>>>>>>> seems
>>>>>>>>>>>>> complex and the data size is relatively small. I'm not sure
>> about
>>>>>>> the
>>>>>>>>>>>>> current tools available but I think you will need to write more
>>>>>>> code
>>>>>>>> to
>>>>>>>>>>>>> support all your requirements in a NoSQL database. So writing
>>>>>> more
>>>>>>>> code
>>>>>>>>>>>> and
>>>>>>>>>>>>> allow redundancy to support *relatively small* and *structured
>>>>>>>>>>>>> data*doesn't seem right to me. May be I'm wrong and there are
>>>>>>> better
>>>>>>>>>>>>> tools in
>>>>>>>>>>>>> NoSQL than RDBMS, which I doubt.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Supun..
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <
>> smarru@apache.org
>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Airavata is actively migrating to use Thrift API for the
>>>>>> RESTless
>>>>>>>>>>>> design
>>>>>>>>>>>>>> and to facilitate various language bindings from client
>>>>>> gateways.
>>>>>>>> The
>>>>>>>>>>>>>> programming language support in thrift has been so far very
>>>>>>>>>>>> encouraging.
>>>>>>>>>>>>>> The current architecture is looking like Figure 1 at [1].
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Language specific clients will be released as thrift SDK's
>>>>>>> (similar
>>>>>>>> to
>>>>>>>>>>>>>> evernote sdk's [1]). These clients will be integrated into
>>>>>> gateway
>>>>>>>>>>>>> portals
>>>>>>>>>>>>>> which connect to the API Server. The API operations brokers he
>>>>>>>> simple
>>>>>>>>>>>>> calls
>>>>>>>>>>>>>> into one or more backend CPI calls (Airavata internal
>> component
>>>>>>>>>>>>>> interfaces).  An example set of mappings are illustrated in
>>>>>>> Figure 2
>>>>>>>>>> at
>>>>>>>>>>>>>> [1]. The current draft of thrift API for version 0.12 is at
>> [3],
>>>>>>>>>> please
>>>>>>>>>>>>> pay
>>>>>>>>>>>>>> attention to experiment model at [4].
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> For the persistent store, we had few iterations of Airavata
>>>>>>> Registry
>>>>>>>>>>>>>> shifting from a legacy XRegistry to JackRabbit to now a
>> OpenJPA
>>>>>>>> based
>>>>>>>>>>>>>> registry. To allow the API and the associated data models to
>>>>>>> evolve,
>>>>>>>>>> it
>>>>>>>>>>>>>> will be useful to explore object databases so we can store the
>>>>>>>>>>>> serialized
>>>>>>>>>>>>>> version of thrift objects directly. But it will be nice to
>> have
>>>>>>> all
>>>>>>>>>> (or
>>>>>>>>>>>>>> most) of the fields queriable. This calls for a more
>>>>>> column-family
>>>>>>>>>>>> design
>>>>>>>>>>>>>> of any NoSQL approaches.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Any recommendations for a registry architecture?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Quickly hacking through I find the following approach a viable
>>>>>>> one:
>>>>>>>>>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra.
>> Airavata
>>>>>>> can
>>>>>>>>>>>>> benefit
>>>>>>>>>>>>>> immediately from the replication and reliability of cassandra
>>>>>> and
>>>>>>>>>>>>>> scalability in near future. Some of the model objects like
>>>>>>>> experiment
>>>>>>>>>>>>>> creation will need to have strong consistency and most of the
>>>>>>>>>>>> monitoring
>>>>>>>>>>>>>> can live with eventual consistency.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Critical comments please?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for your time,
>>>>>>>>>>>>>> Suresh
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [1] -
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>>>>>>>>>>>>>> [2] - https://dev.evernote.com/doc/
>>>>>>>>>>>>>> [3] -
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>>>>>>>>>>>>>> [4] -
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>>>>>>>>>>>>>> [5] - https://github.com/MisterTea/ZombieDB
>>>>>>>>>>>>>> [6] - https://github.com/Netflix/astyanax
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Supun Kamburugamuva
>>>>>>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>>>>>>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>>>>>>>>>>>> Blog: http://supunk.blogspot.com
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Milinda Pathirage
>>>>>>>>>>>> PhD Student Indiana University, Bloomington;
>>>>>>>>>>>> E-mail: milinda.pathirage@gmail.com
>>>>>>>>>>>> Web: http://mpathirage.com
>>>>>>>>>>>> Blog: http://blog.mpathirage.com
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> System Analyst Programmer
>>>>>>>>>>> PTI Lab
>>>>>>>>>>> Indiana University
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best Regards,
>>>>>> Shameera Rathnayaka.
>>>>>> 
>>>>>> email: shameera AT apache.org , shameerainfo AT gmail.com
>>>>>> Blog : http://shameerarathnayaka.blogspot.com/
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Supun Kamburugamuva
>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>>>> Blog: http://supunk.blogspot.com
>>>> 
>>>> 
>> 
>>

Re: Object Database Suggestions for Airavata Registry

Posted by Eran Chinthaka Withana <er...@gmail.com>.

Here is the link to hangout:
https://plus.google.com/hangouts/_/event/c1sgvk7dha37rkr0adktb195lgc?authuser=0&hl=en

Thanks,
Eran Chinthaka Withana


On Sun, Mar 2, 2014 at 12:46 PM, Suresh Marru <sm...@apache.org> wrote:

> Hi All,
>
> Since Eran has been the one who first proposed the hangout and has
> specific suggestion on this thread I prefer to postpone to 8pm (EST). But
> if others planned for 4pm, lets goahead with the plan.
>
> Any one who planned to attend now cannot make it at 8pm (EST)? If do not
> hear any objections lets shoot for 8pm. Otherwise, lets go as planned.
>
> Cheers,
> Suresh
>
> On Mar 2, 2014, at 3:31 PM, Eran Chinthaka Withana <
> eran.chinthaka@gmail.com> wrote:
>
> > Hi Suresh,
> >
> > Sorry for the late reply. I don't think I can make it at 1pm PST today.
> Can
> > we please re-schedule this to 5pm PST (8pm EST) or later?
> >
> > Thanks,
> > Eran Chinthaka Withana
> >
> >
> > On Sun, Mar 2, 2014 at 6:38 AM, Suresh Marru <sm...@apache.org> wrote:
> >
> >> Hi All,
> >>
> >> Great to see we have a good quorum. So how about 4pm EST (1pm PST) today
> >> with a hangout on air. It works best if we start a a hangout then
> (previous
> >> attempts to pre-schedules on-air events did not work well. So please
> check
> >> this mailing list around 4pm EST for the hangout on air link.
> >>
> >> Meanwhile, please join the Airavata Google Plus community, that might be
> >> easier to share the link -
> >> https://plus.google.com/communities/100700433662281905708
> >>
> >> Thanks all for willing to take time on a sunday,
> >> Suresh
> >>
> >> On Feb 28, 2014, at 9:15 PM, Supun Kamburugamuva <su...@gmail.com>
> >> wrote:
> >>
> >>> +1 for Sunday afternoon. I can make it after 4 pm EST.
> >>>
> >>> Thanks,
> >>> Supun..
> >>>
> >>>
> >>> On Fri, Feb 28, 2014 at 5:04 PM, Shameera Rathnayaka <
> >> shameerainfo@gmail.com
> >>>> wrote:
> >>>
> >>>> +1
> >>>>
> >>>> Thanks,
> >>>> Shameera.
> >>>>
> >>>>
> >>>> On Sat, Mar 1, 2014 at 3:11 AM, Eran Chinthaka Withana <
> >>>> eran.chinthaka@gmail.com> wrote:
> >>>>
> >>>>> +1 for Sunday afternoon
> >>>>>
> >>>>> Thanks,
> >>>>> Eran Chinthaka Withana
> >>>>>
> >>>>>
> >>>>> On Fri, Feb 28, 2014 at 5:17 AM, Suresh Marru <sm...@apache.org>
> >> wrote:
> >>>>>
> >>>>>> Hi Eran,
> >>>>>>
> >>>>>> This is a great idea. I myself owe few replies on this thread and
> >>>> unable
> >>>>>> to take time to comprehend my thoughts (and realized I should take
> >> time
> >>>>> to
> >>>>>> properly articulate the challenges otherwise we will be discussing
> >>>>>> orthogonal issues).
> >>>>>>
> >>>>>> A hangout will help us brainstorm more comprehensively. We can have
> it
> >>>> on
> >>>>>> air so we can refer back for archival purposes. How is Sunday
> >> afternoon
> >>>>> for
> >>>>>> everyone willing to join and contribute?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Suresh
> >>>>>>
> >>>>>> On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <
> >>>>>> eran.chinthaka@gmail.com> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Is there any chance of hosting a google hangout to talk about
> this. I
> >>>>>> think
> >>>>>>> with long emails and multiple directions things are getting little
> >>>> bit
> >>>>>>> confusing in thread (I'm partly responsible for this :) ). I can
> >>>> join a
> >>>>>>> video chat during a weekend but lets make sure its convenient for
> >>>> both
> >>>>>> east
> >>>>>>> and west coasts :)
> >>>>>>>
> >>>>>>> WDYT?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Eran Chinthaka Withana
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <sm...@apache.org>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> I could respond to each thread in detail, but I see the general
> >>>> sense
> >>>>> is
> >>>>>>>> inquiring on the use case, so let me try and explain this and see
> if
> >>>>> it
> >>>>>>>> comes across. I am fully onboard with perceptions of relational vs
> >>>>> nosql
> >>>>>>>> and also agree current Airavata needs are not a direct map for
> NoSQL
> >>>>>>>> migration. I will summarize the driving motivation:
> >>>>>>>>
> >>>>>>>> Background: The key problem Airavata needs to solve is getting the
> >>>> API
> >>>>>> and
> >>>>>>>> associated data model right. The problem is current relational
> >>>>> database
> >>>>>>>> (with OpenJPA overlay) is severely limiting the API evolution.
> >>>> Science
> >>>>>>>> Gateways by nature are very science domain and use-case specific.
> >>>> But
> >>>>>>>> Airavata is tackling this challenging problem of providing a
> generic
> >>>>> API
> >>>>>>>> which will meet and enable these use case centric integration. The
> >>>>> issue
> >>>>>>>> here is, we are designing an API to handle a wide range of known
> >>>> (and
> >>>>>> some
> >>>>>>>> foreseen) use cases. But at the same time trying to keep it simple
> >>>> and
> >>>>>> yet
> >>>>>>>> flexible. The only way we can get through a reasonable, normalized
> >>>>>> version
> >>>>>>>> of API is by hands-on programming against the API. Within the
> >>>> Airavata
> >>>>>> PMC
> >>>>>>>> itself, we can solicit a half-a-dozen different ways on how to
> >>>>> visualize
> >>>>>>>> the data model. And we need few hackethon's with real-end users of
> >>>>>> Airavata
> >>>>>>>> until we find a common ground. All of this needs rapid
> prototyping.
> >>>>>>>> Currently a slight change in the data model is taking close to two
> >>>>>> weeks of
> >>>>>>>> re-arcitecting the Open-JPA based registry. There are many known
> >>>>>> problems
> >>>>>>>> with current draft of data model which have to be put-down in the
> >>>>>> interest
> >>>>>>>> of making over all system progress.
> >>>>>>>>
> >>>>>>>> So the driving motivation is not certainly any of the classic
> NoSQL
> >>>>>> needs.
> >>>>>>>> But a simple one, can we have registry which is schema-agnostic
> and
> >>>>> yet
> >>>>>> is
> >>>>>>>> queriable for most of the fields in the model? Can we try 10
> >>>> different
> >>>>>>>> variants of data model (hence API) within the next 3 months with
> >>>>> focused
> >>>>>>>> hackethon's and arrive at a stable 1.0 version of API?
> >>>>>>>>
> >>>>>>>> Part one is the discussion is successful that it raised every
> one's
> >>>>> eye
> >>>>>>>> brows. Now that we have every one's attention, what will be a good
> >>>>> data
> >>>>>>>> store for Airavata which will meet these needs?
> >>>>>>>>
> >>>>>>>> P.S: Additional background: The API has been in development for
> >>>> close
> >>>>>> to 3
> >>>>>>>> years and is falling short of pleasing a majority. Many academic
> >>>>>>>> standardization efforts fail terribly trying to pretend to
> >>>> understand
> >>>>>> all
> >>>>>>>> use cases and proposing a standard way (which ends up
> unnecessarily
> >>>>>> complex
> >>>>>>>> and not usable). Science by nature is evolutionary, and
> restricting
> >>>>> the
> >>>>>>>> capabilities by a known set of use cases prevents the use of
> >>>>> middleware
> >>>>>> for
> >>>>>>>> real-scientific research (and gets limited to proof of concept
> >>>>>>>> demonstrations, papers, educational use). The only way meeting the
> >>>>>>>> challenges of these evolving needs is to have the framework which
> >>>> can
> >>>>>>>> evolve with minimal disruption.
> >>>>>>>>
> >>>>>>>> Great thoughts so far, please keep 'em coming until we can find a
> >>>>>> solution
> >>>>>>>> not by the technical fancies but to address the real need.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Suresh
> >>>>>>>>
> >>>>>>>> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <
> glahiru@gmail.com
> >>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
> >>>>>>>>> milinda.pathirage@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> I also think that moving to Cassandra or any other NoSQL will
> add
> >>>>>>>>>> unneccessary complexity to your solution. Also designing proper
> >>>>> (easy
> >>>>>> to
> >>>>>>>>>> manage changes, easy to query) NoSQL data models are hard
> (AFAIK,
> >>>>>>>> require
> >>>>>>>>>> lots of experience and understanding about data structures and
> >>>>>> queries).
> >>>>>>>>>> Also migrating from one NoSQL technology to other can require
> >>>>> complete
> >>>>>>>>>> re-write. And current relational databases can handle heavy
> loads
> >>>>>> except
> >>>>>>>>>> Google, Twitter, Amazon and Facebook like loads. I don't think
> >>>>>> Airavata
> >>>>>>>>>> will see Google and Amazon like loads.
> >>>>>>>>>>
> >>>>>>>>> +1
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> If the constant changes to the data model is the problem , I
> think
> >>>>>> best
> >>>>>>>>>> option is to abstract registry implementation to something like
> >>>>>>>> collections
> >>>>>>>>>> and resources used in WSO2 Registry [1] or something suitable
> for
> >>>>>>>> Airavata
> >>>>>>>>>> context. That will make it easy to handle changes in data model.
> >>>>>>>>>>
> >>>>>>>>>> Also don't let the technologies drive design decision. Its
> always
> >>>>>>>> better to
> >>>>>>>>>> let use cases drive the design decision.
> >>>>>>>>>>
> >>>>>>>>> +1
> >>>>>>>>>
> >>>>>>>>> Regards
> >>>>>>>>> Lahiru
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Thanks
> >>>>>>>>>> Milinda
> >>>>>>>>>>
> >>>>>>>>>> [1] http://wso2.com/products/governance-registry/
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
> >>>>>>>> supun06@gmail.com
> >>>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi all,
> >>>>>>>>>>>
> >>>>>>>>>>> I'm not trying to discourage you on your exploration to NoSQL
> >>>>>>>> databases.
> >>>>>>>>>> I
> >>>>>>>>>>> have the following concern.
> >>>>>>>>>>>
> >>>>>>>>>>> Your database schema is moderately complex - even for a RDBMS
> it
> >>>>>> seems
> >>>>>>>>>>> complex and the data size is relatively small. I'm not sure
> about
> >>>>> the
> >>>>>>>>>>> current tools available but I think you will need to write more
> >>>>> code
> >>>>>> to
> >>>>>>>>>>> support all your requirements in a NoSQL database. So writing
> >>>> more
> >>>>>> code
> >>>>>>>>>> and
> >>>>>>>>>>> allow redundancy to support *relatively small* and *structured
> >>>>>>>>>>> data*doesn't seem right to me. May be I'm wrong and there are
> >>>>> better
> >>>>>>>>>>> tools in
> >>>>>>>>>>> NoSQL than RDBMS, which I doubt.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Supun..
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <
> smarru@apache.org
> >>>>>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi All,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Airavata is actively migrating to use Thrift API for the
> >>>> RESTless
> >>>>>>>>>> design
> >>>>>>>>>>>> and to facilitate various language bindings from client
> >>>> gateways.
> >>>>>> The
> >>>>>>>>>>>> programming language support in thrift has been so far very
> >>>>>>>>>> encouraging.
> >>>>>>>>>>>> The current architecture is looking like Figure 1 at [1].
> >>>>>>>>>>>>
> >>>>>>>>>>>> Language specific clients will be released as thrift SDK's
> >>>>> (similar
> >>>>>> to
> >>>>>>>>>>>> evernote sdk's [1]). These clients will be integrated into
> >>>> gateway
> >>>>>>>>>>> portals
> >>>>>>>>>>>> which connect to the API Server. The API operations brokers he
> >>>>>> simple
> >>>>>>>>>>> calls
> >>>>>>>>>>>> into one or more backend CPI calls (Airavata internal
> component
> >>>>>>>>>>>> interfaces).  An example set of mappings are illustrated in
> >>>>> Figure 2
> >>>>>>>> at
> >>>>>>>>>>>> [1]. The current draft of thrift API for version 0.12 is at
> [3],
> >>>>>>>> please
> >>>>>>>>>>> pay
> >>>>>>>>>>>> attention to experiment model at [4].
> >>>>>>>>>>>>
> >>>>>>>>>>>> For the persistent store, we had few iterations of Airavata
> >>>>> Registry
> >>>>>>>>>>>> shifting from a legacy XRegistry to JackRabbit to now a
> OpenJPA
> >>>>>> based
> >>>>>>>>>>>> registry. To allow the API and the associated data models to
> >>>>> evolve,
> >>>>>>>> it
> >>>>>>>>>>>> will be useful to explore object databases so we can store the
> >>>>>>>>>> serialized
> >>>>>>>>>>>> version of thrift objects directly. But it will be nice to
> have
> >>>>> all
> >>>>>>>> (or
> >>>>>>>>>>>> most) of the fields queriable. This calls for a more
> >>>> column-family
> >>>>>>>>>> design
> >>>>>>>>>>>> of any NoSQL approaches.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Any recommendations for a registry architecture?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Quickly hacking through I find the following approach a viable
> >>>>> one:
> >>>>>>>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra.
> Airavata
> >>>>> can
> >>>>>>>>>>> benefit
> >>>>>>>>>>>> immediately from the replication and reliability of cassandra
> >>>> and
> >>>>>>>>>>>> scalability in near future. Some of the model objects like
> >>>>>> experiment
> >>>>>>>>>>>> creation will need to have strong consistency and most of the
> >>>>>>>>>> monitoring
> >>>>>>>>>>>> can live with eventual consistency.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Critical comments please?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks for your time,
> >>>>>>>>>>>> Suresh
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1] -
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> >>>>>>>>>>>> [2] - https://dev.evernote.com/doc/
> >>>>>>>>>>>> [3] -
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> >>>>>>>>>>>> [4] -
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> >>>>>>>>>>>> [5] - https://github.com/MisterTea/ZombieDB
> >>>>>>>>>>>> [6] - https://github.com/Netflix/astyanax
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Supun Kamburugamuva
> >>>>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
> >>>>>>>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> >>>>>>>>>>> Blog: http://supunk.blogspot.com
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Milinda Pathirage
> >>>>>>>>>> PhD Student Indiana University, Bloomington;
> >>>>>>>>>> E-mail: milinda.pathirage@gmail.com
> >>>>>>>>>> Web: http://mpathirage.com
> >>>>>>>>>> Blog: http://blog.mpathirage.com
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> System Analyst Programmer
> >>>>>>>>> PTI Lab
> >>>>>>>>> Indiana University
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best Regards,
> >>>> Shameera Rathnayaka.
> >>>>
> >>>> email: shameera AT apache.org , shameerainfo AT gmail.com
> >>>> Blog : http://shameerarathnayaka.blogspot.com/
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Supun Kamburugamuva
> >>> Member, Apache Software Foundation; http://www.apache.org
> >>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> >>> Blog: http://supunk.blogspot.com
> >>
> >>
>
>

Re: Object Database Suggestions for Airavata Registry

Posted by Suresh Marru <sm...@apache.org>.

Hi All,

Since Eran has been the one who first proposed the hangout and has specific suggestion on this thread I prefer to postpone to 8pm (EST). But if others planned for 4pm, lets goahead with the plan.

Any one who planned to attend now cannot make it at 8pm (EST)? If do not hear any objections lets shoot for 8pm. Otherwise, lets go as planned.

Cheers,
Suresh

On Mar 2, 2014, at 3:31 PM, Eran Chinthaka Withana <er...@gmail.com> wrote:

> Hi Suresh,
> 
> Sorry for the late reply. I don't think I can make it at 1pm PST today. Can
> we please re-schedule this to 5pm PST (8pm EST) or later?
> 
> Thanks,
> Eran Chinthaka Withana
> 
> 
> On Sun, Mar 2, 2014 at 6:38 AM, Suresh Marru <sm...@apache.org> wrote:
> 
>> Hi All,
>> 
>> Great to see we have a good quorum. So how about 4pm EST (1pm PST) today
>> with a hangout on air. It works best if we start a a hangout then (previous
>> attempts to pre-schedules on-air events did not work well. So please check
>> this mailing list around 4pm EST for the hangout on air link.
>> 
>> Meanwhile, please join the Airavata Google Plus community, that might be
>> easier to share the link -
>> https://plus.google.com/communities/100700433662281905708
>> 
>> Thanks all for willing to take time on a sunday,
>> Suresh
>> 
>> On Feb 28, 2014, at 9:15 PM, Supun Kamburugamuva <su...@gmail.com>
>> wrote:
>> 
>>> +1 for Sunday afternoon. I can make it after 4 pm EST.
>>> 
>>> Thanks,
>>> Supun..
>>> 
>>> 
>>> On Fri, Feb 28, 2014 at 5:04 PM, Shameera Rathnayaka <
>> shameerainfo@gmail.com
>>>> wrote:
>>> 
>>>> +1
>>>> 
>>>> Thanks,
>>>> Shameera.
>>>> 
>>>> 
>>>> On Sat, Mar 1, 2014 at 3:11 AM, Eran Chinthaka Withana <
>>>> eran.chinthaka@gmail.com> wrote:
>>>> 
>>>>> +1 for Sunday afternoon
>>>>> 
>>>>> Thanks,
>>>>> Eran Chinthaka Withana
>>>>> 
>>>>> 
>>>>> On Fri, Feb 28, 2014 at 5:17 AM, Suresh Marru <sm...@apache.org>
>> wrote:
>>>>> 
>>>>>> Hi Eran,
>>>>>> 
>>>>>> This is a great idea. I myself owe few replies on this thread and
>>>> unable
>>>>>> to take time to comprehend my thoughts (and realized I should take
>> time
>>>>> to
>>>>>> properly articulate the challenges otherwise we will be discussing
>>>>>> orthogonal issues).
>>>>>> 
>>>>>> A hangout will help us brainstorm more comprehensively. We can have it
>>>> on
>>>>>> air so we can refer back for archival purposes. How is Sunday
>> afternoon
>>>>> for
>>>>>> everyone willing to join and contribute?
>>>>>> 
>>>>>> Thanks,
>>>>>> Suresh
>>>>>> 
>>>>>> On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <
>>>>>> eran.chinthaka@gmail.com> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Is there any chance of hosting a google hangout to talk about this. I
>>>>>> think
>>>>>>> with long emails and multiple directions things are getting little
>>>> bit
>>>>>>> confusing in thread (I'm partly responsible for this :) ). I can
>>>> join a
>>>>>>> video chat during a weekend but lets make sure its convenient for
>>>> both
>>>>>> east
>>>>>>> and west coasts :)
>>>>>>> 
>>>>>>> WDYT?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Eran Chinthaka Withana
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <sm...@apache.org>
>>>>> wrote:
>>>>>>> 
>>>>>>>> I could respond to each thread in detail, but I see the general
>>>> sense
>>>>> is
>>>>>>>> inquiring on the use case, so let me try and explain this and see if
>>>>> it
>>>>>>>> comes across. I am fully onboard with perceptions of relational vs
>>>>> nosql
>>>>>>>> and also agree current Airavata needs are not a direct map for NoSQL
>>>>>>>> migration. I will summarize the driving motivation:
>>>>>>>> 
>>>>>>>> Background: The key problem Airavata needs to solve is getting the
>>>> API
>>>>>> and
>>>>>>>> associated data model right. The problem is current relational
>>>>> database
>>>>>>>> (with OpenJPA overlay) is severely limiting the API evolution.
>>>> Science
>>>>>>>> Gateways by nature are very science domain and use-case specific.
>>>> But
>>>>>>>> Airavata is tackling this challenging problem of providing a generic
>>>>> API
>>>>>>>> which will meet and enable these use case centric integration. The
>>>>> issue
>>>>>>>> here is, we are designing an API to handle a wide range of known
>>>> (and
>>>>>> some
>>>>>>>> foreseen) use cases. But at the same time trying to keep it simple
>>>> and
>>>>>> yet
>>>>>>>> flexible. The only way we can get through a reasonable, normalized
>>>>>> version
>>>>>>>> of API is by hands-on programming against the API. Within the
>>>> Airavata
>>>>>> PMC
>>>>>>>> itself, we can solicit a half-a-dozen different ways on how to
>>>>> visualize
>>>>>>>> the data model. And we need few hackethon's with real-end users of
>>>>>> Airavata
>>>>>>>> until we find a common ground. All of this needs rapid prototyping.
>>>>>>>> Currently a slight change in the data model is taking close to two
>>>>>> weeks of
>>>>>>>> re-arcitecting the Open-JPA based registry. There are many known
>>>>>> problems
>>>>>>>> with current draft of data model which have to be put-down in the
>>>>>> interest
>>>>>>>> of making over all system progress.
>>>>>>>> 
>>>>>>>> So the driving motivation is not certainly any of the classic NoSQL
>>>>>> needs.
>>>>>>>> But a simple one, can we have registry which is schema-agnostic and
>>>>> yet
>>>>>> is
>>>>>>>> queriable for most of the fields in the model? Can we try 10
>>>> different
>>>>>>>> variants of data model (hence API) within the next 3 months with
>>>>> focused
>>>>>>>> hackethon's and arrive at a stable 1.0 version of API?
>>>>>>>> 
>>>>>>>> Part one is the discussion is successful that it raised every one's
>>>>> eye
>>>>>>>> brows. Now that we have every one's attention, what will be a good
>>>>> data
>>>>>>>> store for Airavata which will meet these needs?
>>>>>>>> 
>>>>>>>> P.S: Additional background: The API has been in development for
>>>> close
>>>>>> to 3
>>>>>>>> years and is falling short of pleasing a majority. Many academic
>>>>>>>> standardization efforts fail terribly trying to pretend to
>>>> understand
>>>>>> all
>>>>>>>> use cases and proposing a standard way (which ends up unnecessarily
>>>>>> complex
>>>>>>>> and not usable). Science by nature is evolutionary, and restricting
>>>>> the
>>>>>>>> capabilities by a known set of use cases prevents the use of
>>>>> middleware
>>>>>> for
>>>>>>>> real-scientific research (and gets limited to proof of concept
>>>>>>>> demonstrations, papers, educational use). The only way meeting the
>>>>>>>> challenges of these evolving needs is to have the framework which
>>>> can
>>>>>>>> evolve with minimal disruption.
>>>>>>>> 
>>>>>>>> Great thoughts so far, please keep 'em coming until we can find a
>>>>>> solution
>>>>>>>> not by the technical fancies but to address the real need.
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Suresh
>>>>>>>> 
>>>>>>>> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <glahiru@gmail.com
>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
>>>>>>>>> milinda.pathirage@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>>> I also think that moving to Cassandra or any other NoSQL will add
>>>>>>>>>> unneccessary complexity to your solution. Also designing proper
>>>>> (easy
>>>>>> to
>>>>>>>>>> manage changes, easy to query) NoSQL data models are hard (AFAIK,
>>>>>>>> require
>>>>>>>>>> lots of experience and understanding about data structures and
>>>>>> queries).
>>>>>>>>>> Also migrating from one NoSQL technology to other can require
>>>>> complete
>>>>>>>>>> re-write. And current relational databases can handle heavy loads
>>>>>> except
>>>>>>>>>> Google, Twitter, Amazon and Facebook like loads. I don't think
>>>>>> Airavata
>>>>>>>>>> will see Google and Amazon like loads.
>>>>>>>>>> 
>>>>>>>>> +1
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> If the constant changes to the data model is the problem , I think
>>>>>> best
>>>>>>>>>> option is to abstract registry implementation to something like
>>>>>>>> collections
>>>>>>>>>> and resources used in WSO2 Registry [1] or something suitable for
>>>>>>>> Airavata
>>>>>>>>>> context. That will make it easy to handle changes in data model.
>>>>>>>>>> 
>>>>>>>>>> Also don't let the technologies drive design decision. Its always
>>>>>>>> better to
>>>>>>>>>> let use cases drive the design decision.
>>>>>>>>>> 
>>>>>>>>> +1
>>>>>>>>> 
>>>>>>>>> Regards
>>>>>>>>> Lahiru
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> Milinda
>>>>>>>>>> 
>>>>>>>>>> [1] http://wso2.com/products/governance-registry/
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
>>>>>>>> supun06@gmail.com
>>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi all,
>>>>>>>>>>> 
>>>>>>>>>>> I'm not trying to discourage you on your exploration to NoSQL
>>>>>>>> databases.
>>>>>>>>>> I
>>>>>>>>>>> have the following concern.
>>>>>>>>>>> 
>>>>>>>>>>> Your database schema is moderately complex - even for a RDBMS it
>>>>>> seems
>>>>>>>>>>> complex and the data size is relatively small. I'm not sure about
>>>>> the
>>>>>>>>>>> current tools available but I think you will need to write more
>>>>> code
>>>>>> to
>>>>>>>>>>> support all your requirements in a NoSQL database. So writing
>>>> more
>>>>>> code
>>>>>>>>>> and
>>>>>>>>>>> allow redundancy to support *relatively small* and *structured
>>>>>>>>>>> data*doesn't seem right to me. May be I'm wrong and there are
>>>>> better
>>>>>>>>>>> tools in
>>>>>>>>>>> NoSQL than RDBMS, which I doubt.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Supun..
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <smarru@apache.org
>>>>> 
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>> 
>>>>>>>>>>>> Airavata is actively migrating to use Thrift API for the
>>>> RESTless
>>>>>>>>>> design
>>>>>>>>>>>> and to facilitate various language bindings from client
>>>> gateways.
>>>>>> The
>>>>>>>>>>>> programming language support in thrift has been so far very
>>>>>>>>>> encouraging.
>>>>>>>>>>>> The current architecture is looking like Figure 1 at [1].
>>>>>>>>>>>> 
>>>>>>>>>>>> Language specific clients will be released as thrift SDK's
>>>>> (similar
>>>>>> to
>>>>>>>>>>>> evernote sdk's [1]). These clients will be integrated into
>>>> gateway
>>>>>>>>>>> portals
>>>>>>>>>>>> which connect to the API Server. The API operations brokers he
>>>>>> simple
>>>>>>>>>>> calls
>>>>>>>>>>>> into one or more backend CPI calls (Airavata internal component
>>>>>>>>>>>> interfaces).  An example set of mappings are illustrated in
>>>>> Figure 2
>>>>>>>> at
>>>>>>>>>>>> [1]. The current draft of thrift API for version 0.12 is at [3],
>>>>>>>> please
>>>>>>>>>>> pay
>>>>>>>>>>>> attention to experiment model at [4].
>>>>>>>>>>>> 
>>>>>>>>>>>> For the persistent store, we had few iterations of Airavata
>>>>> Registry
>>>>>>>>>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA
>>>>>> based
>>>>>>>>>>>> registry. To allow the API and the associated data models to
>>>>> evolve,
>>>>>>>> it
>>>>>>>>>>>> will be useful to explore object databases so we can store the
>>>>>>>>>> serialized
>>>>>>>>>>>> version of thrift objects directly. But it will be nice to have
>>>>> all
>>>>>>>> (or
>>>>>>>>>>>> most) of the fields queriable. This calls for a more
>>>> column-family
>>>>>>>>>> design
>>>>>>>>>>>> of any NoSQL approaches.
>>>>>>>>>>>> 
>>>>>>>>>>>> Any recommendations for a registry architecture?
>>>>>>>>>>>> 
>>>>>>>>>>>> Quickly hacking through I find the following approach a viable
>>>>> one:
>>>>>>>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata
>>>>> can
>>>>>>>>>>> benefit
>>>>>>>>>>>> immediately from the replication and reliability of cassandra
>>>> and
>>>>>>>>>>>> scalability in near future. Some of the model objects like
>>>>>> experiment
>>>>>>>>>>>> creation will need to have strong consistency and most of the
>>>>>>>>>> monitoring
>>>>>>>>>>>> can live with eventual consistency.
>>>>>>>>>>>> 
>>>>>>>>>>>> Critical comments please?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for your time,
>>>>>>>>>>>> Suresh
>>>>>>>>>>>> 
>>>>>>>>>>>> [1] -
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>>>>>>>>>>>> [2] - https://dev.evernote.com/doc/
>>>>>>>>>>>> [3] -
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>>>>>>>>>>>> [4] -
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>>>>>>>>>>>> [5] - https://github.com/MisterTea/ZombieDB
>>>>>>>>>>>> [6] - https://github.com/Netflix/astyanax
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Supun Kamburugamuva
>>>>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>>>>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>>>>>>>>>> Blog: http://supunk.blogspot.com
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Milinda Pathirage
>>>>>>>>>> PhD Student Indiana University, Bloomington;
>>>>>>>>>> E-mail: milinda.pathirage@gmail.com
>>>>>>>>>> Web: http://mpathirage.com
>>>>>>>>>> Blog: http://blog.mpathirage.com
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> System Analyst Programmer
>>>>>>>>> PTI Lab
>>>>>>>>> Indiana University
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best Regards,
>>>> Shameera Rathnayaka.
>>>> 
>>>> email: shameera AT apache.org , shameerainfo AT gmail.com
>>>> Blog : http://shameerarathnayaka.blogspot.com/
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Supun Kamburugamuva
>>> Member, Apache Software Foundation; http://www.apache.org
>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>> Blog: http://supunk.blogspot.com
>> 
>>

Re: Object Database Suggestions for Airavata Registry

Posted by Suresh Marru <sm...@apache.org>.

Hi Chris,

Thats a good reminder. There were few masters research projects exploring OODT integration with Airavata. Let me find their emails and poke them to give insights on this thread. 

Suresh

On Mar 2, 2014, at 3:39 PM, Mattmann, Chris A (3980) <ch...@jpl.nasa.gov> wrote:

> Guys,
> 
> Has there been any thought to using the Apache OODT file manager
> as the Airavata registry? Would seem to fit the use cases..
> 
> Cheers,
> Chris
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-283, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Eran Chinthaka Withana <er...@gmail.com>
> Reply-To: "architecture@airavata.apache.org"
> <ar...@airavata.apache.org>
> Date: Sunday, March 2, 2014 12:31 PM
> To: "architecture@airavata.apache.org" <ar...@airavata.apache.org>
> Subject: Re: Object Database Suggestions for Airavata Registry
> 
>> Hi Suresh,
>> 
>> Sorry for the late reply. I don't think I can make it at 1pm PST today.
>> Can
>> we please re-schedule this to 5pm PST (8pm EST) or later?
>> 
>> Thanks,
>> Eran Chinthaka Withana
>> 
>> 
>> On Sun, Mar 2, 2014 at 6:38 AM, Suresh Marru <sm...@apache.org> wrote:
>> 
>>> Hi All,
>>> 
>>> Great to see we have a good quorum. So how about 4pm EST (1pm PST) today
>>> with a hangout on air. It works best if we start a a hangout then
>>> (previous
>>> attempts to pre-schedules on-air events did not work well. So please
>>> check
>>> this mailing list around 4pm EST for the hangout on air link.
>>> 
>>> Meanwhile, please join the Airavata Google Plus community, that might be
>>> easier to share the link -
>>> https://plus.google.com/communities/100700433662281905708
>>> 
>>> Thanks all for willing to take time on a sunday,
>>> Suresh
>>> 
>>> On Feb 28, 2014, at 9:15 PM, Supun Kamburugamuva <su...@gmail.com>
>>> wrote:
>>> 
>>>> +1 for Sunday afternoon. I can make it after 4 pm EST.
>>>> 
>>>> Thanks,
>>>> Supun..
>>>> 
>>>> 
>>>> On Fri, Feb 28, 2014 at 5:04 PM, Shameera Rathnayaka <
>>> shameerainfo@gmail.com
>>>>> wrote:
>>>> 
>>>>> +1
>>>>> 
>>>>> Thanks,
>>>>> Shameera.
>>>>> 
>>>>> 
>>>>> On Sat, Mar 1, 2014 at 3:11 AM, Eran Chinthaka Withana <
>>>>> eran.chinthaka@gmail.com> wrote:
>>>>> 
>>>>>> +1 for Sunday afternoon
>>>>>> 
>>>>>> Thanks,
>>>>>> Eran Chinthaka Withana
>>>>>> 
>>>>>> 
>>>>>> On Fri, Feb 28, 2014 at 5:17 AM, Suresh Marru <sm...@apache.org>
>>> wrote:
>>>>>> 
>>>>>>> Hi Eran,
>>>>>>> 
>>>>>>> This is a great idea. I myself owe few replies on this thread and
>>>>> unable
>>>>>>> to take time to comprehend my thoughts (and realized I should take
>>> time
>>>>>> to
>>>>>>> properly articulate the challenges otherwise we will be discussing
>>>>>>> orthogonal issues).
>>>>>>> 
>>>>>>> A hangout will help us brainstorm more comprehensively. We can
>>> have it
>>>>> on
>>>>>>> air so we can refer back for archival purposes. How is Sunday
>>> afternoon
>>>>>> for
>>>>>>> everyone willing to join and contribute?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Suresh
>>>>>>> 
>>>>>>> On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <
>>>>>>> eran.chinthaka@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Is there any chance of hosting a google hangout to talk about
>>> this. I
>>>>>>> think
>>>>>>>> with long emails and multiple directions things are getting little
>>>>> bit
>>>>>>>> confusing in thread (I'm partly responsible for this :) ). I can
>>>>> join a
>>>>>>>> video chat during a weekend but lets make sure its convenient for
>>>>> both
>>>>>>> east
>>>>>>>> and west coasts :)
>>>>>>>> 
>>>>>>>> WDYT?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Eran Chinthaka Withana
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <sm...@apache.org>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I could respond to each thread in detail, but I see the general
>>>>> sense
>>>>>> is
>>>>>>>>> inquiring on the use case, so let me try and explain this and
>>> see if
>>>>>> it
>>>>>>>>> comes across. I am fully onboard with perceptions of relational
>>> vs
>>>>>> nosql
>>>>>>>>> and also agree current Airavata needs are not a direct map for
>>> NoSQL
>>>>>>>>> migration. I will summarize the driving motivation:
>>>>>>>>> 
>>>>>>>>> Background: The key problem Airavata needs to solve is getting
>>> the
>>>>> API
>>>>>>> and
>>>>>>>>> associated data model right. The problem is current relational
>>>>>> database
>>>>>>>>> (with OpenJPA overlay) is severely limiting the API evolution.
>>>>> Science
>>>>>>>>> Gateways by nature are very science domain and use-case specific.
>>>>> But
>>>>>>>>> Airavata is tackling this challenging problem of providing a
>>> generic
>>>>>> API
>>>>>>>>> which will meet and enable these use case centric integration.
>>> The
>>>>>> issue
>>>>>>>>> here is, we are designing an API to handle a wide range of known
>>>>> (and
>>>>>>> some
>>>>>>>>> foreseen) use cases. But at the same time trying to keep it
>>> simple
>>>>> and
>>>>>>> yet
>>>>>>>>> flexible. The only way we can get through a reasonable,
>>> normalized
>>>>>>> version
>>>>>>>>> of API is by hands-on programming against the API. Within the
>>>>> Airavata
>>>>>>> PMC
>>>>>>>>> itself, we can solicit a half-a-dozen different ways on how to
>>>>>> visualize
>>>>>>>>> the data model. And we need few hackethon's with real-end users
>>> of
>>>>>>> Airavata
>>>>>>>>> until we find a common ground. All of this needs rapid
>>> prototyping.
>>>>>>>>> Currently a slight change in the data model is taking close to
>>> two
>>>>>>> weeks of
>>>>>>>>> re-arcitecting the Open-JPA based registry. There are many known
>>>>>>> problems
>>>>>>>>> with current draft of data model which have to be put-down in the
>>>>>>> interest
>>>>>>>>> of making over all system progress.
>>>>>>>>> 
>>>>>>>>> So the driving motivation is not certainly any of the classic
>>> NoSQL
>>>>>>> needs.
>>>>>>>>> But a simple one, can we have registry which is schema-agnostic
>>> and
>>>>>> yet
>>>>>>> is
>>>>>>>>> queriable for most of the fields in the model? Can we try 10
>>>>> different
>>>>>>>>> variants of data model (hence API) within the next 3 months with
>>>>>> focused
>>>>>>>>> hackethon's and arrive at a stable 1.0 version of API?
>>>>>>>>> 
>>>>>>>>> Part one is the discussion is successful that it raised every
>>> one's
>>>>>> eye
>>>>>>>>> brows. Now that we have every one's attention, what will be a
>>> good
>>>>>> data
>>>>>>>>> store for Airavata which will meet these needs?
>>>>>>>>> 
>>>>>>>>> P.S: Additional background: The API has been in development for
>>>>> close
>>>>>>> to 3
>>>>>>>>> years and is falling short of pleasing a majority. Many academic
>>>>>>>>> standardization efforts fail terribly trying to pretend to
>>>>> understand
>>>>>>> all
>>>>>>>>> use cases and proposing a standard way (which ends up
>>> unnecessarily
>>>>>>> complex
>>>>>>>>> and not usable). Science by nature is evolutionary, and
>>> restricting
>>>>>> the
>>>>>>>>> capabilities by a known set of use cases prevents the use of
>>>>>> middleware
>>>>>>> for
>>>>>>>>> real-scientific research (and gets limited to proof of concept
>>>>>>>>> demonstrations, papers, educational use). The only way meeting
>>> the
>>>>>>>>> challenges of these evolving needs is to have the framework which
>>>>> can
>>>>>>>>> evolve with minimal disruption.
>>>>>>>>> 
>>>>>>>>> Great thoughts so far, please keep 'em coming until we can find a
>>>>>>> solution
>>>>>>>>> not by the technical fancies but to address the real need.
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> Suresh
>>>>>>>>> 
>>>>>>>>> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake
>>> <glahiru@gmail.com
>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
>>>>>>>>>> milinda.pathirage@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I also think that moving to Cassandra or any other NoSQL will
>>> add
>>>>>>>>>>> unneccessary complexity to your solution. Also designing proper
>>>>>> (easy
>>>>>>> to
>>>>>>>>>>> manage changes, easy to query) NoSQL data models are hard
>>> (AFAIK,
>>>>>>>>> require
>>>>>>>>>>> lots of experience and understanding about data structures and
>>>>>>> queries).
>>>>>>>>>>> Also migrating from one NoSQL technology to other can require
>>>>>> complete
>>>>>>>>>>> re-write. And current relational databases can handle heavy
>>> loads
>>>>>>> except
>>>>>>>>>>> Google, Twitter, Amazon and Facebook like loads. I don't think
>>>>>>> Airavata
>>>>>>>>>>> will see Google and Amazon like loads.
>>>>>>>>>>> 
>>>>>>>>>> +1
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> If the constant changes to the data model is the problem , I
>>> think
>>>>>>> best
>>>>>>>>>>> option is to abstract registry implementation to something like
>>>>>>>>> collections
>>>>>>>>>>> and resources used in WSO2 Registry [1] or something suitable
>>> for
>>>>>>>>> Airavata
>>>>>>>>>>> context. That will make it easy to handle changes in data
>>> model.
>>>>>>>>>>> 
>>>>>>>>>>> Also don't let the technologies drive design decision. Its
>>> always
>>>>>>>>> better to
>>>>>>>>>>> let use cases drive the design decision.
>>>>>>>>>>> 
>>>>>>>>>> +1
>>>>>>>>>> 
>>>>>>>>>> Regards
>>>>>>>>>> Lahiru
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Thanks
>>>>>>>>>>> Milinda
>>>>>>>>>>> 
>>>>>>>>>>> [1] http://wso2.com/products/governance-registry/
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
>>>>>>>>> supun06@gmail.com
>>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm not trying to discourage you on your exploration to NoSQL
>>>>>>>>> databases.
>>>>>>>>>>> I
>>>>>>>>>>>> have the following concern.
>>>>>>>>>>>> 
>>>>>>>>>>>> Your database schema is moderately complex - even for a RDBMS
>>> it
>>>>>>> seems
>>>>>>>>>>>> complex and the data size is relatively small. I'm not sure
>>> about
>>>>>> the
>>>>>>>>>>>> current tools available but I think you will need to write
>>> more
>>>>>> code
>>>>>>> to
>>>>>>>>>>>> support all your requirements in a NoSQL database. So writing
>>>>> more
>>>>>>> code
>>>>>>>>>>> and
>>>>>>>>>>>> allow redundancy to support *relatively small* and *structured
>>>>>>>>>>>> data*doesn't seem right to me. May be I'm wrong and there are
>>>>>> better
>>>>>>>>>>>> tools in
>>>>>>>>>>>> NoSQL than RDBMS, which I doubt.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Supun..
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru
>>> <smarru@apache.org
>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Airavata is actively migrating to use Thrift API for the
>>>>> RESTless
>>>>>>>>>>> design
>>>>>>>>>>>>> and to facilitate various language bindings from client
>>>>> gateways.
>>>>>>> The
>>>>>>>>>>>>> programming language support in thrift has been so far very
>>>>>>>>>>> encouraging.
>>>>>>>>>>>>> The current architecture is looking like Figure 1 at [1].
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Language specific clients will be released as thrift SDK's
>>>>>> (similar
>>>>>>> to
>>>>>>>>>>>>> evernote sdk's [1]). These clients will be integrated into
>>>>> gateway
>>>>>>>>>>>> portals
>>>>>>>>>>>>> which connect to the API Server. The API operations brokers
>>> he
>>>>>>> simple
>>>>>>>>>>>> calls
>>>>>>>>>>>>> into one or more backend CPI calls (Airavata internal
>>> component
>>>>>>>>>>>>> interfaces).  An example set of mappings are illustrated in
>>>>>> Figure 2
>>>>>>>>> at
>>>>>>>>>>>>> [1]. The current draft of thrift API for version 0.12 is at
>>> [3],
>>>>>>>>> please
>>>>>>>>>>>> pay
>>>>>>>>>>>>> attention to experiment model at [4].
>>>>>>>>>>>>> 
>>>>>>>>>>>>> For the persistent store, we had few iterations of Airavata
>>>>>> Registry
>>>>>>>>>>>>> shifting from a legacy XRegistry to JackRabbit to now a
>>> OpenJPA
>>>>>>> based
>>>>>>>>>>>>> registry. To allow the API and the associated data models to
>>>>>> evolve,
>>>>>>>>> it
>>>>>>>>>>>>> will be useful to explore object databases so we can store
>>> the
>>>>>>>>>>> serialized
>>>>>>>>>>>>> version of thrift objects directly. But it will be nice to
>>> have
>>>>>> all
>>>>>>>>> (or
>>>>>>>>>>>>> most) of the fields queriable. This calls for a more
>>>>> column-family
>>>>>>>>>>> design
>>>>>>>>>>>>> of any NoSQL approaches.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Any recommendations for a registry architecture?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Quickly hacking through I find the following approach a
>>> viable
>>>>>> one:
>>>>>>>>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra.
>>> Airavata
>>>>>> can
>>>>>>>>>>>> benefit
>>>>>>>>>>>>> immediately from the replication and reliability of cassandra
>>>>> and
>>>>>>>>>>>>> scalability in near future. Some of the model objects like
>>>>>>> experiment
>>>>>>>>>>>>> creation will need to have strong consistency and most of the
>>>>>>>>>>> monitoring
>>>>>>>>>>>>> can live with eventual consistency.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Critical comments please?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for your time,
>>>>>>>>>>>>> Suresh
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1] -
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstor
>>> ming+Diagrams
>>>>>>>>>>>>> [2] - https://dev.evernote.com/doc/
>>>>>>>>>>>>> [3] -
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-
>>> api/thrift-interface-descriptions;hb=HEAD
>>>>>>>>>>>>> [4] -
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=air
>>> avata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>>>>>>>>>>>>> [5] - https://github.com/MisterTea/ZombieDB
>>>>>>>>>>>>> [6] - https://github.com/Netflix/astyanax
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Supun Kamburugamuva
>>>>>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>>>>>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>>>>>>>>>>> Blog: http://supunk.blogspot.com
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Milinda Pathirage
>>>>>>>>>>> PhD Student Indiana University, Bloomington;
>>>>>>>>>>> E-mail: milinda.pathirage@gmail.com
>>>>>>>>>>> Web: http://mpathirage.com
>>>>>>>>>>> Blog: http://blog.mpathirage.com
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> System Analyst Programmer
>>>>>>>>>> PTI Lab
>>>>>>>>>> Indiana University
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best Regards,
>>>>> Shameera Rathnayaka.
>>>>> 
>>>>> email: shameera AT apache.org , shameerainfo AT gmail.com
>>>>> Blog : http://shameerarathnayaka.blogspot.com/
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Supun Kamburugamuva
>>>> Member, Apache Software Foundation; http://www.apache.org
>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>>> Blog: http://supunk.blogspot.com
>>> 
>>> 
>

Re: Object Database Suggestions for Airavata Registry

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.

Guys,

Has there been any thought to using the Apache OODT file manager
as the Airavata registry? Would seem to fit the use cases..

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-283, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Eran Chinthaka Withana <er...@gmail.com>
Reply-To: "architecture@airavata.apache.org"
<ar...@airavata.apache.org>
Date: Sunday, March 2, 2014 12:31 PM
To: "architecture@airavata.apache.org" <ar...@airavata.apache.org>
Subject: Re: Object Database Suggestions for Airavata Registry

>Hi Suresh,
>
>Sorry for the late reply. I don't think I can make it at 1pm PST today.
>Can
>we please re-schedule this to 5pm PST (8pm EST) or later?
>
>Thanks,
>Eran Chinthaka Withana
>
>
>On Sun, Mar 2, 2014 at 6:38 AM, Suresh Marru <sm...@apache.org> wrote:
>
>> Hi All,
>>
>> Great to see we have a good quorum. So how about 4pm EST (1pm PST) today
>> with a hangout on air. It works best if we start a a hangout then
>>(previous
>> attempts to pre-schedules on-air events did not work well. So please
>>check
>> this mailing list around 4pm EST for the hangout on air link.
>>
>> Meanwhile, please join the Airavata Google Plus community, that might be
>> easier to share the link -
>> https://plus.google.com/communities/100700433662281905708
>>
>> Thanks all for willing to take time on a sunday,
>> Suresh
>>
>> On Feb 28, 2014, at 9:15 PM, Supun Kamburugamuva <su...@gmail.com>
>> wrote:
>>
>> > +1 for Sunday afternoon. I can make it after 4 pm EST.
>> >
>> > Thanks,
>> > Supun..
>> >
>> >
>> > On Fri, Feb 28, 2014 at 5:04 PM, Shameera Rathnayaka <
>> shameerainfo@gmail.com
>> >> wrote:
>> >
>> >> +1
>> >>
>> >> Thanks,
>> >> Shameera.
>> >>
>> >>
>> >> On Sat, Mar 1, 2014 at 3:11 AM, Eran Chinthaka Withana <
>> >> eran.chinthaka@gmail.com> wrote:
>> >>
>> >>> +1 for Sunday afternoon
>> >>>
>> >>> Thanks,
>> >>> Eran Chinthaka Withana
>> >>>
>> >>>
>> >>> On Fri, Feb 28, 2014 at 5:17 AM, Suresh Marru <sm...@apache.org>
>> wrote:
>> >>>
>> >>>> Hi Eran,
>> >>>>
>> >>>> This is a great idea. I myself owe few replies on this thread and
>> >> unable
>> >>>> to take time to comprehend my thoughts (and realized I should take
>> time
>> >>> to
>> >>>> properly articulate the challenges otherwise we will be discussing
>> >>>> orthogonal issues).
>> >>>>
>> >>>> A hangout will help us brainstorm more comprehensively. We can
>>have it
>> >> on
>> >>>> air so we can refer back for archival purposes. How is Sunday
>> afternoon
>> >>> for
>> >>>> everyone willing to join and contribute?
>> >>>>
>> >>>> Thanks,
>> >>>> Suresh
>> >>>>
>> >>>> On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <
>> >>>> eran.chinthaka@gmail.com> wrote:
>> >>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> Is there any chance of hosting a google hangout to talk about
>>this. I
>> >>>> think
>> >>>>> with long emails and multiple directions things are getting little
>> >> bit
>> >>>>> confusing in thread (I'm partly responsible for this :) ). I can
>> >> join a
>> >>>>> video chat during a weekend but lets make sure its convenient for
>> >> both
>> >>>> east
>> >>>>> and west coasts :)
>> >>>>>
>> >>>>> WDYT?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Eran Chinthaka Withana
>> >>>>>
>> >>>>>
>> >>>>> On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <sm...@apache.org>
>> >>> wrote:
>> >>>>>
>> >>>>>> I could respond to each thread in detail, but I see the general
>> >> sense
>> >>> is
>> >>>>>> inquiring on the use case, so let me try and explain this and
>>see if
>> >>> it
>> >>>>>> comes across. I am fully onboard with perceptions of relational
>>vs
>> >>> nosql
>> >>>>>> and also agree current Airavata needs are not a direct map for
>>NoSQL
>> >>>>>> migration. I will summarize the driving motivation:
>> >>>>>>
>> >>>>>> Background: The key problem Airavata needs to solve is getting
>>the
>> >> API
>> >>>> and
>> >>>>>> associated data model right. The problem is current relational
>> >>> database
>> >>>>>> (with OpenJPA overlay) is severely limiting the API evolution.
>> >> Science
>> >>>>>> Gateways by nature are very science domain and use-case specific.
>> >> But
>> >>>>>> Airavata is tackling this challenging problem of providing a
>>generic
>> >>> API
>> >>>>>> which will meet and enable these use case centric integration.
>>The
>> >>> issue
>> >>>>>> here is, we are designing an API to handle a wide range of known
>> >> (and
>> >>>> some
>> >>>>>> foreseen) use cases. But at the same time trying to keep it
>>simple
>> >> and
>> >>>> yet
>> >>>>>> flexible. The only way we can get through a reasonable,
>>normalized
>> >>>> version
>> >>>>>> of API is by hands-on programming against the API. Within the
>> >> Airavata
>> >>>> PMC
>> >>>>>> itself, we can solicit a half-a-dozen different ways on how to
>> >>> visualize
>> >>>>>> the data model. And we need few hackethon's with real-end users
>>of
>> >>>> Airavata
>> >>>>>> until we find a common ground. All of this needs rapid
>>prototyping.
>> >>>>>> Currently a slight change in the data model is taking close to
>>two
>> >>>> weeks of
>> >>>>>> re-arcitecting the Open-JPA based registry. There are many known
>> >>>> problems
>> >>>>>> with current draft of data model which have to be put-down in the
>> >>>> interest
>> >>>>>> of making over all system progress.
>> >>>>>>
>> >>>>>> So the driving motivation is not certainly any of the classic
>>NoSQL
>> >>>> needs.
>> >>>>>> But a simple one, can we have registry which is schema-agnostic
>>and
>> >>> yet
>> >>>> is
>> >>>>>> queriable for most of the fields in the model? Can we try 10
>> >> different
>> >>>>>> variants of data model (hence API) within the next 3 months with
>> >>> focused
>> >>>>>> hackethon's and arrive at a stable 1.0 version of API?
>> >>>>>>
>> >>>>>> Part one is the discussion is successful that it raised every
>>one's
>> >>> eye
>> >>>>>> brows. Now that we have every one's attention, what will be a
>>good
>> >>> data
>> >>>>>> store for Airavata which will meet these needs?
>> >>>>>>
>> >>>>>> P.S: Additional background: The API has been in development for
>> >> close
>> >>>> to 3
>> >>>>>> years and is falling short of pleasing a majority. Many academic
>> >>>>>> standardization efforts fail terribly trying to pretend to
>> >> understand
>> >>>> all
>> >>>>>> use cases and proposing a standard way (which ends up
>>unnecessarily
>> >>>> complex
>> >>>>>> and not usable). Science by nature is evolutionary, and
>>restricting
>> >>> the
>> >>>>>> capabilities by a known set of use cases prevents the use of
>> >>> middleware
>> >>>> for
>> >>>>>> real-scientific research (and gets limited to proof of concept
>> >>>>>> demonstrations, papers, educational use). The only way meeting
>>the
>> >>>>>> challenges of these evolving needs is to have the framework which
>> >> can
>> >>>>>> evolve with minimal disruption.
>> >>>>>>
>> >>>>>> Great thoughts so far, please keep 'em coming until we can find a
>> >>>> solution
>> >>>>>> not by the technical fancies but to address the real need.
>> >>>>>>
>> >>>>>> Cheers,
>> >>>>>> Suresh
>> >>>>>>
>> >>>>>> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake
>><glahiru@gmail.com
>> >>>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
>> >>>>>>> milinda.pathirage@gmail.com> wrote:
>> >>>>>>>
>> >>>>>>>> I also think that moving to Cassandra or any other NoSQL will
>>add
>> >>>>>>>> unneccessary complexity to your solution. Also designing proper
>> >>> (easy
>> >>>> to
>> >>>>>>>> manage changes, easy to query) NoSQL data models are hard
>>(AFAIK,
>> >>>>>> require
>> >>>>>>>> lots of experience and understanding about data structures and
>> >>>> queries).
>> >>>>>>>> Also migrating from one NoSQL technology to other can require
>> >>> complete
>> >>>>>>>> re-write. And current relational databases can handle heavy
>>loads
>> >>>> except
>> >>>>>>>> Google, Twitter, Amazon and Facebook like loads. I don't think
>> >>>> Airavata
>> >>>>>>>> will see Google and Amazon like loads.
>> >>>>>>>>
>> >>>>>>> +1
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>> If the constant changes to the data model is the problem , I
>>think
>> >>>> best
>> >>>>>>>> option is to abstract registry implementation to something like
>> >>>>>> collections
>> >>>>>>>> and resources used in WSO2 Registry [1] or something suitable
>>for
>> >>>>>> Airavata
>> >>>>>>>> context. That will make it easy to handle changes in data
>>model.
>> >>>>>>>>
>> >>>>>>>> Also don't let the technologies drive design decision. Its
>>always
>> >>>>>> better to
>> >>>>>>>> let use cases drive the design decision.
>> >>>>>>>>
>> >>>>>>> +1
>> >>>>>>>
>> >>>>>>> Regards
>> >>>>>>> Lahiru
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>> Thanks
>> >>>>>>>> Milinda
>> >>>>>>>>
>> >>>>>>>> [1] http://wso2.com/products/governance-registry/
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
>> >>>>>> supun06@gmail.com
>> >>>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hi all,
>> >>>>>>>>>
>> >>>>>>>>> I'm not trying to discourage you on your exploration to NoSQL
>> >>>>>> databases.
>> >>>>>>>> I
>> >>>>>>>>> have the following concern.
>> >>>>>>>>>
>> >>>>>>>>> Your database schema is moderately complex - even for a RDBMS
>>it
>> >>>> seems
>> >>>>>>>>> complex and the data size is relatively small. I'm not sure
>>about
>> >>> the
>> >>>>>>>>> current tools available but I think you will need to write
>>more
>> >>> code
>> >>>> to
>> >>>>>>>>> support all your requirements in a NoSQL database. So writing
>> >> more
>> >>>> code
>> >>>>>>>> and
>> >>>>>>>>> allow redundancy to support *relatively small* and *structured
>> >>>>>>>>> data*doesn't seem right to me. May be I'm wrong and there are
>> >>> better
>> >>>>>>>>> tools in
>> >>>>>>>>> NoSQL than RDBMS, which I doubt.
>> >>>>>>>>>
>> >>>>>>>>> Thanks,
>> >>>>>>>>> Supun..
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru
>><smarru@apache.org
>> >>>
>> >>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Hi All,
>> >>>>>>>>>>
>> >>>>>>>>>> Airavata is actively migrating to use Thrift API for the
>> >> RESTless
>> >>>>>>>> design
>> >>>>>>>>>> and to facilitate various language bindings from client
>> >> gateways.
>> >>>> The
>> >>>>>>>>>> programming language support in thrift has been so far very
>> >>>>>>>> encouraging.
>> >>>>>>>>>> The current architecture is looking like Figure 1 at [1].
>> >>>>>>>>>>
>> >>>>>>>>>> Language specific clients will be released as thrift SDK's
>> >>> (similar
>> >>>> to
>> >>>>>>>>>> evernote sdk's [1]). These clients will be integrated into
>> >> gateway
>> >>>>>>>>> portals
>> >>>>>>>>>> which connect to the API Server. The API operations brokers
>>he
>> >>>> simple
>> >>>>>>>>> calls
>> >>>>>>>>>> into one or more backend CPI calls (Airavata internal
>>component
>> >>>>>>>>>> interfaces).  An example set of mappings are illustrated in
>> >>> Figure 2
>> >>>>>> at
>> >>>>>>>>>> [1]. The current draft of thrift API for version 0.12 is at
>>[3],
>> >>>>>> please
>> >>>>>>>>> pay
>> >>>>>>>>>> attention to experiment model at [4].
>> >>>>>>>>>>
>> >>>>>>>>>> For the persistent store, we had few iterations of Airavata
>> >>> Registry
>> >>>>>>>>>> shifting from a legacy XRegistry to JackRabbit to now a
>>OpenJPA
>> >>>> based
>> >>>>>>>>>> registry. To allow the API and the associated data models to
>> >>> evolve,
>> >>>>>> it
>> >>>>>>>>>> will be useful to explore object databases so we can store
>>the
>> >>>>>>>> serialized
>> >>>>>>>>>> version of thrift objects directly. But it will be nice to
>>have
>> >>> all
>> >>>>>> (or
>> >>>>>>>>>> most) of the fields queriable. This calls for a more
>> >> column-family
>> >>>>>>>> design
>> >>>>>>>>>> of any NoSQL approaches.
>> >>>>>>>>>>
>> >>>>>>>>>> Any recommendations for a registry architecture?
>> >>>>>>>>>>
>> >>>>>>>>>> Quickly hacking through I find the following approach a
>>viable
>> >>> one:
>> >>>>>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra.
>>Airavata
>> >>> can
>> >>>>>>>>> benefit
>> >>>>>>>>>> immediately from the replication and reliability of cassandra
>> >> and
>> >>>>>>>>>> scalability in near future. Some of the model objects like
>> >>>> experiment
>> >>>>>>>>>> creation will need to have strong consistency and most of the
>> >>>>>>>> monitoring
>> >>>>>>>>>> can live with eventual consistency.
>> >>>>>>>>>>
>> >>>>>>>>>> Critical comments please?
>> >>>>>>>>>>
>> >>>>>>>>>> Thanks for your time,
>> >>>>>>>>>> Suresh
>> >>>>>>>>>>
>> >>>>>>>>>> [1] -
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>
>> >>>
>> >>
>> 
>>https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstor
>>ming+Diagrams
>> >>>>>>>>>> [2] - https://dev.evernote.com/doc/
>> >>>>>>>>>> [3] -
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>
>> >>>
>> >>
>> 
>>https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-
>>api/thrift-interface-descriptions;hb=HEAD
>> >>>>>>>>>> [4] -
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>
>> >>>
>> >>
>> 
>>https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=air
>>avata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>> >>>>>>>>>> [5] - https://github.com/MisterTea/ZombieDB
>> >>>>>>>>>> [6] - https://github.com/Netflix/astyanax
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Supun Kamburugamuva
>> >>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
>> >>>>>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>> >>>>>>>>> Blog: http://supunk.blogspot.com
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> Milinda Pathirage
>> >>>>>>>> PhD Student Indiana University, Bloomington;
>> >>>>>>>> E-mail: milinda.pathirage@gmail.com
>> >>>>>>>> Web: http://mpathirage.com
>> >>>>>>>> Blog: http://blog.mpathirage.com
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> System Analyst Programmer
>> >>>>>>> PTI Lab
>> >>>>>>> Indiana University
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards,
>> >> Shameera Rathnayaka.
>> >>
>> >> email: shameera AT apache.org , shameerainfo AT gmail.com
>> >> Blog : http://shameerarathnayaka.blogspot.com/
>> >>
>> >
>> >
>> >
>> > --
>> > Supun Kamburugamuva
>> > Member, Apache Software Foundation; http://www.apache.org
>> > E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>> > Blog: http://supunk.blogspot.com
>>
>>

Re: Object Database Suggestions for Airavata Registry

Posted by Eran Chinthaka Withana <er...@gmail.com>.

Hi Suresh,

Sorry for the late reply. I don't think I can make it at 1pm PST today. Can
we please re-schedule this to 5pm PST (8pm EST) or later?

Thanks,
Eran Chinthaka Withana


On Sun, Mar 2, 2014 at 6:38 AM, Suresh Marru <sm...@apache.org> wrote:

> Hi All,
>
> Great to see we have a good quorum. So how about 4pm EST (1pm PST) today
> with a hangout on air. It works best if we start a a hangout then (previous
> attempts to pre-schedules on-air events did not work well. So please check
> this mailing list around 4pm EST for the hangout on air link.
>
> Meanwhile, please join the Airavata Google Plus community, that might be
> easier to share the link -
> https://plus.google.com/communities/100700433662281905708
>
> Thanks all for willing to take time on a sunday,
> Suresh
>
> On Feb 28, 2014, at 9:15 PM, Supun Kamburugamuva <su...@gmail.com>
> wrote:
>
> > +1 for Sunday afternoon. I can make it after 4 pm EST.
> >
> > Thanks,
> > Supun..
> >
> >
> > On Fri, Feb 28, 2014 at 5:04 PM, Shameera Rathnayaka <
> shameerainfo@gmail.com
> >> wrote:
> >
> >> +1
> >>
> >> Thanks,
> >> Shameera.
> >>
> >>
> >> On Sat, Mar 1, 2014 at 3:11 AM, Eran Chinthaka Withana <
> >> eran.chinthaka@gmail.com> wrote:
> >>
> >>> +1 for Sunday afternoon
> >>>
> >>> Thanks,
> >>> Eran Chinthaka Withana
> >>>
> >>>
> >>> On Fri, Feb 28, 2014 at 5:17 AM, Suresh Marru <sm...@apache.org>
> wrote:
> >>>
> >>>> Hi Eran,
> >>>>
> >>>> This is a great idea. I myself owe few replies on this thread and
> >> unable
> >>>> to take time to comprehend my thoughts (and realized I should take
> time
> >>> to
> >>>> properly articulate the challenges otherwise we will be discussing
> >>>> orthogonal issues).
> >>>>
> >>>> A hangout will help us brainstorm more comprehensively. We can have it
> >> on
> >>>> air so we can refer back for archival purposes. How is Sunday
> afternoon
> >>> for
> >>>> everyone willing to join and contribute?
> >>>>
> >>>> Thanks,
> >>>> Suresh
> >>>>
> >>>> On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <
> >>>> eran.chinthaka@gmail.com> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> Is there any chance of hosting a google hangout to talk about this. I
> >>>> think
> >>>>> with long emails and multiple directions things are getting little
> >> bit
> >>>>> confusing in thread (I'm partly responsible for this :) ). I can
> >> join a
> >>>>> video chat during a weekend but lets make sure its convenient for
> >> both
> >>>> east
> >>>>> and west coasts :)
> >>>>>
> >>>>> WDYT?
> >>>>>
> >>>>> Thanks,
> >>>>> Eran Chinthaka Withana
> >>>>>
> >>>>>
> >>>>> On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <sm...@apache.org>
> >>> wrote:
> >>>>>
> >>>>>> I could respond to each thread in detail, but I see the general
> >> sense
> >>> is
> >>>>>> inquiring on the use case, so let me try and explain this and see if
> >>> it
> >>>>>> comes across. I am fully onboard with perceptions of relational vs
> >>> nosql
> >>>>>> and also agree current Airavata needs are not a direct map for NoSQL
> >>>>>> migration. I will summarize the driving motivation:
> >>>>>>
> >>>>>> Background: The key problem Airavata needs to solve is getting the
> >> API
> >>>> and
> >>>>>> associated data model right. The problem is current relational
> >>> database
> >>>>>> (with OpenJPA overlay) is severely limiting the API evolution.
> >> Science
> >>>>>> Gateways by nature are very science domain and use-case specific.
> >> But
> >>>>>> Airavata is tackling this challenging problem of providing a generic
> >>> API
> >>>>>> which will meet and enable these use case centric integration. The
> >>> issue
> >>>>>> here is, we are designing an API to handle a wide range of known
> >> (and
> >>>> some
> >>>>>> foreseen) use cases. But at the same time trying to keep it simple
> >> and
> >>>> yet
> >>>>>> flexible. The only way we can get through a reasonable, normalized
> >>>> version
> >>>>>> of API is by hands-on programming against the API. Within the
> >> Airavata
> >>>> PMC
> >>>>>> itself, we can solicit a half-a-dozen different ways on how to
> >>> visualize
> >>>>>> the data model. And we need few hackethon's with real-end users of
> >>>> Airavata
> >>>>>> until we find a common ground. All of this needs rapid prototyping.
> >>>>>> Currently a slight change in the data model is taking close to two
> >>>> weeks of
> >>>>>> re-arcitecting the Open-JPA based registry. There are many known
> >>>> problems
> >>>>>> with current draft of data model which have to be put-down in the
> >>>> interest
> >>>>>> of making over all system progress.
> >>>>>>
> >>>>>> So the driving motivation is not certainly any of the classic NoSQL
> >>>> needs.
> >>>>>> But a simple one, can we have registry which is schema-agnostic and
> >>> yet
> >>>> is
> >>>>>> queriable for most of the fields in the model? Can we try 10
> >> different
> >>>>>> variants of data model (hence API) within the next 3 months with
> >>> focused
> >>>>>> hackethon's and arrive at a stable 1.0 version of API?
> >>>>>>
> >>>>>> Part one is the discussion is successful that it raised every one's
> >>> eye
> >>>>>> brows. Now that we have every one's attention, what will be a good
> >>> data
> >>>>>> store for Airavata which will meet these needs?
> >>>>>>
> >>>>>> P.S: Additional background: The API has been in development for
> >> close
> >>>> to 3
> >>>>>> years and is falling short of pleasing a majority. Many academic
> >>>>>> standardization efforts fail terribly trying to pretend to
> >> understand
> >>>> all
> >>>>>> use cases and proposing a standard way (which ends up unnecessarily
> >>>> complex
> >>>>>> and not usable). Science by nature is evolutionary, and restricting
> >>> the
> >>>>>> capabilities by a known set of use cases prevents the use of
> >>> middleware
> >>>> for
> >>>>>> real-scientific research (and gets limited to proof of concept
> >>>>>> demonstrations, papers, educational use). The only way meeting the
> >>>>>> challenges of these evolving needs is to have the framework which
> >> can
> >>>>>> evolve with minimal disruption.
> >>>>>>
> >>>>>> Great thoughts so far, please keep 'em coming until we can find a
> >>>> solution
> >>>>>> not by the technical fancies but to address the real need.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Suresh
> >>>>>>
> >>>>>> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <glahiru@gmail.com
> >>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
> >>>>>>> milinda.pathirage@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> I also think that moving to Cassandra or any other NoSQL will add
> >>>>>>>> unneccessary complexity to your solution. Also designing proper
> >>> (easy
> >>>> to
> >>>>>>>> manage changes, easy to query) NoSQL data models are hard (AFAIK,
> >>>>>> require
> >>>>>>>> lots of experience and understanding about data structures and
> >>>> queries).
> >>>>>>>> Also migrating from one NoSQL technology to other can require
> >>> complete
> >>>>>>>> re-write. And current relational databases can handle heavy loads
> >>>> except
> >>>>>>>> Google, Twitter, Amazon and Facebook like loads. I don't think
> >>>> Airavata
> >>>>>>>> will see Google and Amazon like loads.
> >>>>>>>>
> >>>>>>> +1
> >>>>>>>
> >>>>>>>>
> >>>>>>>> If the constant changes to the data model is the problem , I think
> >>>> best
> >>>>>>>> option is to abstract registry implementation to something like
> >>>>>> collections
> >>>>>>>> and resources used in WSO2 Registry [1] or something suitable for
> >>>>>> Airavata
> >>>>>>>> context. That will make it easy to handle changes in data model.
> >>>>>>>>
> >>>>>>>> Also don't let the technologies drive design decision. Its always
> >>>>>> better to
> >>>>>>>> let use cases drive the design decision.
> >>>>>>>>
> >>>>>>> +1
> >>>>>>>
> >>>>>>> Regards
> >>>>>>> Lahiru
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> Milinda
> >>>>>>>>
> >>>>>>>> [1] http://wso2.com/products/governance-registry/
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
> >>>>>> supun06@gmail.com
> >>>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> I'm not trying to discourage you on your exploration to NoSQL
> >>>>>> databases.
> >>>>>>>> I
> >>>>>>>>> have the following concern.
> >>>>>>>>>
> >>>>>>>>> Your database schema is moderately complex - even for a RDBMS it
> >>>> seems
> >>>>>>>>> complex and the data size is relatively small. I'm not sure about
> >>> the
> >>>>>>>>> current tools available but I think you will need to write more
> >>> code
> >>>> to
> >>>>>>>>> support all your requirements in a NoSQL database. So writing
> >> more
> >>>> code
> >>>>>>>> and
> >>>>>>>>> allow redundancy to support *relatively small* and *structured
> >>>>>>>>> data*doesn't seem right to me. May be I'm wrong and there are
> >>> better
> >>>>>>>>> tools in
> >>>>>>>>> NoSQL than RDBMS, which I doubt.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Supun..
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <smarru@apache.org
> >>>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi All,
> >>>>>>>>>>
> >>>>>>>>>> Airavata is actively migrating to use Thrift API for the
> >> RESTless
> >>>>>>>> design
> >>>>>>>>>> and to facilitate various language bindings from client
> >> gateways.
> >>>> The
> >>>>>>>>>> programming language support in thrift has been so far very
> >>>>>>>> encouraging.
> >>>>>>>>>> The current architecture is looking like Figure 1 at [1].
> >>>>>>>>>>
> >>>>>>>>>> Language specific clients will be released as thrift SDK's
> >>> (similar
> >>>> to
> >>>>>>>>>> evernote sdk's [1]). These clients will be integrated into
> >> gateway
> >>>>>>>>> portals
> >>>>>>>>>> which connect to the API Server. The API operations brokers he
> >>>> simple
> >>>>>>>>> calls
> >>>>>>>>>> into one or more backend CPI calls (Airavata internal component
> >>>>>>>>>> interfaces).  An example set of mappings are illustrated in
> >>> Figure 2
> >>>>>> at
> >>>>>>>>>> [1]. The current draft of thrift API for version 0.12 is at [3],
> >>>>>> please
> >>>>>>>>> pay
> >>>>>>>>>> attention to experiment model at [4].
> >>>>>>>>>>
> >>>>>>>>>> For the persistent store, we had few iterations of Airavata
> >>> Registry
> >>>>>>>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA
> >>>> based
> >>>>>>>>>> registry. To allow the API and the associated data models to
> >>> evolve,
> >>>>>> it
> >>>>>>>>>> will be useful to explore object databases so we can store the
> >>>>>>>> serialized
> >>>>>>>>>> version of thrift objects directly. But it will be nice to have
> >>> all
> >>>>>> (or
> >>>>>>>>>> most) of the fields queriable. This calls for a more
> >> column-family
> >>>>>>>> design
> >>>>>>>>>> of any NoSQL approaches.
> >>>>>>>>>>
> >>>>>>>>>> Any recommendations for a registry architecture?
> >>>>>>>>>>
> >>>>>>>>>> Quickly hacking through I find the following approach a viable
> >>> one:
> >>>>>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata
> >>> can
> >>>>>>>>> benefit
> >>>>>>>>>> immediately from the replication and reliability of cassandra
> >> and
> >>>>>>>>>> scalability in near future. Some of the model objects like
> >>>> experiment
> >>>>>>>>>> creation will need to have strong consistency and most of the
> >>>>>>>> monitoring
> >>>>>>>>>> can live with eventual consistency.
> >>>>>>>>>>
> >>>>>>>>>> Critical comments please?
> >>>>>>>>>>
> >>>>>>>>>> Thanks for your time,
> >>>>>>>>>> Suresh
> >>>>>>>>>>
> >>>>>>>>>> [1] -
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> >>>>>>>>>> [2] - https://dev.evernote.com/doc/
> >>>>>>>>>> [3] -
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> >>>>>>>>>> [4] -
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> >>>>>>>>>> [5] - https://github.com/MisterTea/ZombieDB
> >>>>>>>>>> [6] - https://github.com/Netflix/astyanax
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Supun Kamburugamuva
> >>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
> >>>>>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> >>>>>>>>> Blog: http://supunk.blogspot.com
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Milinda Pathirage
> >>>>>>>> PhD Student Indiana University, Bloomington;
> >>>>>>>> E-mail: milinda.pathirage@gmail.com
> >>>>>>>> Web: http://mpathirage.com
> >>>>>>>> Blog: http://blog.mpathirage.com
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> System Analyst Programmer
> >>>>>>> PTI Lab
> >>>>>>> Indiana University
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Best Regards,
> >> Shameera Rathnayaka.
> >>
> >> email: shameera AT apache.org , shameerainfo AT gmail.com
> >> Blog : http://shameerarathnayaka.blogspot.com/
> >>
> >
> >
> >
> > --
> > Supun Kamburugamuva
> > Member, Apache Software Foundation; http://www.apache.org
> > E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> > Blog: http://supunk.blogspot.com
>
>

Re: Object Database Suggestions for Airavata Registry

Posted by Suresh Marru <sm...@apache.org>.

Hi All,

Great to see we have a good quorum. So how about 4pm EST (1pm PST) today with a hangout on air. It works best if we start a a hangout then (previous attempts to pre-schedules on-air events did not work well. So please check this mailing list around 4pm EST for the hangout on air link.

Meanwhile, please join the Airavata Google Plus community, that might be easier to share the link - https://plus.google.com/communities/100700433662281905708

Thanks all for willing to take time on a sunday,
Suresh

On Feb 28, 2014, at 9:15 PM, Supun Kamburugamuva <su...@gmail.com> wrote:

> +1 for Sunday afternoon. I can make it after 4 pm EST.
> 
> Thanks,
> Supun..
> 
> 
> On Fri, Feb 28, 2014 at 5:04 PM, Shameera Rathnayaka <shameerainfo@gmail.com
>> wrote:
> 
>> +1
>> 
>> Thanks,
>> Shameera.
>> 
>> 
>> On Sat, Mar 1, 2014 at 3:11 AM, Eran Chinthaka Withana <
>> eran.chinthaka@gmail.com> wrote:
>> 
>>> +1 for Sunday afternoon
>>> 
>>> Thanks,
>>> Eran Chinthaka Withana
>>> 
>>> 
>>> On Fri, Feb 28, 2014 at 5:17 AM, Suresh Marru <sm...@apache.org> wrote:
>>> 
>>>> Hi Eran,
>>>> 
>>>> This is a great idea. I myself owe few replies on this thread and
>> unable
>>>> to take time to comprehend my thoughts (and realized I should take time
>>> to
>>>> properly articulate the challenges otherwise we will be discussing
>>>> orthogonal issues).
>>>> 
>>>> A hangout will help us brainstorm more comprehensively. We can have it
>> on
>>>> air so we can refer back for archival purposes. How is Sunday afternoon
>>> for
>>>> everyone willing to join and contribute?
>>>> 
>>>> Thanks,
>>>> Suresh
>>>> 
>>>> On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <
>>>> eran.chinthaka@gmail.com> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Is there any chance of hosting a google hangout to talk about this. I
>>>> think
>>>>> with long emails and multiple directions things are getting little
>> bit
>>>>> confusing in thread (I'm partly responsible for this :) ). I can
>> join a
>>>>> video chat during a weekend but lets make sure its convenient for
>> both
>>>> east
>>>>> and west coasts :)
>>>>> 
>>>>> WDYT?
>>>>> 
>>>>> Thanks,
>>>>> Eran Chinthaka Withana
>>>>> 
>>>>> 
>>>>> On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <sm...@apache.org>
>>> wrote:
>>>>> 
>>>>>> I could respond to each thread in detail, but I see the general
>> sense
>>> is
>>>>>> inquiring on the use case, so let me try and explain this and see if
>>> it
>>>>>> comes across. I am fully onboard with perceptions of relational vs
>>> nosql
>>>>>> and also agree current Airavata needs are not a direct map for NoSQL
>>>>>> migration. I will summarize the driving motivation:
>>>>>> 
>>>>>> Background: The key problem Airavata needs to solve is getting the
>> API
>>>> and
>>>>>> associated data model right. The problem is current relational
>>> database
>>>>>> (with OpenJPA overlay) is severely limiting the API evolution.
>> Science
>>>>>> Gateways by nature are very science domain and use-case specific.
>> But
>>>>>> Airavata is tackling this challenging problem of providing a generic
>>> API
>>>>>> which will meet and enable these use case centric integration. The
>>> issue
>>>>>> here is, we are designing an API to handle a wide range of known
>> (and
>>>> some
>>>>>> foreseen) use cases. But at the same time trying to keep it simple
>> and
>>>> yet
>>>>>> flexible. The only way we can get through a reasonable, normalized
>>>> version
>>>>>> of API is by hands-on programming against the API. Within the
>> Airavata
>>>> PMC
>>>>>> itself, we can solicit a half-a-dozen different ways on how to
>>> visualize
>>>>>> the data model. And we need few hackethon's with real-end users of
>>>> Airavata
>>>>>> until we find a common ground. All of this needs rapid prototyping.
>>>>>> Currently a slight change in the data model is taking close to two
>>>> weeks of
>>>>>> re-arcitecting the Open-JPA based registry. There are many known
>>>> problems
>>>>>> with current draft of data model which have to be put-down in the
>>>> interest
>>>>>> of making over all system progress.
>>>>>> 
>>>>>> So the driving motivation is not certainly any of the classic NoSQL
>>>> needs.
>>>>>> But a simple one, can we have registry which is schema-agnostic and
>>> yet
>>>> is
>>>>>> queriable for most of the fields in the model? Can we try 10
>> different
>>>>>> variants of data model (hence API) within the next 3 months with
>>> focused
>>>>>> hackethon's and arrive at a stable 1.0 version of API?
>>>>>> 
>>>>>> Part one is the discussion is successful that it raised every one's
>>> eye
>>>>>> brows. Now that we have every one's attention, what will be a good
>>> data
>>>>>> store for Airavata which will meet these needs?
>>>>>> 
>>>>>> P.S: Additional background: The API has been in development for
>> close
>>>> to 3
>>>>>> years and is falling short of pleasing a majority. Many academic
>>>>>> standardization efforts fail terribly trying to pretend to
>> understand
>>>> all
>>>>>> use cases and proposing a standard way (which ends up unnecessarily
>>>> complex
>>>>>> and not usable). Science by nature is evolutionary, and restricting
>>> the
>>>>>> capabilities by a known set of use cases prevents the use of
>>> middleware
>>>> for
>>>>>> real-scientific research (and gets limited to proof of concept
>>>>>> demonstrations, papers, educational use). The only way meeting the
>>>>>> challenges of these evolving needs is to have the framework which
>> can
>>>>>> evolve with minimal disruption.
>>>>>> 
>>>>>> Great thoughts so far, please keep 'em coming until we can find a
>>>> solution
>>>>>> not by the technical fancies but to address the real need.
>>>>>> 
>>>>>> Cheers,
>>>>>> Suresh
>>>>>> 
>>>>>> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <glahiru@gmail.com
>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
>>>>>>> milinda.pathirage@gmail.com> wrote:
>>>>>>> 
>>>>>>>> I also think that moving to Cassandra or any other NoSQL will add
>>>>>>>> unneccessary complexity to your solution. Also designing proper
>>> (easy
>>>> to
>>>>>>>> manage changes, easy to query) NoSQL data models are hard (AFAIK,
>>>>>> require
>>>>>>>> lots of experience and understanding about data structures and
>>>> queries).
>>>>>>>> Also migrating from one NoSQL technology to other can require
>>> complete
>>>>>>>> re-write. And current relational databases can handle heavy loads
>>>> except
>>>>>>>> Google, Twitter, Amazon and Facebook like loads. I don't think
>>>> Airavata
>>>>>>>> will see Google and Amazon like loads.
>>>>>>>> 
>>>>>>> +1
>>>>>>> 
>>>>>>>> 
>>>>>>>> If the constant changes to the data model is the problem , I think
>>>> best
>>>>>>>> option is to abstract registry implementation to something like
>>>>>> collections
>>>>>>>> and resources used in WSO2 Registry [1] or something suitable for
>>>>>> Airavata
>>>>>>>> context. That will make it easy to handle changes in data model.
>>>>>>>> 
>>>>>>>> Also don't let the technologies drive design decision. Its always
>>>>>> better to
>>>>>>>> let use cases drive the design decision.
>>>>>>>> 
>>>>>>> +1
>>>>>>> 
>>>>>>> Regards
>>>>>>> Lahiru
>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> Milinda
>>>>>>>> 
>>>>>>>> [1] http://wso2.com/products/governance-registry/
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
>>>>>> supun06@gmail.com
>>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> I'm not trying to discourage you on your exploration to NoSQL
>>>>>> databases.
>>>>>>>> I
>>>>>>>>> have the following concern.
>>>>>>>>> 
>>>>>>>>> Your database schema is moderately complex - even for a RDBMS it
>>>> seems
>>>>>>>>> complex and the data size is relatively small. I'm not sure about
>>> the
>>>>>>>>> current tools available but I think you will need to write more
>>> code
>>>> to
>>>>>>>>> support all your requirements in a NoSQL database. So writing
>> more
>>>> code
>>>>>>>> and
>>>>>>>>> allow redundancy to support *relatively small* and *structured
>>>>>>>>> data*doesn't seem right to me. May be I'm wrong and there are
>>> better
>>>>>>>>> tools in
>>>>>>>>> NoSQL than RDBMS, which I doubt.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Supun..
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <smarru@apache.org
>>> 
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi All,
>>>>>>>>>> 
>>>>>>>>>> Airavata is actively migrating to use Thrift API for the
>> RESTless
>>>>>>>> design
>>>>>>>>>> and to facilitate various language bindings from client
>> gateways.
>>>> The
>>>>>>>>>> programming language support in thrift has been so far very
>>>>>>>> encouraging.
>>>>>>>>>> The current architecture is looking like Figure 1 at [1].
>>>>>>>>>> 
>>>>>>>>>> Language specific clients will be released as thrift SDK's
>>> (similar
>>>> to
>>>>>>>>>> evernote sdk's [1]). These clients will be integrated into
>> gateway
>>>>>>>>> portals
>>>>>>>>>> which connect to the API Server. The API operations brokers he
>>>> simple
>>>>>>>>> calls
>>>>>>>>>> into one or more backend CPI calls (Airavata internal component
>>>>>>>>>> interfaces).  An example set of mappings are illustrated in
>>> Figure 2
>>>>>> at
>>>>>>>>>> [1]. The current draft of thrift API for version 0.12 is at [3],
>>>>>> please
>>>>>>>>> pay
>>>>>>>>>> attention to experiment model at [4].
>>>>>>>>>> 
>>>>>>>>>> For the persistent store, we had few iterations of Airavata
>>> Registry
>>>>>>>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA
>>>> based
>>>>>>>>>> registry. To allow the API and the associated data models to
>>> evolve,
>>>>>> it
>>>>>>>>>> will be useful to explore object databases so we can store the
>>>>>>>> serialized
>>>>>>>>>> version of thrift objects directly. But it will be nice to have
>>> all
>>>>>> (or
>>>>>>>>>> most) of the fields queriable. This calls for a more
>> column-family
>>>>>>>> design
>>>>>>>>>> of any NoSQL approaches.
>>>>>>>>>> 
>>>>>>>>>> Any recommendations for a registry architecture?
>>>>>>>>>> 
>>>>>>>>>> Quickly hacking through I find the following approach a viable
>>> one:
>>>>>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata
>>> can
>>>>>>>>> benefit
>>>>>>>>>> immediately from the replication and reliability of cassandra
>> and
>>>>>>>>>> scalability in near future. Some of the model objects like
>>>> experiment
>>>>>>>>>> creation will need to have strong consistency and most of the
>>>>>>>> monitoring
>>>>>>>>>> can live with eventual consistency.
>>>>>>>>>> 
>>>>>>>>>> Critical comments please?
>>>>>>>>>> 
>>>>>>>>>> Thanks for your time,
>>>>>>>>>> Suresh
>>>>>>>>>> 
>>>>>>>>>> [1] -
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>>>>>>>>>> [2] - https://dev.evernote.com/doc/
>>>>>>>>>> [3] -
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>>>>>>>>>> [4] -
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>>>>>>>>>> [5] - https://github.com/MisterTea/ZombieDB
>>>>>>>>>> [6] - https://github.com/Netflix/astyanax
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Supun Kamburugamuva
>>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>>>>>>>> Blog: http://supunk.blogspot.com
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Milinda Pathirage
>>>>>>>> PhD Student Indiana University, Bloomington;
>>>>>>>> E-mail: milinda.pathirage@gmail.com
>>>>>>>> Web: http://mpathirage.com
>>>>>>>> Blog: http://blog.mpathirage.com
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> System Analyst Programmer
>>>>>>> PTI Lab
>>>>>>> Indiana University
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Best Regards,
>> Shameera Rathnayaka.
>> 
>> email: shameera AT apache.org , shameerainfo AT gmail.com
>> Blog : http://shameerarathnayaka.blogspot.com/
>> 
> 
> 
> 
> -- 
> Supun Kamburugamuva
> Member, Apache Software Foundation; http://www.apache.org
> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> Blog: http://supunk.blogspot.com

Re: Object Database Suggestions for Airavata Registry

Posted by Supun Kamburugamuva <su...@gmail.com>.

+1 for Sunday afternoon. I can make it after 4 pm EST.

Thanks,
Supun..


On Fri, Feb 28, 2014 at 5:04 PM, Shameera Rathnayaka <shameerainfo@gmail.com
> wrote:

> +1
>
> Thanks,
> Shameera.
>
>
> On Sat, Mar 1, 2014 at 3:11 AM, Eran Chinthaka Withana <
> eran.chinthaka@gmail.com> wrote:
>
> > +1 for Sunday afternoon
> >
> > Thanks,
> > Eran Chinthaka Withana
> >
> >
> > On Fri, Feb 28, 2014 at 5:17 AM, Suresh Marru <sm...@apache.org> wrote:
> >
> > > Hi Eran,
> > >
> > > This is a great idea. I myself owe few replies on this thread and
> unable
> > > to take time to comprehend my thoughts (and realized I should take time
> > to
> > > properly articulate the challenges otherwise we will be discussing
> > > orthogonal issues).
> > >
> > > A hangout will help us brainstorm more comprehensively. We can have it
> on
> > > air so we can refer back for archival purposes. How is Sunday afternoon
> > for
> > > everyone willing to join and contribute?
> > >
> > > Thanks,
> > > Suresh
> > >
> > > On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <
> > > eran.chinthaka@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > Is there any chance of hosting a google hangout to talk about this. I
> > > think
> > > > with long emails and multiple directions things are getting little
> bit
> > > > confusing in thread (I'm partly responsible for this :) ). I can
> join a
> > > > video chat during a weekend but lets make sure its convenient for
> both
> > > east
> > > > and west coasts :)
> > > >
> > > > WDYT?
> > > >
> > > > Thanks,
> > > > Eran Chinthaka Withana
> > > >
> > > >
> > > > On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <sm...@apache.org>
> > wrote:
> > > >
> > > >> I could respond to each thread in detail, but I see the general
> sense
> > is
> > > >> inquiring on the use case, so let me try and explain this and see if
> > it
> > > >> comes across. I am fully onboard with perceptions of relational vs
> > nosql
> > > >> and also agree current Airavata needs are not a direct map for NoSQL
> > > >> migration. I will summarize the driving motivation:
> > > >>
> > > >> Background: The key problem Airavata needs to solve is getting the
> API
> > > and
> > > >> associated data model right. The problem is current relational
> > database
> > > >> (with OpenJPA overlay) is severely limiting the API evolution.
> Science
> > > >> Gateways by nature are very science domain and use-case specific.
> But
> > > >> Airavata is tackling this challenging problem of providing a generic
> > API
> > > >> which will meet and enable these use case centric integration. The
> > issue
> > > >> here is, we are designing an API to handle a wide range of known
> (and
> > > some
> > > >> foreseen) use cases. But at the same time trying to keep it simple
> and
> > > yet
> > > >> flexible. The only way we can get through a reasonable, normalized
> > > version
> > > >> of API is by hands-on programming against the API. Within the
> Airavata
> > > PMC
> > > >> itself, we can solicit a half-a-dozen different ways on how to
> > visualize
> > > >> the data model. And we need few hackethon's with real-end users of
> > > Airavata
> > > >> until we find a common ground. All of this needs rapid prototyping.
> > > >> Currently a slight change in the data model is taking close to two
> > > weeks of
> > > >> re-arcitecting the Open-JPA based registry. There are many known
> > > problems
> > > >> with current draft of data model which have to be put-down in the
> > > interest
> > > >> of making over all system progress.
> > > >>
> > > >> So the driving motivation is not certainly any of the classic NoSQL
> > > needs.
> > > >> But a simple one, can we have registry which is schema-agnostic and
> > yet
> > > is
> > > >> queriable for most of the fields in the model? Can we try 10
> different
> > > >> variants of data model (hence API) within the next 3 months with
> > focused
> > > >> hackethon's and arrive at a stable 1.0 version of API?
> > > >>
> > > >> Part one is the discussion is successful that it raised every one's
> > eye
> > > >> brows. Now that we have every one's attention, what will be a good
> > data
> > > >> store for Airavata which will meet these needs?
> > > >>
> > > >> P.S: Additional background: The API has been in development for
> close
> > > to 3
> > > >> years and is falling short of pleasing a majority. Many academic
> > > >> standardization efforts fail terribly trying to pretend to
> understand
> > > all
> > > >> use cases and proposing a standard way (which ends up unnecessarily
> > > complex
> > > >> and not usable). Science by nature is evolutionary, and restricting
> > the
> > > >> capabilities by a known set of use cases prevents the use of
> > middleware
> > > for
> > > >> real-scientific research (and gets limited to proof of concept
> > > >> demonstrations, papers, educational use). The only way meeting the
> > > >> challenges of these evolving needs is to have the framework which
> can
> > > >> evolve with minimal disruption.
> > > >>
> > > >> Great thoughts so far, please keep 'em coming until we can find a
> > > solution
> > > >> not by the technical fancies but to address the real need.
> > > >>
> > > >> Cheers,
> > > >> Suresh
> > > >>
> > > >> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <glahiru@gmail.com
> >
> > > >> wrote:
> > > >>
> > > >>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
> > > >>> milinda.pathirage@gmail.com> wrote:
> > > >>>
> > > >>>> I also think that moving to Cassandra or any other NoSQL will add
> > > >>>> unneccessary complexity to your solution. Also designing proper
> > (easy
> > > to
> > > >>>> manage changes, easy to query) NoSQL data models are hard (AFAIK,
> > > >> require
> > > >>>> lots of experience and understanding about data structures and
> > > queries).
> > > >>>> Also migrating from one NoSQL technology to other can require
> > complete
> > > >>>> re-write. And current relational databases can handle heavy loads
> > > except
> > > >>>> Google, Twitter, Amazon and Facebook like loads. I don't think
> > > Airavata
> > > >>>> will see Google and Amazon like loads.
> > > >>>>
> > > >>> +1
> > > >>>
> > > >>>>
> > > >>>> If the constant changes to the data model is the problem , I think
> > > best
> > > >>>> option is to abstract registry implementation to something like
> > > >> collections
> > > >>>> and resources used in WSO2 Registry [1] or something suitable for
> > > >> Airavata
> > > >>>> context. That will make it easy to handle changes in data model.
> > > >>>>
> > > >>>> Also don't let the technologies drive design decision. Its always
> > > >> better to
> > > >>>> let use cases drive the design decision.
> > > >>>>
> > > >>> +1
> > > >>>
> > > >>> Regards
> > > >>> Lahiru
> > > >>>
> > > >>>>
> > > >>>> Thanks
> > > >>>> Milinda
> > > >>>>
> > > >>>> [1] http://wso2.com/products/governance-registry/
> > > >>>>
> > > >>>>
> > > >>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
> > > >> supun06@gmail.com
> > > >>>>> wrote:
> > > >>>>
> > > >>>>> Hi all,
> > > >>>>>
> > > >>>>> I'm not trying to discourage you on your exploration to NoSQL
> > > >> databases.
> > > >>>> I
> > > >>>>> have the following concern.
> > > >>>>>
> > > >>>>> Your database schema is moderately complex - even for a RDBMS it
> > > seems
> > > >>>>> complex and the data size is relatively small. I'm not sure about
> > the
> > > >>>>> current tools available but I think you will need to write more
> > code
> > > to
> > > >>>>> support all your requirements in a NoSQL database. So writing
> more
> > > code
> > > >>>> and
> > > >>>>> allow redundancy to support *relatively small* and *structured
> > > >>>>> data*doesn't seem right to me. May be I'm wrong and there are
> > better
> > > >>>>> tools in
> > > >>>>> NoSQL than RDBMS, which I doubt.
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Supun..
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <smarru@apache.org
> >
> > > >> wrote:
> > > >>>>>
> > > >>>>>> Hi All,
> > > >>>>>>
> > > >>>>>> Airavata is actively migrating to use Thrift API for the
> RESTless
> > > >>>> design
> > > >>>>>> and to facilitate various language bindings from client
> gateways.
> > > The
> > > >>>>>> programming language support in thrift has been so far very
> > > >>>> encouraging.
> > > >>>>>> The current architecture is looking like Figure 1 at [1].
> > > >>>>>>
> > > >>>>>> Language specific clients will be released as thrift SDK's
> > (similar
> > > to
> > > >>>>>> evernote sdk's [1]). These clients will be integrated into
> gateway
> > > >>>>> portals
> > > >>>>>> which connect to the API Server. The API operations brokers he
> > > simple
> > > >>>>> calls
> > > >>>>>> into one or more backend CPI calls (Airavata internal component
> > > >>>>>> interfaces).  An example set of mappings are illustrated in
> > Figure 2
> > > >> at
> > > >>>>>> [1]. The current draft of thrift API for version 0.12 is at [3],
> > > >> please
> > > >>>>> pay
> > > >>>>>> attention to experiment model at [4].
> > > >>>>>>
> > > >>>>>> For the persistent store, we had few iterations of Airavata
> > Registry
> > > >>>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA
> > > based
> > > >>>>>> registry. To allow the API and the associated data models to
> > evolve,
> > > >> it
> > > >>>>>> will be useful to explore object databases so we can store the
> > > >>>> serialized
> > > >>>>>> version of thrift objects directly. But it will be nice to have
> > all
> > > >> (or
> > > >>>>>> most) of the fields queriable. This calls for a more
> column-family
> > > >>>> design
> > > >>>>>> of any NoSQL approaches.
> > > >>>>>>
> > > >>>>>> Any recommendations for a registry architecture?
> > > >>>>>>
> > > >>>>>> Quickly hacking through I find the following approach a viable
> > one:
> > > >>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata
> > can
> > > >>>>> benefit
> > > >>>>>> immediately from the replication and reliability of cassandra
> and
> > > >>>>>> scalability in near future. Some of the model objects like
> > > experiment
> > > >>>>>> creation will need to have strong consistency and most of the
> > > >>>> monitoring
> > > >>>>>> can live with eventual consistency.
> > > >>>>>>
> > > >>>>>> Critical comments please?
> > > >>>>>>
> > > >>>>>> Thanks for your time,
> > > >>>>>> Suresh
> > > >>>>>>
> > > >>>>>> [1] -
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> > > >>>>>> [2] - https://dev.evernote.com/doc/
> > > >>>>>> [3] -
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> > > >>>>>> [4] -
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> > > >>>>>> [5] - https://github.com/MisterTea/ZombieDB
> > > >>>>>> [6] - https://github.com/Netflix/astyanax
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Supun Kamburugamuva
> > > >>>>> Member, Apache Software Foundation; http://www.apache.org
> > > >>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> > > >>>>> Blog: http://supunk.blogspot.com
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Milinda Pathirage
> > > >>>> PhD Student Indiana University, Bloomington;
> > > >>>> E-mail: milinda.pathirage@gmail.com
> > > >>>> Web: http://mpathirage.com
> > > >>>> Blog: http://blog.mpathirage.com
> > > >>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> System Analyst Programmer
> > > >>> PTI Lab
> > > >>> Indiana University
> > > >>
> > > >>
> > >
> > >
> >
>
>
>
> --
> Best Regards,
> Shameera Rathnayaka.
>
> email: shameera AT apache.org , shameerainfo AT gmail.com
> Blog : http://shameerarathnayaka.blogspot.com/
>



-- 
Supun Kamburugamuva
Member, Apache Software Foundation; http://www.apache.org
E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
Blog: http://supunk.blogspot.com

Re: Object Database Suggestions for Airavata Registry

Posted by Shameera Rathnayaka <sh...@gmail.com>.

+1

Thanks,
Shameera.


On Sat, Mar 1, 2014 at 3:11 AM, Eran Chinthaka Withana <
eran.chinthaka@gmail.com> wrote:

> +1 for Sunday afternoon
>
> Thanks,
> Eran Chinthaka Withana
>
>
> On Fri, Feb 28, 2014 at 5:17 AM, Suresh Marru <sm...@apache.org> wrote:
>
> > Hi Eran,
> >
> > This is a great idea. I myself owe few replies on this thread and unable
> > to take time to comprehend my thoughts (and realized I should take time
> to
> > properly articulate the challenges otherwise we will be discussing
> > orthogonal issues).
> >
> > A hangout will help us brainstorm more comprehensively. We can have it on
> > air so we can refer back for archival purposes. How is Sunday afternoon
> for
> > everyone willing to join and contribute?
> >
> > Thanks,
> > Suresh
> >
> > On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <
> > eran.chinthaka@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Is there any chance of hosting a google hangout to talk about this. I
> > think
> > > with long emails and multiple directions things are getting little bit
> > > confusing in thread (I'm partly responsible for this :) ). I can join a
> > > video chat during a weekend but lets make sure its convenient for both
> > east
> > > and west coasts :)
> > >
> > > WDYT?
> > >
> > > Thanks,
> > > Eran Chinthaka Withana
> > >
> > >
> > > On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <sm...@apache.org>
> wrote:
> > >
> > >> I could respond to each thread in detail, but I see the general sense
> is
> > >> inquiring on the use case, so let me try and explain this and see if
> it
> > >> comes across. I am fully onboard with perceptions of relational vs
> nosql
> > >> and also agree current Airavata needs are not a direct map for NoSQL
> > >> migration. I will summarize the driving motivation:
> > >>
> > >> Background: The key problem Airavata needs to solve is getting the API
> > and
> > >> associated data model right. The problem is current relational
> database
> > >> (with OpenJPA overlay) is severely limiting the API evolution. Science
> > >> Gateways by nature are very science domain and use-case specific. But
> > >> Airavata is tackling this challenging problem of providing a generic
> API
> > >> which will meet and enable these use case centric integration. The
> issue
> > >> here is, we are designing an API to handle a wide range of known (and
> > some
> > >> foreseen) use cases. But at the same time trying to keep it simple and
> > yet
> > >> flexible. The only way we can get through a reasonable, normalized
> > version
> > >> of API is by hands-on programming against the API. Within the Airavata
> > PMC
> > >> itself, we can solicit a half-a-dozen different ways on how to
> visualize
> > >> the data model. And we need few hackethon's with real-end users of
> > Airavata
> > >> until we find a common ground. All of this needs rapid prototyping.
> > >> Currently a slight change in the data model is taking close to two
> > weeks of
> > >> re-arcitecting the Open-JPA based registry. There are many known
> > problems
> > >> with current draft of data model which have to be put-down in the
> > interest
> > >> of making over all system progress.
> > >>
> > >> So the driving motivation is not certainly any of the classic NoSQL
> > needs.
> > >> But a simple one, can we have registry which is schema-agnostic and
> yet
> > is
> > >> queriable for most of the fields in the model? Can we try 10 different
> > >> variants of data model (hence API) within the next 3 months with
> focused
> > >> hackethon's and arrive at a stable 1.0 version of API?
> > >>
> > >> Part one is the discussion is successful that it raised every one's
> eye
> > >> brows. Now that we have every one's attention, what will be a good
> data
> > >> store for Airavata which will meet these needs?
> > >>
> > >> P.S: Additional background: The API has been in development for close
> > to 3
> > >> years and is falling short of pleasing a majority. Many academic
> > >> standardization efforts fail terribly trying to pretend to understand
> > all
> > >> use cases and proposing a standard way (which ends up unnecessarily
> > complex
> > >> and not usable). Science by nature is evolutionary, and restricting
> the
> > >> capabilities by a known set of use cases prevents the use of
> middleware
> > for
> > >> real-scientific research (and gets limited to proof of concept
> > >> demonstrations, papers, educational use). The only way meeting the
> > >> challenges of these evolving needs is to have the framework which can
> > >> evolve with minimal disruption.
> > >>
> > >> Great thoughts so far, please keep 'em coming until we can find a
> > solution
> > >> not by the technical fancies but to address the real need.
> > >>
> > >> Cheers,
> > >> Suresh
> > >>
> > >> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <gl...@gmail.com>
> > >> wrote:
> > >>
> > >>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
> > >>> milinda.pathirage@gmail.com> wrote:
> > >>>
> > >>>> I also think that moving to Cassandra or any other NoSQL will add
> > >>>> unneccessary complexity to your solution. Also designing proper
> (easy
> > to
> > >>>> manage changes, easy to query) NoSQL data models are hard (AFAIK,
> > >> require
> > >>>> lots of experience and understanding about data structures and
> > queries).
> > >>>> Also migrating from one NoSQL technology to other can require
> complete
> > >>>> re-write. And current relational databases can handle heavy loads
> > except
> > >>>> Google, Twitter, Amazon and Facebook like loads. I don't think
> > Airavata
> > >>>> will see Google and Amazon like loads.
> > >>>>
> > >>> +1
> > >>>
> > >>>>
> > >>>> If the constant changes to the data model is the problem , I think
> > best
> > >>>> option is to abstract registry implementation to something like
> > >> collections
> > >>>> and resources used in WSO2 Registry [1] or something suitable for
> > >> Airavata
> > >>>> context. That will make it easy to handle changes in data model.
> > >>>>
> > >>>> Also don't let the technologies drive design decision. Its always
> > >> better to
> > >>>> let use cases drive the design decision.
> > >>>>
> > >>> +1
> > >>>
> > >>> Regards
> > >>> Lahiru
> > >>>
> > >>>>
> > >>>> Thanks
> > >>>> Milinda
> > >>>>
> > >>>> [1] http://wso2.com/products/governance-registry/
> > >>>>
> > >>>>
> > >>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
> > >> supun06@gmail.com
> > >>>>> wrote:
> > >>>>
> > >>>>> Hi all,
> > >>>>>
> > >>>>> I'm not trying to discourage you on your exploration to NoSQL
> > >> databases.
> > >>>> I
> > >>>>> have the following concern.
> > >>>>>
> > >>>>> Your database schema is moderately complex - even for a RDBMS it
> > seems
> > >>>>> complex and the data size is relatively small. I'm not sure about
> the
> > >>>>> current tools available but I think you will need to write more
> code
> > to
> > >>>>> support all your requirements in a NoSQL database. So writing more
> > code
> > >>>> and
> > >>>>> allow redundancy to support *relatively small* and *structured
> > >>>>> data*doesn't seem right to me. May be I'm wrong and there are
> better
> > >>>>> tools in
> > >>>>> NoSQL than RDBMS, which I doubt.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Supun..
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org>
> > >> wrote:
> > >>>>>
> > >>>>>> Hi All,
> > >>>>>>
> > >>>>>> Airavata is actively migrating to use Thrift API for the RESTless
> > >>>> design
> > >>>>>> and to facilitate various language bindings from client gateways.
> > The
> > >>>>>> programming language support in thrift has been so far very
> > >>>> encouraging.
> > >>>>>> The current architecture is looking like Figure 1 at [1].
> > >>>>>>
> > >>>>>> Language specific clients will be released as thrift SDK's
> (similar
> > to
> > >>>>>> evernote sdk's [1]). These clients will be integrated into gateway
> > >>>>> portals
> > >>>>>> which connect to the API Server. The API operations brokers he
> > simple
> > >>>>> calls
> > >>>>>> into one or more backend CPI calls (Airavata internal component
> > >>>>>> interfaces).  An example set of mappings are illustrated in
> Figure 2
> > >> at
> > >>>>>> [1]. The current draft of thrift API for version 0.12 is at [3],
> > >> please
> > >>>>> pay
> > >>>>>> attention to experiment model at [4].
> > >>>>>>
> > >>>>>> For the persistent store, we had few iterations of Airavata
> Registry
> > >>>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA
> > based
> > >>>>>> registry. To allow the API and the associated data models to
> evolve,
> > >> it
> > >>>>>> will be useful to explore object databases so we can store the
> > >>>> serialized
> > >>>>>> version of thrift objects directly. But it will be nice to have
> all
> > >> (or
> > >>>>>> most) of the fields queriable. This calls for a more column-family
> > >>>> design
> > >>>>>> of any NoSQL approaches.
> > >>>>>>
> > >>>>>> Any recommendations for a registry architecture?
> > >>>>>>
> > >>>>>> Quickly hacking through I find the following approach a viable
> one:
> > >>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata
> can
> > >>>>> benefit
> > >>>>>> immediately from the replication and reliability of cassandra and
> > >>>>>> scalability in near future. Some of the model objects like
> > experiment
> > >>>>>> creation will need to have strong consistency and most of the
> > >>>> monitoring
> > >>>>>> can live with eventual consistency.
> > >>>>>>
> > >>>>>> Critical comments please?
> > >>>>>>
> > >>>>>> Thanks for your time,
> > >>>>>> Suresh
> > >>>>>>
> > >>>>>> [1] -
> > >>>>>>
> > >>>>>
> > >>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> > >>>>>> [2] - https://dev.evernote.com/doc/
> > >>>>>> [3] -
> > >>>>>>
> > >>>>>
> > >>>>
> > >>
> >
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> > >>>>>> [4] -
> > >>>>>>
> > >>>>>
> > >>>>
> > >>
> >
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> > >>>>>> [5] - https://github.com/MisterTea/ZombieDB
> > >>>>>> [6] - https://github.com/Netflix/astyanax
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Supun Kamburugamuva
> > >>>>> Member, Apache Software Foundation; http://www.apache.org
> > >>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> > >>>>> Blog: http://supunk.blogspot.com
> > >>>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Milinda Pathirage
> > >>>> PhD Student Indiana University, Bloomington;
> > >>>> E-mail: milinda.pathirage@gmail.com
> > >>>> Web: http://mpathirage.com
> > >>>> Blog: http://blog.mpathirage.com
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> System Analyst Programmer
> > >>> PTI Lab
> > >>> Indiana University
> > >>
> > >>
> >
> >
>



-- 
Best Regards,
Shameera Rathnayaka.

email: shameera AT apache.org , shameerainfo AT gmail.com
Blog : http://shameerarathnayaka.blogspot.com/

Re: Object Database Suggestions for Airavata Registry

Posted by Eran Chinthaka Withana <er...@gmail.com>.

+1 for Sunday afternoon

Thanks,
Eran Chinthaka Withana


On Fri, Feb 28, 2014 at 5:17 AM, Suresh Marru <sm...@apache.org> wrote:

> Hi Eran,
>
> This is a great idea. I myself owe few replies on this thread and unable
> to take time to comprehend my thoughts (and realized I should take time to
> properly articulate the challenges otherwise we will be discussing
> orthogonal issues).
>
> A hangout will help us brainstorm more comprehensively. We can have it on
> air so we can refer back for archival purposes. How is Sunday afternoon for
> everyone willing to join and contribute?
>
> Thanks,
> Suresh
>
> On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <
> eran.chinthaka@gmail.com> wrote:
>
> > Hi,
> >
> > Is there any chance of hosting a google hangout to talk about this. I
> think
> > with long emails and multiple directions things are getting little bit
> > confusing in thread (I'm partly responsible for this :) ). I can join a
> > video chat during a weekend but lets make sure its convenient for both
> east
> > and west coasts :)
> >
> > WDYT?
> >
> > Thanks,
> > Eran Chinthaka Withana
> >
> >
> > On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <sm...@apache.org> wrote:
> >
> >> I could respond to each thread in detail, but I see the general sense is
> >> inquiring on the use case, so let me try and explain this and see if it
> >> comes across. I am fully onboard with perceptions of relational vs nosql
> >> and also agree current Airavata needs are not a direct map for NoSQL
> >> migration. I will summarize the driving motivation:
> >>
> >> Background: The key problem Airavata needs to solve is getting the API
> and
> >> associated data model right. The problem is current relational database
> >> (with OpenJPA overlay) is severely limiting the API evolution. Science
> >> Gateways by nature are very science domain and use-case specific. But
> >> Airavata is tackling this challenging problem of providing a generic API
> >> which will meet and enable these use case centric integration. The issue
> >> here is, we are designing an API to handle a wide range of known (and
> some
> >> foreseen) use cases. But at the same time trying to keep it simple and
> yet
> >> flexible. The only way we can get through a reasonable, normalized
> version
> >> of API is by hands-on programming against the API. Within the Airavata
> PMC
> >> itself, we can solicit a half-a-dozen different ways on how to visualize
> >> the data model. And we need few hackethon's with real-end users of
> Airavata
> >> until we find a common ground. All of this needs rapid prototyping.
> >> Currently a slight change in the data model is taking close to two
> weeks of
> >> re-arcitecting the Open-JPA based registry. There are many known
> problems
> >> with current draft of data model which have to be put-down in the
> interest
> >> of making over all system progress.
> >>
> >> So the driving motivation is not certainly any of the classic NoSQL
> needs.
> >> But a simple one, can we have registry which is schema-agnostic and yet
> is
> >> queriable for most of the fields in the model? Can we try 10 different
> >> variants of data model (hence API) within the next 3 months with focused
> >> hackethon's and arrive at a stable 1.0 version of API?
> >>
> >> Part one is the discussion is successful that it raised every one's eye
> >> brows. Now that we have every one's attention, what will be a good data
> >> store for Airavata which will meet these needs?
> >>
> >> P.S: Additional background: The API has been in development for close
> to 3
> >> years and is falling short of pleasing a majority. Many academic
> >> standardization efforts fail terribly trying to pretend to understand
> all
> >> use cases and proposing a standard way (which ends up unnecessarily
> complex
> >> and not usable). Science by nature is evolutionary, and restricting the
> >> capabilities by a known set of use cases prevents the use of middleware
> for
> >> real-scientific research (and gets limited to proof of concept
> >> demonstrations, papers, educational use). The only way meeting the
> >> challenges of these evolving needs is to have the framework which can
> >> evolve with minimal disruption.
> >>
> >> Great thoughts so far, please keep 'em coming until we can find a
> solution
> >> not by the technical fancies but to address the real need.
> >>
> >> Cheers,
> >> Suresh
> >>
> >> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <gl...@gmail.com>
> >> wrote:
> >>
> >>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
> >>> milinda.pathirage@gmail.com> wrote:
> >>>
> >>>> I also think that moving to Cassandra or any other NoSQL will add
> >>>> unneccessary complexity to your solution. Also designing proper (easy
> to
> >>>> manage changes, easy to query) NoSQL data models are hard (AFAIK,
> >> require
> >>>> lots of experience and understanding about data structures and
> queries).
> >>>> Also migrating from one NoSQL technology to other can require complete
> >>>> re-write. And current relational databases can handle heavy loads
> except
> >>>> Google, Twitter, Amazon and Facebook like loads. I don't think
> Airavata
> >>>> will see Google and Amazon like loads.
> >>>>
> >>> +1
> >>>
> >>>>
> >>>> If the constant changes to the data model is the problem , I think
> best
> >>>> option is to abstract registry implementation to something like
> >> collections
> >>>> and resources used in WSO2 Registry [1] or something suitable for
> >> Airavata
> >>>> context. That will make it easy to handle changes in data model.
> >>>>
> >>>> Also don't let the technologies drive design decision. Its always
> >> better to
> >>>> let use cases drive the design decision.
> >>>>
> >>> +1
> >>>
> >>> Regards
> >>> Lahiru
> >>>
> >>>>
> >>>> Thanks
> >>>> Milinda
> >>>>
> >>>> [1] http://wso2.com/products/governance-registry/
> >>>>
> >>>>
> >>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
> >> supun06@gmail.com
> >>>>> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I'm not trying to discourage you on your exploration to NoSQL
> >> databases.
> >>>> I
> >>>>> have the following concern.
> >>>>>
> >>>>> Your database schema is moderately complex - even for a RDBMS it
> seems
> >>>>> complex and the data size is relatively small. I'm not sure about the
> >>>>> current tools available but I think you will need to write more code
> to
> >>>>> support all your requirements in a NoSQL database. So writing more
> code
> >>>> and
> >>>>> allow redundancy to support *relatively small* and *structured
> >>>>> data*doesn't seem right to me. May be I'm wrong and there are better
> >>>>> tools in
> >>>>> NoSQL than RDBMS, which I doubt.
> >>>>>
> >>>>> Thanks,
> >>>>> Supun..
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org>
> >> wrote:
> >>>>>
> >>>>>> Hi All,
> >>>>>>
> >>>>>> Airavata is actively migrating to use Thrift API for the RESTless
> >>>> design
> >>>>>> and to facilitate various language bindings from client gateways.
> The
> >>>>>> programming language support in thrift has been so far very
> >>>> encouraging.
> >>>>>> The current architecture is looking like Figure 1 at [1].
> >>>>>>
> >>>>>> Language specific clients will be released as thrift SDK's (similar
> to
> >>>>>> evernote sdk's [1]). These clients will be integrated into gateway
> >>>>> portals
> >>>>>> which connect to the API Server. The API operations brokers he
> simple
> >>>>> calls
> >>>>>> into one or more backend CPI calls (Airavata internal component
> >>>>>> interfaces).  An example set of mappings are illustrated in Figure 2
> >> at
> >>>>>> [1]. The current draft of thrift API for version 0.12 is at [3],
> >> please
> >>>>> pay
> >>>>>> attention to experiment model at [4].
> >>>>>>
> >>>>>> For the persistent store, we had few iterations of Airavata Registry
> >>>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA
> based
> >>>>>> registry. To allow the API and the associated data models to evolve,
> >> it
> >>>>>> will be useful to explore object databases so we can store the
> >>>> serialized
> >>>>>> version of thrift objects directly. But it will be nice to have all
> >> (or
> >>>>>> most) of the fields queriable. This calls for a more column-family
> >>>> design
> >>>>>> of any NoSQL approaches.
> >>>>>>
> >>>>>> Any recommendations for a registry architecture?
> >>>>>>
> >>>>>> Quickly hacking through I find the following approach a viable one:
> >>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
> >>>>> benefit
> >>>>>> immediately from the replication and reliability of cassandra and
> >>>>>> scalability in near future. Some of the model objects like
> experiment
> >>>>>> creation will need to have strong consistency and most of the
> >>>> monitoring
> >>>>>> can live with eventual consistency.
> >>>>>>
> >>>>>> Critical comments please?
> >>>>>>
> >>>>>> Thanks for your time,
> >>>>>> Suresh
> >>>>>>
> >>>>>> [1] -
> >>>>>>
> >>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> >>>>>> [2] - https://dev.evernote.com/doc/
> >>>>>> [3] -
> >>>>>>
> >>>>>
> >>>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> >>>>>> [4] -
> >>>>>>
> >>>>>
> >>>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> >>>>>> [5] - https://github.com/MisterTea/ZombieDB
> >>>>>> [6] - https://github.com/Netflix/astyanax
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Supun Kamburugamuva
> >>>>> Member, Apache Software Foundation; http://www.apache.org
> >>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> >>>>> Blog: http://supunk.blogspot.com
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Milinda Pathirage
> >>>> PhD Student Indiana University, Bloomington;
> >>>> E-mail: milinda.pathirage@gmail.com
> >>>> Web: http://mpathirage.com
> >>>> Blog: http://blog.mpathirage.com
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> System Analyst Programmer
> >>> PTI Lab
> >>> Indiana University
> >>
> >>
>
>

Re: Object Database Suggestions for Airavata Registry

Posted by Suresh Marru <sm...@apache.org>.

Hi Eran,

This is a great idea. I myself owe few replies on this thread and unable to take time to comprehend my thoughts (and realized I should take time to properly articulate the challenges otherwise we will be discussing orthogonal issues).

A hangout will help us brainstorm more comprehensively. We can have it on air so we can refer back for archival purposes. How is Sunday afternoon for everyone willing to join and contribute?

Thanks,
Suresh

On Feb 28, 2014, at 1:45 AM, Eran Chinthaka Withana <er...@gmail.com> wrote:

> Hi,
> 
> Is there any chance of hosting a google hangout to talk about this. I think
> with long emails and multiple directions things are getting little bit
> confusing in thread (I'm partly responsible for this :) ). I can join a
> video chat during a weekend but lets make sure its convenient for both east
> and west coasts :)
> 
> WDYT?
> 
> Thanks,
> Eran Chinthaka Withana
> 
> 
> On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <sm...@apache.org> wrote:
> 
>> I could respond to each thread in detail, but I see the general sense is
>> inquiring on the use case, so let me try and explain this and see if it
>> comes across. I am fully onboard with perceptions of relational vs nosql
>> and also agree current Airavata needs are not a direct map for NoSQL
>> migration. I will summarize the driving motivation:
>> 
>> Background: The key problem Airavata needs to solve is getting the API and
>> associated data model right. The problem is current relational database
>> (with OpenJPA overlay) is severely limiting the API evolution. Science
>> Gateways by nature are very science domain and use-case specific. But
>> Airavata is tackling this challenging problem of providing a generic API
>> which will meet and enable these use case centric integration. The issue
>> here is, we are designing an API to handle a wide range of known (and some
>> foreseen) use cases. But at the same time trying to keep it simple and yet
>> flexible. The only way we can get through a reasonable, normalized version
>> of API is by hands-on programming against the API. Within the Airavata PMC
>> itself, we can solicit a half-a-dozen different ways on how to visualize
>> the data model. And we need few hackethon's with real-end users of Airavata
>> until we find a common ground. All of this needs rapid prototyping.
>> Currently a slight change in the data model is taking close to two weeks of
>> re-arcitecting the Open-JPA based registry. There are many known problems
>> with current draft of data model which have to be put-down in the interest
>> of making over all system progress.
>> 
>> So the driving motivation is not certainly any of the classic NoSQL needs.
>> But a simple one, can we have registry which is schema-agnostic and yet is
>> queriable for most of the fields in the model? Can we try 10 different
>> variants of data model (hence API) within the next 3 months with focused
>> hackethon's and arrive at a stable 1.0 version of API?
>> 
>> Part one is the discussion is successful that it raised every one's eye
>> brows. Now that we have every one's attention, what will be a good data
>> store for Airavata which will meet these needs?
>> 
>> P.S: Additional background: The API has been in development for close to 3
>> years and is falling short of pleasing a majority. Many academic
>> standardization efforts fail terribly trying to pretend to understand all
>> use cases and proposing a standard way (which ends up unnecessarily complex
>> and not usable). Science by nature is evolutionary, and restricting the
>> capabilities by a known set of use cases prevents the use of middleware for
>> real-scientific research (and gets limited to proof of concept
>> demonstrations, papers, educational use). The only way meeting the
>> challenges of these evolving needs is to have the framework which can
>> evolve with minimal disruption.
>> 
>> Great thoughts so far, please keep 'em coming until we can find a solution
>> not by the technical fancies but to address the real need.
>> 
>> Cheers,
>> Suresh
>> 
>> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <gl...@gmail.com>
>> wrote:
>> 
>>> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
>>> milinda.pathirage@gmail.com> wrote:
>>> 
>>>> I also think that moving to Cassandra or any other NoSQL will add
>>>> unneccessary complexity to your solution. Also designing proper (easy to
>>>> manage changes, easy to query) NoSQL data models are hard (AFAIK,
>> require
>>>> lots of experience and understanding about data structures and queries).
>>>> Also migrating from one NoSQL technology to other can require complete
>>>> re-write. And current relational databases can handle heavy loads except
>>>> Google, Twitter, Amazon and Facebook like loads. I don't think Airavata
>>>> will see Google and Amazon like loads.
>>>> 
>>> +1
>>> 
>>>> 
>>>> If the constant changes to the data model is the problem , I think best
>>>> option is to abstract registry implementation to something like
>> collections
>>>> and resources used in WSO2 Registry [1] or something suitable for
>> Airavata
>>>> context. That will make it easy to handle changes in data model.
>>>> 
>>>> Also don't let the technologies drive design decision. Its always
>> better to
>>>> let use cases drive the design decision.
>>>> 
>>> +1
>>> 
>>> Regards
>>> Lahiru
>>> 
>>>> 
>>>> Thanks
>>>> Milinda
>>>> 
>>>> [1] http://wso2.com/products/governance-registry/
>>>> 
>>>> 
>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
>> supun06@gmail.com
>>>>> wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I'm not trying to discourage you on your exploration to NoSQL
>> databases.
>>>> I
>>>>> have the following concern.
>>>>> 
>>>>> Your database schema is moderately complex - even for a RDBMS it seems
>>>>> complex and the data size is relatively small. I'm not sure about the
>>>>> current tools available but I think you will need to write more code to
>>>>> support all your requirements in a NoSQL database. So writing more code
>>>> and
>>>>> allow redundancy to support *relatively small* and *structured
>>>>> data*doesn't seem right to me. May be I'm wrong and there are better
>>>>> tools in
>>>>> NoSQL than RDBMS, which I doubt.
>>>>> 
>>>>> Thanks,
>>>>> Supun..
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org>
>> wrote:
>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> Airavata is actively migrating to use Thrift API for the RESTless
>>>> design
>>>>>> and to facilitate various language bindings from client gateways. The
>>>>>> programming language support in thrift has been so far very
>>>> encouraging.
>>>>>> The current architecture is looking like Figure 1 at [1].
>>>>>> 
>>>>>> Language specific clients will be released as thrift SDK's (similar to
>>>>>> evernote sdk's [1]). These clients will be integrated into gateway
>>>>> portals
>>>>>> which connect to the API Server. The API operations brokers he simple
>>>>> calls
>>>>>> into one or more backend CPI calls (Airavata internal component
>>>>>> interfaces).  An example set of mappings are illustrated in Figure 2
>> at
>>>>>> [1]. The current draft of thrift API for version 0.12 is at [3],
>> please
>>>>> pay
>>>>>> attention to experiment model at [4].
>>>>>> 
>>>>>> For the persistent store, we had few iterations of Airavata Registry
>>>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
>>>>>> registry. To allow the API and the associated data models to evolve,
>> it
>>>>>> will be useful to explore object databases so we can store the
>>>> serialized
>>>>>> version of thrift objects directly. But it will be nice to have all
>> (or
>>>>>> most) of the fields queriable. This calls for a more column-family
>>>> design
>>>>>> of any NoSQL approaches.
>>>>>> 
>>>>>> Any recommendations for a registry architecture?
>>>>>> 
>>>>>> Quickly hacking through I find the following approach a viable one:
>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
>>>>> benefit
>>>>>> immediately from the replication and reliability of cassandra and
>>>>>> scalability in near future. Some of the model objects like experiment
>>>>>> creation will need to have strong consistency and most of the
>>>> monitoring
>>>>>> can live with eventual consistency.
>>>>>> 
>>>>>> Critical comments please?
>>>>>> 
>>>>>> Thanks for your time,
>>>>>> Suresh
>>>>>> 
>>>>>> [1] -
>>>>>> 
>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>>>>>> [2] - https://dev.evernote.com/doc/
>>>>>> [3] -
>>>>>> 
>>>>> 
>>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>>>>>> [4] -
>>>>>> 
>>>>> 
>>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>>>>>> [5] - https://github.com/MisterTea/ZombieDB
>>>>>> [6] - https://github.com/Netflix/astyanax
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Supun Kamburugamuva
>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>>>> Blog: http://supunk.blogspot.com
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Milinda Pathirage
>>>> PhD Student Indiana University, Bloomington;
>>>> E-mail: milinda.pathirage@gmail.com
>>>> Web: http://mpathirage.com
>>>> Blog: http://blog.mpathirage.com
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> System Analyst Programmer
>>> PTI Lab
>>> Indiana University
>> 
>>

Re: Object Database Suggestions for Airavata Registry

Posted by Eran Chinthaka Withana <er...@gmail.com>.

Hi,

Is there any chance of hosting a google hangout to talk about this. I think
with long emails and multiple directions things are getting little bit
confusing in thread (I'm partly responsible for this :) ). I can join a
video chat during a weekend but lets make sure its convenient for both east
and west coasts :)

WDYT?

Thanks,
Eran Chinthaka Withana


On Mon, Feb 24, 2014 at 9:32 AM, Suresh Marru <sm...@apache.org> wrote:

> I could respond to each thread in detail, but I see the general sense is
> inquiring on the use case, so let me try and explain this and see if it
> comes across. I am fully onboard with perceptions of relational vs nosql
> and also agree current Airavata needs are not a direct map for NoSQL
> migration. I will summarize the driving motivation:
>
> Background: The key problem Airavata needs to solve is getting the API and
> associated data model right. The problem is current relational database
> (with OpenJPA overlay) is severely limiting the API evolution. Science
> Gateways by nature are very science domain and use-case specific. But
> Airavata is tackling this challenging problem of providing a generic API
> which will meet and enable these use case centric integration. The issue
> here is, we are designing an API to handle a wide range of known (and some
> foreseen) use cases. But at the same time trying to keep it simple and yet
> flexible. The only way we can get through a reasonable, normalized version
> of API is by hands-on programming against the API. Within the Airavata PMC
> itself, we can solicit a half-a-dozen different ways on how to visualize
> the data model. And we need few hackethon's with real-end users of Airavata
> until we find a common ground. All of this needs rapid prototyping.
> Currently a slight change in the data model is taking close to two weeks of
> re-arcitecting the Open-JPA based registry. There are many known problems
> with current draft of data model which have to be put-down in the interest
> of making over all system progress.
>
> So the driving motivation is not certainly any of the classic NoSQL needs.
> But a simple one, can we have registry which is schema-agnostic and yet is
> queriable for most of the fields in the model? Can we try 10 different
> variants of data model (hence API) within the next 3 months with focused
> hackethon's and arrive at a stable 1.0 version of API?
>
> Part one is the discussion is successful that it raised every one's eye
> brows. Now that we have every one's attention, what will be a good data
> store for Airavata which will meet these needs?
>
> P.S: Additional background: The API has been in development for close to 3
> years and is falling short of pleasing a majority. Many academic
> standardization efforts fail terribly trying to pretend to understand all
> use cases and proposing a standard way (which ends up unnecessarily complex
> and not usable). Science by nature is evolutionary, and restricting the
> capabilities by a known set of use cases prevents the use of middleware for
> real-scientific research (and gets limited to proof of concept
> demonstrations, papers, educational use). The only way meeting the
> challenges of these evolving needs is to have the framework which can
> evolve with minimal disruption.
>
> Great thoughts so far, please keep 'em coming until we can find a solution
> not by the technical fancies but to address the real need.
>
> Cheers,
> Suresh
>
> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <gl...@gmail.com>
> wrote:
>
> > On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
> > milinda.pathirage@gmail.com> wrote:
> >
> >> I also think that moving to Cassandra or any other NoSQL will add
> >> unneccessary complexity to your solution. Also designing proper (easy to
> >> manage changes, easy to query) NoSQL data models are hard (AFAIK,
> require
> >> lots of experience and understanding about data structures and queries).
> >> Also migrating from one NoSQL technology to other can require complete
> >> re-write. And current relational databases can handle heavy loads except
> >> Google, Twitter, Amazon and Facebook like loads. I don't think Airavata
> >> will see Google and Amazon like loads.
> >>
> > +1
> >
> >>
> >> If the constant changes to the data model is the problem , I think best
> >> option is to abstract registry implementation to something like
> collections
> >> and resources used in WSO2 Registry [1] or something suitable for
> Airavata
> >> context. That will make it easy to handle changes in data model.
> >>
> >> Also don't let the technologies drive design decision. Its always
> better to
> >> let use cases drive the design decision.
> >>
> > +1
> >
> > Regards
> > Lahiru
> >
> >>
> >> Thanks
> >> Milinda
> >>
> >> [1] http://wso2.com/products/governance-registry/
> >>
> >>
> >> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
> supun06@gmail.com
> >>> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I'm not trying to discourage you on your exploration to NoSQL
> databases.
> >> I
> >>> have the following concern.
> >>>
> >>> Your database schema is moderately complex - even for a RDBMS it seems
> >>> complex and the data size is relatively small. I'm not sure about the
> >>> current tools available but I think you will need to write more code to
> >>> support all your requirements in a NoSQL database. So writing more code
> >> and
> >>> allow redundancy to support *relatively small* and *structured
> >>> data*doesn't seem right to me. May be I'm wrong and there are better
> >>> tools in
> >>> NoSQL than RDBMS, which I doubt.
> >>>
> >>> Thanks,
> >>> Supun..
> >>>
> >>>
> >>>
> >>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org>
> wrote:
> >>>
> >>>> Hi All,
> >>>>
> >>>> Airavata is actively migrating to use Thrift API for the RESTless
> >> design
> >>>> and to facilitate various language bindings from client gateways. The
> >>>> programming language support in thrift has been so far very
> >> encouraging.
> >>>> The current architecture is looking like Figure 1 at [1].
> >>>>
> >>>> Language specific clients will be released as thrift SDK's (similar to
> >>>> evernote sdk's [1]). These clients will be integrated into gateway
> >>> portals
> >>>> which connect to the API Server. The API operations brokers he simple
> >>> calls
> >>>> into one or more backend CPI calls (Airavata internal component
> >>>> interfaces).  An example set of mappings are illustrated in Figure 2
> at
> >>>> [1]. The current draft of thrift API for version 0.12 is at [3],
> please
> >>> pay
> >>>> attention to experiment model at [4].
> >>>>
> >>>> For the persistent store, we had few iterations of Airavata Registry
> >>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
> >>>> registry. To allow the API and the associated data models to evolve,
> it
> >>>> will be useful to explore object databases so we can store the
> >> serialized
> >>>> version of thrift objects directly. But it will be nice to have all
> (or
> >>>> most) of the fields queriable. This calls for a more column-family
> >> design
> >>>> of any NoSQL approaches.
> >>>>
> >>>> Any recommendations for a registry architecture?
> >>>>
> >>>> Quickly hacking through I find the following approach a viable one:
> >>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
> >>> benefit
> >>>> immediately from the replication and reliability of cassandra and
> >>>> scalability in near future. Some of the model objects like experiment
> >>>> creation will need to have strong consistency and most of the
> >> monitoring
> >>>> can live with eventual consistency.
> >>>>
> >>>> Critical comments please?
> >>>>
> >>>> Thanks for your time,
> >>>> Suresh
> >>>>
> >>>> [1] -
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> >>>> [2] - https://dev.evernote.com/doc/
> >>>> [3] -
> >>>>
> >>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> >>>> [4] -
> >>>>
> >>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> >>>> [5] - https://github.com/MisterTea/ZombieDB
> >>>> [6] - https://github.com/Netflix/astyanax
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Supun Kamburugamuva
> >>> Member, Apache Software Foundation; http://www.apache.org
> >>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> >>> Blog: http://supunk.blogspot.com
> >>>
> >>
> >>
> >>
> >> --
> >> Milinda Pathirage
> >> PhD Student Indiana University, Bloomington;
> >> E-mail: milinda.pathirage@gmail.com
> >> Web: http://mpathirage.com
> >> Blog: http://blog.mpathirage.com
> >>
> >
> >
> >
> > --
> > System Analyst Programmer
> > PTI Lab
> > Indiana University
>
>

Re: Object Database Suggestions for Airavata Registry

Posted by Supun Kamburugamuva <su...@gmail.com>.

Hi Suresh,

>From your email this is what I got.

You are trying to create a data-model along with an API that fits most of
the use cases (you want to get it right).
You guys find it very hard because the data models are keep changing.
You want to find a solution such that you can do arbitrary queries. Also
you want to change the data model slightly if necessary with small changes.

But I think NoSQL or RDBMS will not solve your problems directly. First I
think writing a right API for a product like this is very hard to achieve
(may be you shouldn't try to achieve this). I think instead of trying to
write a right API you may want to write a simple but extendable API. May be
you should make all your data storage plug-gable. For example if I want to
record the data for a particular experiment I should enable a plugin for
that. From the data standpoint Airavata can only support a fixed schema
because Airavata capabilities are limited. If the user wants to insert
custom data from their workflow, you can provide extensions to the workflow
language/airavata so that user can do this easily. For custom queries you
may also need to provide extension mechanisms so that user can write their
own queries etc.

Thanks,
Supun..







On Mon, Feb 24, 2014 at 12:32 PM, Suresh Marru <sm...@apache.org> wrote:

> I could respond to each thread in detail, but I see the general sense is
> inquiring on the use case, so let me try and explain this and see if it
> comes across. I am fully onboard with perceptions of relational vs nosql
> and also agree current Airavata needs are not a direct map for NoSQL
> migration. I will summarize the driving motivation:
>
> Background: The key problem Airavata needs to solve is getting the API and
> associated data model right. The problem is current relational database
> (with OpenJPA overlay) is severely limiting the API evolution. Science
> Gateways by nature are very science domain and use-case specific. But
> Airavata is tackling this challenging problem of providing a generic API
> which will meet and enable these use case centric integration. The issue
> here is, we are designing an API to handle a wide range of known (and some
> foreseen) use cases. But at the same time trying to keep it simple and yet
> flexible. The only way we can get through a reasonable, normalized version
> of API is by hands-on programming against the API. Within the Airavata PMC
> itself, we can solicit a half-a-dozen different ways on how to visualize
> the data model. And we need few hackethon's with real-end users of Airavata
> until we find a common ground. All of this needs rapid prototyping.
> Currently a slight change in the data model is taking close to two weeks of
> re-arcitecting the Open-JPA based registry. There are many known problems
> with current draft of data model which have to be put-down in the interest
> of making over all system progress.
>
> So the driving motivation is not certainly any of the classic NoSQL needs.
> But a simple one, can we have registry which is schema-agnostic and yet is
> queriable for most of the fields in the model? Can we try 10 different
> variants of data model (hence API) within the next 3 months with focused
> hackethon's and arrive at a stable 1.0 version of API?
>
> Part one is the discussion is successful that it raised every one's eye
> brows. Now that we have every one's attention, what will be a good data
> store for Airavata which will meet these needs?
>
> P.S: Additional background: The API has been in development for close to 3
> years and is falling short of pleasing a majority. Many academic
> standardization efforts fail terribly trying to pretend to understand all
> use cases and proposing a standard way (which ends up unnecessarily complex
> and not usable). Science by nature is evolutionary, and restricting the
> capabilities by a known set of use cases prevents the use of middleware for
> real-scientific research (and gets limited to proof of concept
> demonstrations, papers, educational use). The only way meeting the
> challenges of these evolving needs is to have the framework which can
> evolve with minimal disruption.
>
> Great thoughts so far, please keep 'em coming until we can find a solution
> not by the technical fancies but to address the real need.
>
> Cheers,
> Suresh
>
> On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <gl...@gmail.com>
> wrote:
>
> > On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
> > milinda.pathirage@gmail.com> wrote:
> >
> >> I also think that moving to Cassandra or any other NoSQL will add
> >> unneccessary complexity to your solution. Also designing proper (easy to
> >> manage changes, easy to query) NoSQL data models are hard (AFAIK,
> require
> >> lots of experience and understanding about data structures and queries).
> >> Also migrating from one NoSQL technology to other can require complete
> >> re-write. And current relational databases can handle heavy loads except
> >> Google, Twitter, Amazon and Facebook like loads. I don't think Airavata
> >> will see Google and Amazon like loads.
> >>
> > +1
> >
> >>
> >> If the constant changes to the data model is the problem , I think best
> >> option is to abstract registry implementation to something like
> collections
> >> and resources used in WSO2 Registry [1] or something suitable for
> Airavata
> >> context. That will make it easy to handle changes in data model.
> >>
> >> Also don't let the technologies drive design decision. Its always
> better to
> >> let use cases drive the design decision.
> >>
> > +1
> >
> > Regards
> > Lahiru
> >
> >>
> >> Thanks
> >> Milinda
> >>
> >> [1] http://wso2.com/products/governance-registry/
> >>
> >>
> >> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
> supun06@gmail.com
> >>> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I'm not trying to discourage you on your exploration to NoSQL
> databases.
> >> I
> >>> have the following concern.
> >>>
> >>> Your database schema is moderately complex - even for a RDBMS it seems
> >>> complex and the data size is relatively small. I'm not sure about the
> >>> current tools available but I think you will need to write more code to
> >>> support all your requirements in a NoSQL database. So writing more code
> >> and
> >>> allow redundancy to support *relatively small* and *structured
> >>> data*doesn't seem right to me. May be I'm wrong and there are better
> >>> tools in
> >>> NoSQL than RDBMS, which I doubt.
> >>>
> >>> Thanks,
> >>> Supun..
> >>>
> >>>
> >>>
> >>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org>
> wrote:
> >>>
> >>>> Hi All,
> >>>>
> >>>> Airavata is actively migrating to use Thrift API for the RESTless
> >> design
> >>>> and to facilitate various language bindings from client gateways. The
> >>>> programming language support in thrift has been so far very
> >> encouraging.
> >>>> The current architecture is looking like Figure 1 at [1].
> >>>>
> >>>> Language specific clients will be released as thrift SDK's (similar to
> >>>> evernote sdk's [1]). These clients will be integrated into gateway
> >>> portals
> >>>> which connect to the API Server. The API operations brokers he simple
> >>> calls
> >>>> into one or more backend CPI calls (Airavata internal component
> >>>> interfaces).  An example set of mappings are illustrated in Figure 2
> at
> >>>> [1]. The current draft of thrift API for version 0.12 is at [3],
> please
> >>> pay
> >>>> attention to experiment model at [4].
> >>>>
> >>>> For the persistent store, we had few iterations of Airavata Registry
> >>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
> >>>> registry. To allow the API and the associated data models to evolve,
> it
> >>>> will be useful to explore object databases so we can store the
> >> serialized
> >>>> version of thrift objects directly. But it will be nice to have all
> (or
> >>>> most) of the fields queriable. This calls for a more column-family
> >> design
> >>>> of any NoSQL approaches.
> >>>>
> >>>> Any recommendations for a registry architecture?
> >>>>
> >>>> Quickly hacking through I find the following approach a viable one:
> >>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
> >>> benefit
> >>>> immediately from the replication and reliability of cassandra and
> >>>> scalability in near future. Some of the model objects like experiment
> >>>> creation will need to have strong consistency and most of the
> >> monitoring
> >>>> can live with eventual consistency.
> >>>>
> >>>> Critical comments please?
> >>>>
> >>>> Thanks for your time,
> >>>> Suresh
> >>>>
> >>>> [1] -
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> >>>> [2] - https://dev.evernote.com/doc/
> >>>> [3] -
> >>>>
> >>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> >>>> [4] -
> >>>>
> >>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> >>>> [5] - https://github.com/MisterTea/ZombieDB
> >>>> [6] - https://github.com/Netflix/astyanax
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Supun Kamburugamuva
> >>> Member, Apache Software Foundation; http://www.apache.org
> >>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> >>> Blog: http://supunk.blogspot.com
> >>>
> >>
> >>
> >>
> >> --
> >> Milinda Pathirage
> >> PhD Student Indiana University, Bloomington;
> >> E-mail: milinda.pathirage@gmail.com
> >> Web: http://mpathirage.com
> >> Blog: http://blog.mpathirage.com
> >>
> >
> >
> >
> > --
> > System Analyst Programmer
> > PTI Lab
> > Indiana University
>
>


-- 
Supun Kamburugamuva
Member, Apache Software Foundation; http://www.apache.org
E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
Blog: http://supunk.blogspot.com

Re: Object Database Suggestions for Airavata Registry

Posted by Suresh Marru <sm...@apache.org>.

I could respond to each thread in detail, but I see the general sense is inquiring on the use case, so let me try and explain this and see if it comes across. I am fully onboard with perceptions of relational vs nosql and also agree current Airavata needs are not a direct map for NoSQL migration. I will summarize the driving motivation:

Background: The key problem Airavata needs to solve is getting the API and associated data model right. The problem is current relational database (with OpenJPA overlay) is severely limiting the API evolution. Science Gateways by nature are very science domain and use-case specific. But Airavata is tackling this challenging problem of providing a generic API which will meet and enable these use case centric integration. The issue here is, we are designing an API to handle a wide range of known (and some foreseen) use cases. But at the same time trying to keep it simple and yet flexible. The only way we can get through a reasonable, normalized version of API is by hands-on programming against the API. Within the Airavata PMC itself, we can solicit a half-a-dozen different ways on how to visualize the data model. And we need few hackethon’s with real-end users of Airavata until we find a common ground. All of this needs rapid prototyping. Currently a slight change in the data model is taking close to two weeks of re-arcitecting the Open-JPA based registry. There are many known problems with current draft of data model which have to be put-down in the interest of making over all system progress. 

So the driving motivation is not certainly any of the classic NoSQL needs. But a simple one, can we have registry which is schema-agnostic and yet is queriable for most of the fields in the model? Can we try 10 different variants of data model (hence API) within the next 3 months with focused hackethon’s and arrive at a stable 1.0 version of API?

Part one is the discussion is successful that it raised every one’s eye brows. Now that we have every one’s attention, what will be a good data store for Airavata which will meet these needs? 

P.S: Additional background: The API has been in development for close to 3 years and is falling short of pleasing a majority. Many academic standardization efforts fail terribly trying to pretend to understand all use cases and proposing a standard way (which ends up unnecessarily complex and not usable). Science by nature is evolutionary, and restricting the capabilities by a known set of use cases prevents the use of middleware for real-scientific research (and gets limited to proof of concept demonstrations, papers, educational use). The only way meeting the challenges of these evolving needs is to have the framework which can evolve with minimal disruption. 

Great thoughts so far, please keep ’em coming until we can find a solution not by the technical fancies but to address the real need.

Cheers,
Suresh

On Feb 24, 2014, at 11:53 AM, Lahiru Gunathilake <gl...@gmail.com> wrote:

> On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
> milinda.pathirage@gmail.com> wrote:
> 
>> I also think that moving to Cassandra or any other NoSQL will add
>> unneccessary complexity to your solution. Also designing proper (easy to
>> manage changes, easy to query) NoSQL data models are hard (AFAIK, require
>> lots of experience and understanding about data structures and queries).
>> Also migrating from one NoSQL technology to other can require complete
>> re-write. And current relational databases can handle heavy loads except
>> Google, Twitter, Amazon and Facebook like loads. I don't think Airavata
>> will see Google and Amazon like loads.
>> 
> +1
> 
>> 
>> If the constant changes to the data model is the problem , I think best
>> option is to abstract registry implementation to something like collections
>> and resources used in WSO2 Registry [1] or something suitable for Airavata
>> context. That will make it easy to handle changes in data model.
>> 
>> Also don't let the technologies drive design decision. Its always better to
>> let use cases drive the design decision.
>> 
> +1
> 
> Regards
> Lahiru
> 
>> 
>> Thanks
>> Milinda
>> 
>> [1] http://wso2.com/products/governance-registry/
>> 
>> 
>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <supun06@gmail.com
>>> wrote:
>> 
>>> Hi all,
>>> 
>>> I'm not trying to discourage you on your exploration to NoSQL databases.
>> I
>>> have the following concern.
>>> 
>>> Your database schema is moderately complex - even for a RDBMS it seems
>>> complex and the data size is relatively small. I'm not sure about the
>>> current tools available but I think you will need to write more code to
>>> support all your requirements in a NoSQL database. So writing more code
>> and
>>> allow redundancy to support *relatively small* and *structured
>>> data*doesn't seem right to me. May be I'm wrong and there are better
>>> tools in
>>> NoSQL than RDBMS, which I doubt.
>>> 
>>> Thanks,
>>> Supun..
>>> 
>>> 
>>> 
>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org> wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> Airavata is actively migrating to use Thrift API for the RESTless
>> design
>>>> and to facilitate various language bindings from client gateways. The
>>>> programming language support in thrift has been so far very
>> encouraging.
>>>> The current architecture is looking like Figure 1 at [1].
>>>> 
>>>> Language specific clients will be released as thrift SDK's (similar to
>>>> evernote sdk's [1]). These clients will be integrated into gateway
>>> portals
>>>> which connect to the API Server. The API operations brokers he simple
>>> calls
>>>> into one or more backend CPI calls (Airavata internal component
>>>> interfaces).  An example set of mappings are illustrated in Figure 2 at
>>>> [1]. The current draft of thrift API for version 0.12 is at [3], please
>>> pay
>>>> attention to experiment model at [4].
>>>> 
>>>> For the persistent store, we had few iterations of Airavata Registry
>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
>>>> registry. To allow the API and the associated data models to evolve, it
>>>> will be useful to explore object databases so we can store the
>> serialized
>>>> version of thrift objects directly. But it will be nice to have all (or
>>>> most) of the fields queriable. This calls for a more column-family
>> design
>>>> of any NoSQL approaches.
>>>> 
>>>> Any recommendations for a registry architecture?
>>>> 
>>>> Quickly hacking through I find the following approach a viable one:
>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
>>> benefit
>>>> immediately from the replication and reliability of cassandra and
>>>> scalability in near future. Some of the model objects like experiment
>>>> creation will need to have strong consistency and most of the
>> monitoring
>>>> can live with eventual consistency.
>>>> 
>>>> Critical comments please?
>>>> 
>>>> Thanks for your time,
>>>> Suresh
>>>> 
>>>> [1] -
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>>>> [2] - https://dev.evernote.com/doc/
>>>> [3] -
>>>> 
>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>>>> [4] -
>>>> 
>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>>>> [5] - https://github.com/MisterTea/ZombieDB
>>>> [6] - https://github.com/Netflix/astyanax
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Supun Kamburugamuva
>>> Member, Apache Software Foundation; http://www.apache.org
>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>> Blog: http://supunk.blogspot.com
>>> 
>> 
>> 
>> 
>> --
>> Milinda Pathirage
>> PhD Student Indiana University, Bloomington;
>> E-mail: milinda.pathirage@gmail.com
>> Web: http://mpathirage.com
>> Blog: http://blog.mpathirage.com
>> 
> 
> 
> 
> -- 
> System Analyst Programmer
> PTI Lab
> Indiana University

Re: Object Database Suggestions for Airavata Registry

Posted by Lahiru Gunathilake <gl...@gmail.com>.

On Mon, Feb 24, 2014 at 11:20 AM, Milinda Pathirage <
milinda.pathirage@gmail.com> wrote:

> I also think that moving to Cassandra or any other NoSQL will add
> unneccessary complexity to your solution. Also designing proper (easy to
> manage changes, easy to query) NoSQL data models are hard (AFAIK, require
> lots of experience and understanding about data structures and queries).
> Also migrating from one NoSQL technology to other can require complete
> re-write. And current relational databases can handle heavy loads except
> Google, Twitter, Amazon and Facebook like loads. I don't think Airavata
> will see Google and Amazon like loads.
>
+1

>
> If the constant changes to the data model is the problem , I think best
> option is to abstract registry implementation to something like collections
> and resources used in WSO2 Registry [1] or something suitable for Airavata
> context. That will make it easy to handle changes in data model.
>
> Also don't let the technologies drive design decision. Its always better to
> let use cases drive the design decision.
>
+1

Regards
Lahiru

>
> Thanks
> Milinda
>
> [1] http://wso2.com/products/governance-registry/
>
>
> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <supun06@gmail.com
> >wrote:
>
> > Hi all,
> >
> > I'm not trying to discourage you on your exploration to NoSQL databases.
> I
> > have the following concern.
> >
> > Your database schema is moderately complex - even for a RDBMS it seems
> > complex and the data size is relatively small. I'm not sure about the
> > current tools available but I think you will need to write more code to
> > support all your requirements in a NoSQL database. So writing more code
> and
> > allow redundancy to support *relatively small* and *structured
> > data*doesn't seem right to me. May be I'm wrong and there are better
> > tools in
> > NoSQL than RDBMS, which I doubt.
> >
> > Thanks,
> > Supun..
> >
> >
> >
> > On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org> wrote:
> >
> > > Hi All,
> > >
> > > Airavata is actively migrating to use Thrift API for the RESTless
> design
> > > and to facilitate various language bindings from client gateways. The
> > > programming language support in thrift has been so far very
> encouraging.
> > > The current architecture is looking like Figure 1 at [1].
> > >
> > > Language specific clients will be released as thrift SDK's (similar to
> > > evernote sdk's [1]). These clients will be integrated into gateway
> > portals
> > > which connect to the API Server. The API operations brokers he simple
> > calls
> > > into one or more backend CPI calls (Airavata internal component
> > > interfaces).  An example set of mappings are illustrated in Figure 2 at
> > > [1]. The current draft of thrift API for version 0.12 is at [3], please
> > pay
> > > attention to experiment model at [4].
> > >
> > > For the persistent store, we had few iterations of Airavata Registry
> > > shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
> > > registry. To allow the API and the associated data models to evolve, it
> > > will be useful to explore object databases so we can store the
> serialized
> > > version of thrift objects directly. But it will be nice to have all (or
> > > most) of the fields queriable. This calls for a more column-family
> design
> > > of any NoSQL approaches.
> > >
> > > Any recommendations for a registry architecture?
> > >
> > > Quickly hacking through I find the following approach a viable one:
> > > ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
> > benefit
> > > immediately from the replication and reliability of cassandra and
> > > scalability in near future. Some of the model objects like experiment
> > > creation will need to have strong consistency and most of the
> monitoring
> > > can live with eventual consistency.
> > >
> > > Critical comments please?
> > >
> > > Thanks for your time,
> > > Suresh
> > >
> > > [1] -
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> > > [2] - https://dev.evernote.com/doc/
> > > [3] -
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> > > [4] -
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> > > [5] - https://github.com/MisterTea/ZombieDB
> > > [6] - https://github.com/Netflix/astyanax
> > >
> > >
> >
> >
> > --
> > Supun Kamburugamuva
> > Member, Apache Software Foundation; http://www.apache.org
> > E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> > Blog: http://supunk.blogspot.com
> >
>
>
>
> --
> Milinda Pathirage
> PhD Student Indiana University, Bloomington;
> E-mail: milinda.pathirage@gmail.com
> Web: http://mpathirage.com
> Blog: http://blog.mpathirage.com
>



-- 
System Analyst Programmer
PTI Lab
Indiana University

Re: Object Database Suggestions for Airavata Registry

Posted by Suresh Marru <sm...@apache.org>.

On Feb 24, 2014, at 12:07 PM, Eran Chinthaka Withana <er...@gmail.com> wrote:

> Haha, I don't wanna start a philosophical war here, but calling NoSQL is
> still in its infancy  and  NoSQL data models are hard and require
> lots of experience and understanding about data structures and queries is
> bit surprising to me. Its all about the use cases and finding the correct
> tool to help with it.

We can curb aside the philosophical discussions. When there is a strong structure and relational needs with good support for transactions, then obviously its better to stick witch sql. And certainly lots of real-world production usage scenarios (facebook, twitter, netflix to name a few) have proven success on NoSQL for their usecases. So I agree with every one here that we have to have the use cases first. I just sent a API driver and limiting to it not to confuse the discussion. If we go beyond that, there are other driving needs for scalable metadata, like shredding the data files and extracting thousands or parameters and making them queriable to allow experimentation.  Checking the covariance of mathematical models and making decisions on executions in real-time, tapping into streams of data and forking of workflows on interesting signature and so forth exist too. None of these are really at the scale or complexity of so called Big Data.

One other way of looking at cassandra like solutions is the builtin reliability. We can argue again on why not mysql master-slave pattern and so forth. But some things like cassandra cluster nodes on Amazon and if the data center running Airavata goes offline, the services running on EC2 can almost pick it from where left off and still has the full identical copy is appealing. If there are software which do things for us which we do not need to worry, why not? 

But all of these do not motivate the jump Airavata needs to take. The motivation is certainly solving the current problems first, which is helping Airavata evolve in next few months. Whether the solution is SQL or Not only SQL or something hybrid or a nice overlay over mysql is open for discussion. Once an API is stable (and can remain so for atleast an year), it very well could be argued at that point to have a well-defined schema and be with it in mysql world. I certainly do not have a strong opinion either way and have no first hand experience of NoSQL as many of you have, but certainly do not want to rule it out on popular perceptions. 

Suresh

> 
> Lets try to wait until either Suresh or someone comes up with set of
> usecases for registry to have a valid constructive discussion. May be NoSQL
> is not good and SQL is good for this usecase but we can not get into
> decisions now.
> 
> Thanks,
> Eran Chinthaka Withana
> 
> Thanks,
> Eran Chinthaka Withana
> 
> 
> On Mon, Feb 24, 2014 at 8:20 AM, Milinda Pathirage <
> milinda.pathirage@gmail.com> wrote:
> 
>> I also think that moving to Cassandra or any other NoSQL will add
>> unneccessary complexity to your solution. Also designing proper (easy to
>> manage changes, easy to query) NoSQL data models are hard (AFAIK, require
>> lots of experience and understanding about data structures and queries).
>> Also migrating from one NoSQL technology to other can require complete
>> re-write. And current relational databases can handle heavy loads except
>> Google, Twitter, Amazon and Facebook like loads. I don't think Airavata
>> will see Google and Amazon like loads.
>> 
>> If the constant changes to the data model is the problem , I think best
>> option is to abstract registry implementation to something like collections
>> and resources used in WSO2 Registry [1] or something suitable for Airavata
>> context. That will make it easy to handle changes in data model.
>> 
>> Also don't let the technologies drive design decision. Its always better to
>> let use cases drive the design decision.
>> 
>> Thanks
>> Milinda
>> 
>> [1] http://wso2.com/products/governance-registry/
>> 
>> 
>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <supun06@gmail.com
>>> wrote:
>> 
>>> Hi all,
>>> 
>>> I'm not trying to discourage you on your exploration to NoSQL databases.
>> I
>>> have the following concern.
>>> 
>>> Your database schema is moderately complex - even for a RDBMS it seems
>>> complex and the data size is relatively small. I'm not sure about the
>>> current tools available but I think you will need to write more code to
>>> support all your requirements in a NoSQL database. So writing more code
>> and
>>> allow redundancy to support *relatively small* and *structured
>>> data*doesn't seem right to me. May be I'm wrong and there are better
>>> tools in
>>> NoSQL than RDBMS, which I doubt.
>>> 
>>> Thanks,
>>> Supun..
>>> 
>>> 
>>> 
>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org> wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> Airavata is actively migrating to use Thrift API for the RESTless
>> design
>>>> and to facilitate various language bindings from client gateways. The
>>>> programming language support in thrift has been so far very
>> encouraging.
>>>> The current architecture is looking like Figure 1 at [1].
>>>> 
>>>> Language specific clients will be released as thrift SDK's (similar to
>>>> evernote sdk's [1]). These clients will be integrated into gateway
>>> portals
>>>> which connect to the API Server. The API operations brokers he simple
>>> calls
>>>> into one or more backend CPI calls (Airavata internal component
>>>> interfaces).  An example set of mappings are illustrated in Figure 2 at
>>>> [1]. The current draft of thrift API for version 0.12 is at [3], please
>>> pay
>>>> attention to experiment model at [4].
>>>> 
>>>> For the persistent store, we had few iterations of Airavata Registry
>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
>>>> registry. To allow the API and the associated data models to evolve, it
>>>> will be useful to explore object databases so we can store the
>> serialized
>>>> version of thrift objects directly. But it will be nice to have all (or
>>>> most) of the fields queriable. This calls for a more column-family
>> design
>>>> of any NoSQL approaches.
>>>> 
>>>> Any recommendations for a registry architecture?
>>>> 
>>>> Quickly hacking through I find the following approach a viable one:
>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
>>> benefit
>>>> immediately from the replication and reliability of cassandra and
>>>> scalability in near future. Some of the model objects like experiment
>>>> creation will need to have strong consistency and most of the
>> monitoring
>>>> can live with eventual consistency.
>>>> 
>>>> Critical comments please?
>>>> 
>>>> Thanks for your time,
>>>> Suresh
>>>> 
>>>> [1] -
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>>>> [2] - https://dev.evernote.com/doc/
>>>> [3] -
>>>> 
>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>>>> [4] -
>>>> 
>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>>>> [5] - https://github.com/MisterTea/ZombieDB
>>>> [6] - https://github.com/Netflix/astyanax
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Supun Kamburugamuva
>>> Member, Apache Software Foundation; http://www.apache.org
>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>> Blog: http://supunk.blogspot.com
>>> 
>> 
>> 
>> 
>> --
>> Milinda Pathirage
>> PhD Student Indiana University, Bloomington;
>> E-mail: milinda.pathirage@gmail.com
>> Web: http://mpathirage.com
>> Blog: http://blog.mpathirage.com
>>

Re: Object Database Suggestions for Airavata Registry

Posted by Eran Chinthaka Withana <er...@gmail.com>.

Haha, I don't wanna start a philosophical war here, but calling NoSQL is
still in its infancy  and  NoSQL data models are hard and require
lots of experience and understanding about data structures and queries is
bit surprising to me. Its all about the use cases and finding the correct
tool to help with it.

Lets try to wait until either Suresh or someone comes up with set of
usecases for registry to have a valid constructive discussion. May be NoSQL
is not good and SQL is good for this usecase but we can not get into
decisions now.

Thanks,
Eran Chinthaka Withana

Thanks,
Eran Chinthaka Withana


On Mon, Feb 24, 2014 at 8:20 AM, Milinda Pathirage <
milinda.pathirage@gmail.com> wrote:

> I also think that moving to Cassandra or any other NoSQL will add
> unneccessary complexity to your solution. Also designing proper (easy to
> manage changes, easy to query) NoSQL data models are hard (AFAIK, require
> lots of experience and understanding about data structures and queries).
> Also migrating from one NoSQL technology to other can require complete
> re-write. And current relational databases can handle heavy loads except
> Google, Twitter, Amazon and Facebook like loads. I don't think Airavata
> will see Google and Amazon like loads.
>
> If the constant changes to the data model is the problem , I think best
> option is to abstract registry implementation to something like collections
> and resources used in WSO2 Registry [1] or something suitable for Airavata
> context. That will make it easy to handle changes in data model.
>
> Also don't let the technologies drive design decision. Its always better to
> let use cases drive the design decision.
>
> Thanks
> Milinda
>
> [1] http://wso2.com/products/governance-registry/
>
>
> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <supun06@gmail.com
> >wrote:
>
> > Hi all,
> >
> > I'm not trying to discourage you on your exploration to NoSQL databases.
> I
> > have the following concern.
> >
> > Your database schema is moderately complex - even for a RDBMS it seems
> > complex and the data size is relatively small. I'm not sure about the
> > current tools available but I think you will need to write more code to
> > support all your requirements in a NoSQL database. So writing more code
> and
> > allow redundancy to support *relatively small* and *structured
> > data*doesn't seem right to me. May be I'm wrong and there are better
> > tools in
> > NoSQL than RDBMS, which I doubt.
> >
> > Thanks,
> > Supun..
> >
> >
> >
> > On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org> wrote:
> >
> > > Hi All,
> > >
> > > Airavata is actively migrating to use Thrift API for the RESTless
> design
> > > and to facilitate various language bindings from client gateways. The
> > > programming language support in thrift has been so far very
> encouraging.
> > > The current architecture is looking like Figure 1 at [1].
> > >
> > > Language specific clients will be released as thrift SDK's (similar to
> > > evernote sdk's [1]). These clients will be integrated into gateway
> > portals
> > > which connect to the API Server. The API operations brokers he simple
> > calls
> > > into one or more backend CPI calls (Airavata internal component
> > > interfaces).  An example set of mappings are illustrated in Figure 2 at
> > > [1]. The current draft of thrift API for version 0.12 is at [3], please
> > pay
> > > attention to experiment model at [4].
> > >
> > > For the persistent store, we had few iterations of Airavata Registry
> > > shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
> > > registry. To allow the API and the associated data models to evolve, it
> > > will be useful to explore object databases so we can store the
> serialized
> > > version of thrift objects directly. But it will be nice to have all (or
> > > most) of the fields queriable. This calls for a more column-family
> design
> > > of any NoSQL approaches.
> > >
> > > Any recommendations for a registry architecture?
> > >
> > > Quickly hacking through I find the following approach a viable one:
> > > ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
> > benefit
> > > immediately from the replication and reliability of cassandra and
> > > scalability in near future. Some of the model objects like experiment
> > > creation will need to have strong consistency and most of the
> monitoring
> > > can live with eventual consistency.
> > >
> > > Critical comments please?
> > >
> > > Thanks for your time,
> > > Suresh
> > >
> > > [1] -
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> > > [2] - https://dev.evernote.com/doc/
> > > [3] -
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> > > [4] -
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> > > [5] - https://github.com/MisterTea/ZombieDB
> > > [6] - https://github.com/Netflix/astyanax
> > >
> > >
> >
> >
> > --
> > Supun Kamburugamuva
> > Member, Apache Software Foundation; http://www.apache.org
> > E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> > Blog: http://supunk.blogspot.com
> >
>
>
>
> --
> Milinda Pathirage
> PhD Student Indiana University, Bloomington;
> E-mail: milinda.pathirage@gmail.com
> Web: http://mpathirage.com
> Blog: http://blog.mpathirage.com
>

Re: Object Database Suggestions for Airavata Registry

Posted by Suresh Marru <sm...@apache.org>.

On Feb 27, 2014, at 1:09 PM, K Yoshimoto <ke...@sdsc.edu> wrote:

> 
> I happened to look through the data model.  
> 
>>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> 
> How is information on the input data and transfer method stored?
> Is that freeform text in DataTransferDetails?

Hi Kenneth,

Current API draft is only facilitating the simple input types and any files will have to be passed. This will work with an assumption that the portal which connects to Airavata has a file staging component. Do you think it will be better if Airavata API provides ability to upload the input data files as opposed to provide a URL? 

> 
> Also, is there a place to describe preprocessin of job input data
> or a custom submit command?

I think this is a good requirement to allow pro-processing steps of the job itself. I think we should consider this for 0.13 release. 

Thanks for taking time to review the API.
Suresh

> 
> Sorry for the sidetrack.
> 
> Kenneth
> 
> On Wed, Feb 26, 2014 at 02:13:46AM +0530, Shameera Rathnayaka wrote:
>> Hi all,
>> 
>> Just thinking a loud here, sorry if i am moving this thread to another
>> direction.
>> 
>> If we going to use our own registry implementation, do we have consider
>> provide database layer where we can plug different kind of databases?(may
>> be Supun also suggesting the same in his previous reply). As we are already
>> separating SPIs and APIs for other components, we can do the same for DB
>> implementation too. NoSql  database like cassandra also have cql driver
>> which is identical to Mysql driver. So it is not difficult to implement
>> plugable environment,
>> 
>> In wso2 registry they already have above capability but not yet implemented
>> CQL as i know.
>> 
>> Thanks,
>> Shameera.
>> 
>> 
>> On Wed, Feb 26, 2014 at 1:36 AM, Saminda Wijeratne <sa...@gmail.com>wrote:
>> 
>>> Sorry I missed the arrow from Registry to Orchestrator. Thanks for pointing
>>> it out Marlon. Updated the arrows and added a legend.
>>> 
>>> Broken line arrow is involved in MessageBox component where it gets
>>> triggered from time to time without external user intervention. Also
>>> there's still some technical details we need to figure-out on how the
>>> MessageBox will function and expose itself in the new design.
>>> 
>>> 
>>> On Tue, Feb 25, 2014 at 2:36 PM, Marlon Pierce <ma...@iu.edu> wrote:
>>> 
>>>> Please define the solid and broken line arrows.  Why doesn't the
>>>> orchestrator interact with the registry?
>>>> 
>>>> 
>>>> Marlon
>>>> 
>>>> On 2/25/14 2:29 PM, Saminda Wijeratne wrote:
>>>>> The diagrams @[1] will depict functional requirements (at an
>>>>> abstract-level) for Airavata from CIPRES and UltraScan gateways.
>>>>> 
>>>>> 1. https://iu.app.box.com/s/52d2dmtfsd8mvlwvu9f3
>>>>> 
>>>>> 
>>>>> On Mon, Feb 24, 2014 at 3:01 PM, Milinda Pathirage <
>>>>> milinda.pathirage@gmail.com> wrote:
>>>>> 
>>>>>> Hi Suresh,
>>>>>> 
>>>>>> Collections are similar to directories and resources are similar to
>>>> files.
>>>>>> WSO2 Registry implement various different functionalities on top of
>>> this
>>>>>> abstraction. In one of our projects we use this abstraction to
>>> implement
>>>>>> persistence storage for text mining workflow. Our text mining workflow
>>>>>> starts with a workset which is a collection of books. We represent
>>> this
>>>>>> workset as a collection in WSO2 Registry under user's collection
>>> (Which
>>>> can
>>>>>> be think of as a workspace specific to user and other users can't
>>> access
>>>>>> this workspace). This workset can contain one or more resources or
>>>>>> collections. Current implementation only support single resource which
>>>> is
>>>>>> list of book identifiers. When user start a text analysis job on this
>>>>>> workset, job manager reads necessary information (currently list of
>>>> books)
>>>>>> from the workset, download necessary files from a API,  run analysis
>>>>>> algorithms on downloaded files and finally saves back the results in a
>>>>>> another registry collection. This model is pretty extensible for our
>>> use
>>>>>> case because if we want some aditional files or data in future we just
>>>> need
>>>>>> to add another resource or another collection to workset collection.
>>>> Then
>>>>>> applicaion can decide what to process or what not to process.
>>>>>> 
>>>>>> I think you also need some abstraction like that. I am not sure
>>> whether
>>>>>> collections and resources abstraction is the best for you. Level of
>>>>>> abstraction will depend on your use cases and requirements.
>>>>>> 
>>>>>> Thanks
>>>>>> Milinda
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Feb 24, 2014 at 2:00 PM, Suresh Marru <sm...@apache.org>
>>>> wrote:
>>>>>> 
>>>>>>> On Feb 24, 2014, at 11:20 AM, Milinda Pathirage <
>>>>>>> milinda.pathirage@gmail.com> wrote:
>>>>>>> 
>>>>>>>> I also think that moving to Cassandra or any other NoSQL will add
>>>>>>>> unneccessary complexity to your solution. Also designing proper
>>> (easy
>>>>>> to
>>>>>>>> manage changes, easy to query) NoSQL data models are hard (AFAIK,
>>>>>> require
>>>>>>>> lots of experience and understanding about data structures and
>>>>>> queries).
>>>>>>>> Also migrating from one NoSQL technology to other can require
>>> complete
>>>>>>>> re-write. And current relational databases can handle heavy loads
>>>>>> except
>>>>>>>> Google, Twitter, Amazon and Facebook like loads. I don't think
>>>> Airavata
>>>>>>>> will see Google and Amazon like loads.
>>>>>>>> 
>>>>>>>> If the constant changes to the data model is the problem , I think
>>>> best
>>>>>>>> option is to abstract registry implementation to something like
>>>>>>> collections
>>>>>>>> and resources used in WSO2 Registry [1] or something suitable for
>>>>>>> Airavata
>>>>>>>> context. That will make it easy to handle changes in data model.
>>>>>>> You stated it right Milinda, Airavata does not have scaling needs
>>> which
>>>>>>> will go beyond RDMS limits, but needs this abstraction.
>>>>>>> 
>>>>>>> Can any one elaborate more on collections and resources used in WSO2
>>>>>>> registry?
>>>>>>> 
>>>>>>> Suresh
>>>>>>> 
>>>>>>>> Also don't let the technologies drive design decision. Its always
>>>>>> better
>>>>>>> to
>>>>>>>> let use cases drive the design decision.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> Milinda
>>>>>>>> 
>>>>>>>> [1] http://wso2.com/products/governance-registry/
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
>>>>>> supun06@gmail.com
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> I'm not trying to discourage you on your exploration to NoSQL
>>>>>>> databases. I
>>>>>>>>> have the following concern.
>>>>>>>>> 
>>>>>>>>> Your database schema is moderately complex - even for a RDBMS it
>>>> seems
>>>>>>>>> complex and the data size is relatively small. I'm not sure about
>>> the
>>>>>>>>> current tools available but I think you will need to write more
>>> code
>>>>>> to
>>>>>>>>> support all your requirements in a NoSQL database. So writing more
>>>>>> code
>>>>>>> and
>>>>>>>>> allow redundancy to support *relatively small* and *structured
>>>>>>>>> data*doesn't seem right to me. May be I'm wrong and there are
>>> better
>>>>>>>>> tools in
>>>>>>>>> NoSQL than RDBMS, which I doubt.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Supun..
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org>
>>>>>>> wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>> 
>>>>>>>>>> Airavata is actively migrating to use Thrift API for the RESTless
>>>>>>> design
>>>>>>>>>> and to facilitate various language bindings from client gateways.
>>>> The
>>>>>>>>>> programming language support in thrift has been so far very
>>>>>>> encouraging.
>>>>>>>>>> The current architecture is looking like Figure 1 at [1].
>>>>>>>>>> 
>>>>>>>>>> Language specific clients will be released as thrift SDK's
>>> (similar
>>>>>> to
>>>>>>>>>> evernote sdk's [1]). These clients will be integrated into gateway
>>>>>>>>> portals
>>>>>>>>>> which connect to the API Server. The API operations brokers he
>>>> simple
>>>>>>>>> calls
>>>>>>>>>> into one or more backend CPI calls (Airavata internal component
>>>>>>>>>> interfaces).  An example set of mappings are illustrated in
>>> Figure 2
>>>>>> at
>>>>>>>>>> [1]. The current draft of thrift API for version 0.12 is at [3],
>>>>>> please
>>>>>>>>> pay
>>>>>>>>>> attention to experiment model at [4].
>>>>>>>>>> 
>>>>>>>>>> For the persistent store, we had few iterations of Airavata
>>> Registry
>>>>>>>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA
>>>> based
>>>>>>>>>> registry. To allow the API and the associated data models to
>>> evolve,
>>>>>> it
>>>>>>>>>> will be useful to explore object databases so we can store the
>>>>>>> serialized
>>>>>>>>>> version of thrift objects directly. But it will be nice to have
>>> all
>>>>>> (or
>>>>>>>>>> most) of the fields queriable. This calls for a more column-family
>>>>>>> design
>>>>>>>>>> of any NoSQL approaches.
>>>>>>>>>> 
>>>>>>>>>> Any recommendations for a registry architecture?
>>>>>>>>>> 
>>>>>>>>>> Quickly hacking through I find the following approach a viable
>>> one:
>>>>>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata
>>> can
>>>>>>>>> benefit
>>>>>>>>>> immediately from the replication and reliability of cassandra and
>>>>>>>>>> scalability in near future. Some of the model objects like
>>>> experiment
>>>>>>>>>> creation will need to have strong consistency and most of the
>>>>>>> monitoring
>>>>>>>>>> can live with eventual consistency.
>>>>>>>>>> 
>>>>>>>>>> Critical comments please?
>>>>>>>>>> 
>>>>>>>>>> Thanks for your time,
>>>>>>>>>> Suresh
>>>>>>>>>> 
>>>>>>>>>> [1] -
>>>>>>>>>> 
>>>>>> 
>>>> 
>>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>>>>>>>>>> [2] - https://dev.evernote.com/doc/
>>>>>>>>>> [3] -
>>>>>>>>>> 
>>>>>> 
>>>> 
>>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>>>>>>>>>> [4] -
>>>>>>>>>> 
>>>>>> 
>>>> 
>>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>>>>>>>>>> [5] - https://github.com/MisterTea/ZombieDB
>>>>>>>>>> [6] - https://github.com/Netflix/astyanax
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Supun Kamburugamuva
>>>>>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>>>>>>>> Blog: http://supunk.blogspot.com
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Milinda Pathirage
>>>>>>>> PhD Student Indiana University, Bloomington;
>>>>>>>> E-mail: milinda.pathirage@gmail.com
>>>>>>>> Web: http://mpathirage.com
>>>>>>>> Blog: http://blog.mpathirage.com
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Milinda Pathirage
>>>>>> PhD Student Indiana University, Bloomington;
>>>>>> E-mail: milinda.pathirage@gmail.com
>>>>>> Web: http://mpathirage.com
>>>>>> Blog: http://blog.mpathirage.com
>>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Best Regards,
>> Shameera Rathnayaka.
>> 
>> email: shameera AT apache.org , shameerainfo AT gmail.com
>> Blog : http://shameerarathnayaka.blogspot.com/

Re: Object Database Suggestions for Airavata Registry

Posted by K Yoshimoto <ke...@sdsc.edu>.

I happened to look through the data model.  

> > https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD

How is information on the input data and transfer method stored?
Is that freeform text in DataTransferDetails?

Also, is there a place to describe preprocessin of job input data
or a custom submit command?

Sorry for the sidetrack.

Kenneth

On Wed, Feb 26, 2014 at 02:13:46AM +0530, Shameera Rathnayaka wrote:
> Hi all,
> 
> Just thinking a loud here, sorry if i am moving this thread to another
> direction.
> 
> If we going to use our own registry implementation, do we have consider
> provide database layer where we can plug different kind of databases?(may
> be Supun also suggesting the same in his previous reply). As we are already
> separating SPIs and APIs for other components, we can do the same for DB
> implementation too. NoSql  database like cassandra also have cql driver
> which is identical to Mysql driver. So it is not difficult to implement
> plugable environment,
> 
> In wso2 registry they already have above capability but not yet implemented
> CQL as i know.
> 
> Thanks,
> Shameera.
> 
> 
> On Wed, Feb 26, 2014 at 1:36 AM, Saminda Wijeratne <sa...@gmail.com>wrote:
> 
> > Sorry I missed the arrow from Registry to Orchestrator. Thanks for pointing
> > it out Marlon. Updated the arrows and added a legend.
> >
> > Broken line arrow is involved in MessageBox component where it gets
> > triggered from time to time without external user intervention. Also
> > there's still some technical details we need to figure-out on how the
> > MessageBox will function and expose itself in the new design.
> >
> >
> > On Tue, Feb 25, 2014 at 2:36 PM, Marlon Pierce <ma...@iu.edu> wrote:
> >
> > > Please define the solid and broken line arrows.  Why doesn't the
> > > orchestrator interact with the registry?
> > >
> > >
> > > Marlon
> > >
> > > On 2/25/14 2:29 PM, Saminda Wijeratne wrote:
> > > > The diagrams @[1] will depict functional requirements (at an
> > > > abstract-level) for Airavata from CIPRES and UltraScan gateways.
> > > >
> > > > 1. https://iu.app.box.com/s/52d2dmtfsd8mvlwvu9f3
> > > >
> > > >
> > > > On Mon, Feb 24, 2014 at 3:01 PM, Milinda Pathirage <
> > > > milinda.pathirage@gmail.com> wrote:
> > > >
> > > >> Hi Suresh,
> > > >>
> > > >> Collections are similar to directories and resources are similar to
> > > files.
> > > >> WSO2 Registry implement various different functionalities on top of
> > this
> > > >> abstraction. In one of our projects we use this abstraction to
> > implement
> > > >> persistence storage for text mining workflow. Our text mining workflow
> > > >> starts with a workset which is a collection of books. We represent
> > this
> > > >> workset as a collection in WSO2 Registry under user's collection
> > (Which
> > > can
> > > >> be think of as a workspace specific to user and other users can't
> > access
> > > >> this workspace). This workset can contain one or more resources or
> > > >> collections. Current implementation only support single resource which
> > > is
> > > >> list of book identifiers. When user start a text analysis job on this
> > > >> workset, job manager reads necessary information (currently list of
> > > books)
> > > >> from the workset, download necessary files from a API,  run analysis
> > > >> algorithms on downloaded files and finally saves back the results in a
> > > >> another registry collection. This model is pretty extensible for our
> > use
> > > >> case because if we want some aditional files or data in future we just
> > > need
> > > >> to add another resource or another collection to workset collection.
> > > Then
> > > >> applicaion can decide what to process or what not to process.
> > > >>
> > > >> I think you also need some abstraction like that. I am not sure
> > whether
> > > >> collections and resources abstraction is the best for you. Level of
> > > >> abstraction will depend on your use cases and requirements.
> > > >>
> > > >> Thanks
> > > >> Milinda
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Feb 24, 2014 at 2:00 PM, Suresh Marru <sm...@apache.org>
> > > wrote:
> > > >>
> > > >>> On Feb 24, 2014, at 11:20 AM, Milinda Pathirage <
> > > >>> milinda.pathirage@gmail.com> wrote:
> > > >>>
> > > >>>> I also think that moving to Cassandra or any other NoSQL will add
> > > >>>> unneccessary complexity to your solution. Also designing proper
> > (easy
> > > >> to
> > > >>>> manage changes, easy to query) NoSQL data models are hard (AFAIK,
> > > >> require
> > > >>>> lots of experience and understanding about data structures and
> > > >> queries).
> > > >>>> Also migrating from one NoSQL technology to other can require
> > complete
> > > >>>> re-write. And current relational databases can handle heavy loads
> > > >> except
> > > >>>> Google, Twitter, Amazon and Facebook like loads. I don't think
> > > Airavata
> > > >>>> will see Google and Amazon like loads.
> > > >>>>
> > > >>>> If the constant changes to the data model is the problem , I think
> > > best
> > > >>>> option is to abstract registry implementation to something like
> > > >>> collections
> > > >>>> and resources used in WSO2 Registry [1] or something suitable for
> > > >>> Airavata
> > > >>>> context. That will make it easy to handle changes in data model.
> > > >>> You stated it right Milinda, Airavata does not have scaling needs
> > which
> > > >>> will go beyond RDMS limits, but needs this abstraction.
> > > >>>
> > > >>> Can any one elaborate more on collections and resources used in WSO2
> > > >>> registry?
> > > >>>
> > > >>> Suresh
> > > >>>
> > > >>>> Also don't let the technologies drive design decision. Its always
> > > >> better
> > > >>> to
> > > >>>> let use cases drive the design decision.
> > > >>>>
> > > >>>> Thanks
> > > >>>> Milinda
> > > >>>>
> > > >>>> [1] http://wso2.com/products/governance-registry/
> > > >>>>
> > > >>>>
> > > >>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
> > > >> supun06@gmail.com
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Hi all,
> > > >>>>>
> > > >>>>> I'm not trying to discourage you on your exploration to NoSQL
> > > >>> databases. I
> > > >>>>> have the following concern.
> > > >>>>>
> > > >>>>> Your database schema is moderately complex - even for a RDBMS it
> > > seems
> > > >>>>> complex and the data size is relatively small. I'm not sure about
> > the
> > > >>>>> current tools available but I think you will need to write more
> > code
> > > >> to
> > > >>>>> support all your requirements in a NoSQL database. So writing more
> > > >> code
> > > >>> and
> > > >>>>> allow redundancy to support *relatively small* and *structured
> > > >>>>> data*doesn't seem right to me. May be I'm wrong and there are
> > better
> > > >>>>> tools in
> > > >>>>> NoSQL than RDBMS, which I doubt.
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Supun..
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org>
> > > >>> wrote:
> > > >>>>>> Hi All,
> > > >>>>>>
> > > >>>>>> Airavata is actively migrating to use Thrift API for the RESTless
> > > >>> design
> > > >>>>>> and to facilitate various language bindings from client gateways.
> > > The
> > > >>>>>> programming language support in thrift has been so far very
> > > >>> encouraging.
> > > >>>>>> The current architecture is looking like Figure 1 at [1].
> > > >>>>>>
> > > >>>>>> Language specific clients will be released as thrift SDK's
> > (similar
> > > >> to
> > > >>>>>> evernote sdk's [1]). These clients will be integrated into gateway
> > > >>>>> portals
> > > >>>>>> which connect to the API Server. The API operations brokers he
> > > simple
> > > >>>>> calls
> > > >>>>>> into one or more backend CPI calls (Airavata internal component
> > > >>>>>> interfaces).  An example set of mappings are illustrated in
> > Figure 2
> > > >> at
> > > >>>>>> [1]. The current draft of thrift API for version 0.12 is at [3],
> > > >> please
> > > >>>>> pay
> > > >>>>>> attention to experiment model at [4].
> > > >>>>>>
> > > >>>>>> For the persistent store, we had few iterations of Airavata
> > Registry
> > > >>>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA
> > > based
> > > >>>>>> registry. To allow the API and the associated data models to
> > evolve,
> > > >> it
> > > >>>>>> will be useful to explore object databases so we can store the
> > > >>> serialized
> > > >>>>>> version of thrift objects directly. But it will be nice to have
> > all
> > > >> (or
> > > >>>>>> most) of the fields queriable. This calls for a more column-family
> > > >>> design
> > > >>>>>> of any NoSQL approaches.
> > > >>>>>>
> > > >>>>>> Any recommendations for a registry architecture?
> > > >>>>>>
> > > >>>>>> Quickly hacking through I find the following approach a viable
> > one:
> > > >>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata
> > can
> > > >>>>> benefit
> > > >>>>>> immediately from the replication and reliability of cassandra and
> > > >>>>>> scalability in near future. Some of the model objects like
> > > experiment
> > > >>>>>> creation will need to have strong consistency and most of the
> > > >>> monitoring
> > > >>>>>> can live with eventual consistency.
> > > >>>>>>
> > > >>>>>> Critical comments please?
> > > >>>>>>
> > > >>>>>> Thanks for your time,
> > > >>>>>> Suresh
> > > >>>>>>
> > > >>>>>> [1] -
> > > >>>>>>
> > > >>
> > >
> > https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> > > >>>>>> [2] - https://dev.evernote.com/doc/
> > > >>>>>> [3] -
> > > >>>>>>
> > > >>
> > >
> > https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> > > >>>>>> [4] -
> > > >>>>>>
> > > >>
> > >
> > https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> > > >>>>>> [5] - https://github.com/MisterTea/ZombieDB
> > > >>>>>> [6] - https://github.com/Netflix/astyanax
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Supun Kamburugamuva
> > > >>>>> Member, Apache Software Foundation; http://www.apache.org
> > > >>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> > > >>>>> Blog: http://supunk.blogspot.com
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Milinda Pathirage
> > > >>>> PhD Student Indiana University, Bloomington;
> > > >>>> E-mail: milinda.pathirage@gmail.com
> > > >>>> Web: http://mpathirage.com
> > > >>>> Blog: http://blog.mpathirage.com
> > > >>>
> > > >>
> > > >> --
> > > >> Milinda Pathirage
> > > >> PhD Student Indiana University, Bloomington;
> > > >> E-mail: milinda.pathirage@gmail.com
> > > >> Web: http://mpathirage.com
> > > >> Blog: http://blog.mpathirage.com
> > > >>
> > >
> > >
> >
> 
> 
> 
> -- 
> Best Regards,
> Shameera Rathnayaka.
> 
> email: shameera AT apache.org , shameerainfo AT gmail.com
> Blog : http://shameerarathnayaka.blogspot.com/

Re: Object Database Suggestions for Airavata Registry

Posted by Shameera Rathnayaka <sh...@gmail.com>.

Hi all,

Just thinking a loud here, sorry if i am moving this thread to another
direction.

If we going to use our own registry implementation, do we have consider
provide database layer where we can plug different kind of databases?(may
be Supun also suggesting the same in his previous reply). As we are already
separating SPIs and APIs for other components, we can do the same for DB
implementation too. NoSql  database like cassandra also have cql driver
which is identical to Mysql driver. So it is not difficult to implement
plugable environment,

In wso2 registry they already have above capability but not yet implemented
CQL as i know.

Thanks,
Shameera.


On Wed, Feb 26, 2014 at 1:36 AM, Saminda Wijeratne <sa...@gmail.com>wrote:

> Sorry I missed the arrow from Registry to Orchestrator. Thanks for pointing
> it out Marlon. Updated the arrows and added a legend.
>
> Broken line arrow is involved in MessageBox component where it gets
> triggered from time to time without external user intervention. Also
> there's still some technical details we need to figure-out on how the
> MessageBox will function and expose itself in the new design.
>
>
> On Tue, Feb 25, 2014 at 2:36 PM, Marlon Pierce <ma...@iu.edu> wrote:
>
> > Please define the solid and broken line arrows.  Why doesn't the
> > orchestrator interact with the registry?
> >
> >
> > Marlon
> >
> > On 2/25/14 2:29 PM, Saminda Wijeratne wrote:
> > > The diagrams @[1] will depict functional requirements (at an
> > > abstract-level) for Airavata from CIPRES and UltraScan gateways.
> > >
> > > 1. https://iu.app.box.com/s/52d2dmtfsd8mvlwvu9f3
> > >
> > >
> > > On Mon, Feb 24, 2014 at 3:01 PM, Milinda Pathirage <
> > > milinda.pathirage@gmail.com> wrote:
> > >
> > >> Hi Suresh,
> > >>
> > >> Collections are similar to directories and resources are similar to
> > files.
> > >> WSO2 Registry implement various different functionalities on top of
> this
> > >> abstraction. In one of our projects we use this abstraction to
> implement
> > >> persistence storage for text mining workflow. Our text mining workflow
> > >> starts with a workset which is a collection of books. We represent
> this
> > >> workset as a collection in WSO2 Registry under user's collection
> (Which
> > can
> > >> be think of as a workspace specific to user and other users can't
> access
> > >> this workspace). This workset can contain one or more resources or
> > >> collections. Current implementation only support single resource which
> > is
> > >> list of book identifiers. When user start a text analysis job on this
> > >> workset, job manager reads necessary information (currently list of
> > books)
> > >> from the workset, download necessary files from a API,  run analysis
> > >> algorithms on downloaded files and finally saves back the results in a
> > >> another registry collection. This model is pretty extensible for our
> use
> > >> case because if we want some aditional files or data in future we just
> > need
> > >> to add another resource or another collection to workset collection.
> > Then
> > >> applicaion can decide what to process or what not to process.
> > >>
> > >> I think you also need some abstraction like that. I am not sure
> whether
> > >> collections and resources abstraction is the best for you. Level of
> > >> abstraction will depend on your use cases and requirements.
> > >>
> > >> Thanks
> > >> Milinda
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, Feb 24, 2014 at 2:00 PM, Suresh Marru <sm...@apache.org>
> > wrote:
> > >>
> > >>> On Feb 24, 2014, at 11:20 AM, Milinda Pathirage <
> > >>> milinda.pathirage@gmail.com> wrote:
> > >>>
> > >>>> I also think that moving to Cassandra or any other NoSQL will add
> > >>>> unneccessary complexity to your solution. Also designing proper
> (easy
> > >> to
> > >>>> manage changes, easy to query) NoSQL data models are hard (AFAIK,
> > >> require
> > >>>> lots of experience and understanding about data structures and
> > >> queries).
> > >>>> Also migrating from one NoSQL technology to other can require
> complete
> > >>>> re-write. And current relational databases can handle heavy loads
> > >> except
> > >>>> Google, Twitter, Amazon and Facebook like loads. I don't think
> > Airavata
> > >>>> will see Google and Amazon like loads.
> > >>>>
> > >>>> If the constant changes to the data model is the problem , I think
> > best
> > >>>> option is to abstract registry implementation to something like
> > >>> collections
> > >>>> and resources used in WSO2 Registry [1] or something suitable for
> > >>> Airavata
> > >>>> context. That will make it easy to handle changes in data model.
> > >>> You stated it right Milinda, Airavata does not have scaling needs
> which
> > >>> will go beyond RDMS limits, but needs this abstraction.
> > >>>
> > >>> Can any one elaborate more on collections and resources used in WSO2
> > >>> registry?
> > >>>
> > >>> Suresh
> > >>>
> > >>>> Also don't let the technologies drive design decision. Its always
> > >> better
> > >>> to
> > >>>> let use cases drive the design decision.
> > >>>>
> > >>>> Thanks
> > >>>> Milinda
> > >>>>
> > >>>> [1] http://wso2.com/products/governance-registry/
> > >>>>
> > >>>>
> > >>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
> > >> supun06@gmail.com
> > >>>> wrote:
> > >>>>
> > >>>>> Hi all,
> > >>>>>
> > >>>>> I'm not trying to discourage you on your exploration to NoSQL
> > >>> databases. I
> > >>>>> have the following concern.
> > >>>>>
> > >>>>> Your database schema is moderately complex - even for a RDBMS it
> > seems
> > >>>>> complex and the data size is relatively small. I'm not sure about
> the
> > >>>>> current tools available but I think you will need to write more
> code
> > >> to
> > >>>>> support all your requirements in a NoSQL database. So writing more
> > >> code
> > >>> and
> > >>>>> allow redundancy to support *relatively small* and *structured
> > >>>>> data*doesn't seem right to me. May be I'm wrong and there are
> better
> > >>>>> tools in
> > >>>>> NoSQL than RDBMS, which I doubt.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Supun..
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org>
> > >>> wrote:
> > >>>>>> Hi All,
> > >>>>>>
> > >>>>>> Airavata is actively migrating to use Thrift API for the RESTless
> > >>> design
> > >>>>>> and to facilitate various language bindings from client gateways.
> > The
> > >>>>>> programming language support in thrift has been so far very
> > >>> encouraging.
> > >>>>>> The current architecture is looking like Figure 1 at [1].
> > >>>>>>
> > >>>>>> Language specific clients will be released as thrift SDK's
> (similar
> > >> to
> > >>>>>> evernote sdk's [1]). These clients will be integrated into gateway
> > >>>>> portals
> > >>>>>> which connect to the API Server. The API operations brokers he
> > simple
> > >>>>> calls
> > >>>>>> into one or more backend CPI calls (Airavata internal component
> > >>>>>> interfaces).  An example set of mappings are illustrated in
> Figure 2
> > >> at
> > >>>>>> [1]. The current draft of thrift API for version 0.12 is at [3],
> > >> please
> > >>>>> pay
> > >>>>>> attention to experiment model at [4].
> > >>>>>>
> > >>>>>> For the persistent store, we had few iterations of Airavata
> Registry
> > >>>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA
> > based
> > >>>>>> registry. To allow the API and the associated data models to
> evolve,
> > >> it
> > >>>>>> will be useful to explore object databases so we can store the
> > >>> serialized
> > >>>>>> version of thrift objects directly. But it will be nice to have
> all
> > >> (or
> > >>>>>> most) of the fields queriable. This calls for a more column-family
> > >>> design
> > >>>>>> of any NoSQL approaches.
> > >>>>>>
> > >>>>>> Any recommendations for a registry architecture?
> > >>>>>>
> > >>>>>> Quickly hacking through I find the following approach a viable
> one:
> > >>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata
> can
> > >>>>> benefit
> > >>>>>> immediately from the replication and reliability of cassandra and
> > >>>>>> scalability in near future. Some of the model objects like
> > experiment
> > >>>>>> creation will need to have strong consistency and most of the
> > >>> monitoring
> > >>>>>> can live with eventual consistency.
> > >>>>>>
> > >>>>>> Critical comments please?
> > >>>>>>
> > >>>>>> Thanks for your time,
> > >>>>>> Suresh
> > >>>>>>
> > >>>>>> [1] -
> > >>>>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> > >>>>>> [2] - https://dev.evernote.com/doc/
> > >>>>>> [3] -
> > >>>>>>
> > >>
> >
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> > >>>>>> [4] -
> > >>>>>>
> > >>
> >
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> > >>>>>> [5] - https://github.com/MisterTea/ZombieDB
> > >>>>>> [6] - https://github.com/Netflix/astyanax
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Supun Kamburugamuva
> > >>>>> Member, Apache Software Foundation; http://www.apache.org
> > >>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> > >>>>> Blog: http://supunk.blogspot.com
> > >>>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Milinda Pathirage
> > >>>> PhD Student Indiana University, Bloomington;
> > >>>> E-mail: milinda.pathirage@gmail.com
> > >>>> Web: http://mpathirage.com
> > >>>> Blog: http://blog.mpathirage.com
> > >>>
> > >>
> > >> --
> > >> Milinda Pathirage
> > >> PhD Student Indiana University, Bloomington;
> > >> E-mail: milinda.pathirage@gmail.com
> > >> Web: http://mpathirage.com
> > >> Blog: http://blog.mpathirage.com
> > >>
> >
> >
>



-- 
Best Regards,
Shameera Rathnayaka.

email: shameera AT apache.org , shameerainfo AT gmail.com
Blog : http://shameerarathnayaka.blogspot.com/

Re: Object Database Suggestions for Airavata Registry

Posted by Saminda Wijeratne <sa...@gmail.com>.

Sorry I missed the arrow from Registry to Orchestrator. Thanks for pointing
it out Marlon. Updated the arrows and added a legend.

Broken line arrow is involved in MessageBox component where it gets
triggered from time to time without external user intervention. Also
there's still some technical details we need to figure-out on how the
MessageBox will function and expose itself in the new design.


On Tue, Feb 25, 2014 at 2:36 PM, Marlon Pierce <ma...@iu.edu> wrote:

> Please define the solid and broken line arrows.  Why doesn't the
> orchestrator interact with the registry?
>
>
> Marlon
>
> On 2/25/14 2:29 PM, Saminda Wijeratne wrote:
> > The diagrams @[1] will depict functional requirements (at an
> > abstract-level) for Airavata from CIPRES and UltraScan gateways.
> >
> > 1. https://iu.app.box.com/s/52d2dmtfsd8mvlwvu9f3
> >
> >
> > On Mon, Feb 24, 2014 at 3:01 PM, Milinda Pathirage <
> > milinda.pathirage@gmail.com> wrote:
> >
> >> Hi Suresh,
> >>
> >> Collections are similar to directories and resources are similar to
> files.
> >> WSO2 Registry implement various different functionalities on top of this
> >> abstraction. In one of our projects we use this abstraction to implement
> >> persistence storage for text mining workflow. Our text mining workflow
> >> starts with a workset which is a collection of books. We represent this
> >> workset as a collection in WSO2 Registry under user's collection (Which
> can
> >> be think of as a workspace specific to user and other users can't access
> >> this workspace). This workset can contain one or more resources or
> >> collections. Current implementation only support single resource which
> is
> >> list of book identifiers. When user start a text analysis job on this
> >> workset, job manager reads necessary information (currently list of
> books)
> >> from the workset, download necessary files from a API,  run analysis
> >> algorithms on downloaded files and finally saves back the results in a
> >> another registry collection. This model is pretty extensible for our use
> >> case because if we want some aditional files or data in future we just
> need
> >> to add another resource or another collection to workset collection.
> Then
> >> applicaion can decide what to process or what not to process.
> >>
> >> I think you also need some abstraction like that. I am not sure whether
> >> collections and resources abstraction is the best for you. Level of
> >> abstraction will depend on your use cases and requirements.
> >>
> >> Thanks
> >> Milinda
> >>
> >>
> >>
> >>
> >> On Mon, Feb 24, 2014 at 2:00 PM, Suresh Marru <sm...@apache.org>
> wrote:
> >>
> >>> On Feb 24, 2014, at 11:20 AM, Milinda Pathirage <
> >>> milinda.pathirage@gmail.com> wrote:
> >>>
> >>>> I also think that moving to Cassandra or any other NoSQL will add
> >>>> unneccessary complexity to your solution. Also designing proper (easy
> >> to
> >>>> manage changes, easy to query) NoSQL data models are hard (AFAIK,
> >> require
> >>>> lots of experience and understanding about data structures and
> >> queries).
> >>>> Also migrating from one NoSQL technology to other can require complete
> >>>> re-write. And current relational databases can handle heavy loads
> >> except
> >>>> Google, Twitter, Amazon and Facebook like loads. I don't think
> Airavata
> >>>> will see Google and Amazon like loads.
> >>>>
> >>>> If the constant changes to the data model is the problem , I think
> best
> >>>> option is to abstract registry implementation to something like
> >>> collections
> >>>> and resources used in WSO2 Registry [1] or something suitable for
> >>> Airavata
> >>>> context. That will make it easy to handle changes in data model.
> >>> You stated it right Milinda, Airavata does not have scaling needs which
> >>> will go beyond RDMS limits, but needs this abstraction.
> >>>
> >>> Can any one elaborate more on collections and resources used in WSO2
> >>> registry?
> >>>
> >>> Suresh
> >>>
> >>>> Also don't let the technologies drive design decision. Its always
> >> better
> >>> to
> >>>> let use cases drive the design decision.
> >>>>
> >>>> Thanks
> >>>> Milinda
> >>>>
> >>>> [1] http://wso2.com/products/governance-registry/
> >>>>
> >>>>
> >>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
> >> supun06@gmail.com
> >>>> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I'm not trying to discourage you on your exploration to NoSQL
> >>> databases. I
> >>>>> have the following concern.
> >>>>>
> >>>>> Your database schema is moderately complex - even for a RDBMS it
> seems
> >>>>> complex and the data size is relatively small. I'm not sure about the
> >>>>> current tools available but I think you will need to write more code
> >> to
> >>>>> support all your requirements in a NoSQL database. So writing more
> >> code
> >>> and
> >>>>> allow redundancy to support *relatively small* and *structured
> >>>>> data*doesn't seem right to me. May be I'm wrong and there are better
> >>>>> tools in
> >>>>> NoSQL than RDBMS, which I doubt.
> >>>>>
> >>>>> Thanks,
> >>>>> Supun..
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org>
> >>> wrote:
> >>>>>> Hi All,
> >>>>>>
> >>>>>> Airavata is actively migrating to use Thrift API for the RESTless
> >>> design
> >>>>>> and to facilitate various language bindings from client gateways.
> The
> >>>>>> programming language support in thrift has been so far very
> >>> encouraging.
> >>>>>> The current architecture is looking like Figure 1 at [1].
> >>>>>>
> >>>>>> Language specific clients will be released as thrift SDK's (similar
> >> to
> >>>>>> evernote sdk's [1]). These clients will be integrated into gateway
> >>>>> portals
> >>>>>> which connect to the API Server. The API operations brokers he
> simple
> >>>>> calls
> >>>>>> into one or more backend CPI calls (Airavata internal component
> >>>>>> interfaces).  An example set of mappings are illustrated in Figure 2
> >> at
> >>>>>> [1]. The current draft of thrift API for version 0.12 is at [3],
> >> please
> >>>>> pay
> >>>>>> attention to experiment model at [4].
> >>>>>>
> >>>>>> For the persistent store, we had few iterations of Airavata Registry
> >>>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA
> based
> >>>>>> registry. To allow the API and the associated data models to evolve,
> >> it
> >>>>>> will be useful to explore object databases so we can store the
> >>> serialized
> >>>>>> version of thrift objects directly. But it will be nice to have all
> >> (or
> >>>>>> most) of the fields queriable. This calls for a more column-family
> >>> design
> >>>>>> of any NoSQL approaches.
> >>>>>>
> >>>>>> Any recommendations for a registry architecture?
> >>>>>>
> >>>>>> Quickly hacking through I find the following approach a viable one:
> >>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
> >>>>> benefit
> >>>>>> immediately from the replication and reliability of cassandra and
> >>>>>> scalability in near future. Some of the model objects like
> experiment
> >>>>>> creation will need to have strong consistency and most of the
> >>> monitoring
> >>>>>> can live with eventual consistency.
> >>>>>>
> >>>>>> Critical comments please?
> >>>>>>
> >>>>>> Thanks for your time,
> >>>>>> Suresh
> >>>>>>
> >>>>>> [1] -
> >>>>>>
> >>
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> >>>>>> [2] - https://dev.evernote.com/doc/
> >>>>>> [3] -
> >>>>>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> >>>>>> [4] -
> >>>>>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> >>>>>> [5] - https://github.com/MisterTea/ZombieDB
> >>>>>> [6] - https://github.com/Netflix/astyanax
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Supun Kamburugamuva
> >>>>> Member, Apache Software Foundation; http://www.apache.org
> >>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> >>>>> Blog: http://supunk.blogspot.com
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Milinda Pathirage
> >>>> PhD Student Indiana University, Bloomington;
> >>>> E-mail: milinda.pathirage@gmail.com
> >>>> Web: http://mpathirage.com
> >>>> Blog: http://blog.mpathirage.com
> >>>
> >>
> >> --
> >> Milinda Pathirage
> >> PhD Student Indiana University, Bloomington;
> >> E-mail: milinda.pathirage@gmail.com
> >> Web: http://mpathirage.com
> >> Blog: http://blog.mpathirage.com
> >>
>
>

Re: Object Database Suggestions for Airavata Registry

Posted by Marlon Pierce <ma...@iu.edu>.

Please define the solid and broken line arrows.  Why doesn't the
orchestrator interact with the registry?


Marlon

On 2/25/14 2:29 PM, Saminda Wijeratne wrote:
> The diagrams @[1] will depict functional requirements (at an
> abstract-level) for Airavata from CIPRES and UltraScan gateways.
>
> 1. https://iu.app.box.com/s/52d2dmtfsd8mvlwvu9f3
>
>
> On Mon, Feb 24, 2014 at 3:01 PM, Milinda Pathirage <
> milinda.pathirage@gmail.com> wrote:
>
>> Hi Suresh,
>>
>> Collections are similar to directories and resources are similar to files.
>> WSO2 Registry implement various different functionalities on top of this
>> abstraction. In one of our projects we use this abstraction to implement
>> persistence storage for text mining workflow. Our text mining workflow
>> starts with a workset which is a collection of books. We represent this
>> workset as a collection in WSO2 Registry under user's collection (Which can
>> be think of as a workspace specific to user and other users can't access
>> this workspace). This workset can contain one or more resources or
>> collections. Current implementation only support single resource which is
>> list of book identifiers. When user start a text analysis job on this
>> workset, job manager reads necessary information (currently list of books)
>> from the workset, download necessary files from a API,  run analysis
>> algorithms on downloaded files and finally saves back the results in a
>> another registry collection. This model is pretty extensible for our use
>> case because if we want some aditional files or data in future we just need
>> to add another resource or another collection to workset collection. Then
>> applicaion can decide what to process or what not to process.
>>
>> I think you also need some abstraction like that. I am not sure whether
>> collections and resources abstraction is the best for you. Level of
>> abstraction will depend on your use cases and requirements.
>>
>> Thanks
>> Milinda
>>
>>
>>
>>
>> On Mon, Feb 24, 2014 at 2:00 PM, Suresh Marru <sm...@apache.org> wrote:
>>
>>> On Feb 24, 2014, at 11:20 AM, Milinda Pathirage <
>>> milinda.pathirage@gmail.com> wrote:
>>>
>>>> I also think that moving to Cassandra or any other NoSQL will add
>>>> unneccessary complexity to your solution. Also designing proper (easy
>> to
>>>> manage changes, easy to query) NoSQL data models are hard (AFAIK,
>> require
>>>> lots of experience and understanding about data structures and
>> queries).
>>>> Also migrating from one NoSQL technology to other can require complete
>>>> re-write. And current relational databases can handle heavy loads
>> except
>>>> Google, Twitter, Amazon and Facebook like loads. I don't think Airavata
>>>> will see Google and Amazon like loads.
>>>>
>>>> If the constant changes to the data model is the problem , I think best
>>>> option is to abstract registry implementation to something like
>>> collections
>>>> and resources used in WSO2 Registry [1] or something suitable for
>>> Airavata
>>>> context. That will make it easy to handle changes in data model.
>>> You stated it right Milinda, Airavata does not have scaling needs which
>>> will go beyond RDMS limits, but needs this abstraction.
>>>
>>> Can any one elaborate more on collections and resources used in WSO2
>>> registry?
>>>
>>> Suresh
>>>
>>>> Also don't let the technologies drive design decision. Its always
>> better
>>> to
>>>> let use cases drive the design decision.
>>>>
>>>> Thanks
>>>> Milinda
>>>>
>>>> [1] http://wso2.com/products/governance-registry/
>>>>
>>>>
>>>> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
>> supun06@gmail.com
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'm not trying to discourage you on your exploration to NoSQL
>>> databases. I
>>>>> have the following concern.
>>>>>
>>>>> Your database schema is moderately complex - even for a RDBMS it seems
>>>>> complex and the data size is relatively small. I'm not sure about the
>>>>> current tools available but I think you will need to write more code
>> to
>>>>> support all your requirements in a NoSQL database. So writing more
>> code
>>> and
>>>>> allow redundancy to support *relatively small* and *structured
>>>>> data*doesn't seem right to me. May be I'm wrong and there are better
>>>>> tools in
>>>>> NoSQL than RDBMS, which I doubt.
>>>>>
>>>>> Thanks,
>>>>> Supun..
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org>
>>> wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> Airavata is actively migrating to use Thrift API for the RESTless
>>> design
>>>>>> and to facilitate various language bindings from client gateways. The
>>>>>> programming language support in thrift has been so far very
>>> encouraging.
>>>>>> The current architecture is looking like Figure 1 at [1].
>>>>>>
>>>>>> Language specific clients will be released as thrift SDK's (similar
>> to
>>>>>> evernote sdk's [1]). These clients will be integrated into gateway
>>>>> portals
>>>>>> which connect to the API Server. The API operations brokers he simple
>>>>> calls
>>>>>> into one or more backend CPI calls (Airavata internal component
>>>>>> interfaces).  An example set of mappings are illustrated in Figure 2
>> at
>>>>>> [1]. The current draft of thrift API for version 0.12 is at [3],
>> please
>>>>> pay
>>>>>> attention to experiment model at [4].
>>>>>>
>>>>>> For the persistent store, we had few iterations of Airavata Registry
>>>>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
>>>>>> registry. To allow the API and the associated data models to evolve,
>> it
>>>>>> will be useful to explore object databases so we can store the
>>> serialized
>>>>>> version of thrift objects directly. But it will be nice to have all
>> (or
>>>>>> most) of the fields queriable. This calls for a more column-family
>>> design
>>>>>> of any NoSQL approaches.
>>>>>>
>>>>>> Any recommendations for a registry architecture?
>>>>>>
>>>>>> Quickly hacking through I find the following approach a viable one:
>>>>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
>>>>> benefit
>>>>>> immediately from the replication and reliability of cassandra and
>>>>>> scalability in near future. Some of the model objects like experiment
>>>>>> creation will need to have strong consistency and most of the
>>> monitoring
>>>>>> can live with eventual consistency.
>>>>>>
>>>>>> Critical comments please?
>>>>>>
>>>>>> Thanks for your time,
>>>>>> Suresh
>>>>>>
>>>>>> [1] -
>>>>>>
>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>>>>>> [2] - https://dev.evernote.com/doc/
>>>>>> [3] -
>>>>>>
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>>>>>> [4] -
>>>>>>
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>>>>>> [5] - https://github.com/MisterTea/ZombieDB
>>>>>> [6] - https://github.com/Netflix/astyanax
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Supun Kamburugamuva
>>>>> Member, Apache Software Foundation; http://www.apache.org
>>>>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>>>>> Blog: http://supunk.blogspot.com
>>>>>
>>>>
>>>>
>>>> --
>>>> Milinda Pathirage
>>>> PhD Student Indiana University, Bloomington;
>>>> E-mail: milinda.pathirage@gmail.com
>>>> Web: http://mpathirage.com
>>>> Blog: http://blog.mpathirage.com
>>>
>>
>> --
>> Milinda Pathirage
>> PhD Student Indiana University, Bloomington;
>> E-mail: milinda.pathirage@gmail.com
>> Web: http://mpathirage.com
>> Blog: http://blog.mpathirage.com
>>

Re: Object Database Suggestions for Airavata Registry

Posted by Saminda Wijeratne <sa...@gmail.com>.

The diagrams @[1] will depict functional requirements (at an
abstract-level) for Airavata from CIPRES and UltraScan gateways.

1. https://iu.app.box.com/s/52d2dmtfsd8mvlwvu9f3


On Mon, Feb 24, 2014 at 3:01 PM, Milinda Pathirage <
milinda.pathirage@gmail.com> wrote:

> Hi Suresh,
>
> Collections are similar to directories and resources are similar to files.
> WSO2 Registry implement various different functionalities on top of this
> abstraction. In one of our projects we use this abstraction to implement
> persistence storage for text mining workflow. Our text mining workflow
> starts with a workset which is a collection of books. We represent this
> workset as a collection in WSO2 Registry under user's collection (Which can
> be think of as a workspace specific to user and other users can't access
> this workspace). This workset can contain one or more resources or
> collections. Current implementation only support single resource which is
> list of book identifiers. When user start a text analysis job on this
> workset, job manager reads necessary information (currently list of books)
> from the workset, download necessary files from a API,  run analysis
> algorithms on downloaded files and finally saves back the results in a
> another registry collection. This model is pretty extensible for our use
> case because if we want some aditional files or data in future we just need
> to add another resource or another collection to workset collection. Then
> applicaion can decide what to process or what not to process.
>
> I think you also need some abstraction like that. I am not sure whether
> collections and resources abstraction is the best for you. Level of
> abstraction will depend on your use cases and requirements.
>
> Thanks
> Milinda
>
>
>
>
> On Mon, Feb 24, 2014 at 2:00 PM, Suresh Marru <sm...@apache.org> wrote:
>
> > On Feb 24, 2014, at 11:20 AM, Milinda Pathirage <
> > milinda.pathirage@gmail.com> wrote:
> >
> > > I also think that moving to Cassandra or any other NoSQL will add
> > > unneccessary complexity to your solution. Also designing proper (easy
> to
> > > manage changes, easy to query) NoSQL data models are hard (AFAIK,
> require
> > > lots of experience and understanding about data structures and
> queries).
> > > Also migrating from one NoSQL technology to other can require complete
> > > re-write. And current relational databases can handle heavy loads
> except
> > > Google, Twitter, Amazon and Facebook like loads. I don't think Airavata
> > > will see Google and Amazon like loads.
> > >
> > > If the constant changes to the data model is the problem , I think best
> > > option is to abstract registry implementation to something like
> > collections
> > > and resources used in WSO2 Registry [1] or something suitable for
> > Airavata
> > > context. That will make it easy to handle changes in data model.
> >
> > You stated it right Milinda, Airavata does not have scaling needs which
> > will go beyond RDMS limits, but needs this abstraction.
> >
> > Can any one elaborate more on collections and resources used in WSO2
> > registry?
> >
> > Suresh
> >
> > >
> > > Also don't let the technologies drive design decision. Its always
> better
> > to
> > > let use cases drive the design decision.
> > >
> > > Thanks
> > > Milinda
> > >
> > > [1] http://wso2.com/products/governance-registry/
> > >
> > >
> > > On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <
> supun06@gmail.com
> > >wrote:
> > >
> > >> Hi all,
> > >>
> > >> I'm not trying to discourage you on your exploration to NoSQL
> > databases. I
> > >> have the following concern.
> > >>
> > >> Your database schema is moderately complex - even for a RDBMS it seems
> > >> complex and the data size is relatively small. I'm not sure about the
> > >> current tools available but I think you will need to write more code
> to
> > >> support all your requirements in a NoSQL database. So writing more
> code
> > and
> > >> allow redundancy to support *relatively small* and *structured
> > >> data*doesn't seem right to me. May be I'm wrong and there are better
> > >> tools in
> > >> NoSQL than RDBMS, which I doubt.
> > >>
> > >> Thanks,
> > >> Supun..
> > >>
> > >>
> > >>
> > >> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org>
> > wrote:
> > >>
> > >>> Hi All,
> > >>>
> > >>> Airavata is actively migrating to use Thrift API for the RESTless
> > design
> > >>> and to facilitate various language bindings from client gateways. The
> > >>> programming language support in thrift has been so far very
> > encouraging.
> > >>> The current architecture is looking like Figure 1 at [1].
> > >>>
> > >>> Language specific clients will be released as thrift SDK's (similar
> to
> > >>> evernote sdk's [1]). These clients will be integrated into gateway
> > >> portals
> > >>> which connect to the API Server. The API operations brokers he simple
> > >> calls
> > >>> into one or more backend CPI calls (Airavata internal component
> > >>> interfaces).  An example set of mappings are illustrated in Figure 2
> at
> > >>> [1]. The current draft of thrift API for version 0.12 is at [3],
> please
> > >> pay
> > >>> attention to experiment model at [4].
> > >>>
> > >>> For the persistent store, we had few iterations of Airavata Registry
> > >>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
> > >>> registry. To allow the API and the associated data models to evolve,
> it
> > >>> will be useful to explore object databases so we can store the
> > serialized
> > >>> version of thrift objects directly. But it will be nice to have all
> (or
> > >>> most) of the fields queriable. This calls for a more column-family
> > design
> > >>> of any NoSQL approaches.
> > >>>
> > >>> Any recommendations for a registry architecture?
> > >>>
> > >>> Quickly hacking through I find the following approach a viable one:
> > >>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
> > >> benefit
> > >>> immediately from the replication and reliability of cassandra and
> > >>> scalability in near future. Some of the model objects like experiment
> > >>> creation will need to have strong consistency and most of the
> > monitoring
> > >>> can live with eventual consistency.
> > >>>
> > >>> Critical comments please?
> > >>>
> > >>> Thanks for your time,
> > >>> Suresh
> > >>>
> > >>> [1] -
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> > >>> [2] - https://dev.evernote.com/doc/
> > >>> [3] -
> > >>>
> > >>
> >
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> > >>> [4] -
> > >>>
> > >>
> >
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> > >>> [5] - https://github.com/MisterTea/ZombieDB
> > >>> [6] - https://github.com/Netflix/astyanax
> > >>>
> > >>>
> > >>
> > >>
> > >> --
> > >> Supun Kamburugamuva
> > >> Member, Apache Software Foundation; http://www.apache.org
> > >> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> > >> Blog: http://supunk.blogspot.com
> > >>
> > >
> > >
> > >
> > > --
> > > Milinda Pathirage
> > > PhD Student Indiana University, Bloomington;
> > > E-mail: milinda.pathirage@gmail.com
> > > Web: http://mpathirage.com
> > > Blog: http://blog.mpathirage.com
> >
> >
>
>
> --
> Milinda Pathirage
> PhD Student Indiana University, Bloomington;
> E-mail: milinda.pathirage@gmail.com
> Web: http://mpathirage.com
> Blog: http://blog.mpathirage.com
>

Re: Object Database Suggestions for Airavata Registry

Posted by Milinda Pathirage <mi...@gmail.com>.

Hi Suresh,

Collections are similar to directories and resources are similar to files.
WSO2 Registry implement various different functionalities on top of this
abstraction. In one of our projects we use this abstraction to implement
persistence storage for text mining workflow. Our text mining workflow
starts with a workset which is a collection of books. We represent this
workset as a collection in WSO2 Registry under user's collection (Which can
be think of as a workspace specific to user and other users can't access
this workspace). This workset can contain one or more resources or
collections. Current implementation only support single resource which is
list of book identifiers. When user start a text analysis job on this
workset, job manager reads necessary information (currently list of books)
from the workset, download necessary files from a API,  run analysis
algorithms on downloaded files and finally saves back the results in a
another registry collection. This model is pretty extensible for our use
case because if we want some aditional files or data in future we just need
to add another resource or another collection to workset collection. Then
applicaion can decide what to process or what not to process.

I think you also need some abstraction like that. I am not sure whether
collections and resources abstraction is the best for you. Level of
abstraction will depend on your use cases and requirements.

Thanks
Milinda




On Mon, Feb 24, 2014 at 2:00 PM, Suresh Marru <sm...@apache.org> wrote:

> On Feb 24, 2014, at 11:20 AM, Milinda Pathirage <
> milinda.pathirage@gmail.com> wrote:
>
> > I also think that moving to Cassandra or any other NoSQL will add
> > unneccessary complexity to your solution. Also designing proper (easy to
> > manage changes, easy to query) NoSQL data models are hard (AFAIK, require
> > lots of experience and understanding about data structures and queries).
> > Also migrating from one NoSQL technology to other can require complete
> > re-write. And current relational databases can handle heavy loads except
> > Google, Twitter, Amazon and Facebook like loads. I don't think Airavata
> > will see Google and Amazon like loads.
> >
> > If the constant changes to the data model is the problem , I think best
> > option is to abstract registry implementation to something like
> collections
> > and resources used in WSO2 Registry [1] or something suitable for
> Airavata
> > context. That will make it easy to handle changes in data model.
>
> You stated it right Milinda, Airavata does not have scaling needs which
> will go beyond RDMS limits, but needs this abstraction.
>
> Can any one elaborate more on collections and resources used in WSO2
> registry?
>
> Suresh
>
> >
> > Also don't let the technologies drive design decision. Its always better
> to
> > let use cases drive the design decision.
> >
> > Thanks
> > Milinda
> >
> > [1] http://wso2.com/products/governance-registry/
> >
> >
> > On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <supun06@gmail.com
> >wrote:
> >
> >> Hi all,
> >>
> >> I'm not trying to discourage you on your exploration to NoSQL
> databases. I
> >> have the following concern.
> >>
> >> Your database schema is moderately complex - even for a RDBMS it seems
> >> complex and the data size is relatively small. I'm not sure about the
> >> current tools available but I think you will need to write more code to
> >> support all your requirements in a NoSQL database. So writing more code
> and
> >> allow redundancy to support *relatively small* and *structured
> >> data*doesn't seem right to me. May be I'm wrong and there are better
> >> tools in
> >> NoSQL than RDBMS, which I doubt.
> >>
> >> Thanks,
> >> Supun..
> >>
> >>
> >>
> >> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org>
> wrote:
> >>
> >>> Hi All,
> >>>
> >>> Airavata is actively migrating to use Thrift API for the RESTless
> design
> >>> and to facilitate various language bindings from client gateways. The
> >>> programming language support in thrift has been so far very
> encouraging.
> >>> The current architecture is looking like Figure 1 at [1].
> >>>
> >>> Language specific clients will be released as thrift SDK's (similar to
> >>> evernote sdk's [1]). These clients will be integrated into gateway
> >> portals
> >>> which connect to the API Server. The API operations brokers he simple
> >> calls
> >>> into one or more backend CPI calls (Airavata internal component
> >>> interfaces).  An example set of mappings are illustrated in Figure 2 at
> >>> [1]. The current draft of thrift API for version 0.12 is at [3], please
> >> pay
> >>> attention to experiment model at [4].
> >>>
> >>> For the persistent store, we had few iterations of Airavata Registry
> >>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
> >>> registry. To allow the API and the associated data models to evolve, it
> >>> will be useful to explore object databases so we can store the
> serialized
> >>> version of thrift objects directly. But it will be nice to have all (or
> >>> most) of the fields queriable. This calls for a more column-family
> design
> >>> of any NoSQL approaches.
> >>>
> >>> Any recommendations for a registry architecture?
> >>>
> >>> Quickly hacking through I find the following approach a viable one:
> >>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
> >> benefit
> >>> immediately from the replication and reliability of cassandra and
> >>> scalability in near future. Some of the model objects like experiment
> >>> creation will need to have strong consistency and most of the
> monitoring
> >>> can live with eventual consistency.
> >>>
> >>> Critical comments please?
> >>>
> >>> Thanks for your time,
> >>> Suresh
> >>>
> >>> [1] -
> >>>
> >>
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> >>> [2] - https://dev.evernote.com/doc/
> >>> [3] -
> >>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> >>> [4] -
> >>>
> >>
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> >>> [5] - https://github.com/MisterTea/ZombieDB
> >>> [6] - https://github.com/Netflix/astyanax
> >>>
> >>>
> >>
> >>
> >> --
> >> Supun Kamburugamuva
> >> Member, Apache Software Foundation; http://www.apache.org
> >> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> >> Blog: http://supunk.blogspot.com
> >>
> >
> >
> >
> > --
> > Milinda Pathirage
> > PhD Student Indiana University, Bloomington;
> > E-mail: milinda.pathirage@gmail.com
> > Web: http://mpathirage.com
> > Blog: http://blog.mpathirage.com
>
>


-- 
Milinda Pathirage
PhD Student Indiana University, Bloomington;
E-mail: milinda.pathirage@gmail.com
Web: http://mpathirage.com
Blog: http://blog.mpathirage.com

Re: Object Database Suggestions for Airavata Registry

Posted by Suresh Marru <sm...@apache.org>.

On Feb 24, 2014, at 11:20 AM, Milinda Pathirage <mi...@gmail.com> wrote:

> I also think that moving to Cassandra or any other NoSQL will add
> unneccessary complexity to your solution. Also designing proper (easy to
> manage changes, easy to query) NoSQL data models are hard (AFAIK, require
> lots of experience and understanding about data structures and queries).
> Also migrating from one NoSQL technology to other can require complete
> re-write. And current relational databases can handle heavy loads except
> Google, Twitter, Amazon and Facebook like loads. I don't think Airavata
> will see Google and Amazon like loads.
> 
> If the constant changes to the data model is the problem , I think best
> option is to abstract registry implementation to something like collections
> and resources used in WSO2 Registry [1] or something suitable for Airavata
> context. That will make it easy to handle changes in data model.

You stated it right Milinda, Airavata does not have scaling needs which will go beyond RDMS limits, but needs this abstraction. 

Can any one elaborate more on collections and resources used in WSO2 registry?

Suresh

> 
> Also don't let the technologies drive design decision. Its always better to
> let use cases drive the design decision.
> 
> Thanks
> Milinda
> 
> [1] http://wso2.com/products/governance-registry/
> 
> 
> On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <su...@gmail.com>wrote:
> 
>> Hi all,
>> 
>> I'm not trying to discourage you on your exploration to NoSQL databases. I
>> have the following concern.
>> 
>> Your database schema is moderately complex - even for a RDBMS it seems
>> complex and the data size is relatively small. I'm not sure about the
>> current tools available but I think you will need to write more code to
>> support all your requirements in a NoSQL database. So writing more code and
>> allow redundancy to support *relatively small* and *structured
>> data*doesn't seem right to me. May be I'm wrong and there are better
>> tools in
>> NoSQL than RDBMS, which I doubt.
>> 
>> Thanks,
>> Supun..
>> 
>> 
>> 
>> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org> wrote:
>> 
>>> Hi All,
>>> 
>>> Airavata is actively migrating to use Thrift API for the RESTless design
>>> and to facilitate various language bindings from client gateways. The
>>> programming language support in thrift has been so far very encouraging.
>>> The current architecture is looking like Figure 1 at [1].
>>> 
>>> Language specific clients will be released as thrift SDK's (similar to
>>> evernote sdk's [1]). These clients will be integrated into gateway
>> portals
>>> which connect to the API Server. The API operations brokers he simple
>> calls
>>> into one or more backend CPI calls (Airavata internal component
>>> interfaces).  An example set of mappings are illustrated in Figure 2 at
>>> [1]. The current draft of thrift API for version 0.12 is at [3], please
>> pay
>>> attention to experiment model at [4].
>>> 
>>> For the persistent store, we had few iterations of Airavata Registry
>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
>>> registry. To allow the API and the associated data models to evolve, it
>>> will be useful to explore object databases so we can store the serialized
>>> version of thrift objects directly. But it will be nice to have all (or
>>> most) of the fields queriable. This calls for a more column-family design
>>> of any NoSQL approaches.
>>> 
>>> Any recommendations for a registry architecture?
>>> 
>>> Quickly hacking through I find the following approach a viable one:
>>> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
>> benefit
>>> immediately from the replication and reliability of cassandra and
>>> scalability in near future. Some of the model objects like experiment
>>> creation will need to have strong consistency and most of the monitoring
>>> can live with eventual consistency.
>>> 
>>> Critical comments please?
>>> 
>>> Thanks for your time,
>>> Suresh
>>> 
>>> [1] -
>>> 
>> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
>>> [2] - https://dev.evernote.com/doc/
>>> [3] -
>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
>>> [4] -
>>> 
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>>> [5] - https://github.com/MisterTea/ZombieDB
>>> [6] - https://github.com/Netflix/astyanax
>>> 
>>> 
>> 
>> 
>> --
>> Supun Kamburugamuva
>> Member, Apache Software Foundation; http://www.apache.org
>> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
>> Blog: http://supunk.blogspot.com
>> 
> 
> 
> 
> -- 
> Milinda Pathirage
> PhD Student Indiana University, Bloomington;
> E-mail: milinda.pathirage@gmail.com
> Web: http://mpathirage.com
> Blog: http://blog.mpathirage.com

Re: Object Database Suggestions for Airavata Registry

Posted by Milinda Pathirage <mi...@gmail.com>.

I also think that moving to Cassandra or any other NoSQL will add
unneccessary complexity to your solution. Also designing proper (easy to
manage changes, easy to query) NoSQL data models are hard (AFAIK, require
lots of experience and understanding about data structures and queries).
Also migrating from one NoSQL technology to other can require complete
re-write. And current relational databases can handle heavy loads except
Google, Twitter, Amazon and Facebook like loads. I don't think Airavata
will see Google and Amazon like loads.

If the constant changes to the data model is the problem , I think best
option is to abstract registry implementation to something like collections
and resources used in WSO2 Registry [1] or something suitable for Airavata
context. That will make it easy to handle changes in data model.

Also don't let the technologies drive design decision. Its always better to
let use cases drive the design decision.

Thanks
Milinda

[1] http://wso2.com/products/governance-registry/


On Mon, Feb 24, 2014 at 10:57 AM, Supun Kamburugamuva <su...@gmail.com>wrote:

> Hi all,
>
> I'm not trying to discourage you on your exploration to NoSQL databases. I
> have the following concern.
>
> Your database schema is moderately complex - even for a RDBMS it seems
> complex and the data size is relatively small. I'm not sure about the
> current tools available but I think you will need to write more code to
> support all your requirements in a NoSQL database. So writing more code and
> allow redundancy to support *relatively small* and *structured
> data*doesn't seem right to me. May be I'm wrong and there are better
> tools in
> NoSQL than RDBMS, which I doubt.
>
> Thanks,
> Supun..
>
>
>
> On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org> wrote:
>
> > Hi All,
> >
> > Airavata is actively migrating to use Thrift API for the RESTless design
> > and to facilitate various language bindings from client gateways. The
> > programming language support in thrift has been so far very encouraging.
> > The current architecture is looking like Figure 1 at [1].
> >
> > Language specific clients will be released as thrift SDK's (similar to
> > evernote sdk's [1]). These clients will be integrated into gateway
> portals
> > which connect to the API Server. The API operations brokers he simple
> calls
> > into one or more backend CPI calls (Airavata internal component
> > interfaces).  An example set of mappings are illustrated in Figure 2 at
> > [1]. The current draft of thrift API for version 0.12 is at [3], please
> pay
> > attention to experiment model at [4].
> >
> > For the persistent store, we had few iterations of Airavata Registry
> > shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
> > registry. To allow the API and the associated data models to evolve, it
> > will be useful to explore object databases so we can store the serialized
> > version of thrift objects directly. But it will be nice to have all (or
> > most) of the fields queriable. This calls for a more column-family design
> > of any NoSQL approaches.
> >
> > Any recommendations for a registry architecture?
> >
> > Quickly hacking through I find the following approach a viable one:
> > ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can
> benefit
> > immediately from the replication and reliability of cassandra and
> > scalability in near future. Some of the model objects like experiment
> > creation will need to have strong consistency and most of the monitoring
> > can live with eventual consistency.
> >
> > Critical comments please?
> >
> > Thanks for your time,
> > Suresh
> >
> > [1] -
> >
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> > [2] - https://dev.evernote.com/doc/
> > [3] -
> >
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> > [4] -
> >
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> > [5] - https://github.com/MisterTea/ZombieDB
> > [6] - https://github.com/Netflix/astyanax
> >
> >
>
>
> --
> Supun Kamburugamuva
> Member, Apache Software Foundation; http://www.apache.org
> E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
> Blog: http://supunk.blogspot.com
>



-- 
Milinda Pathirage
PhD Student Indiana University, Bloomington;
E-mail: milinda.pathirage@gmail.com
Web: http://mpathirage.com
Blog: http://blog.mpathirage.com

Re: Object Database Suggestions for Airavata Registry

Posted by Supun Kamburugamuva <su...@gmail.com>.

Hi all,

I'm not trying to discourage you on your exploration to NoSQL databases. I
have the following concern.

Your database schema is moderately complex - even for a RDBMS it seems
complex and the data size is relatively small. I'm not sure about the
current tools available but I think you will need to write more code to
support all your requirements in a NoSQL database. So writing more code and
allow redundancy to support *relatively small* and *structured
data*doesn't seem right to me. May be I'm wrong and there are better
tools in
NoSQL than RDBMS, which I doubt.

Thanks,
Supun..



On Sun, Feb 23, 2014 at 5:20 PM, Suresh Marru <sm...@apache.org> wrote:

> Hi All,
>
> Airavata is actively migrating to use Thrift API for the RESTless design
> and to facilitate various language bindings from client gateways. The
> programming language support in thrift has been so far very encouraging.
> The current architecture is looking like Figure 1 at [1].
>
> Language specific clients will be released as thrift SDK's (similar to
> evernote sdk's [1]). These clients will be integrated into gateway portals
> which connect to the API Server. The API operations brokers he simple calls
> into one or more backend CPI calls (Airavata internal component
> interfaces).  An example set of mappings are illustrated in Figure 2 at
> [1]. The current draft of thrift API for version 0.12 is at [3], please pay
> attention to experiment model at [4].
>
> For the persistent store, we had few iterations of Airavata Registry
> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
> registry. To allow the API and the associated data models to evolve, it
> will be useful to explore object databases so we can store the serialized
> version of thrift objects directly. But it will be nice to have all (or
> most) of the fields queriable. This calls for a more column-family design
> of any NoSQL approaches.
>
> Any recommendations for a registry architecture?
>
> Quickly hacking through I find the following approach a viable one:
> ZombieDB[5] over astyanax[6] which talks to Cassandra. Airavata can benefit
> immediately from the replication and reliability of cassandra and
> scalability in near future. Some of the model objects like experiment
> creation will need to have strong consistency and most of the monitoring
> can live with eventual consistency.
>
> Critical comments please?
>
> Thanks for your time,
> Suresh
>
> [1] -
> https://cwiki.apache.org/confluence/display/AIRAVATA/2014/02/23/Brainstorming+Diagrams
> [2] - https://dev.evernote.com/doc/
> [3] -
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=tree;f=airavata-api/thrift-interface-descriptions;hb=HEAD
> [4] -
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
> [5] - https://github.com/MisterTea/ZombieDB
> [6] - https://github.com/Netflix/astyanax
>
>


-- 
Supun Kamburugamuva
Member, Apache Software Foundation; http://www.apache.org
E-mail: supun06@gmail.com;  Mobile: +1 812 369 6762
Blog: http://supunk.blogspot.com

Re: Thrift IDL Feedback

Posted by Suresh Marru <sm...@apache.org>.

Hi Eran,

Thanks for the detailed feedback. Please see more below:

On Feb 28, 2014, at 1:26 AM, Eran Chinthaka Withana <er...@gmail.com> wrote:

> Hi,
> 
> Here are my feedback.
> 
> 1. Thrift has a serious limitation handling nulls. For example, if you are
> trying to retrieve an experiment for a given id and if there is no
> experiment relating to that id, then you can not send null. To support
> this, we introduced a required boolean isEmpty for every data structure we
> defined in the thrift file. By default this is false but when we want to
> send null, we set this to true and send out the empty structure. So, may
> be, its a good idea to introduce this attribute if you are thinking of
> using any of these and either method inputs or return types.

I think this is a great idea. The experiment model has too much nested structure and it will be useful to have this check and ignore parsing through some of the objects. 

> 
> 2. If we define all the models in one file, then the maintenance of these
> struct could be an issue. There are couple of scenarios here
>  a) say, you want to introduce an API where you will only be using only
> few of these structs. But for that to work, you are *forced *to include the
> whole thrift file *AND *have all classes included in the jar file (unless
> you do selective exclusion of files).
>  b) this file can get bigger and bigger when there are more models to add
> 
> To solve this issues, we limited a single thrift file for a single data
> structure and have a version in it. But within the thrift file, it
> explicitly mentions the inclusions. Then, we can pass this to a tool (like
> Medusa: http://goo.gl/Og7IgF) and let it build self contained jars for a
> given data structure.
> Let me explain this a bit further. Lets say, you want to use Experiment
> struct and you want to create a jar out of ONLY the generated depending
> classes. With a tool like Medusa, it can read the thrift IDL, build the
> related other thrift files and build a jar out of only those files together
> with a related pom file. So rather than having a bulky all struct jar, you
> will have light weight jars.

I exactly intended to do this but refrained because of the jar building. But Medusa seems to be a good tool which will ease this task. So for the current release (0.12) will push ahead with this  monotonic thrift file, but will refactor it out for next release. Hopefully, wizzecommerce can open source medusa by then.

> 
> 3. Add documentation to each and every struct/enum. It may be clear to you
> know, but 6 months down the line you will forget what that is. Also, if you
> have better documentation, anyone can read and understand it (and may be
> generate thrift docs similar to java docs)

I was deferring this and already realizing some of the intentions are getting lost. We need to take this as a high priority.

> 
> 4. struct rootCauseErrorIdList is of type list. I'd try to use set instead
> of list for all primitive types as a good practice to avoid unnecessary
> duplicates whenever possible.

Good point changed the lists to sets. Will commit it after I change the server handler classes. 

This is very useful feedback. 

Thanks,
Suresh

> 
> 
> Thanks,
> Eran Chinthaka Withana
> 
> 
> On Tue, Feb 25, 2014 at 12:07 PM, Suresh Marru <sm...@apache.org> wrote:
> 
>> Hi All,
>> 
>> Sorry I have been distracted and could not catch up back on the object
>> database thread, will do so very soon. Meanwhile, would like to request
>> some feedback on the theft interfaces
>> 
>> Hi Eran,
>> 
>> The data model is near to complete for a first draft, but the API itself
>> is half-baked and the feedback you have below is already useful. I will
>> request for a API thrift IDL review later this week. Meanwhile, you have
>> any feedback on the data model thrift IDL -
>> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>> 
>> Thanks,
>> Suresh
>> 
>> On Feb 24, 2014, at 1:40 AM, Eran Chinthaka Withana <
>> eran.chinthaka@gmail.com> wrote:
>> 
>>> Comments on thrift IDL
>>> 
>>> 1. The input and output parameters do not have constraint specifiers
>>> (required vs optional) and left to be default. This will be very
>>> challenging when we try to improve APIs in later versions and its a
>>> standard practise to ALWAYS have either optional or required as
>> constraint
>>> specifiers.
>>> 
>>> 2. consider using TypeDefs to reduce repetitive names. For example,
>>> defining airavataErrors.InvalidRequestException as a type will help you
>> to
>>> simply refer to that as InvalidRequestException
>>> 
>>> 3. Introduce a parameter for each method to get the API key. This will be
>>> helpful in the future to identify individual clients, enforce SLAs, logs
>>> requests, etc
>>

Re: Thrift IDL Feedback

Posted by Eran Chinthaka Withana <er...@gmail.com>.

Hi,

Here are my feedback.

1. Thrift has a serious limitation handling nulls. For example, if you are
trying to retrieve an experiment for a given id and if there is no
experiment relating to that id, then you can not send null. To support
this, we introduced a required boolean isEmpty for every data structure we
defined in the thrift file. By default this is false but when we want to
send null, we set this to true and send out the empty structure. So, may
be, its a good idea to introduce this attribute if you are thinking of
using any of these and either method inputs or return types.

2. If we define all the models in one file, then the maintenance of these
struct could be an issue. There are couple of scenarios here
  a) say, you want to introduce an API where you will only be using only
few of these structs. But for that to work, you are *forced *to include the
whole thrift file *AND *have all classes included in the jar file (unless
you do selective exclusion of files).
  b) this file can get bigger and bigger when there are more models to add

To solve this issues, we limited a single thrift file for a single data
structure and have a version in it. But within the thrift file, it
explicitly mentions the inclusions. Then, we can pass this to a tool (like
Medusa: http://goo.gl/Og7IgF) and let it build self contained jars for a
given data structure.
Let me explain this a bit further. Lets say, you want to use Experiment
struct and you want to create a jar out of ONLY the generated depending
classes. With a tool like Medusa, it can read the thrift IDL, build the
related other thrift files and build a jar out of only those files together
with a related pom file. So rather than having a bulky all struct jar, you
will have light weight jars.

3. Add documentation to each and every struct/enum. It may be clear to you
know, but 6 months down the line you will forget what that is. Also, if you
have better documentation, anyone can read and understand it (and may be
generate thrift docs similar to java docs)

4. struct rootCauseErrorIdList is of type list. I'd try to use set instead
of list for all primitive types as a good practice to avoid unnecessary
duplicates whenever possible.

Thanks,
Eran Chinthaka Withana

On Tue, Feb 25, 2014 at 12:07 PM, Suresh Marru <sm...@apache.org> wrote:

> Hi All,
>
> Sorry I have been distracted and could not catch up back on the object
> database thread, will do so very soon. Meanwhile, would like to request
> some feedback on the theft interfaces
>
> Hi Eran,
>
> The data model is near to complete for a first draft, but the API itself
> is half-baked and the feedback you have below is already useful. I will
> request for a API thrift IDL review later this week. Meanwhile, you have
> any feedback on the data model thrift IDL -
> https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD
>
> Thanks,
> Suresh
>
> On Feb 24, 2014, at 1:40 AM, Eran Chinthaka Withana <
> eran.chinthaka@gmail.com> wrote:
>
> > Comments on thrift IDL
> >
> > 1. The input and output parameters do not have constraint specifiers
> > (required vs optional) and left to be default. This will be very
> > challenging when we try to improve APIs in later versions and its a
> > standard practise to ALWAYS have either optional or required as
> constraint
> > specifiers.
> >
> > 2. consider using TypeDefs to reduce repetitive names. For example,
> > defining airavataErrors.InvalidRequestException as a type will help you
> to
> > simply refer to that as InvalidRequestException
> >
> > 3. Introduce a parameter for each method to get the API key. This will be
> > helpful in the future to identify individual clients, enforce SLAs, logs
> > requests, etc
>

Thrift IDL Feedback

Posted by Suresh Marru <sm...@apache.org>.

Hi All,

Sorry I have been distracted and could not catch up back on the object database thread, will do so very soon. Meanwhile, would like to request some feedback on the theft interfaces 

Hi Eran,

The data model is near to complete for a first draft, but the API itself is half-baked and the feedback you have below is already useful. I will request for a API thrift IDL review later this week. Meanwhile, you have any feedback on the data model thrift IDL - https://git-wip-us.apache.org/repos/asf?p=airavata.git;a=blob_plain;f=airavata-api/thrift-interface-descriptions/experimentModel.thrift;hb=HEAD

Thanks,
Suresh

On Feb 24, 2014, at 1:40 AM, Eran Chinthaka Withana <er...@gmail.com> wrote:

> Comments on thrift IDL
> 
> 1. The input and output parameters do not have constraint specifiers
> (required vs optional) and left to be default. This will be very
> challenging when we try to improve APIs in later versions and its a
> standard practise to ALWAYS have either optional or required as constraint
> specifiers.
> 
> 2. consider using TypeDefs to reduce repetitive names. For example,
> defining airavataErrors.InvalidRequestException as a type will help you to
> simply refer to that as InvalidRequestException
> 
> 3. Introduce a parameter for each method to get the API key. This will be
> helpful in the future to identify individual clients, enforce SLAs, logs
> requests, etc

Re: Object Database Suggestions for Airavata Registry

Posted by Marlon Pierce <ma...@iu.edu>.

My point of view here is that the API use cases can be articulated
independently of the internal CPI use cases, and are necessary but not
sufficient for the overall design.  


Marlon

On 2/24/14 7:56 AM, Marlon Pierce wrote:
> Registry use cases: Eran's point here is important, of course.  Can we
> collectively articulate those?  I suggest we focus on API use cases only
> (the capabilities we need to provide to gateways) rather than internal
> use cases (how the orchestrator and registry interact). 
>
>
> Marlon
>
>
> On 2/24/14 1:40 AM, Eran Chinthaka Withana wrote:
>> Hi Suresh,
>>
>> I will try to keep the focus of this mail thread on to the object db
>> selection. But I will also share some comments about the architecture and
>> the API since you mentioned those. Please feel free to spawn separate
>> threads on those if we want to keep this thread focused on object DB.
>>
>> Please see the comments in-line.
>>
>> Thanks,
>> Eran Chinthaka Withana
>>
>>
>> On Sun, Feb 23, 2014 at 2:20 PM, Suresh Marru <sm...@apache.org> wrote:
>>
>>> Hi All,
>>>
>>> Airavata is actively migrating to use Thrift API for the RESTless design
>>> and to facilitate various language bindings from client gateways. The
>>> programming language support in thrift has been so far very encouraging.
>>> The current architecture is looking like Figure 1 at [1].
>>>
>> Quick questions on the architecture. It seems like the API is directly
>> contacting the Orchestrator to schedule workflows. I honestly think this is
>> not a scalable approach due to the impedance mismatch of these two systems.
>> Are we considering to decouple these two with a message queue and go for a
>> worker based architecture?
>>
>>  Also, the "API Mapping Diagram" is hinting towards a "kind of" stateful
>> service with a sequential set of steps. For example, due to the lack of a
>> method to get all experiments, I assume the client is suppose to remember
>> the experiment ids and invoke each of these methods in sequence. I'd
>> encourage to think in terms of stateless invocation where any client can
>> invoke each of these methods without a prior knowledge on the state of the
>> execution.
>>
>> Language specific clients will be released as thrift SDK's (similar to
>>> evernote sdk's [1]). These clients will be integrated into gateway portals
>>> which connect to the API Server. The API operations brokers he simple calls
>>> into one or more backend CPI calls (Airavata internal component
>>> interfaces).  An example set of mappings are illustrated in Figure 2 at
>>> [1]. The current draft of thrift API for version 0.12 is at [3], please pay
>>> attention to experiment model at [4].
>>>
>> Comments on thrift IDL
>>
>> 1. The input and output parameters do not have constraint specifiers
>> (required vs optional) and left to be default. This will be very
>> challenging when we try to improve APIs in later versions and its a
>> standard practise to ALWAYS have either optional or required as constraint
>> specifiers.
>>
>> 2. consider using TypeDefs to reduce repetitive names. For example,
>> defining airavataErrors.InvalidRequestException as a type will help you to
>> simply refer to that as InvalidRequestException
>>
>> 3. Introduce a parameter for each method to get the API key. This will be
>> helpful in the future to identify individual clients, enforce SLAs, logs
>> requests, etc
>>
>>
>>> For the persistent store, we had few iterations of Airavata Registry
>>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
>>> registry. To allow the API and the associated data models to evolve, it
>>> will be useful to explore object databases so we can store the serialized
>>> version of thrift objects directly. But it will be nice to have all (or
>>> most) of the fields queriable.
>> FYI, we did a storage space analysis sometime back and for smaller objects,
>> the overhead of storing the object in thrift serialized form vs each
>> attribute as a column is same. Also, enabling compression on each column
>> family will make the difference go away further. So, I'd first start with a
>> fields based object representation.
>>
>> Having said that, making each attribute part of a column doesn't make it
>> queriable. We have to either create secondary indexes or do column slices
>> and both these are a bit expensive. So as always with NoSQL storage
>> systems, we should always know the queries ahead of time before even
>> loosely defining storage schemas.
>>
>>
>>> This calls for a more column-family design of any NoSQL approaches.
>>>
>>> Any recommendations for a registry architecture?
>>>
>> It will be easy to answer this question if you can list the use cases for
>> the registry. I don't think most people in this list know all the use
>> cases. I myself have a very faint memory :)
>>
>>
>>> Quickly hacking through I find the following approach a viable one:
>>> ZombieDB[5] over astyanax[6] which talks to Cassandra.
>> Not sure why you picked Astyanax (despite it being originated from Netflix
>> and boasting to have better performance than Hector due to its token range
>> awareness). I'd rather pick Hector or Astyanax based on the performance
>> numbers you get. We did some work on this earlier and came up with an
>> abstraction over these two clients so that we can switch easily between
>> those: https://github.com/WizeCommerce/hecuba
>>
>> In any case, I think its bit too early to talk about this.
>>
>> I haven't used ZombieDB before, but before we pick any technology I'd spend
>> a bit more time to list down the use cases.
>>
>>
>>> Airavata can benefit immediately from the replication and reliability of
>>> cassandra and scalability in near future. Some of the model objects like
>>> experiment creation will need to have strong consistency and most of the
>>> monitoring can live with eventual consistency.
>>>
>> Cassandra, even though is supposed to compromise C for AP (from CAP
>> theorem), there are knobs (like read and write consistency levels) we can
>> use to make it strong C. So I think we are covered here.
>>

Re: Object Database Suggestions for Airavata Registry

Posted by Marlon Pierce <ma...@iu.edu>.

Registry use cases: Eran's point here is important, of course.  Can we
collectively articulate those?  I suggest we focus on API use cases only
(the capabilities we need to provide to gateways) rather than internal
use cases (how the orchestrator and registry interact). 


Marlon


On 2/24/14 1:40 AM, Eran Chinthaka Withana wrote:
> Hi Suresh,
>
> I will try to keep the focus of this mail thread on to the object db
> selection. But I will also share some comments about the architecture and
> the API since you mentioned those. Please feel free to spawn separate
> threads on those if we want to keep this thread focused on object DB.
>
> Please see the comments in-line.
>
> Thanks,
> Eran Chinthaka Withana
>
>
> On Sun, Feb 23, 2014 at 2:20 PM, Suresh Marru <sm...@apache.org> wrote:
>
>> Hi All,
>>
>> Airavata is actively migrating to use Thrift API for the RESTless design
>> and to facilitate various language bindings from client gateways. The
>> programming language support in thrift has been so far very encouraging.
>> The current architecture is looking like Figure 1 at [1].
>>
> Quick questions on the architecture. It seems like the API is directly
> contacting the Orchestrator to schedule workflows. I honestly think this is
> not a scalable approach due to the impedance mismatch of these two systems.
> Are we considering to decouple these two with a message queue and go for a
> worker based architecture?
>
>  Also, the "API Mapping Diagram" is hinting towards a "kind of" stateful
> service with a sequential set of steps. For example, due to the lack of a
> method to get all experiments, I assume the client is suppose to remember
> the experiment ids and invoke each of these methods in sequence. I'd
> encourage to think in terms of stateless invocation where any client can
> invoke each of these methods without a prior knowledge on the state of the
> execution.
>
> Language specific clients will be released as thrift SDK's (similar to
>> evernote sdk's [1]). These clients will be integrated into gateway portals
>> which connect to the API Server. The API operations brokers he simple calls
>> into one or more backend CPI calls (Airavata internal component
>> interfaces).  An example set of mappings are illustrated in Figure 2 at
>> [1]. The current draft of thrift API for version 0.12 is at [3], please pay
>> attention to experiment model at [4].
>>
> Comments on thrift IDL
>
> 1. The input and output parameters do not have constraint specifiers
> (required vs optional) and left to be default. This will be very
> challenging when we try to improve APIs in later versions and its a
> standard practise to ALWAYS have either optional or required as constraint
> specifiers.
>
> 2. consider using TypeDefs to reduce repetitive names. For example,
> defining airavataErrors.InvalidRequestException as a type will help you to
> simply refer to that as InvalidRequestException
>
> 3. Introduce a parameter for each method to get the API key. This will be
> helpful in the future to identify individual clients, enforce SLAs, logs
> requests, etc
>
>
>> For the persistent store, we had few iterations of Airavata Registry
>> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
>> registry. To allow the API and the associated data models to evolve, it
>> will be useful to explore object databases so we can store the serialized
>> version of thrift objects directly. But it will be nice to have all (or
>> most) of the fields queriable.
>
> FYI, we did a storage space analysis sometime back and for smaller objects,
> the overhead of storing the object in thrift serialized form vs each
> attribute as a column is same. Also, enabling compression on each column
> family will make the difference go away further. So, I'd first start with a
> fields based object representation.
>
> Having said that, making each attribute part of a column doesn't make it
> queriable. We have to either create secondary indexes or do column slices
> and both these are a bit expensive. So as always with NoSQL storage
> systems, we should always know the queries ahead of time before even
> loosely defining storage schemas.
>
>
>> This calls for a more column-family design of any NoSQL approaches.
>>
>> Any recommendations for a registry architecture?
>>
> It will be easy to answer this question if you can list the use cases for
> the registry. I don't think most people in this list know all the use
> cases. I myself have a very faint memory :)
>
>
>> Quickly hacking through I find the following approach a viable one:
>> ZombieDB[5] over astyanax[6] which talks to Cassandra.
>
> Not sure why you picked Astyanax (despite it being originated from Netflix
> and boasting to have better performance than Hector due to its token range
> awareness). I'd rather pick Hector or Astyanax based on the performance
> numbers you get. We did some work on this earlier and came up with an
> abstraction over these two clients so that we can switch easily between
> those: https://github.com/WizeCommerce/hecuba
>
> In any case, I think its bit too early to talk about this.
>
> I haven't used ZombieDB before, but before we pick any technology I'd spend
> a bit more time to list down the use cases.
>
>
>> Airavata can benefit immediately from the replication and reliability of
>> cassandra and scalability in near future. Some of the model objects like
>> experiment creation will need to have strong consistency and most of the
>> monitoring can live with eventual consistency.
>>
> Cassandra, even though is supposed to compromise C for AP (from CAP
> theorem), there are knobs (like read and write consistency levels) we can
> use to make it strong C. So I think we are covered here.
>

Re: Object Database Suggestions for Airavata Registry

Posted by Eran Chinthaka Withana <er...@gmail.com>.

Hi Suresh,

I will try to keep the focus of this mail thread on to the object db
selection. But I will also share some comments about the architecture and
the API since you mentioned those. Please feel free to spawn separate
threads on those if we want to keep this thread focused on object DB.

Please see the comments in-line.

Thanks,
Eran Chinthaka Withana

On Sun, Feb 23, 2014 at 2:20 PM, Suresh Marru <sm...@apache.org> wrote:

> Hi All,
>
> Airavata is actively migrating to use Thrift API for the RESTless design
> and to facilitate various language bindings from client gateways. The
> programming language support in thrift has been so far very encouraging.
> The current architecture is looking like Figure 1 at [1].
>

Quick questions on the architecture. It seems like the API is directly
contacting the Orchestrator to schedule workflows. I honestly think this is
not a scalable approach due to the impedance mismatch of these two systems.
Are we considering to decouple these two with a message queue and go for a
worker based architecture?

 Also, the "API Mapping Diagram" is hinting towards a "kind of" stateful
service with a sequential set of steps. For example, due to the lack of a
method to get all experiments, I assume the client is suppose to remember
the experiment ids and invoke each of these methods in sequence. I'd
encourage to think in terms of stateless invocation where any client can
invoke each of these methods without a prior knowledge on the state of the
execution.

Language specific clients will be released as thrift SDK's (similar to
> evernote sdk's [1]). These clients will be integrated into gateway portals
> which connect to the API Server. The API operations brokers he simple calls
> into one or more backend CPI calls (Airavata internal component
> interfaces).  An example set of mappings are illustrated in Figure 2 at
> [1]. The current draft of thrift API for version 0.12 is at [3], please pay
> attention to experiment model at [4].
>

Comments on thrift IDL

1. The input and output parameters do not have constraint specifiers
(required vs optional) and left to be default. This will be very
challenging when we try to improve APIs in later versions and its a
standard practise to ALWAYS have either optional or required as constraint
specifiers.

2. consider using TypeDefs to reduce repetitive names. For example,
defining airavataErrors.InvalidRequestException as a type will help you to
simply refer to that as InvalidRequestException

3. Introduce a parameter for each method to get the API key. This will be
helpful in the future to identify individual clients, enforce SLAs, logs
requests, etc

>
> For the persistent store, we had few iterations of Airavata Registry
> shifting from a legacy XRegistry to JackRabbit to now a OpenJPA based
> registry. To allow the API and the associated data models to evolve, it
> will be useful to explore object databases so we can store the serialized
> version of thrift objects directly. But it will be nice to have all (or
> most) of the fields queriable.

FYI, we did a storage space analysis sometime back and for smaller objects,
the overhead of storing the object in thrift serialized form vs each
attribute as a column is same. Also, enabling compression on each column
family will make the difference go away further. So, I'd first start with a
fields based object representation.

Having said that, making each attribute part of a column doesn't make it
queriable. We have to either create secondary indexes or do column slices
and both these are a bit expensive. So as always with NoSQL storage
systems, we should always know the queries ahead of time before even
loosely defining storage schemas.

> This calls for a more column-family design of any NoSQL approaches.
>
> Any recommendations for a registry architecture?
>

It will be easy to answer this question if you can list the use cases for
the registry. I don't think most people in this list know all the use
cases. I myself have a very faint memory :)

> Quickly hacking through I find the following approach a viable one:
> ZombieDB[5] over astyanax[6] which talks to Cassandra.

Not sure why you picked Astyanax (despite it being originated from Netflix
and boasting to have better performance than Hector due to its token range
awareness). I'd rather pick Hector or Astyanax based on the performance
numbers you get. We did some work on this earlier and came up with an
abstraction over these two clients so that we can switch easily between
those: https://github.com/WizeCommerce/hecuba

In any case, I think its bit too early to talk about this.

I haven't used ZombieDB before, but before we pick any technology I'd spend
a bit more time to list down the use cases.

> Airavata can benefit immediately from the replication and reliability of
> cassandra and scalability in near future. Some of the model objects like
> experiment creation will need to have strong consistency and most of the
> monitoring can live with eventual consistency.
>

Cassandra, even though is supposed to compromise C for AP (from CAP
theorem), there are knobs (like read and write consistency levels) we can
use to make it strong C. So I think we are covered here.