You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airavata.apache.org by Hasitha Aravinda <ma...@gmail.com> on 2012/08/17 16:54:51 UTC

Possibility of using graph database for MetCat

Hi Devs,

In MetCat, it is a requirement to store relationships between data entities
and query about them. It might not be possible to achieve optimum
efficiency for query results since we are using non-relational database,
i.e. Cassandra.

As an example take following entities A,B,C,D,E... etc. *  If
A->C->B->D->E      and     B->F->G->K*
We might want to know whether there exists a relationship between A and F
(That is path between A and F)

In real world there may be thousands of entities and relationships which
may degrade the query efficiency well if we used conventional databases to
address such requirements. These kind of requirements are hardly filled by
Cassandra but Graph databases.

So for the above requirement, we might need to use graph database too. We'd
like to know your opinions on this.

Thanks,
Hasitha.

Re: Possibility of using graph database for MetCat

Posted by Hasitha Aravinda <ma...@gmail.com>.
Hi Shahani,

We didn't try this feature using Graph databases yet. But Hope they will
give better performance over Cassandra, because they are naturally designed
to answer these kind of problems. If we are going to support this kind of
relation search using Cassandra we have to do some calculations and
indexing for each data-products when retrieving and storing informations.
But this would be a good research and we will do some tests comparing
Cassandra and a graph database.

We did some research on several graph databases. But selecting a graph
database may be a problem, when considering the license issues and
community support. We can't use Neo4J because it licensed under GPLv3 and
conflict with apache license [0] and found some problem with some others
databases as well. Orientdb [1][2] may be a good option, because It is
licensed under Apache license v2.0 and seems to have some good community
support as well.

If we are going to use a graph database with MetCat it will become a
dependency of the MetCat. Also using two database (i.e Cassandra to store
metadata and graph database to store relations) may be a problem. So We
need your suggestions on this issue.

Thanks,
Hasitha

[0] - http://www.apache.org/licenses/GPL-compatibility.html
[1] - http://www.orientdb.org/orient-db.htm
[2] - http://code.google.com/p/orient/

On Sat, Aug 18, 2012 at 7:01 AM, Shahani Markus Weerawarana <
shahani.w@gmail.com> wrote:

> Hi Hasitha,
>
> This would be an interesting exploration.
> Have you tried out your idea with something like Neo4J? Have you come
> across any performance comparison articles/papers of Graph DBs such as
> Neo4J, FlockDB, InfiniteGraph with Column DBs such as Cassandra?
>
> Shahani
>
> On Fri, Aug 17, 2012 at 8:24 PM, Hasitha Aravinda
> <ma...@gmail.com>wrote:
>
> > Hi Devs,
> >
> > In MetCat, it is a requirement to store relationships between data
> entities
> > and query about them. It might not be possible to achieve optimum
> > efficiency for query results since we are using non-relational database,
> > i.e. Cassandra.
> >
> > As an example take following entities A,B,C,D,E... etc. *  If
> > A->C->B->D->E      and     B->F->G->K*
> > We might want to know whether there exists a relationship between A and F
> > (That is path between A and F)
> >
> > In real world there may be thousands of entities and relationships which
> > may degrade the query efficiency well if we used conventional databases
> to
> > address such requirements. These kind of requirements are hardly filled
> by
> > Cassandra but Graph databases.
> >
> > So for the above requirement, we might need to use graph database too.
> We'd
> > like to know your opinions on this.
> >
> > Thanks,
> > Hasitha.
> >
>
>
>
> --
> *Shahani Markus Weerawarana, Ph.D.*
> *Computer Scientist*
> Visiting Lecturer, University of Moratuwa, Sri Lanka.
> Visiting Scientist, Indiana University, USA.
>

Re: Possibility of using graph database for MetCat

Posted by Shahani Markus Weerawarana <sh...@gmail.com>.
Hi Hasitha,

This would be an interesting exploration.
Have you tried out your idea with something like Neo4J? Have you come
across any performance comparison articles/papers of Graph DBs such as
Neo4J, FlockDB, InfiniteGraph with Column DBs such as Cassandra?

Shahani

On Fri, Aug 17, 2012 at 8:24 PM, Hasitha Aravinda
<ma...@gmail.com>wrote:

> Hi Devs,
>
> In MetCat, it is a requirement to store relationships between data entities
> and query about them. It might not be possible to achieve optimum
> efficiency for query results since we are using non-relational database,
> i.e. Cassandra.
>
> As an example take following entities A,B,C,D,E... etc. *  If
> A->C->B->D->E      and     B->F->G->K*
> We might want to know whether there exists a relationship between A and F
> (That is path between A and F)
>
> In real world there may be thousands of entities and relationships which
> may degrade the query efficiency well if we used conventional databases to
> address such requirements. These kind of requirements are hardly filled by
> Cassandra but Graph databases.
>
> So for the above requirement, we might need to use graph database too. We'd
> like to know your opinions on this.
>
> Thanks,
> Hasitha.
>



-- 
*Shahani Markus Weerawarana, Ph.D.*
*Computer Scientist*
Visiting Lecturer, University of Moratuwa, Sri Lanka.
Visiting Scientist, Indiana University, USA.