You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Anuj Kumar <an...@gmail.com> on 2011/10/22 06:20:48 UTC

Graph algorithms over TDB

Hello Everyone,

I have my RDF data loaded in TDB and using Java APIs to query the same. Is
there a way to run graph algorithms on top of it?
I am looking for shortest-path and PageRank.

I am currently doing it using the JUNG framework but just curious to know if
it is possible to run it straight on top of TDB.

Thanks,
Anuj

Re: Graph algorithms over TDB

Posted by Anuj Kumar <an...@gmail.com>.

Hi Paolo,

I was trying with a subset of DBpedia dump that is close to 7-8 million
nodes with more than 10 million relations. I didn't compare TDB with
different databases but for SPARQL queries it was really good (I don't have
a quantitative measure).

Later, I decided to use Neo4j due to lot of graph traversal queries and
that is working pretty well. I have plans to use Gremlin [1] sometime very
soon to avoid the dependency on a particular graph DB. Once that
implementation is in place, I can try with different graph databases and
may be use gbench [2] framework to benchmark.

Frankly speaking I don't have much idea about TDBs implementation of Node
IDs so I left it for a moment. The implementation that I have in mind is to
use either a Storm [3] topology or a map-reduce framework to calculate
PageRank, etc. on top of TDB. That may work but I need to validate that.

Regards,
Anuj

[1] https://github.com/tinkerpop/gremlin
[2] http://ups.savba.sk/~marek/gbench.html
[3] https://github.com/nathanmarz/storm

On Wed, Mar 21, 2012 at 12:03 AM, Paolo Castagna <
castagna.lists@googlemail.com> wrote:

> Hi Anuj,
> can you share a little bit more about your use cases (I am interested).
>
> How big are your graphs? (How many nodes? How many links?).
>
> Have you tried to use PageRank, for example, as algorithm to compare
> different
> graph databases and TDB which, although RDF is a directed labeled
> multi-graph,
> isn't IMHO a 'proper' graph database and it has not designed or optimized
> for
> typical graph access patterns (i.e. graph traversal).
>
> Is there a well known and/or commonly used benchmark for graph databases?
>
> It would be interesting to see PageRank or other graph algorithms
> implemented
> on top of TDB (in node ids space, without touching the node table).
>
> Paolo
>
> Anuj Kumar wrote:
> > Hello Everyone,
> >
> > I have my RDF data loaded in TDB and using Java APIs to query the same.
> Is
> > there a way to run graph algorithms on top of it?
> > I am looking for shortest-path and PageRank.
> >
> > I am currently doing it using the JUNG framework but just curious to
> know if
> > it is possible to run it straight on top of TDB.
> >
> > Thanks,
> > Anuj
> >
>
>

Re: Graph algorithms over TDB

Posted by Paolo Castagna <ca...@googlemail.com>.

Hi Anuj,
can you share a little bit more about your use cases (I am interested).

How big are your graphs? (How many nodes? How many links?).

Have you tried to use PageRank, for example, as algorithm to compare different
graph databases and TDB which, although RDF is a directed labeled multi-graph,
isn't IMHO a 'proper' graph database and it has not designed or optimized for
typical graph access patterns (i.e. graph traversal).

Is there a well known and/or commonly used benchmark for graph databases?

It would be interesting to see PageRank or other graph algorithms implemented
on top of TDB (in node ids space, without touching the node table).

Paolo

Anuj Kumar wrote:
> Hello Everyone,
> 
> I have my RDF data loaded in TDB and using Java APIs to query the same. Is
> there a way to run graph algorithms on top of it?
> I am looking for shortest-path and PageRank.
> 
> I am currently doing it using the JUNG framework but just curious to know if
> it is possible to run it straight on top of TDB.
> 
> Thanks,
> Anuj
>

Re: Graph algorithms over TDB

Posted by Anuj Kumar <an...@gmail.com>.

Hi Damian,

I understand the motive of your implementation. It is nicely done. I have
been running my traversals over Jung earlier but what I am looking for is an
implementation distributed across multiple machines.

I have datasets of dbpedia loaded in TDB and trying the same. It is around
25M triples and total size is around 16GB.

Regards,
Anuj

On Wed, Oct 26, 2011 at 12:12 AM, Damian Steer <d....@bristol.ac.uk>wrote:

>
> On 25 Oct 2011, at 18:52, Anuj Kumar wrote:
>
> > Thanks Patrick. I also came across this presentation-
> >
> http://www.slideshare.net/slidarko/traversing-graph-databases-with-gremlin
> > and TinkerPop also supports Jung. So, with your and Damian's response, I
> got
> > my answers.
> >
> > I gave it a quick try. It works nicely with smaller graphs but as the
> > triples grow in TDB, Jung starts struggling and that makes sense because
> it
> > needs more memory depending on the size of the triples.
>
> Could you quantify that? How many triples are you dealing with?
>
> Don't underestimate the naïvety of the jenajung implementation. It was only
> written to get some diagrams working, over small graphs. There  may be scope
> to improve performance by caching, for example.
>
> What data were you using, and what were you doing with it? Have you tried
> profiling?
>
> Damian
>
>

Re: Graph algorithms over TDB

Posted by Damian Steer <d....@bristol.ac.uk>.

On 25 Oct 2011, at 18:52, Anuj Kumar wrote:

> Thanks Patrick. I also came across this presentation-
> http://www.slideshare.net/slidarko/traversing-graph-databases-with-gremlin
> and TinkerPop also supports Jung. So, with your and Damian's response, I got
> my answers.
> 
> I gave it a quick try. It works nicely with smaller graphs but as the
> triples grow in TDB, Jung starts struggling and that makes sense because it
> needs more memory depending on the size of the triples.

Could you quantify that? How many triples are you dealing with?

Don't underestimate the naïvety of the jenajung implementation. It was only written to get some diagrams working, over small graphs. There  may be scope to improve performance by caching, for example.

What data were you using, and what were you doing with it? Have you tried profiling?

Damian

Re: Graph algorithms over TDB

Posted by Anuj Kumar <an...@gmail.com>.

Thanks Patrick. I also came across this presentation-
http://www.slideshare.net/slidarko/traversing-graph-databases-with-gremlin
and TinkerPop also supports Jung. So, with your and Damian's response, I got
my answers.

I gave it a quick try. It works nicely with smaller graphs but as the
triples grow in TDB, Jung starts struggling and that makes sense because it
needs more memory depending on the size of the triples.

So, is there a way to load the entire graph from TDB in a distributed cache
or something that can span multiple machines? I can read it from TDB and
push it into a graph database or a distributed cache but if this is provided
by Jena TDB that is what I would prefer to go with.

Is it supported?

Thanks,
Anuj

On Sat, Oct 22, 2011 at 9:43 PM, Patrick Logan <pa...@gmail.com>wrote:

> Gremlin has been adapted to work with Sesame. It's probably not much
> of a stretch for Jena.
>
> https://github.com/tinkerpop/gremlin
>
>
> On Fri, Oct 21, 2011 at 9:20 PM, Anuj Kumar <an...@gmail.com> wrote:
> > Hello Everyone,
> >
> > I have my RDF data loaded in TDB and using Java APIs to query the same.
> Is
> > there a way to run graph algorithms on top of it?
> > I am looking for shortest-path and PageRank.
> >
> > I am currently doing it using the JUNG framework but just curious to know
> if
> > it is possible to run it straight on top of TDB.
> >
> > Thanks,
> > Anuj
> >
>

Re: Graph algorithms over TDB

Posted by Patrick Logan <pa...@gmail.com>.

Gremlin has been adapted to work with Sesame. It's probably not much
of a stretch for Jena.

https://github.com/tinkerpop/gremlin


On Fri, Oct 21, 2011 at 9:20 PM, Anuj Kumar <an...@gmail.com> wrote:
> Hello Everyone,
>
> I have my RDF data loaded in TDB and using Java APIs to query the same. Is
> there a way to run graph algorithms on top of it?
> I am looking for shortest-path and PageRank.
>
> I am currently doing it using the JUNG framework but just curious to know if
> it is possible to run it straight on top of TDB.
>
> Thanks,
> Anuj
>

Re: Graph algorithms over TDB

Posted by Anuj Kumar <an...@gmail.com>.

Thanks Damian. This is helpful and cleaner than my implementation.
I will try it.

Regards,
Anuj

On Sat, Oct 22, 2011 at 2:04 PM, Damian Steer <d....@bristol.ac.uk> wrote:

>
> On 22 Oct 2011, at 05:20, Anuj Kumar wrote:
>
> > Hello Everyone,
> >
> > I have my RDF data loaded in TDB and using Java APIs to query the same.
> Is
> > there a way to run graph algorithms on top of it?
> > I am looking for shortest-path and PageRank.
> >
> > I am currently doing it using the JUNG framework but just curious to know
> if
> > it is possible to run it straight on top of TDB.
>
> I wrote a jung wrapper for jena:
>
> <https://github.com/shellac/JenaJung>
>
> Damian

Re: Graph algorithms over TDB

Posted by Damian Steer <d....@bristol.ac.uk>.

On 22 Oct 2011, at 05:20, Anuj Kumar wrote:

> Hello Everyone,
> 
> I have my RDF data loaded in TDB and using Java APIs to query the same. Is
> there a way to run graph algorithms on top of it?
> I am looking for shortest-path and PageRank.
> 
> I am currently doing it using the JUNG framework but just curious to know if
> it is possible to run it straight on top of TDB.

I wrote a jung wrapper for jena:

<https://github.com/shellac/JenaJung>

Damian