You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Paolo Castagna <ca...@googlemail.com> on 2012/03/20 19:33:51 UTC

Re: Graph algorithms over TDB

Hi Anuj,
can you share a little bit more about your use cases (I am interested).

How big are your graphs? (How many nodes? How many links?).

Have you tried to use PageRank, for example, as algorithm to compare different
graph databases and TDB which, although RDF is a directed labeled multi-graph,
isn't IMHO a 'proper' graph database and it has not designed or optimized for
typical graph access patterns (i.e. graph traversal).

Is there a well known and/or commonly used benchmark for graph databases?

It would be interesting to see PageRank or other graph algorithms implemented
on top of TDB (in node ids space, without touching the node table).

Paolo

Anuj Kumar wrote:
> Hello Everyone,
> 
> I have my RDF data loaded in TDB and using Java APIs to query the same. Is
> there a way to run graph algorithms on top of it?
> I am looking for shortest-path and PageRank.
> 
> I am currently doing it using the JUNG framework but just curious to know if
> it is possible to run it straight on top of TDB.
> 
> Thanks,
> Anuj
> 


Re: Graph algorithms over TDB

Posted by Anuj Kumar <an...@gmail.com>.
Hi Paolo,

I was trying with a subset of DBpedia dump that is close to 7-8 million
nodes with more than 10 million relations. I didn't compare TDB with
different databases but for SPARQL queries it was really good (I don't have
a quantitative measure).

Later, I decided to use Neo4j due to lot of graph traversal queries and
that is working pretty well. I have plans to use Gremlin [1] sometime very
soon to avoid the dependency on a particular graph DB. Once that
implementation is in place, I can try with different graph databases and
may be use gbench [2] framework to benchmark.

Frankly speaking I don't have much idea about TDBs implementation of Node
IDs so I left it for a moment. The implementation that I have in mind is to
use either a Storm [3] topology or a map-reduce framework to calculate
PageRank, etc. on top of TDB. That may work but I need to validate that.

Regards,
Anuj

[1] https://github.com/tinkerpop/gremlin
[2] http://ups.savba.sk/~marek/gbench.html
[3] https://github.com/nathanmarz/storm

On Wed, Mar 21, 2012 at 12:03 AM, Paolo Castagna <
castagna.lists@googlemail.com> wrote:

> Hi Anuj,
> can you share a little bit more about your use cases (I am interested).
>
> How big are your graphs? (How many nodes? How many links?).
>
> Have you tried to use PageRank, for example, as algorithm to compare
> different
> graph databases and TDB which, although RDF is a directed labeled
> multi-graph,
> isn't IMHO a 'proper' graph database and it has not designed or optimized
> for
> typical graph access patterns (i.e. graph traversal).
>
> Is there a well known and/or commonly used benchmark for graph databases?
>
> It would be interesting to see PageRank or other graph algorithms
> implemented
> on top of TDB (in node ids space, without touching the node table).
>
> Paolo
>
> Anuj Kumar wrote:
> > Hello Everyone,
> >
> > I have my RDF data loaded in TDB and using Java APIs to query the same.
> Is
> > there a way to run graph algorithms on top of it?
> > I am looking for shortest-path and PageRank.
> >
> > I am currently doing it using the JUNG framework but just curious to
> know if
> > it is possible to run it straight on top of TDB.
> >
> > Thanks,
> > Anuj
> >
>
>