You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Laura Morales <la...@mail.com> on 2017/03/26 16:20:14 UTC

Jena scalability

- Is Jena a "native" store? Or does it use some other RDBMS/NoSQL backends?
- Has anybody ever done tests/benchmarks to see how well Jena scales with large datasets (billions or trillions of n-quads)?
- Is it possible to start with a single machine, and later distribute the database over multiple machines as the graph grows?

Re: Jena scalability

Posted by Laura Morales <la...@mail.com>.
> Not currently with TDB but i have code in production which aggregates
> across multiple DatasetGraph's. We create a DatasetGraphMosaic and add
> DatasetGraph's to it. TDB in other JVM's are supported via a Thrift based
> proxy. This allows simple sparql, otherwise use the service command in your
> query...

Is the "service" command supposed to execute queries on different endpoints as if they were all a single, giant graph?

Re: Jena scalability

Posted by Laura Morales <la...@mail.com>.
> Claude Warren (one of the Jena committers) has been working on an Apache
> Cassandra backend, and he can say more about it if it seems relevant.

Is this Apache Cassandra backend "like TDB, but scalable?"

Re: Jena scalability

Posted by Eugene Tenkaev <hr...@gmail.com>.
I there any repository on github for Apache Cassandra backend?

2017-03-26 21:55 GMT+03:00 A. Soroka <aj...@virginia.edu>:

> TDB is a native store, with a next generation version in development [1].
> SDB uses a SQL backend. It is not under active development. Claude Warren
> (one of the Jena committers) has been working on an Apache Cassandra
> backend, and he can say more about it if it seems relevant.
>
> ---
> A. Soroka
> The University of Virginia Library
>
> [1] https://github.com/afs/mantis
>
> > On Mar 26, 2017, at 12:54 PM, Dick Murray <da...@gmail.com> wrote:
> >
> > On 26 Mar 2017 5:20 pm, "Laura Morales" <la...@mail.com> wrote:
> >
> > - Is Jena a "native" store? Or does it use some other RDBMS/NoSQL
> backends?
> >
> >
> > It has memory, TDB and SDB (I'm not sure of the current state)
> >
> > - Has anybody ever done tests/benchmarks to see how well Jena scales with
> > large datasets (billions or trillions of n-quads)?
> >
> >
> > We have several 650GB TDB and some Men instances at 128 GB. What queries
> > are being performed? How many graphs do you have? Are you just querying
> or
> > updating as well?
> >
> > - Is it possible to start with a single machine, and later distribute the
> > database over multiple machines as the graph grows?
> >
> >
> > Not currently with TDB but i have code in production which aggregates
> > across multiple DatasetGraph's. We create a DatasetGraphMosaic and add
> > DatasetGraph's to it. TDB in other JVM's are supported via a Thrift based
> > proxy. This allows simple sparql, otherwise use the service command in
> your
> > query...
>
>

Re: Jena scalability

Posted by "A. Soroka" <aj...@virginia.edu>.
TDB is a native store, with a next generation version in development [1]. SDB uses a SQL backend. It is not under active development. Claude Warren (one of the Jena committers) has been working on an Apache Cassandra backend, and he can say more about it if it seems relevant. 

---
A. Soroka
The University of Virginia Library

[1] https://github.com/afs/mantis

> On Mar 26, 2017, at 12:54 PM, Dick Murray <da...@gmail.com> wrote:
> 
> On 26 Mar 2017 5:20 pm, "Laura Morales" <la...@mail.com> wrote:
> 
> - Is Jena a "native" store? Or does it use some other RDBMS/NoSQL backends?
> 
> 
> It has memory, TDB and SDB (I'm not sure of the current state)
> 
> - Has anybody ever done tests/benchmarks to see how well Jena scales with
> large datasets (billions or trillions of n-quads)?
> 
> 
> We have several 650GB TDB and some Men instances at 128 GB. What queries
> are being performed? How many graphs do you have? Are you just querying or
> updating as well?
> 
> - Is it possible to start with a single machine, and later distribute the
> database over multiple machines as the graph grows?
> 
> 
> Not currently with TDB but i have code in production which aggregates
> across multiple DatasetGraph's. We create a DatasetGraphMosaic and add
> DatasetGraph's to it. TDB in other JVM's are supported via a Thrift based
> proxy. This allows simple sparql, otherwise use the service command in your
> query...


Re: Jena scalability

Posted by Dick Murray <da...@gmail.com>.
On 26 Mar 2017 5:20 pm, "Laura Morales" <la...@mail.com> wrote:

- Is Jena a "native" store? Or does it use some other RDBMS/NoSQL backends?


It has memory, TDB and SDB (I'm not sure of the current state)

- Has anybody ever done tests/benchmarks to see how well Jena scales with
large datasets (billions or trillions of n-quads)?


We have several 650GB TDB and some Men instances at 128 GB. What queries
are being performed? How many graphs do you have? Are you just querying or
updating as well?

- Is it possible to start with a single machine, and later distribute the
database over multiple machines as the graph grows?


Not currently with TDB but i have code in production which aggregates
across multiple DatasetGraph's. We create a DatasetGraphMosaic and add
DatasetGraph's to it. TDB in other JVM's are supported via a Thrift based
proxy. This allows simple sparql, otherwise use the service command in your
query...