You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Alexandra Kokkinaki <al...@gmail.com> on 2016/04/12 15:39:47 UTC
Re: Fuseki server: many data services or many fuseki installations?
Hi Andy, thanks for your answers. So would it be feasible to add/delete
triples in an existing database?
Thanks,
Alexandra
On Tue, Mar 29, 2016 at 9:58 AM, Andy Seaborne <an...@apache.org> wrote:
> On 21/03/16 13:35, Alexandra Kokkinaki wrote:
>
>> Hi Andy, thanks for your answers.
>>
>>
>> On Fri, Mar 18, 2016 at 11:43 AM, Andy Seaborne <an...@apache.org> wrote:
>>
>> Hi,
>>>
>>> it will depend on usage patterns. 2* 500 million isn't unreasonable but
>>> validating with your expected usage is essential.
>>> The critical factors are the usage patterns and the hardware available.
>>> Number of queries, query complexity, number of updates, all matter. RAM
>>> is
>>> good (which is true for any database) as are SSDs if you do lots of
>>> update
>>> or need fast startup from cold.
>>>
>>> What kind of usage patterns are considered not valid for big triple
>> stores.
>> We are planning to use our fuseki server to allow, machine to machine
>> communication and also allow independent users to express mostly spatial
>> queries We plan to do indexing and have a query time out too. Is that
>> enough to address performance issues?
>>
>
> They are a good idea. It will protect the server.
>
> It is possible to write SPARQL queries which are fundamentally expensive.
>
> The TDB will need to get updated daily, using jena API, since I suppose
>> deleting and inserting everything back would take a long time. I read in (
>>
>> https://lists.w3.org/Archives/Public/public-sparql-dev/2008JulSep/0029.html
>> ) that it takes 5370secs for 100M triples to be loaded in TDB, which is
>> good.
>> But here <https://www.w3.org/wiki/LargeTripleStores> it is said that it
>> took 36 hours to load 1.7B triples in TDB
>>
>
> ... in 2008 ... with a spinning disk.
>
> 12k triples/s would be a bit slow nowadays.
>
> At large scale tdbloader2 can be faster that tdbloader. You have to try
> with your data on your hardware - it isn't a simple yes/no question
> unfortunately.
>
> tdbloader2 only loads from empty.
>
> tdbloader does not do anything special when loading a partial database.
>
> , which drives me towards the
>> daily updates rather than daily delete and insert.
>> How long would a 500 triple DB take to be loaded in an empty database?
>>
>
> 500M?
>
> Just run
>
> tdbloader --loc DB <data> and see what rate you get - I'd be interested in
> seeing the log. Every data set, every hardware set can be different.
> That's why it is hard to make any accurate predications - just try it.
>
> tdbloader --loc=DB <the_data>
>
> The pattern of the data makes a difference - LUBM loads very fast as it
> has a high triples to nodes ratio so less bytes are being loaded. All
> triple stores report better figures on that data - a factor of x2 faster is
> common - but it's not typical data.
>
> Andy
>
>
> Multiple requests, whether same service or different service, are
>>> competing for the same machine resources. Fuseki runs requests
>>> independently and in parallel. There are per-database transactions
>>> supporting multiple, truly parallel readers.
>>>
>>>
>> Andy
>>>
>>
>>
>> Many thanks,
>>
>> Alexandra
>>
>>
>>>
>>> On 18/03/16 09:35, Alexandra Kokkinaki wrote:
>>>
>>> Hi,
>>>>
>>>> after researching on TDB performance with Big Data, I would still like
>>>> to
>>>> know:
>>>> We have one fuseki server exposing 2 sparql endpoints (2million triples
>>>> each) as data services. We are planning to add one more, but with Big
>>>> data
>>>>
>>>> 500Million triples
>>>>>
>>>>>
>>>> - For big data is it better to use many installations of fuseki
>>>> server
>>>> or
>>>> - many data services under the same Fuseki server?
>>>>
>>>>
>>>> Could fuseki cope with two or more services with more than 500 Million
>>>> triples each?
>>>>
>>>>
>>>
>>>
>>> How does Fuseki cope when it has to serve concurrent queries to the
>>>> different data services?
>>>>
>>>>
>>>
>>>
>>> Many thanks,
>>>>
>>>> Alexandra
>>>>
>>>>
>>>>
>>>
>>
>
Re: Chained Properties
Posted by Dave Reynolds <da...@gmail.com>.
On 15/04/16 12:12, Abduladem Eljamel wrote:
> HiDoes the reasoning engine in Jena supports the chained propertiesrules of OWL2?ThanksAbdul
No, there's no support for OWL2 builtin.
Dave
Chained Properties
Posted by Abduladem Eljamel <a_...@yahoo.co.uk.INVALID>.
HiDoes the reasoning engine in Jena supports the chained propertiesrules of OWL2?ThanksAbdul
Re: Reasoning On Graphs
Posted by Dave Reynolds <da...@gmail.com>.
On 14/04/16 15:17, Abduladem Eljamel wrote:
> Hi,,
> I have a DBT dataset contains several named graphs and ontology file. I would like to apply the same reasoning on all graphs in the dataset.
The Jena reasoner is not dataset-aware, it only works over models.
The reasoner also only works in memory, you can run it over a TDB-backed
model but the results will be stored in memory and will not persist
unless you explicitly write them out to a persisted model.
> I tried this code to apply reasoning on one named graph.
>
> OntModel ontmodel = ModelFactory.createOntologyModel(OntModelSpec.OWL_DL_MEM, null);
> File filePath = new File(OntologyPath);
> URL filePathURL = filePath.toURI().toURL();
> ontmodel.read(filePathURL.toString());
>
> Dataset dataset = TDBFactory.createDataset(datasetLocation);
> dataset.begin(ReadWrite.WRITE) ;
No need for this, you are not writing to the dataset here.
> Model model = dataset.getNamedModel(graphName);
>
> Reasoner reasoner = ReasonerRegistry.getOWLMicroReasoner();
> reasoner= reasoner.bindSchema(ontmodel);
> InfModel infmodel = ModelFactory.createInfModel(reasoner, model);
This will work but, as noted above, the infmodel is in memory and it has
to do quite a bit of work to load the data from TDB. It can be more
efficient to just load the TDB model into an in memory model first and
create the infmodel over that.
> Should I apply the reasoning on the graphs one by one? Is there any method to apply the reasoning on all graphs and they stay separate as they are? Is there any method which can be used as for the default graphs below?
>
> Model model = dataset.getDefaultModel();
Not sure I follow the question.
If you want to be able to access each of the inference closures of each
of the source graphs separately then you will need an InfModel for each.
If you want to treat the set of graphs as one big graph, and create a
single inference closure over the union model then do that. You can
either configure TDB to have the defaultGraph be the union of all the
named graphs or you can use the TDB-specific name for the union
urn:x-arq:DefaultGraph.
Dave
Reasoning On Graphs
Posted by Abduladem Eljamel <a_...@yahoo.co.uk.INVALID>.
Hi,,
I have a DBT dataset contains several named graphs and ontology file. I would like to apply the same reasoning on all graphs in the dataset.
I tried this code to apply reasoning on one named graph.
OntModel ontmodel = ModelFactory.createOntologyModel(OntModelSpec.OWL_DL_MEM, null);
File filePath = new File(OntologyPath);
URL filePathURL = filePath.toURI().toURL();
ontmodel.read(filePathURL.toString());
Dataset dataset = TDBFactory.createDataset(datasetLocation);
dataset.begin(ReadWrite.WRITE) ;
Model model = dataset.getNamedModel(graphName);
Reasoner reasoner = ReasonerRegistry.getOWLMicroReasoner();
reasoner= reasoner.bindSchema(ontmodel);
InfModel infmodel = ModelFactory.createInfModel(reasoner, model);
Should I apply the reasoning on the graphs one by one? Is there any method to apply the reasoning on all graphs and they stay separate as they are? Is there any method which can be used as for the default graphs below?
Model model = dataset.getDefaultModel();
Thanks
Abdul
Re: Fuseki server: many data services or many fuseki installations?
Posted by Andy Seaborne <an...@apache.org>.
On 14/04/16 13:43, Alexandra Kokkinaki wrote:
> Thanks Andy, so you are suggesting to break the updates (deletions and
> insertions) in smaller requests to avoid any memory issues. I suppose that
> we will make daily updates so that our triple store is up to date, which
> will probably result some hundreds of triples.
Hundreds - no worries.
Just keep the batch size of a single request below a few million (and
tweak the heap up a bit if you are doing million+ updates at one time).
Do not increase the heap regardless TBD uses non-heap space so more heap
is slower.
> I have seen the commit command but not the rollback. Is there any safety
> net if something goes wrong during the update procedure to rollback?
> Are there any nice examples about "TDB transactions" that I could start
> looking at?
In Fuseki, one HTTP request = one transaction. It aborts if the update
fails (e.g. bad request, syntax error).
What there isn't yet is begin-request-request-request-commit.
If you are dealing with hundred of triples, the way to approach is to
build up a set of all the changes you want, build a SPARQL update and do
it all at once. It will all happen or none happen.
You can not update a database using the TDB tools when Fuseki is using it.
Andy
Re: Fuseki server: many data services or many fuseki installations?
Posted by Alexandra Kokkinaki <al...@gmail.com>.
Thanks Andy, so you are suggesting to break the updates (deletions and
insertions) in smaller requests to avoid any memory issues. I suppose that
we will make daily updates so that our triple store is up to date, which
will probably result some hundreds of triples.
I have seen the commit command but not the rollback. Is there any safety
net if something goes wrong during the update procedure to rollback?
Are there any nice examples about "TDB transactions" that I could start
looking at?
Many thanks
Alexandra
On Thu, Apr 14, 2016 at 12:32 PM, Andy Seaborne <an...@apache.org> wrote:
> On 12/04/16 14:39, Alexandra Kokkinaki wrote:
>
>> Hi Andy, thanks for your answers. So would it be feasible to add/delete
>> triples in an existing database?
>>
>
> Updates are supported.
>
> However, changing large amounts (deleting or adding or a mix) - 10's of
> millions of triples - in a single transaction (single HTTP request) will
> consume too much memory. Such a large change would need to be broken up
> into multiple requests.
>
> Andy
>
>
>
>> Thanks,
>>
>> Alexandra
>>
>> On Tue, Mar 29, 2016 at 9:58 AM, Andy Seaborne <an...@apache.org> wrote:
>>
>> On 21/03/16 13:35, Alexandra Kokkinaki wrote:
>>>
>>> Hi Andy, thanks for your answers.
>>>>
>>>>
>>>> On Fri, Mar 18, 2016 at 11:43 AM, Andy Seaborne <an...@apache.org>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>>>
>>>>> it will depend on usage patterns. 2* 500 million isn't unreasonable but
>>>>> validating with your expected usage is essential.
>>>>> The critical factors are the usage patterns and the hardware available.
>>>>> Number of queries, query complexity, number of updates, all matter. RAM
>>>>> is
>>>>> good (which is true for any database) as are SSDs if you do lots of
>>>>> update
>>>>> or need fast startup from cold.
>>>>>
>>>>> What kind of usage patterns are considered not valid for big triple
>>>>>
>>>> stores.
>>>> We are planning to use our fuseki server to allow, machine to machine
>>>> communication and also allow independent users to express mostly
>>>> spatial
>>>> queries We plan to do indexing and have a query time out too. Is that
>>>> enough to address performance issues?
>>>>
>>>>
>>> They are a good idea. It will protect the server.
>>>
>>> It is possible to write SPARQL queries which are fundamentally expensive.
>>>
>>> The TDB will need to get updated daily, using jena API, since I suppose
>>>
>>>> deleting and inserting everything back would take a long time. I read
>>>> in (
>>>>
>>>>
>>>> https://lists.w3.org/Archives/Public/public-sparql-dev/2008JulSep/0029.html
>>>> ) that it takes 5370secs for 100M triples to be loaded in TDB, which is
>>>> good.
>>>> But here <https://www.w3.org/wiki/LargeTripleStores> it is said that it
>>>> took 36 hours to load 1.7B triples in TDB
>>>>
>>>>
>>> ... in 2008 ... with a spinning disk.
>>>
>>> 12k triples/s would be a bit slow nowadays.
>>>
>>> At large scale tdbloader2 can be faster that tdbloader. You have to try
>>> with your data on your hardware - it isn't a simple yes/no question
>>> unfortunately.
>>>
>>> tdbloader2 only loads from empty.
>>>
>>> tdbloader does not do anything special when loading a partial database.
>>>
>>> , which drives me towards the
>>>
>>>> daily updates rather than daily delete and insert.
>>>> How long would a 500 triple DB take to be loaded in an empty database?
>>>>
>>>>
>>> 500M?
>>>
>>> Just run
>>>
>>> tdbloader --loc DB <data> and see what rate you get - I'd be interested
>>> in
>>> seeing the log. Every data set, every hardware set can be different.
>>> That's why it is hard to make any accurate predications - just try it.
>>>
>>> tdbloader --loc=DB <the_data>
>>>
>>> The pattern of the data makes a difference - LUBM loads very fast as it
>>> has a high triples to nodes ratio so less bytes are being loaded. All
>>> triple stores report better figures on that data - a factor of x2 faster
>>> is
>>> common - but it's not typical data.
>>>
>>> Andy
>>>
>>>
>>> Multiple requests, whether same service or different service, are
>>>
>>>> competing for the same machine resources. Fuseki runs requests
>>>>> independently and in parallel. There are per-database transactions
>>>>> supporting multiple, truly parallel readers.
>>>>>
>>>>>
>>>>> Andy
>>>>
>>>>>
>>>>>
>>>>
>>>> Many thanks,
>>>>
>>>> Alexandra
>>>>
>>>>
>>>>
>>>>> On 18/03/16 09:35, Alexandra Kokkinaki wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>>
>>>>>> after researching on TDB performance with Big Data, I would still like
>>>>>> to
>>>>>> know:
>>>>>> We have one fuseki server exposing 2 sparql endpoints (2million
>>>>>> triples
>>>>>> each) as data services. We are planning to add one more, but with Big
>>>>>> data
>>>>>>
>>>>>> 500Million triples
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> - For big data is it better to use many installations of fuseki
>>>>>> server
>>>>>> or
>>>>>> - many data services under the same Fuseki server?
>>>>>>
>>>>>>
>>>>>> Could fuseki cope with two or more services with more than 500
>>>>>> Million
>>>>>> triples each?
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> How does Fuseki cope when it has to serve concurrent queries to the
>>>>>
>>>>>> different data services?
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> Many thanks,
>>>>>
>>>>>>
>>>>>> Alexandra
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Fuseki server: many data services or many fuseki installations?
Posted by Andy Seaborne <an...@apache.org>.
On 12/04/16 14:39, Alexandra Kokkinaki wrote:
> Hi Andy, thanks for your answers. So would it be feasible to add/delete
> triples in an existing database?
Updates are supported.
However, changing large amounts (deleting or adding or a mix) - 10's of
millions of triples - in a single transaction (single HTTP request) will
consume too much memory. Such a large change would need to be broken up
into multiple requests.
Andy
>
> Thanks,
>
> Alexandra
>
> On Tue, Mar 29, 2016 at 9:58 AM, Andy Seaborne <an...@apache.org> wrote:
>
>> On 21/03/16 13:35, Alexandra Kokkinaki wrote:
>>
>>> Hi Andy, thanks for your answers.
>>>
>>>
>>> On Fri, Mar 18, 2016 at 11:43 AM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>> Hi,
>>>>
>>>> it will depend on usage patterns. 2* 500 million isn't unreasonable but
>>>> validating with your expected usage is essential.
>>>> The critical factors are the usage patterns and the hardware available.
>>>> Number of queries, query complexity, number of updates, all matter. RAM
>>>> is
>>>> good (which is true for any database) as are SSDs if you do lots of
>>>> update
>>>> or need fast startup from cold.
>>>>
>>>> What kind of usage patterns are considered not valid for big triple
>>> stores.
>>> We are planning to use our fuseki server to allow, machine to machine
>>> communication and also allow independent users to express mostly spatial
>>> queries We plan to do indexing and have a query time out too. Is that
>>> enough to address performance issues?
>>>
>>
>> They are a good idea. It will protect the server.
>>
>> It is possible to write SPARQL queries which are fundamentally expensive.
>>
>> The TDB will need to get updated daily, using jena API, since I suppose
>>> deleting and inserting everything back would take a long time. I read in (
>>>
>>> https://lists.w3.org/Archives/Public/public-sparql-dev/2008JulSep/0029.html
>>> ) that it takes 5370secs for 100M triples to be loaded in TDB, which is
>>> good.
>>> But here <https://www.w3.org/wiki/LargeTripleStores> it is said that it
>>> took 36 hours to load 1.7B triples in TDB
>>>
>>
>> ... in 2008 ... with a spinning disk.
>>
>> 12k triples/s would be a bit slow nowadays.
>>
>> At large scale tdbloader2 can be faster that tdbloader. You have to try
>> with your data on your hardware - it isn't a simple yes/no question
>> unfortunately.
>>
>> tdbloader2 only loads from empty.
>>
>> tdbloader does not do anything special when loading a partial database.
>>
>> , which drives me towards the
>>> daily updates rather than daily delete and insert.
>>> How long would a 500 triple DB take to be loaded in an empty database?
>>>
>>
>> 500M?
>>
>> Just run
>>
>> tdbloader --loc DB <data> and see what rate you get - I'd be interested in
>> seeing the log. Every data set, every hardware set can be different.
>> That's why it is hard to make any accurate predications - just try it.
>>
>> tdbloader --loc=DB <the_data>
>>
>> The pattern of the data makes a difference - LUBM loads very fast as it
>> has a high triples to nodes ratio so less bytes are being loaded. All
>> triple stores report better figures on that data - a factor of x2 faster is
>> common - but it's not typical data.
>>
>> Andy
>>
>>
>> Multiple requests, whether same service or different service, are
>>>> competing for the same machine resources. Fuseki runs requests
>>>> independently and in parallel. There are per-database transactions
>>>> supporting multiple, truly parallel readers.
>>>>
>>>>
>>> Andy
>>>>
>>>
>>>
>>> Many thanks,
>>>
>>> Alexandra
>>>
>>>
>>>>
>>>> On 18/03/16 09:35, Alexandra Kokkinaki wrote:
>>>>
>>>> Hi,
>>>>>
>>>>> after researching on TDB performance with Big Data, I would still like
>>>>> to
>>>>> know:
>>>>> We have one fuseki server exposing 2 sparql endpoints (2million triples
>>>>> each) as data services. We are planning to add one more, but with Big
>>>>> data
>>>>>
>>>>> 500Million triples
>>>>>>
>>>>>>
>>>>> - For big data is it better to use many installations of fuseki
>>>>> server
>>>>> or
>>>>> - many data services under the same Fuseki server?
>>>>>
>>>>>
>>>>> Could fuseki cope with two or more services with more than 500 Million
>>>>> triples each?
>>>>>
>>>>>
>>>>
>>>>
>>>> How does Fuseki cope when it has to serve concurrent queries to the
>>>>> different data services?
>>>>>
>>>>>
>>>>
>>>>
>>>> Many thanks,
>>>>>
>>>>> Alexandra
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>