You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Евгений, , hr...@gmail.com on 2015/08/06 17:05:13 UTC

How read and write triples to `Graph` at the same time?

Hello again!
I need to do next staff: read all "name" fields from MY_TDB, do some
calculations with this "name" and save results to MY_TDB.
But I need to read and save on the fly, eg read on "name" -> do calculaions
-> save results -> read next "name" -> calculations -> save result ...

Here is the example code:
public class TestTdbTransactionsReadWrite {
    public static void main(String[] args) {
        Dataset dataset = TDBFactory.createDataset();
        dataset.begin(ReadWrite.WRITE);
        DatasetGraph datasetGraph = dataset.asDatasetGraph();
        Graph graph = datasetGraph.getDefaultGraph();

        // Fill graph.
        graph.add(
            new Triple(
                NodeFactory.createURI("http://example/unit13"),
                NodeFactory.createURI("http://example/name"),
                NodeFactory.createLiteral("Unit 13", "en")
            )
        );

        graph.add(
            new Triple(
                NodeFactory.createURI("http://example/unit13"),
                NodeFactory.createURI("http://example/type"),
                NodeFactory.createURI("http://example/robot")
            )
        );

        graph.add(
            new Triple(
                NodeFactory.createURI("http://example/unit13"),
                NodeFactory.createURI("http://example/creationYear"),
                NodeFactory.createURI("http://example/2015")
            )
        );

        // Test.
        String qs1 = "SELECT * { ?s <http://example/name> ?o }";
        try (QueryExecution qExec = QueryExecutionFactory.create(qs1,
dataset)) {
            ResultSet rs = qExec.execSelect();

            while (rs.hasNext()) {
                QuerySolution qs = rs.next();
                RDFNode nodeSubject = qs.get("s");
                RDFNode nodeObject = qs.get("o");
                if (nodeSubject.isURIResource() && nodeObject.isLiteral()) {
                    String str = nodeObject.asLiteral().getString();

                    graph.add(
                        new Triple(

NodeFactory.createURI(nodeSubject.asResource().getURI()),
                            NodeFactory.createURI("http://example/value"),

NodeFactory.createLiteral(String.valueOf(str.length()))
                        )
                    );
                }
            }

            dataset.commit();
        } finally {
            dataset.end();
        }

        RDFDataMgr.write(System.out, graph, RDFFormat.NTRIPLES);
    }
}

https://github.com/Hronom/test-jena

Re: How read and write triples to `Graph` at the same time?

Posted by Andy Seaborne <an...@apache.org>.

On 06/08/15 16:05, Евгений =) wrote:
> ResultSet rs = qExec.execSelect();

You can't iterator and update at the same time.

So read the results once and then loop on the copy:

     ResultSet rs = ResultSetFactory.copyResults(qExec.execSelect());

	Andy

Re: How read and write triples to `Graph` at the same time?

Posted by Евгений, , hr...@gmail.com.

Thanks for your answer!

2015-08-07 17:14 GMT+03:00 Andy Seaborne <an...@apache.org>:
>
> Your example is small. So I assume you are not actually describing your real
> use case.  Your example is already keeping everything in memory. TDB
> in-memory, which is for testing mainly, uses a RAM disk for exact semantics
> with the disk version, and has multiple copies of data.

Yes, example that I show use small data, but in real case I use data
from DBpedia that is pretty big.
>
> There are several choices:
>
> 1/ Increase the heap size.
>
> 2/ (if we're really talking about a large disk database).  Do a scan through
> the results of graph.find and keep the subjects in a separate datastructure.
> Do a second loop over the retained subjects to do the updates.  This works
> well when there are a significant smaller number of places to update than
> the whole find() locates.
>
> 3/ If the updates are going to be huge, and the database is really a
> larger-than-RAM persistent one, (total number of updates is compararable or
> larger than RAM) then you are asking for an operation that is fundamentally
> expensive.  Write the updates to a file; in the loop then add the file into
> the database.
>
> 4/ If it's a one off maintance task, dump to N-triples and use perl/ruby/...
> to fix the backup and reload.
>
> See also large transaction support.
> http://mail-archives.apache.org/mod_mbox/jena-users/201507.mbox/%3CCAPTxtVOZRzyPxN1njh3WVggsJEUNxeXDJhNvx%2BG4WcRtExxPxg%40mail.gmail.com%3E
>
>         Andy
>

So if I understood correctly, in my case I need to organize temporary
buffer. And then read all "abstracts" from DBpedia to this buffer.
After that, do calculations with this data from temporary buffer and
then save to TDB.

Ok, I think about something like that, but I'm not work with Jena, so
I think maybe Jena already have some mechanism for this case.

Re: How read and write triples to `Graph` at the same time?

Posted by Andy Seaborne <an...@apache.org>.

On 06/08/15 21:15, Евгений =) wrote:
> And I accept this error and I already use method that Andy Seaborne writes
> herehttps://issues.apache.org/jira/browse/JENA-1006
> Iterator<Triple> iter = graph.find(...).toList().iterator();
>
> But looks like this method "hungry for memory" and I'm look for efficient
> way(low memory consumption) method to do this stuff.

Your example is small. So I assume you are not actually describing your 
real use case.  Your example is already keeping everything in memory. 
TDB in-memory, which is for testing mainly, uses a RAM disk for exact 
semantics with the disk version, and has multiple copies of data.
>
> Maybe similar questions can explain more then me(sorry for links to other
> sources):
> http://stackoverflow.com/questions/10165096/jena-tdb-nested-transactions
> http://answers.semanticweb.com/questions/15852/jena-tdb-nested-transactions
>
> And sorry for my bad english=(

There are several choices:

1/ Increase the heap size.

2/ (if we're really talking about a large disk database).  Do a scan 
through the results of graph.find and keep the subjects in a separate 
datastructure. Do a second loop over the retained subjects to do the 
updates.  This works well when there are a significant smaller number of 
places to update than the whole find() locates.

3/ If the updates are going to be huge, and the database is really a 
larger-than-RAM persistent one, (total number of updates is compararable 
or larger than RAM) then you are asking for an operation that is 
fundamentally expensive.  Write the updates to a file; in the loop then 
add the file into the database.

4/ If it's a one off maintance task, dump to N-triples and use 
perl/ruby/... to fix the backup and reload.

See also large transaction support.
http://mail-archives.apache.org/mod_mbox/jena-users/201507.mbox/%3CCAPTxtVOZRzyPxN1njh3WVggsJEUNxeXDJhNvx%2BG4WcRtExxPxg%40mail.gmail.com%3E

	Andy

Re: How read and write triples to `Graph` at the same time?

Posted by Евгений, , hr...@gmail.com.

Sorry for not full description!

Thanks Andy and Rob for quick answers!

>From code that I show below, I expect next:
 - Read all "name" fields from MY_TDB, do some calculations with this
"name" and save results to MY_TDB.
   The full work sequence is next: read "name" field -> do calculaions ->
save results -> read next "name" -> do calculaions -> save results -> ...

But this code after running gives me next exception:
Exception in thread "main" java.util.ConcurrentModificationException:
Iterator: started at 8, now 9
at
com.hp.hpl.jena.tdb.sys.DatasetControlMRSW.policyError(DatasetControlMRSW.java:157)
at
com.hp.hpl.jena.tdb.sys.DatasetControlMRSW.access$000(DatasetControlMRSW.java:32)
at
com.hp.hpl.jena.tdb.sys.DatasetControlMRSW$IteratorCheckNotConcurrent.checkCourrentModification(DatasetControlMRSW.java:110)
at
com.hp.hpl.jena.tdb.sys.DatasetControlMRSW$IteratorCheckNotConcurrent.hasNext(DatasetControlMRSW.java:118)
at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:318)
at org.apache.jena.atlas.iterator.Iter$3.hasNext(Iter.java:201)
at org.apache.jena.atlas.iterator.Iter.hasNext(Iter.java:942)
at
org.apache.jena.atlas.iterator.RepeatApplyIterator.hasNext(RepeatApplyIterator.java:59)
at
com.hp.hpl.jena.tdb.solver.SolverLib$IterAbortable.hasNext(SolverLib.java:191)
at org.apache.jena.atlas.iterator.Iter$4.hasNext(Iter.java:318)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:54)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at
com.hp.hpl.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:75)
at
com.hp.hpl.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:59)
at
com.github.hronom.test.jena.TestTdbTransactionsReadWrite.main(TestTdbTransactionsReadWrite.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

And I accept this error and I already use method that Andy Seaborne writes
here https://issues.apache.org/jira/browse/JENA-1006
Iterator<Triple> iter = graph.find(...).toList().iterator();

But looks like this method "hungry for memory" and I'm look for efficient
way(low memory consumption) method to do this stuff.

Maybe similar questions can explain more then me(sorry for links to other
sources):
http://stackoverflow.com/questions/10165096/jena-tdb-nested-transactions
http://answers.semanticweb.com/questions/15852/jena-tdb-nested-transactions

And sorry for my bad english=(

Re: How read and write triples to `Graph` at the same time?

Posted by Rob Vesse <rv...@dotnetrdf.org>.

This list is not a code writing service nor are we mind readers

>From the unclear description given your code appears that it should do
what you describe

If it doesn't then you need to explain what behaviour you see (error,
incorrect results etc) and what behaviour you expected to see

Rob

On 06/08/2015 16:05, "Евгений =)" <hr...@gmail.com> wrote:

>Hello again!
>I need to do next staff: read all "name" fields from MY_TDB, do some
>calculations with this "name" and save results to MY_TDB.
>But I need to read and save on the fly, eg read on "name" -> do
>calculaions
>-> save results -> read next "name" -> calculations -> save result ...
>
>Here is the example code:
>public class TestTdbTransactionsReadWrite {
>    public static void main(String[] args) {
>        Dataset dataset = TDBFactory.createDataset();
>        dataset.begin(ReadWrite.WRITE);
>        DatasetGraph datasetGraph = dataset.asDatasetGraph();
>        Graph graph = datasetGraph.getDefaultGraph();
>
>        // Fill graph.
>        graph.add(
>            new Triple(
>                NodeFactory.createURI("http://example/unit13"),
>                NodeFactory.createURI("http://example/name"),
>                NodeFactory.createLiteral("Unit 13", "en")
>            )
>        );
>
>        graph.add(
>            new Triple(
>                NodeFactory.createURI("http://example/unit13"),
>                NodeFactory.createURI("http://example/type"),
>                NodeFactory.createURI("http://example/robot")
>            )
>        );
>
>        graph.add(
>            new Triple(
>                NodeFactory.createURI("http://example/unit13"),
>                NodeFactory.createURI("http://example/creationYear"),
>                NodeFactory.createURI("http://example/2015")
>            )
>        );
>
>        // Test.
>        String qs1 = "SELECT * { ?s <http://example/name> ?o }";
>        try (QueryExecution qExec = QueryExecutionFactory.create(qs1,
>dataset)) {
>            ResultSet rs = qExec.execSelect();
>
>            while (rs.hasNext()) {
>                QuerySolution qs = rs.next();
>                RDFNode nodeSubject = qs.get("s");
>                RDFNode nodeObject = qs.get("o");
>                if (nodeSubject.isURIResource() &&
>nodeObject.isLiteral()) {
>                    String str = nodeObject.asLiteral().getString();
>
>                    graph.add(
>                        new Triple(
>
>NodeFactory.createURI(nodeSubject.asResource().getURI()),
>                            NodeFactory.createURI("http://example/value"),
>
>NodeFactory.createLiteral(String.valueOf(str.length()))
>                        )
>                    );
>                }
>            }
>
>            dataset.commit();
>        } finally {
>            dataset.end();
>        }
>
>        RDFDataMgr.write(System.out, graph, RDFFormat.NTRIPLES);
>    }
>}
>
>https://github.com/Hronom/test-jena