You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Andy Seaborne <an...@apache.org> on 2021/08/11 14:38:53 UTC

Re: difference between 3.13 and 3.17

Hi there,

There isn't enough information to see what's happening.

The first thing to do is dump, or Fuseki backup, the database from each 
setup and see if they are the same.

Then if they are, send a minimal reproducible example [1].
Something someone else can run.

     Andy

[1]
https://stackoverflow.com/help/minimal-reproducible-example


On 11/08/2021 13:35, jaanam@kolumbus.fi wrote:
> Hello,
> 
> My jena-fuseki database consists of several named graphs. In order to 
> provide users graphql-like interface to jena-fuseki I have to combine my 
> NGs into one big default graph for HyperGraphql 
> (https://www.hypergraphql.org/) that provides the interface.
> 
> At some point the users started to get less data than before and when I 
> investigated the issue I noticed that this was after upgrading 
> jena-fuseki from 3.13 to 3.17 !
> 
> To combine the NGs I'm using the following command:
> 
>    curl -i -H "Content-Type: application/sparql-update"  -X POST 
> http://<myTargetHost>:8061/<myDs>/update --data-binary 
> "@./combine-NGs.sparql"
> 
> (see NGs_to_be_combined.txt for hint of my source database and 
> combine-NGs.sparql as the executed script).
> 
> So I run the combine-NGs.sparql-script in the target jena host and the 
> target dataset.
> 
> If I use in-memory dataset as my target dataset I get only half of the 
> triplets cmapared to the amount of triplets with persistent dataset. 
> This happens only with jena-fuseki 3.17.
> 
> In 3.13 I haven't seen this issue!
> 
> Br, Jaana
> 
> 
> 

Re: difference between 3.13 and 3.17

Posted by ja...@kolumbus.fi.
Andy Seaborne kirjoitti 11.8.2021 17:38:
> Hi there,
> 
> There isn't enough information to see what's happening.


Hello,

I don't know it this message was already received by the recipients as I 
tried to send 2 MB file as an attachment. Sorry for the inconvenience in 
that case !

Anyway, now the file has been stored in github, so pls see the steps to 
repeat the issue below.



1) start source apache-jena-fuseki-server on port 3030

2) start target apache-jena-fuseki-3.17.0-server on port 3031

3) upload the file ds.trig from https://github.com/jamietti/jena into 
source apache-jena-fuseki-server on port 3030 using command

       curl -XPOST --header 'Content-Type: application/trig'     
--data-binary @ds.trig http://<source 
apache-jena-fuseki-server>:3030/<source dataset>

4) create one in-memory dataset and one persistent dataset on target 
apache-jena-fuseki-3.17.0-server

5) update the source apache-jena-fuseki-server and source dataset in 
combine_NGs.sparql-file

6) run combine_NGs.sparql-script in in-memory dataset and persistent 
dataset of the target apache-jena-fuseki-3.17.0-server

7) run query

     SELECT ?subject ?predicate ?object
       WHERE {
        ?subject ?predicate ?object
       }

in in-memory dataset and persistent dataset of the target 
apache-jena-fuseki-3.17.0-server and compare the results.

See attachements in_memory.png and persistent.png for my results after 
the above procedure.

Jaana


> 
> The first thing to do is dump, or Fuseki backup, the database from
> each setup and see if they are the same.
> 
> Then if they are, send a minimal reproducible example [1].
> Something someone else can run.
> 
>     Andy
> 
> [1]
> https://stackoverflow.com/help/minimal-reproducible-example
> 
> 
> On 11/08/2021 13:35, jaanam@kolumbus.fi wrote:
>> Hello,
>> 
>> My jena-fuseki database consists of several named graphs. In order to 
>> provide users graphql-like interface to jena-fuseki I have to combine 
>> my NGs into one big default graph for HyperGraphql 
>> (https://www.hypergraphql.org/) that provides the interface.
>> 
>> At some point the users started to get less data than before and when 
>> I investigated the issue I noticed that this was after upgrading 
>> jena-fuseki from 3.13 to 3.17 !
>> 
>> To combine the NGs I'm using the following command:
>> 
>>    curl -i -H "Content-Type: application/sparql-update"  -X POST 
>> http://<myTargetHost>:8061/<myDs>/update --data-binary 
>> "@./combine-NGs.sparql"
>> 
>> (see NGs_to_be_combined.txt for hint of my source database and 
>> combine-NGs.sparql as the executed script).
>> 
>> So I run the combine-NGs.sparql-script in the target jena host and the 
>> target dataset.
>> 
>> If I use in-memory dataset as my target dataset I get only half of the 
>> triplets cmapared to the amount of triplets with persistent dataset. 
>> This happens only with jena-fuseki 3.17.
>> 
>> In 3.13 I haven't seen this issue!
>> 
>> Br, Jaana
>> 
>> 
>> 

Re: difference between 3.13 and 3.17

Posted by Andy Seaborne <an...@apache.org>.

On 19/08/2021 03:19, jaanam@kolumbus.fi wrote:
>> I'll look some more _sometime_ but to be fair to everyone, it has to
>> fit around other reports.
> As expected in previous e-mails it seems that the problem is in the 
> script: just running this update
> 
>     # Alter: Variable as subject
>     delete{
>       graph ?g {
>          ?s_before ?p ?o.
>       }
>     }
>    insert{
>      graph ?g {
>          ?s_after ?p ?o.
>      }
>    }
>    where{

Remove this - (it does not do anything to the overall update).

>      {
>        graph ?g { }.
>      }
>      graph ?g {
>        ?s_before a <http://www.example.org/rdf/ontologies/gsimsf/Variable>.
>        ?s_before ?p ?o.
> bind(iri(concat("http://www.example.org/rdf/data/gsimsf/Variable/", 
> strafter(str(?g), "http://www.example.org/tilasto/"), "/", 
> strafter(str(?s_before), 
> "http://www.example.org/rdf/data/gsimsf/Variable/"))) as ?s_after)
>      }
>    };

Reorg this:

       graph ?g {
         ?s_before a ...
         ?s_before ?p ?o.
       }
       bind(.... as ?s_after)

when JENA-2150 gets fixed, the datasets will behave the same - and 
that's as the in-memory one (zero results).

https://issues.apache.org/jira/browse/JENA-2150

> in the source server leads to different results in in_memory and 
> persistent dataset. I just have to understand how to change this part of 
> the script according to the hits in the previous discussions!
> 
> Thanks a lot for your patiency with this issue !
> 
> Jaana
> 
> Andy Seaborne kirjoitti 17.8.2021 22:12:
>> On 17/08/2021 06:54, jaanam@kolumbus.fi wrote:
>>> Hello, you were right, there were still unnecessary graphs in my 
>>> source file. They have been removed in source2.zip.
>>>
>>> The difference between datasets 'target_in_memory'(=NG combination 
>>> was done in in_memory dataset) and 'target_persistent' (=NG 
>>> combination was done in persistent dataset) is that 
>>> 'target_in_memory' dataset doesn't have predicate 
>>> <http://www.example.org/rdf/data/pxt/isPresentationOfVariable> at all.
>>>
>>> Unfortunately I my sparql-knowledge is not good (as you musta have 
>>> noticed), but I'm still reponsible for this NG combination stuff and 
>>> the combineNGs.sparql script which was coded by a guy who has left 
>>> the office.
>>
>> This is going to be difficult.
>>
>> Aside from the difference between datasets, the WHERE clauses that use
>> ?g inside "GRAPH ?g" are wrong - older versions of Jena execute
>> incorrectly and it is now fixed. The inner ?g is undefined in the BIND
>> and that leads some no results.
>>
>> So - regardless of anything else - that's going to need fixing in your
>> script and that's going to need verification of the expected answers.
>>
>> ---
>>
>> The SPARQL script is a series of SPARQL Update statements separated by
>> a semicolons - bisect to the find the first update statement that
>> shows a difference.  Chop the last half of them off, see if the
>> modified scripts produces differences. If yes, repeat. If no, chop the
>> last quarter off.
>>
>> Also replacing the first 20 lines with the rewrite I posted makes it
>> easier - no need for servers, execute with the command line "update"
>> or "tdbupdate".
>>
>> Similarly bisecting the data to find a smaller data sample.
>>
>> Bisecting is easier when the data and intent is understood and I don't
>> know your application.
>>
>>> Attached also a word document changes.zip in which the changes (data 
>>> missing from 'target_in_memory' dataset) are marked with yellow.
>>
>> I'll look some more _sometime_ but to be fair to everyone, it has to
>> fit around other reports.
>>
>>     Andy
>>
>>>
>>> Br Jaana
>>>
>>>
>>> Andy Seaborne kirjoitti 16.8.2021 19:35:
>>>> It's a bit smaller but I notice it still has the  graph ?g { } and the
>>>> data still has a default graph.
>>>>
>>>> Does it really need all that data? I'd be surprised if it takes more
>>>> than one subject and it's triples to show  a difference.
>>>>
>>>> There are multiple update steps - which is the first that makes a 
>>>> difference?
>>>>
>>>> And which outcome is right and which is wrong?
>>>>
>>>> Not knowing the right answer makes it much harder and much more time
>>>> consuming to work out what going on.
>>>>
>>>> The data load is the same as a PUT of the data without the default
>>>> graph - so the first 20 lines can be done with a "curl -XPUT".
>>>>
>>>>     Andy
>>>>
>>>> On 16/08/2021 11:09, jaanam@kolumbus.fi wrote:
>>>>> Hello,
>>>>>
>>>>> sorry for providing you too big amount of data for reproducing the 
>>>>> problem.
>>>>>
>>>>> Here's much smaller set for data source and a bit smaller script 
>>>>> for combining the NGs.
>>>>>
>>>>> and the steps to reporoduce:
>>>>>
>>>>> 1) start source apache-jena-fuseki-server on port 3030
>>>>>
>>>>> 2) start target apache-jena-fuseki-3.17.0-server on port 3031
>>>>>
>>>>> 3) unzip and upload the attachment source.zip into source 
>>>>> apache-jena-fuseki-server on port 3030
>>>>>
>>>>> curl -XPOST --header 'Content-Type: application/trig' --data-binary 
>>>>> @source.trig http://localhost:3030/source
>>>>>
>>>>> 4) create one in-memory dataset (e.g. target_in_memory) and one 
>>>>> persistent (e.g. target_persistent) dataset on target 
>>>>> apache-jena-fuseki-3.17.0-server running on port 3031
>>>>>
>>>>> 5) update the source apache-jena-fuseki-server and source dataset 
>>>>> in combine_NGs2.sparql-script id needed
>>>>>
>>>>> 6) run combine_NGs2.sparql-script in in-memory dataset and 
>>>>> persistent dataset of the target apache-jena-fuseki-3.17.0-server 
>>>>> for instance in jena-fuseki GUI query tab
>>>>>
>>>>> or using curl:
>>>>>
>>>>> curl -i -H "Content-Type: application/sparql-update"  -X POST 
>>>>> http://localhost:3031/target_in_memory/update --data-binary 
>>>>> "@./combine_NGs2.sparql"
>>>>>
>>>>> curl -i -H "Content-Type: application/sparql-update"  -X POST 
>>>>> http://localhost:3031/target_persistent/update --data-binary 
>>>>> "@./combine_NGs2.sparql"
>>>>>
>>>>> Compare the resutls. Even form jena-fuseki GUI edit-page when 
>>>>> opening the result default graps in the editor tab window it can be 
>>>>> seen that the in memory data set has less data than the persistent 
>>>>> one.
>>>>>
>>>>> As I've told before this issue didn't occur with 
>>>>> apacahe-jane-fuseki-3.13.
>>>>>
>>>>> And about your questions:
>>>>>
>>>>>> Where is the 3.13.0 server?
>>>>>
>>>>> To notice that this doesn't happen with 3.13, just replace the 
>>>>> 3.17-server with 3.17 in my above steps. I my staps - I guess - the 
>>>>> source sever can be 3.13 or 3.17.
>>>>>
>>>>>
>>>>>> Does it need the data pulled from another server that than execute on
>>>>>> already loaded data?
>>>>>
>>>>> We didn't manage to combine the NGs within one server - we expected 
>>>>> that jena would try to use proxy or something like that...
>>>>>
>>>>> Br, Jaana
>>>>>
>>>>> Andy Seaborne kirjoitti 13.8.2021 23:02:
>>>>>> On 13/08/2021 12:03, jaanam@kolumbus.fi wrote:
>>>>>>> Andy Seaborne kirjoitti 11.8.2021 17:38:
>>>>>>>> Hi there,
>>>>>>>>
>>>>>>>> There isn't enough information to see what's happening.
>>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> see steps to repeat the issue below.
>>>>>>
>>>>>> I've got all the parts of the example - it's not minimal though.
>>>>>>
>>>>>> What is a short amount of data, shorter update script that shows 
>>>>>> the problem?
>>>>>>
>>>>>> Does it need the data pulled from another server that than execute on
>>>>>> already loaded data?
>>>>>>
>>>>>> There is a data file of 363,559 quads (which has warnings), a SPARQL
>>>>>> update script of 241 lines.
>>>>>>
>>>>>> To work out what is going on, someone has to reduce that large setup
>>>>>> to the part that causes the difference.
>>>>>>
>>>>>> The first thing that script does is delete all the local data and 
>>>>>> pull
>>>>>> some, not all, data from the source server. Is that step necessary?
>>>>>>
>>>>>> I don't believe it needs all the data and all the script to show a
>>>>>> difference nor that it needs to pull the data out of one server, and
>>>>>> put it in the local store in order to be different, why not just load
>>>>>> something directly?
>>>>>>
>>>>>>
>>>>>> The rest of the update does some kind of manipulation of the data - I
>>>>>> don't understand what it is trying to do - its purpose relates the to
>>>>>> data model.
>>>>>>
>>>>>> You are in a much better place to reduce that large script to a
>>>>>> minimal one that shows a difference because it's your application.
>>>>>>
>>>>>> Does it need all those steps together to show the difference or just
>>>>>> one of them?  (BTW each update step is done independency: there'll be
>>>>>> a point where the answers start diverging.)
>>>>>>
>>>>>> Looking at it though, the use of
>>>>>>
>>>>>> where{
>>>>>>   {
>>>>>>     graph ?g { }.
>>>>>>   }
>>>>>>   graph ?g {
>>>>>>     .. some pattern ..
>>>>>>     .. some BIND involving ?g ..
>>>>>>  }
>>>>>> }
>>>>>>
>>>>>> is pretty suspect.
>>>>>> Omit the first part and put the BIND after the second:
>>>>>>
>>>>>> Move the BIND to after the
>>>>>>   graph ?g {
>>>>>>     .. some pattern ..
>>>>>>  }
>>>>>>  .. some BIND involving ?g ..
>>>>>>
>>>>>>     Andy
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Where is the 3.13.0 server?
>>>>>>
>>>>>>> 1) start source apache-jena-fuseki-server on port 3030
>>>>>>>
>>>>>>> 2) start target apache-jena-fuseki-3.17.0-server on port 3031
>>>>>>>
>>>>>>> 3) unzip and upload the attachment ds.zip into source 
>>>>>>> apache-jena-fuseki-server on port 3030 using command
>>>>>>>
>>>>>>>       curl -XPOST --header 'Content-Type: application/trig' 
>>>>>>> --data-binary @ds.trig http://<source 
>>>>>>> apache-jena-fuseki-server>:3030/<source dataset>
>>>>>>>
>>>>>>> 4) create one in-memory dataset and one persistent dataset on 
>>>>>>> target apache-jena-fuseki-3.17.0-server
>>>>>>>
>>>>>>> 5) update the source apache-jena-fuseki-server and source dataset 
>>>>>>> in combine_NGs.sparql-file
>>>>>>>
>>>>>>> 6) run combine_NGs.sparql-script in in-memory dataset and 
>>>>>>> persistent dataset of the target apache-jena-fuseki-3.17.0-server
>>>>>>
>>>>>> Run how?
>>>>>>
>>>>>>>
>>>>>>> 7) run query
>>>>>>>
>>>>>>>     SELECT ?subject ?predicate ?object
>>>>>>>       WHERE {
>>>>>>>        ?subject ?predicate ?object
>>>>>>>       }
>>>>>>>
>>>>>>> in in-memory dataset and persistent dataset of the target 
>>>>>>> apache-jena-fuseki-3.17.0-server and compare the results.
>>>>>>>
>>>>>>> See attachements in_memory.png and persistent.png for my results 
>>>>>>> after the above procedure.
>>>>>>
>>>>>> That's screenshots of a count: just do
>>>>>>
>>>>>> SELECT (Count(*) AS ?C) { ?s ?p ?o }
>>>>>>
>>>>>>>
>>>>>>> Jaana
>>>>>>>
>>>>>>>
>>>>>>>> The first thing to do is dump, or Fuseki backup, the database from
>>>>>>>> each setup and see if they are the same.
>>>>>>>>
>>>>>>>> Then if they are, send a minimal reproducible example [1].
>>>>>>>> Something someone else can run.
>>>>>>>>
>>>>>>>>     Andy
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://stackoverflow.com/help/minimal-reproducible-example
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/08/2021 13:35, jaanam@kolumbus.fi wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> My jena-fuseki database consists of several named graphs. In 
>>>>>>>>> order to provide users graphql-like interface to jena-fuseki I 
>>>>>>>>> have to combine my NGs into one big default graph for 
>>>>>>>>> HyperGraphql (https://www.hypergraphql.org/) that provides the 
>>>>>>>>> interface.
>>>>>>>>>
>>>>>>>>> At some point the users started to get less data than before 
>>>>>>>>> and when I investigated the issue I noticed that this was after 
>>>>>>>>> upgrading jena-fuseki from 3.13 to 3.17 !
>>>>>>>>>
>>>>>>>>> To combine the NGs I'm using the following command:
>>>>>>>>>
>>>>>>>>>    curl -i -H "Content-Type: application/sparql-update"  -X 
>>>>>>>>> POST http://<myTargetHost>:8061/<myDs>/update --data-binary 
>>>>>>>>> "@./combine-NGs.sparql"
>>>>>>>>>
>>>>>>>>> (see NGs_to_be_combined.txt for hint of my source database and 
>>>>>>>>> combine-NGs.sparql as the executed script).
>>>>>>>>>
>>>>>>>>> So I run the combine-NGs.sparql-script in the target jena host 
>>>>>>>>> and the target dataset.
>>>>>>>>>
>>>>>>>>> If I use in-memory dataset as my target dataset I get only half 
>>>>>>>>> of the triplets cmapared to the amount of triplets with 
>>>>>>>>> persistent dataset. This happens only with jena-fuseki 3.17.
>>>>>>>>>
>>>>>>>>> In 3.13 I haven't seen this issue!
>>>>>>>>>
>>>>>>>>> Br, Jaana
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>

Re: difference between 3.13 and 3.17

Posted by ja...@kolumbus.fi.
> I'll look some more _sometime_ but to be fair to everyone, it has to
> fit around other reports.
As expected in previous e-mails it seems that the problem is in the 
script: just running this update

    # Alter: Variable as subject
    delete{
      graph ?g {
         ?s_before ?p ?o.
      }
    }
   insert{
     graph ?g {
         ?s_after ?p ?o.
     }
   }
   where{
     {
       graph ?g { }.
     }
     graph ?g {
       ?s_before a 
<http://www.example.org/rdf/ontologies/gsimsf/Variable>.
       ?s_before ?p ?o.
       
bind(iri(concat("http://www.example.org/rdf/data/gsimsf/Variable/", 
strafter(str(?g), "http://www.example.org/tilasto/"), "/", 
strafter(str(?s_before),   
"http://www.example.org/rdf/data/gsimsf/Variable/"))) as ?s_after)
     }
   };

in the source server leads to different results in in_memory and 
persistent dataset. I just have to understand how to change this part of 
the script according to the hits in the previous discussions!

Thanks a lot for your patiency with this issue !

Jaana

Andy Seaborne kirjoitti 17.8.2021 22:12:
> On 17/08/2021 06:54, jaanam@kolumbus.fi wrote:
>> Hello, you were right, there were still unnecessary graphs in my 
>> source file. They have been removed in source2.zip.
>> 
>> The difference between datasets 'target_in_memory'(=NG combination was 
>> done in in_memory dataset) and 'target_persistent' (=NG combination 
>> was done in persistent dataset) is that 'target_in_memory' dataset 
>> doesn't have predicate 
>> <http://www.example.org/rdf/data/pxt/isPresentationOfVariable> at all.
>> 
>> Unfortunately I my sparql-knowledge is not good (as you musta have 
>> noticed), but I'm still reponsible for this NG combination stuff and 
>> the combineNGs.sparql script which was coded by a guy who has left the 
>> office.
> 
> This is going to be difficult.
> 
> Aside from the difference between datasets, the WHERE clauses that use
> ?g inside "GRAPH ?g" are wrong - older versions of Jena execute
> incorrectly and it is now fixed. The inner ?g is undefined in the BIND
> and that leads some no results.
> 
> So - regardless of anything else - that's going to need fixing in your
> script and that's going to need verification of the expected answers.
> 
> ---
> 
> The SPARQL script is a series of SPARQL Update statements separated by
> a semicolons - bisect to the find the first update statement that
> shows a difference.  Chop the last half of them off, see if the
> modified scripts produces differences. If yes, repeat. If no, chop the
> last quarter off.
> 
> Also replacing the first 20 lines with the rewrite I posted makes it
> easier - no need for servers, execute with the command line "update"
> or "tdbupdate".
> 
> Similarly bisecting the data to find a smaller data sample.
> 
> Bisecting is easier when the data and intent is understood and I don't
> know your application.
> 
>> Attached also a word document changes.zip in which the changes (data 
>> missing from 'target_in_memory' dataset) are marked with yellow.
> 
> I'll look some more _sometime_ but to be fair to everyone, it has to
> fit around other reports.
> 
>     Andy
> 
>> 
>> Br Jaana
>> 
>> 
>> Andy Seaborne kirjoitti 16.8.2021 19:35:
>>> It's a bit smaller but I notice it still has the  graph ?g { } and 
>>> the
>>> data still has a default graph.
>>> 
>>> Does it really need all that data? I'd be surprised if it takes more
>>> than one subject and it's triples to show  a difference.
>>> 
>>> There are multiple update steps - which is the first that makes a 
>>> difference?
>>> 
>>> And which outcome is right and which is wrong?
>>> 
>>> Not knowing the right answer makes it much harder and much more time
>>> consuming to work out what going on.
>>> 
>>> The data load is the same as a PUT of the data without the default
>>> graph - so the first 20 lines can be done with a "curl -XPUT".
>>> 
>>>     Andy
>>> 
>>> On 16/08/2021 11:09, jaanam@kolumbus.fi wrote:
>>>> Hello,
>>>> 
>>>> sorry for providing you too big amount of data for reproducing the 
>>>> problem.
>>>> 
>>>> Here's much smaller set for data source and a bit smaller script for 
>>>> combining the NGs.
>>>> 
>>>> and the steps to reporoduce:
>>>> 
>>>> 1) start source apache-jena-fuseki-server on port 3030
>>>> 
>>>> 2) start target apache-jena-fuseki-3.17.0-server on port 3031
>>>> 
>>>> 3) unzip and upload the attachment source.zip into source 
>>>> apache-jena-fuseki-server on port 3030
>>>> 
>>>> curl -XPOST --header 'Content-Type: application/trig' --data-binary 
>>>> @source.trig http://localhost:3030/source
>>>> 
>>>> 4) create one in-memory dataset (e.g. target_in_memory) and one 
>>>> persistent (e.g. target_persistent) dataset on target 
>>>> apache-jena-fuseki-3.17.0-server running on port 3031
>>>> 
>>>> 5) update the source apache-jena-fuseki-server and source dataset in 
>>>> combine_NGs2.sparql-script id needed
>>>> 
>>>> 6) run combine_NGs2.sparql-script in in-memory dataset and 
>>>> persistent dataset of the target apache-jena-fuseki-3.17.0-server 
>>>> for instance in jena-fuseki GUI query tab
>>>> 
>>>> or using curl:
>>>> 
>>>> curl -i -H "Content-Type: application/sparql-update"  -X POST 
>>>> http://localhost:3031/target_in_memory/update --data-binary 
>>>> "@./combine_NGs2.sparql"
>>>> 
>>>> curl -i -H "Content-Type: application/sparql-update"  -X POST 
>>>> http://localhost:3031/target_persistent/update --data-binary 
>>>> "@./combine_NGs2.sparql"
>>>> 
>>>> Compare the resutls. Even form jena-fuseki GUI edit-page when 
>>>> opening the result default graps in the editor tab window it can be 
>>>> seen that the in memory data set has less data than the persistent 
>>>> one.
>>>> 
>>>> As I've told before this issue didn't occur with 
>>>> apacahe-jane-fuseki-3.13.
>>>> 
>>>> And about your questions:
>>>> 
>>>>> Where is the 3.13.0 server?
>>>> 
>>>> To notice that this doesn't happen with 3.13, just replace the 
>>>> 3.17-server with 3.17 in my above steps. I my staps - I guess - the 
>>>> source sever can be 3.13 or 3.17.
>>>> 
>>>> 
>>>>> Does it need the data pulled from another server that than execute 
>>>>> on
>>>>> already loaded data?
>>>> 
>>>> We didn't manage to combine the NGs within one server - we expected 
>>>> that jena would try to use proxy or something like that...
>>>> 
>>>> Br, Jaana
>>>> 
>>>> Andy Seaborne kirjoitti 13.8.2021 23:02:
>>>>> On 13/08/2021 12:03, jaanam@kolumbus.fi wrote:
>>>>>> Andy Seaborne kirjoitti 11.8.2021 17:38:
>>>>>>> Hi there,
>>>>>>> 
>>>>>>> There isn't enough information to see what's happening.
>>>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> see steps to repeat the issue below.
>>>>> 
>>>>> I've got all the parts of the example - it's not minimal though.
>>>>> 
>>>>> What is a short amount of data, shorter update script that shows 
>>>>> the problem?
>>>>> 
>>>>> Does it need the data pulled from another server that than execute 
>>>>> on
>>>>> already loaded data?
>>>>> 
>>>>> There is a data file of 363,559 quads (which has warnings), a 
>>>>> SPARQL
>>>>> update script of 241 lines.
>>>>> 
>>>>> To work out what is going on, someone has to reduce that large 
>>>>> setup
>>>>> to the part that causes the difference.
>>>>> 
>>>>> The first thing that script does is delete all the local data and 
>>>>> pull
>>>>> some, not all, data from the source server. Is that step necessary?
>>>>> 
>>>>> I don't believe it needs all the data and all the script to show a
>>>>> difference nor that it needs to pull the data out of one server, 
>>>>> and
>>>>> put it in the local store in order to be different, why not just 
>>>>> load
>>>>> something directly?
>>>>> 
>>>>> 
>>>>> The rest of the update does some kind of manipulation of the data - 
>>>>> I
>>>>> don't understand what it is trying to do - its purpose relates the 
>>>>> to
>>>>> data model.
>>>>> 
>>>>> You are in a much better place to reduce that large script to a
>>>>> minimal one that shows a difference because it's your application.
>>>>> 
>>>>> Does it need all those steps together to show the difference or 
>>>>> just
>>>>> one of them?  (BTW each update step is done independency: there'll 
>>>>> be
>>>>> a point where the answers start diverging.)
>>>>> 
>>>>> Looking at it though, the use of
>>>>> 
>>>>> where{
>>>>>   {
>>>>>     graph ?g { }.
>>>>>   }
>>>>>   graph ?g {
>>>>>     .. some pattern ..
>>>>>     .. some BIND involving ?g ..
>>>>>  }
>>>>> }
>>>>> 
>>>>> is pretty suspect.
>>>>> Omit the first part and put the BIND after the second:
>>>>> 
>>>>> Move the BIND to after the
>>>>>   graph ?g {
>>>>>     .. some pattern ..
>>>>>  }
>>>>>  .. some BIND involving ?g ..
>>>>> 
>>>>>     Andy
>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> Where is the 3.13.0 server?
>>>>> 
>>>>>> 1) start source apache-jena-fuseki-server on port 3030
>>>>>> 
>>>>>> 2) start target apache-jena-fuseki-3.17.0-server on port 3031
>>>>>> 
>>>>>> 3) unzip and upload the attachment ds.zip into source 
>>>>>> apache-jena-fuseki-server on port 3030 using command
>>>>>> 
>>>>>>       curl -XPOST --header 'Content-Type: application/trig' 
>>>>>> --data-binary @ds.trig http://<source 
>>>>>> apache-jena-fuseki-server>:3030/<source dataset>
>>>>>> 
>>>>>> 4) create one in-memory dataset and one persistent dataset on 
>>>>>> target apache-jena-fuseki-3.17.0-server
>>>>>> 
>>>>>> 5) update the source apache-jena-fuseki-server and source dataset 
>>>>>> in combine_NGs.sparql-file
>>>>>> 
>>>>>> 6) run combine_NGs.sparql-script in in-memory dataset and 
>>>>>> persistent dataset of the target apache-jena-fuseki-3.17.0-server
>>>>> 
>>>>> Run how?
>>>>> 
>>>>>> 
>>>>>> 7) run query
>>>>>> 
>>>>>>     SELECT ?subject ?predicate ?object
>>>>>>       WHERE {
>>>>>>        ?subject ?predicate ?object
>>>>>>       }
>>>>>> 
>>>>>> in in-memory dataset and persistent dataset of the target 
>>>>>> apache-jena-fuseki-3.17.0-server and compare the results.
>>>>>> 
>>>>>> See attachements in_memory.png and persistent.png for my results 
>>>>>> after the above procedure.
>>>>> 
>>>>> That's screenshots of a count: just do
>>>>> 
>>>>> SELECT (Count(*) AS ?C) { ?s ?p ?o }
>>>>> 
>>>>>> 
>>>>>> Jaana
>>>>>> 
>>>>>> 
>>>>>>> The first thing to do is dump, or Fuseki backup, the database 
>>>>>>> from
>>>>>>> each setup and see if they are the same.
>>>>>>> 
>>>>>>> Then if they are, send a minimal reproducible example [1].
>>>>>>> Something someone else can run.
>>>>>>> 
>>>>>>>     Andy
>>>>>>> 
>>>>>>> [1]
>>>>>>> https://stackoverflow.com/help/minimal-reproducible-example
>>>>>>> 
>>>>>>> 
>>>>>>> On 11/08/2021 13:35, jaanam@kolumbus.fi wrote:
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> My jena-fuseki database consists of several named graphs. In 
>>>>>>>> order to provide users graphql-like interface to jena-fuseki I 
>>>>>>>> have to combine my NGs into one big default graph for 
>>>>>>>> HyperGraphql (https://www.hypergraphql.org/) that provides the 
>>>>>>>> interface.
>>>>>>>> 
>>>>>>>> At some point the users started to get less data than before and 
>>>>>>>> when I investigated the issue I noticed that this was after 
>>>>>>>> upgrading jena-fuseki from 3.13 to 3.17 !
>>>>>>>> 
>>>>>>>> To combine the NGs I'm using the following command:
>>>>>>>> 
>>>>>>>>    curl -i -H "Content-Type: application/sparql-update"  -X POST 
>>>>>>>> http://<myTargetHost>:8061/<myDs>/update --data-binary 
>>>>>>>> "@./combine-NGs.sparql"
>>>>>>>> 
>>>>>>>> (see NGs_to_be_combined.txt for hint of my source database and 
>>>>>>>> combine-NGs.sparql as the executed script).
>>>>>>>> 
>>>>>>>> So I run the combine-NGs.sparql-script in the target jena host 
>>>>>>>> and the target dataset.
>>>>>>>> 
>>>>>>>> If I use in-memory dataset as my target dataset I get only half 
>>>>>>>> of the triplets cmapared to the amount of triplets with 
>>>>>>>> persistent dataset. This happens only with jena-fuseki 3.17.
>>>>>>>> 
>>>>>>>> In 3.13 I haven't seen this issue!
>>>>>>>> 
>>>>>>>> Br, Jaana
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 

Re: difference between 3.13 and 3.17

Posted by Andy Seaborne <an...@apache.org>.

On 17/08/2021 06:54, jaanam@kolumbus.fi wrote:
> Hello, you were right, there were still unnecessary graphs in my source 
> file. They have been removed in source2.zip.
> 
> The difference between datasets 'target_in_memory'(=NG combination was 
> done in in_memory dataset) and 'target_persistent' (=NG combination was 
> done in persistent dataset) is that 'target_in_memory' dataset doesn't 
> have predicate 
> <http://www.example.org/rdf/data/pxt/isPresentationOfVariable> at all.
> 
> Unfortunately I my sparql-knowledge is not good (as you musta have 
> noticed), but I'm still reponsible for this NG combination stuff and the 
> combineNGs.sparql script which was coded by a guy who has left the office.

This is going to be difficult.

Aside from the difference between datasets, the WHERE clauses that use 
?g inside "GRAPH ?g" are wrong - older versions of Jena execute 
incorrectly and it is now fixed. The inner ?g is undefined in the BIND 
and that leads some no results.

So - regardless of anything else - that's going to need fixing in your 
script and that's going to need verification of the expected answers.

---

The SPARQL script is a series of SPARQL Update statements separated by a 
semicolons - bisect to the find the first update statement that shows a 
difference.  Chop the last half of them off, see if the modified scripts 
produces differences. If yes, repeat. If no, chop the last quarter off.

Also replacing the first 20 lines with the rewrite I posted makes it 
easier - no need for servers, execute with the command line "update" or 
"tdbupdate".

Similarly bisecting the data to find a smaller data sample.

Bisecting is easier when the data and intent is understood and I don't 
know your application.

> Attached also a word document changes.zip in which the changes (data 
> missing from 'target_in_memory' dataset) are marked with yellow.

I'll look some more _sometime_ but to be fair to everyone, it has to fit 
around other reports.

     Andy

> 
> Br Jaana
> 
> 
> Andy Seaborne kirjoitti 16.8.2021 19:35:
>> It's a bit smaller but I notice it still has the  graph ?g { } and the
>> data still has a default graph.
>>
>> Does it really need all that data? I'd be surprised if it takes more
>> than one subject and it's triples to show  a difference.
>>
>> There are multiple update steps - which is the first that makes a 
>> difference?
>>
>> And which outcome is right and which is wrong?
>>
>> Not knowing the right answer makes it much harder and much more time
>> consuming to work out what going on.
>>
>> The data load is the same as a PUT of the data without the default
>> graph - so the first 20 lines can be done with a "curl -XPUT".
>>
>>     Andy
>>
>> On 16/08/2021 11:09, jaanam@kolumbus.fi wrote:
>>> Hello,
>>>
>>> sorry for providing you too big amount of data for reproducing the 
>>> problem.
>>>
>>> Here's much smaller set for data source and a bit smaller script for 
>>> combining the NGs.
>>>
>>> and the steps to reporoduce:
>>>
>>> 1) start source apache-jena-fuseki-server on port 3030
>>>
>>> 2) start target apache-jena-fuseki-3.17.0-server on port 3031
>>>
>>> 3) unzip and upload the attachment source.zip into source 
>>> apache-jena-fuseki-server on port 3030
>>>
>>> curl -XPOST --header 'Content-Type: application/trig' --data-binary 
>>> @source.trig http://localhost:3030/source
>>>
>>> 4) create one in-memory dataset (e.g. target_in_memory) and one 
>>> persistent (e.g. target_persistent) dataset on target 
>>> apache-jena-fuseki-3.17.0-server running on port 3031
>>>
>>> 5) update the source apache-jena-fuseki-server and source dataset in 
>>> combine_NGs2.sparql-script id needed
>>>
>>> 6) run combine_NGs2.sparql-script in in-memory dataset and persistent 
>>> dataset of the target apache-jena-fuseki-3.17.0-server for instance 
>>> in jena-fuseki GUI query tab
>>>
>>> or using curl:
>>>
>>> curl -i -H "Content-Type: application/sparql-update"  -X POST 
>>> http://localhost:3031/target_in_memory/update --data-binary 
>>> "@./combine_NGs2.sparql"
>>>
>>> curl -i -H "Content-Type: application/sparql-update"  -X POST 
>>> http://localhost:3031/target_persistent/update --data-binary 
>>> "@./combine_NGs2.sparql"
>>>
>>> Compare the resutls. Even form jena-fuseki GUI edit-page when opening 
>>> the result default graps in the editor tab window it can be seen that 
>>> the in memory data set has less data than the persistent one.
>>>
>>> As I've told before this issue didn't occur with 
>>> apacahe-jane-fuseki-3.13.
>>>
>>> And about your questions:
>>>
>>>> Where is the 3.13.0 server?
>>>
>>> To notice that this doesn't happen with 3.13, just replace the 
>>> 3.17-server with 3.17 in my above steps. I my staps - I guess - the 
>>> source sever can be 3.13 or 3.17.
>>>
>>>
>>>> Does it need the data pulled from another server that than execute on
>>>> already loaded data?
>>>
>>> We didn't manage to combine the NGs within one server - we expected 
>>> that jena would try to use proxy or something like that...
>>>
>>> Br, Jaana
>>>
>>> Andy Seaborne kirjoitti 13.8.2021 23:02:
>>>> On 13/08/2021 12:03, jaanam@kolumbus.fi wrote:
>>>>> Andy Seaborne kirjoitti 11.8.2021 17:38:
>>>>>> Hi there,
>>>>>>
>>>>>> There isn't enough information to see what's happening.
>>>>>>
>>>>> Hello,
>>>>>
>>>>> see steps to repeat the issue below.
>>>>
>>>> I've got all the parts of the example - it's not minimal though.
>>>>
>>>> What is a short amount of data, shorter update script that shows the 
>>>> problem?
>>>>
>>>> Does it need the data pulled from another server that than execute on
>>>> already loaded data?
>>>>
>>>> There is a data file of 363,559 quads (which has warnings), a SPARQL
>>>> update script of 241 lines.
>>>>
>>>> To work out what is going on, someone has to reduce that large setup
>>>> to the part that causes the difference.
>>>>
>>>> The first thing that script does is delete all the local data and pull
>>>> some, not all, data from the source server. Is that step necessary?
>>>>
>>>> I don't believe it needs all the data and all the script to show a
>>>> difference nor that it needs to pull the data out of one server, and
>>>> put it in the local store in order to be different, why not just load
>>>> something directly?
>>>>
>>>>
>>>> The rest of the update does some kind of manipulation of the data - I
>>>> don't understand what it is trying to do - its purpose relates the to
>>>> data model.
>>>>
>>>> You are in a much better place to reduce that large script to a
>>>> minimal one that shows a difference because it's your application.
>>>>
>>>> Does it need all those steps together to show the difference or just
>>>> one of them?  (BTW each update step is done independency: there'll be
>>>> a point where the answers start diverging.)
>>>>
>>>> Looking at it though, the use of
>>>>
>>>> where{
>>>>   {
>>>>     graph ?g { }.
>>>>   }
>>>>   graph ?g {
>>>>     .. some pattern ..
>>>>     .. some BIND involving ?g ..
>>>>  }
>>>> }
>>>>
>>>> is pretty suspect.
>>>> Omit the first part and put the BIND after the second:
>>>>
>>>> Move the BIND to after the
>>>>   graph ?g {
>>>>     .. some pattern ..
>>>>  }
>>>>  .. some BIND involving ?g ..
>>>>
>>>>     Andy
>>>>
>>>>>
>>>>>
>>>>
>>>> Where is the 3.13.0 server?
>>>>
>>>>> 1) start source apache-jena-fuseki-server on port 3030
>>>>>
>>>>> 2) start target apache-jena-fuseki-3.17.0-server on port 3031
>>>>>
>>>>> 3) unzip and upload the attachment ds.zip into source 
>>>>> apache-jena-fuseki-server on port 3030 using command
>>>>>
>>>>>       curl -XPOST --header 'Content-Type: application/trig' 
>>>>> --data-binary @ds.trig http://<source 
>>>>> apache-jena-fuseki-server>:3030/<source dataset>
>>>>>
>>>>> 4) create one in-memory dataset and one persistent dataset on 
>>>>> target apache-jena-fuseki-3.17.0-server
>>>>>
>>>>> 5) update the source apache-jena-fuseki-server and source dataset 
>>>>> in combine_NGs.sparql-file
>>>>>
>>>>> 6) run combine_NGs.sparql-script in in-memory dataset and 
>>>>> persistent dataset of the target apache-jena-fuseki-3.17.0-server
>>>>
>>>> Run how?
>>>>
>>>>>
>>>>> 7) run query
>>>>>
>>>>>     SELECT ?subject ?predicate ?object
>>>>>       WHERE {
>>>>>        ?subject ?predicate ?object
>>>>>       }
>>>>>
>>>>> in in-memory dataset and persistent dataset of the target 
>>>>> apache-jena-fuseki-3.17.0-server and compare the results.
>>>>>
>>>>> See attachements in_memory.png and persistent.png for my results 
>>>>> after the above procedure.
>>>>
>>>> That's screenshots of a count: just do
>>>>
>>>> SELECT (Count(*) AS ?C) { ?s ?p ?o }
>>>>
>>>>>
>>>>> Jaana
>>>>>
>>>>>
>>>>>> The first thing to do is dump, or Fuseki backup, the database from
>>>>>> each setup and see if they are the same.
>>>>>>
>>>>>> Then if they are, send a minimal reproducible example [1].
>>>>>> Something someone else can run.
>>>>>>
>>>>>>     Andy
>>>>>>
>>>>>> [1]
>>>>>> https://stackoverflow.com/help/minimal-reproducible-example
>>>>>>
>>>>>>
>>>>>> On 11/08/2021 13:35, jaanam@kolumbus.fi wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> My jena-fuseki database consists of several named graphs. In 
>>>>>>> order to provide users graphql-like interface to jena-fuseki I 
>>>>>>> have to combine my NGs into one big default graph for 
>>>>>>> HyperGraphql (https://www.hypergraphql.org/) that provides the 
>>>>>>> interface.
>>>>>>>
>>>>>>> At some point the users started to get less data than before and 
>>>>>>> when I investigated the issue I noticed that this was after 
>>>>>>> upgrading jena-fuseki from 3.13 to 3.17 !
>>>>>>>
>>>>>>> To combine the NGs I'm using the following command:
>>>>>>>
>>>>>>>    curl -i -H "Content-Type: application/sparql-update"  -X POST 
>>>>>>> http://<myTargetHost>:8061/<myDs>/update --data-binary 
>>>>>>> "@./combine-NGs.sparql"
>>>>>>>
>>>>>>> (see NGs_to_be_combined.txt for hint of my source database and 
>>>>>>> combine-NGs.sparql as the executed script).
>>>>>>>
>>>>>>> So I run the combine-NGs.sparql-script in the target jena host 
>>>>>>> and the target dataset.
>>>>>>>
>>>>>>> If I use in-memory dataset as my target dataset I get only half 
>>>>>>> of the triplets cmapared to the amount of triplets with 
>>>>>>> persistent dataset. This happens only with jena-fuseki 3.17.
>>>>>>>
>>>>>>> In 3.13 I haven't seen this issue!
>>>>>>>
>>>>>>> Br, Jaana
>>>>>>>
>>>>>>>
>>>>>>>

Re: difference between 3.13 and 3.17

Posted by ja...@kolumbus.fi.
Hello, you were right, there were still unnecessary graphs in my source 
file. They have been removed in source2.zip.

The difference between datasets 'target_in_memory'(=NG combination was 
done in in_memory dataset) and 'target_persistent' (=NG combination was 
done in persistent dataset) is that 'target_in_memory' dataset doesn't 
have predicate 
<http://www.example.org/rdf/data/pxt/isPresentationOfVariable> at all.

Unfortunately I my sparql-knowledge is not good (as you musta have 
noticed), but I'm still reponsible for this NG combination stuff and the 
combineNGs.sparql script which was coded by a guy who has left the 
office.

Attached also a word document changes.zip in which the changes (data 
missing from 'target_in_memory' dataset) are marked with yellow.

Br Jaana


Andy Seaborne kirjoitti 16.8.2021 19:35:
> It's a bit smaller but I notice it still has the  graph ?g { } and the
> data still has a default graph.
> 
> Does it really need all that data? I'd be surprised if it takes more
> than one subject and it's triples to show  a difference.
> 
> There are multiple update steps - which is the first that makes a 
> difference?
> 
> And which outcome is right and which is wrong?
> 
> Not knowing the right answer makes it much harder and much more time
> consuming to work out what going on.
> 
> The data load is the same as a PUT of the data without the default
> graph - so the first 20 lines can be done with a "curl -XPUT".
> 
>     Andy
> 
> On 16/08/2021 11:09, jaanam@kolumbus.fi wrote:
>> Hello,
>> 
>> sorry for providing you too big amount of data for reproducing the 
>> problem.
>> 
>> Here's much smaller set for data source and a bit smaller script for 
>> combining the NGs.
>> 
>> and the steps to reporoduce:
>> 
>> 1) start source apache-jena-fuseki-server on port 3030
>> 
>> 2) start target apache-jena-fuseki-3.17.0-server on port 3031
>> 
>> 3) unzip and upload the attachment source.zip into source 
>> apache-jena-fuseki-server on port 3030
>> 
>> curl -XPOST --header 'Content-Type: application/trig' --data-binary 
>> @source.trig http://localhost:3030/source
>> 
>> 4) create one in-memory dataset (e.g. target_in_memory) and one 
>> persistent (e.g. target_persistent) dataset on target 
>> apache-jena-fuseki-3.17.0-server running on port 3031
>> 
>> 5) update the source apache-jena-fuseki-server and source dataset in 
>> combine_NGs2.sparql-script id needed
>> 
>> 6) run combine_NGs2.sparql-script in in-memory dataset and persistent 
>> dataset of the target apache-jena-fuseki-3.17.0-server for instance in 
>> jena-fuseki GUI query tab
>> 
>> or using curl:
>> 
>> curl -i -H "Content-Type: application/sparql-update"  -X POST 
>> http://localhost:3031/target_in_memory/update --data-binary 
>> "@./combine_NGs2.sparql"
>> 
>> curl -i -H "Content-Type: application/sparql-update"  -X POST 
>> http://localhost:3031/target_persistent/update --data-binary 
>> "@./combine_NGs2.sparql"
>> 
>> Compare the resutls. Even form jena-fuseki GUI edit-page when opening 
>> the result default graps in the editor tab window it can be seen that 
>> the in memory data set has less data than the persistent one.
>> 
>> As I've told before this issue didn't occur with 
>> apacahe-jane-fuseki-3.13.
>> 
>> And about your questions:
>> 
>>> Where is the 3.13.0 server?
>> 
>> To notice that this doesn't happen with 3.13, just replace the 
>> 3.17-server with 3.17 in my above steps. I my staps - I guess - the 
>> source sever can be 3.13 or 3.17.
>> 
>> 
>>> Does it need the data pulled from another server that than execute on
>>> already loaded data?
>> 
>> We didn't manage to combine the NGs within one server - we expected 
>> that jena would try to use proxy or something like that...
>> 
>> Br, Jaana
>> 
>> Andy Seaborne kirjoitti 13.8.2021 23:02:
>>> On 13/08/2021 12:03, jaanam@kolumbus.fi wrote:
>>>> Andy Seaborne kirjoitti 11.8.2021 17:38:
>>>>> Hi there,
>>>>> 
>>>>> There isn't enough information to see what's happening.
>>>>> 
>>>> Hello,
>>>> 
>>>> see steps to repeat the issue below.
>>> 
>>> I've got all the parts of the example - it's not minimal though.
>>> 
>>> What is a short amount of data, shorter update script that shows the 
>>> problem?
>>> 
>>> Does it need the data pulled from another server that than execute on
>>> already loaded data?
>>> 
>>> There is a data file of 363,559 quads (which has warnings), a SPARQL
>>> update script of 241 lines.
>>> 
>>> To work out what is going on, someone has to reduce that large setup
>>> to the part that causes the difference.
>>> 
>>> The first thing that script does is delete all the local data and 
>>> pull
>>> some, not all, data from the source server. Is that step necessary?
>>> 
>>> I don't believe it needs all the data and all the script to show a
>>> difference nor that it needs to pull the data out of one server, and
>>> put it in the local store in order to be different, why not just load
>>> something directly?
>>> 
>>> 
>>> The rest of the update does some kind of manipulation of the data - I
>>> don't understand what it is trying to do - its purpose relates the to
>>> data model.
>>> 
>>> You are in a much better place to reduce that large script to a
>>> minimal one that shows a difference because it's your application.
>>> 
>>> Does it need all those steps together to show the difference or just
>>> one of them?  (BTW each update step is done independency: there'll be
>>> a point where the answers start diverging.)
>>> 
>>> Looking at it though, the use of
>>> 
>>> where{
>>>   {
>>>     graph ?g { }.
>>>   }
>>>   graph ?g {
>>>     .. some pattern ..
>>>     .. some BIND involving ?g ..
>>>  }
>>> }
>>> 
>>> is pretty suspect.
>>> Omit the first part and put the BIND after the second:
>>> 
>>> Move the BIND to after the
>>>   graph ?g {
>>>     .. some pattern ..
>>>  }
>>>  .. some BIND involving ?g ..
>>> 
>>>     Andy
>>> 
>>>> 
>>>> 
>>> 
>>> Where is the 3.13.0 server?
>>> 
>>>> 1) start source apache-jena-fuseki-server on port 3030
>>>> 
>>>> 2) start target apache-jena-fuseki-3.17.0-server on port 3031
>>>> 
>>>> 3) unzip and upload the attachment ds.zip into source 
>>>> apache-jena-fuseki-server on port 3030 using command
>>>> 
>>>>       curl -XPOST --header 'Content-Type: application/trig' 
>>>> --data-binary @ds.trig http://<source 
>>>> apache-jena-fuseki-server>:3030/<source dataset>
>>>> 
>>>> 4) create one in-memory dataset and one persistent dataset on target 
>>>> apache-jena-fuseki-3.17.0-server
>>>> 
>>>> 5) update the source apache-jena-fuseki-server and source dataset in 
>>>> combine_NGs.sparql-file
>>>> 
>>>> 6) run combine_NGs.sparql-script in in-memory dataset and persistent 
>>>> dataset of the target apache-jena-fuseki-3.17.0-server
>>> 
>>> Run how?
>>> 
>>>> 
>>>> 7) run query
>>>> 
>>>>     SELECT ?subject ?predicate ?object
>>>>       WHERE {
>>>>        ?subject ?predicate ?object
>>>>       }
>>>> 
>>>> in in-memory dataset and persistent dataset of the target 
>>>> apache-jena-fuseki-3.17.0-server and compare the results.
>>>> 
>>>> See attachements in_memory.png and persistent.png for my results 
>>>> after the above procedure.
>>> 
>>> That's screenshots of a count: just do
>>> 
>>> SELECT (Count(*) AS ?C) { ?s ?p ?o }
>>> 
>>>> 
>>>> Jaana
>>>> 
>>>> 
>>>>> The first thing to do is dump, or Fuseki backup, the database from
>>>>> each setup and see if they are the same.
>>>>> 
>>>>> Then if they are, send a minimal reproducible example [1].
>>>>> Something someone else can run.
>>>>> 
>>>>>     Andy
>>>>> 
>>>>> [1]
>>>>> https://stackoverflow.com/help/minimal-reproducible-example
>>>>> 
>>>>> 
>>>>> On 11/08/2021 13:35, jaanam@kolumbus.fi wrote:
>>>>>> Hello,
>>>>>> 
>>>>>> My jena-fuseki database consists of several named graphs. In order 
>>>>>> to provide users graphql-like interface to jena-fuseki I have to 
>>>>>> combine my NGs into one big default graph for HyperGraphql 
>>>>>> (https://www.hypergraphql.org/) that provides the interface.
>>>>>> 
>>>>>> At some point the users started to get less data than before and 
>>>>>> when I investigated the issue I noticed that this was after 
>>>>>> upgrading jena-fuseki from 3.13 to 3.17 !
>>>>>> 
>>>>>> To combine the NGs I'm using the following command:
>>>>>> 
>>>>>>    curl -i -H "Content-Type: application/sparql-update"  -X POST 
>>>>>> http://<myTargetHost>:8061/<myDs>/update --data-binary 
>>>>>> "@./combine-NGs.sparql"
>>>>>> 
>>>>>> (see NGs_to_be_combined.txt for hint of my source database and 
>>>>>> combine-NGs.sparql as the executed script).
>>>>>> 
>>>>>> So I run the combine-NGs.sparql-script in the target jena host and 
>>>>>> the target dataset.
>>>>>> 
>>>>>> If I use in-memory dataset as my target dataset I get only half of 
>>>>>> the triplets cmapared to the amount of triplets with persistent 
>>>>>> dataset. This happens only with jena-fuseki 3.17.
>>>>>> 
>>>>>> In 3.13 I haven't seen this issue!
>>>>>> 
>>>>>> Br, Jaana
>>>>>> 
>>>>>> 
>>>>>> 

Re: difference between 3.13 and 3.17

Posted by Andy Seaborne <an...@apache.org>.
It's a bit smaller but I notice it still has the  graph ?g { } and the 
data still has a default graph.

Does it really need all that data? I'd be surprised if it takes more 
than one subject and it's triples to show  a difference.

There are multiple update steps - which is the first that makes a 
difference?

And which outcome is right and which is wrong?

Not knowing the right answer makes it much harder and much more time 
consuming to work out what going on.

The data load is the same as a PUT of the data without the default graph 
- so the first 20 lines can be done with a "curl -XPUT".

     Andy

On 16/08/2021 11:09, jaanam@kolumbus.fi wrote:
> Hello,
> 
> sorry for providing you too big amount of data for reproducing the problem.
> 
> Here's much smaller set for data source and a bit smaller script for 
> combining the NGs.
> 
> and the steps to reporoduce:
> 
> 1) start source apache-jena-fuseki-server on port 3030
> 
> 2) start target apache-jena-fuseki-3.17.0-server on port 3031
> 
> 3) unzip and upload the attachment source.zip into source 
> apache-jena-fuseki-server on port 3030
> 
> curl -XPOST --header 'Content-Type: application/trig' --data-binary 
> @source.trig http://localhost:3030/source
> 
> 4) create one in-memory dataset (e.g. target_in_memory) and one 
> persistent (e.g. target_persistent) dataset on target 
> apache-jena-fuseki-3.17.0-server running on port 3031
> 
> 5) update the source apache-jena-fuseki-server and source dataset in 
> combine_NGs2.sparql-script id needed
> 
> 6) run combine_NGs2.sparql-script in in-memory dataset and persistent 
> dataset of the target apache-jena-fuseki-3.17.0-server for instance in 
> jena-fuseki GUI query tab
> 
> or using curl:
> 
> curl -i -H "Content-Type: application/sparql-update"  -X POST 
> http://localhost:3031/target_in_memory/update --data-binary 
> "@./combine_NGs2.sparql"
> 
> curl -i -H "Content-Type: application/sparql-update"  -X POST 
> http://localhost:3031/target_persistent/update --data-binary 
> "@./combine_NGs2.sparql"
> 
> Compare the resutls. Even form jena-fuseki GUI edit-page when opening 
> the result default graps in the editor tab window it can be seen that 
> the in memory data set has less data than the persistent one.
> 
> As I've told before this issue didn't occur with apacahe-jane-fuseki-3.13.
> 
> And about your questions:
> 
>> Where is the 3.13.0 server?
> 
> To notice that this doesn't happen with 3.13, just replace the 
> 3.17-server with 3.17 in my above steps. I my staps - I guess - the 
> source sever can be 3.13 or 3.17.
> 
> 
>> Does it need the data pulled from another server that than execute on
>> already loaded data?
> 
> We didn't manage to combine the NGs within one server - we expected that 
> jena would try to use proxy or something like that...
> 
> Br, Jaana
> 
> Andy Seaborne kirjoitti 13.8.2021 23:02:
>> On 13/08/2021 12:03, jaanam@kolumbus.fi wrote:
>>> Andy Seaborne kirjoitti 11.8.2021 17:38:
>>>> Hi there,
>>>>
>>>> There isn't enough information to see what's happening.
>>>>
>>> Hello,
>>>
>>> see steps to repeat the issue below.
>>
>> I've got all the parts of the example - it's not minimal though.
>>
>> What is a short amount of data, shorter update script that shows the 
>> problem?
>>
>> Does it need the data pulled from another server that than execute on
>> already loaded data?
>>
>> There is a data file of 363,559 quads (which has warnings), a SPARQL
>> update script of 241 lines.
>>
>> To work out what is going on, someone has to reduce that large setup
>> to the part that causes the difference.
>>
>> The first thing that script does is delete all the local data and pull
>> some, not all, data from the source server. Is that step necessary?
>>
>> I don't believe it needs all the data and all the script to show a
>> difference nor that it needs to pull the data out of one server, and
>> put it in the local store in order to be different, why not just load
>> something directly?
>>
>>
>> The rest of the update does some kind of manipulation of the data - I
>> don't understand what it is trying to do - its purpose relates the to
>> data model.
>>
>> You are in a much better place to reduce that large script to a
>> minimal one that shows a difference because it's your application.
>>
>> Does it need all those steps together to show the difference or just
>> one of them?  (BTW each update step is done independency: there'll be
>> a point where the answers start diverging.)
>>
>> Looking at it though, the use of
>>
>> where{
>>   {
>>     graph ?g { }.
>>   }
>>   graph ?g {
>>     .. some pattern ..
>>     .. some BIND involving ?g ..
>>  }
>> }
>>
>> is pretty suspect.
>> Omit the first part and put the BIND after the second:
>>
>> Move the BIND to after the
>>   graph ?g {
>>     .. some pattern ..
>>  }
>>  .. some BIND involving ?g ..
>>
>>     Andy
>>
>>>
>>>
>>
>> Where is the 3.13.0 server?
>>
>>> 1) start source apache-jena-fuseki-server on port 3030
>>>
>>> 2) start target apache-jena-fuseki-3.17.0-server on port 3031
>>>
>>> 3) unzip and upload the attachment ds.zip into source 
>>> apache-jena-fuseki-server on port 3030 using command
>>>
>>>       curl -XPOST --header 'Content-Type: application/trig' 
>>> --data-binary @ds.trig http://<source 
>>> apache-jena-fuseki-server>:3030/<source dataset>
>>>
>>> 4) create one in-memory dataset and one persistent dataset on target 
>>> apache-jena-fuseki-3.17.0-server
>>>
>>> 5) update the source apache-jena-fuseki-server and source dataset in 
>>> combine_NGs.sparql-file
>>>
>>> 6) run combine_NGs.sparql-script in in-memory dataset and persistent 
>>> dataset of the target apache-jena-fuseki-3.17.0-server
>>
>> Run how?
>>
>>>
>>> 7) run query
>>>
>>>     SELECT ?subject ?predicate ?object
>>>       WHERE {
>>>        ?subject ?predicate ?object
>>>       }
>>>
>>> in in-memory dataset and persistent dataset of the target 
>>> apache-jena-fuseki-3.17.0-server and compare the results.
>>>
>>> See attachements in_memory.png and persistent.png for my results 
>>> after the above procedure.
>>
>> That's screenshots of a count: just do
>>
>> SELECT (Count(*) AS ?C) { ?s ?p ?o }
>>
>>>
>>> Jaana
>>>
>>>
>>>> The first thing to do is dump, or Fuseki backup, the database from
>>>> each setup and see if they are the same.
>>>>
>>>> Then if they are, send a minimal reproducible example [1].
>>>> Something someone else can run.
>>>>
>>>>     Andy
>>>>
>>>> [1]
>>>> https://stackoverflow.com/help/minimal-reproducible-example
>>>>
>>>>
>>>> On 11/08/2021 13:35, jaanam@kolumbus.fi wrote:
>>>>> Hello,
>>>>>
>>>>> My jena-fuseki database consists of several named graphs. In order 
>>>>> to provide users graphql-like interface to jena-fuseki I have to 
>>>>> combine my NGs into one big default graph for HyperGraphql 
>>>>> (https://www.hypergraphql.org/) that provides the interface.
>>>>>
>>>>> At some point the users started to get less data than before and 
>>>>> when I investigated the issue I noticed that this was after 
>>>>> upgrading jena-fuseki from 3.13 to 3.17 !
>>>>>
>>>>> To combine the NGs I'm using the following command:
>>>>>
>>>>>    curl -i -H "Content-Type: application/sparql-update"  -X POST 
>>>>> http://<myTargetHost>:8061/<myDs>/update --data-binary 
>>>>> "@./combine-NGs.sparql"
>>>>>
>>>>> (see NGs_to_be_combined.txt for hint of my source database and 
>>>>> combine-NGs.sparql as the executed script).
>>>>>
>>>>> So I run the combine-NGs.sparql-script in the target jena host and 
>>>>> the target dataset.
>>>>>
>>>>> If I use in-memory dataset as my target dataset I get only half of 
>>>>> the triplets cmapared to the amount of triplets with persistent 
>>>>> dataset. This happens only with jena-fuseki 3.17.
>>>>>
>>>>> In 3.13 I haven't seen this issue!
>>>>>
>>>>> Br, Jaana
>>>>>
>>>>>
>>>>>

Re: difference between 3.13 and 3.17

Posted by ja...@kolumbus.fi.
just fixing typo in my below e-mail

> To notice that this doesn't happen with 3.13, just replace the
> 3.17-server with 3.13 in my reproducion steps. I my steps - I guess - 
> the
> source sever can be 3.13 or 3.17.

jaana

jaanam@kolumbus.fi kirjoitti 16.8.2021 13:09:
> Hello,
> 
> sorry for providing you too big amount of data for reproducing the 
> problem.
> 
> Here's much smaller set for data source and a bit smaller script for
> combining the NGs.
> 
> and the steps to reporoduce:
> 
> 1) start source apache-jena-fuseki-server on port 3030
> 
> 2) start target apache-jena-fuseki-3.17.0-server on port 3031
> 
> 3) unzip and upload the attachment source.zip into source
> apache-jena-fuseki-server on port 3030
> 
> curl -XPOST --header 'Content-Type: application/trig' --data-binary
> @source.trig http://localhost:3030/source
> 
> 4) create one in-memory dataset (e.g. target_in_memory) and one
> persistent (e.g. target_persistent) dataset on target
> apache-jena-fuseki-3.17.0-server running on port 3031
> 
> 5) update the source apache-jena-fuseki-server and source dataset in
> combine_NGs2.sparql-script id needed
> 
> 6) run combine_NGs2.sparql-script in in-memory dataset and persistent
> dataset of the target apache-jena-fuseki-3.17.0-server for instance in
> jena-fuseki GUI query tab
> 
> or using curl:
> 
> curl -i -H "Content-Type: application/sparql-update"  -X POST
> http://localhost:3031/target_in_memory/update --data-binary
> "@./combine_NGs2.sparql"
> 
> curl -i -H "Content-Type: application/sparql-update"  -X POST
> http://localhost:3031/target_persistent/update --data-binary
> "@./combine_NGs2.sparql"
> 
> Compare the resutls. Even form jena-fuseki GUI edit-page when opening
> the result default graps in the editor tab window it can be seen that
> the in memory data set has less data than the persistent one.
> 
> As I've told before this issue didn't occur with 
> apacahe-jane-fuseki-3.13.
> 
> And about your questions:
> 
>> Where is the 3.13.0 server?
> 
> To notice that this doesn't happen with 3.13, just replace the
> 3.17-server with 3.17 in my above steps. I my staps - I guess - the
> source sever can be 3.13 or 3.17.
> 
> 
>> Does it need the data pulled from another server that than execute on
>> already loaded data?
> 
> We didn't manage to combine the NGs within one server - we expected
> that jena would try to use proxy or something like that...
> 
> Br, Jaana
> 
> Andy Seaborne kirjoitti 13.8.2021 23:02:
>> On 13/08/2021 12:03, jaanam@kolumbus.fi wrote:
>>> Andy Seaborne kirjoitti 11.8.2021 17:38:
>>>> Hi there,
>>>> 
>>>> There isn't enough information to see what's happening.
>>>> 
>>> Hello,
>>> 
>>> see steps to repeat the issue below.
>> 
>> I've got all the parts of the example - it's not minimal though.
>> 
>> What is a short amount of data, shorter update script that shows the 
>> problem?
>> 
>> Does it need the data pulled from another server that than execute on
>> already loaded data?
>> 
>> There is a data file of 363,559 quads (which has warnings), a SPARQL
>> update script of 241 lines.
>> 
>> To work out what is going on, someone has to reduce that large setup
>> to the part that causes the difference.
>> 
>> The first thing that script does is delete all the local data and pull
>> some, not all, data from the source server. Is that step necessary?
>> 
>> I don't believe it needs all the data and all the script to show a
>> difference nor that it needs to pull the data out of one server, and
>> put it in the local store in order to be different, why not just load
>> something directly?
>> 
>> 
>> The rest of the update does some kind of manipulation of the data - I
>> don't understand what it is trying to do - its purpose relates the to
>> data model.
>> 
>> You are in a much better place to reduce that large script to a
>> minimal one that shows a difference because it's your application.
>> 
>> Does it need all those steps together to show the difference or just
>> one of them?  (BTW each update step is done independency: there'll be
>> a point where the answers start diverging.)
>> 
>> Looking at it though, the use of
>> 
>> where{
>>   {
>>     graph ?g { }.
>>   }
>>   graph ?g {
>>     .. some pattern ..
>>     .. some BIND involving ?g ..
>>  }
>> }
>> 
>> is pretty suspect.
>> Omit the first part and put the BIND after the second:
>> 
>> Move the BIND to after the
>>   graph ?g {
>>     .. some pattern ..
>>  }
>>  .. some BIND involving ?g ..
>> 
>>     Andy
>> 
>>> 
>>> 
>> 
>> Where is the 3.13.0 server?
>> 
>>> 1) start source apache-jena-fuseki-server on port 3030
>>> 
>>> 2) start target apache-jena-fuseki-3.17.0-server on port 3031
>>> 
>>> 3) unzip and upload the attachment ds.zip into source 
>>> apache-jena-fuseki-server on port 3030 using command
>>> 
>>>       curl -XPOST --header 'Content-Type: application/trig' 
>>> --data-binary @ds.trig http://<source 
>>> apache-jena-fuseki-server>:3030/<source dataset>
>>> 
>>> 4) create one in-memory dataset and one persistent dataset on target 
>>> apache-jena-fuseki-3.17.0-server
>>> 
>>> 5) update the source apache-jena-fuseki-server and source dataset in 
>>> combine_NGs.sparql-file
>>> 
>>> 6) run combine_NGs.sparql-script in in-memory dataset and persistent 
>>> dataset of the target apache-jena-fuseki-3.17.0-server
>> 
>> Run how?
>> 
>>> 
>>> 7) run query
>>> 
>>>     SELECT ?subject ?predicate ?object
>>>       WHERE {
>>>        ?subject ?predicate ?object
>>>       }
>>> 
>>> in in-memory dataset and persistent dataset of the target 
>>> apache-jena-fuseki-3.17.0-server and compare the results.
>>> 
>>> See attachements in_memory.png and persistent.png for my results 
>>> after the above procedure.
>> 
>> That's screenshots of a count: just do
>> 
>> SELECT (Count(*) AS ?C) { ?s ?p ?o }
>> 
>>> 
>>> Jaana
>>> 
>>> 
>>>> The first thing to do is dump, or Fuseki backup, the database from
>>>> each setup and see if they are the same.
>>>> 
>>>> Then if they are, send a minimal reproducible example [1].
>>>> Something someone else can run.
>>>> 
>>>>     Andy
>>>> 
>>>> [1]
>>>> https://stackoverflow.com/help/minimal-reproducible-example
>>>> 
>>>> 
>>>> On 11/08/2021 13:35, jaanam@kolumbus.fi wrote:
>>>>> Hello,
>>>>> 
>>>>> My jena-fuseki database consists of several named graphs. In order 
>>>>> to provide users graphql-like interface to jena-fuseki I have to 
>>>>> combine my NGs into one big default graph for HyperGraphql 
>>>>> (https://www.hypergraphql.org/) that provides the interface.
>>>>> 
>>>>> At some point the users started to get less data than before and 
>>>>> when I investigated the issue I noticed that this was after 
>>>>> upgrading jena-fuseki from 3.13 to 3.17 !
>>>>> 
>>>>> To combine the NGs I'm using the following command:
>>>>> 
>>>>>    curl -i -H "Content-Type: application/sparql-update"  -X POST 
>>>>> http://<myTargetHost>:8061/<myDs>/update --data-binary 
>>>>> "@./combine-NGs.sparql"
>>>>> 
>>>>> (see NGs_to_be_combined.txt for hint of my source database and 
>>>>> combine-NGs.sparql as the executed script).
>>>>> 
>>>>> So I run the combine-NGs.sparql-script in the target jena host and 
>>>>> the target dataset.
>>>>> 
>>>>> If I use in-memory dataset as my target dataset I get only half of 
>>>>> the triplets cmapared to the amount of triplets with persistent 
>>>>> dataset. This happens only with jena-fuseki 3.17.
>>>>> 
>>>>> In 3.13 I haven't seen this issue!
>>>>> 
>>>>> Br, Jaana
>>>>> 
>>>>> 
>>>>> 

Re: difference between 3.13 and 3.17

Posted by ja...@kolumbus.fi.
Hello,

sorry for providing you too big amount of data for reproducing the 
problem.

Here's much smaller set for data source and a bit smaller script for 
combining the NGs.

and the steps to reporoduce:

1) start source apache-jena-fuseki-server on port 3030

2) start target apache-jena-fuseki-3.17.0-server on port 3031

3) unzip and upload the attachment source.zip into source 
apache-jena-fuseki-server on port 3030

curl -XPOST --header 'Content-Type: application/trig' --data-binary 
@source.trig http://localhost:3030/source

4) create one in-memory dataset (e.g. target_in_memory) and one 
persistent (e.g. target_persistent) dataset on target 
apache-jena-fuseki-3.17.0-server running on port 3031

5) update the source apache-jena-fuseki-server and source dataset in 
combine_NGs2.sparql-script id needed

6) run combine_NGs2.sparql-script in in-memory dataset and persistent 
dataset of the target apache-jena-fuseki-3.17.0-server for instance in 
jena-fuseki GUI query tab

or using curl:

curl -i -H "Content-Type: application/sparql-update"  -X POST 
http://localhost:3031/target_in_memory/update --data-binary 
"@./combine_NGs2.sparql"

curl -i -H "Content-Type: application/sparql-update"  -X POST 
http://localhost:3031/target_persistent/update --data-binary 
"@./combine_NGs2.sparql"

Compare the resutls. Even form jena-fuseki GUI edit-page when opening 
the result default graps in the editor tab window it can be seen that 
the in memory data set has less data than the persistent one.

As I've told before this issue didn't occur with 
apacahe-jane-fuseki-3.13.

And about your questions:

> Where is the 3.13.0 server?

To notice that this doesn't happen with 3.13, just replace the 
3.17-server with 3.17 in my above steps. I my staps - I guess - the 
source sever can be 3.13 or 3.17.


> Does it need the data pulled from another server that than execute on
> already loaded data?

We didn't manage to combine the NGs within one server - we expected that 
jena would try to use proxy or something like that...

Br, Jaana

Andy Seaborne kirjoitti 13.8.2021 23:02:
> On 13/08/2021 12:03, jaanam@kolumbus.fi wrote:
>> Andy Seaborne kirjoitti 11.8.2021 17:38:
>>> Hi there,
>>> 
>>> There isn't enough information to see what's happening.
>>> 
>> Hello,
>> 
>> see steps to repeat the issue below.
> 
> I've got all the parts of the example - it's not minimal though.
> 
> What is a short amount of data, shorter update script that shows the 
> problem?
> 
> Does it need the data pulled from another server that than execute on
> already loaded data?
> 
> There is a data file of 363,559 quads (which has warnings), a SPARQL
> update script of 241 lines.
> 
> To work out what is going on, someone has to reduce that large setup
> to the part that causes the difference.
> 
> The first thing that script does is delete all the local data and pull
> some, not all, data from the source server. Is that step necessary?
> 
> I don't believe it needs all the data and all the script to show a
> difference nor that it needs to pull the data out of one server, and
> put it in the local store in order to be different, why not just load
> something directly?
> 
> 
> The rest of the update does some kind of manipulation of the data - I
> don't understand what it is trying to do - its purpose relates the to
> data model.
> 
> You are in a much better place to reduce that large script to a
> minimal one that shows a difference because it's your application.
> 
> Does it need all those steps together to show the difference or just
> one of them?  (BTW each update step is done independency: there'll be
> a point where the answers start diverging.)
> 
> Looking at it though, the use of
> 
> where{
>   {
>     graph ?g { }.
>   }
>   graph ?g {
>     .. some pattern ..
>     .. some BIND involving ?g ..
>  }
> }
> 
> is pretty suspect.
> Omit the first part and put the BIND after the second:
> 
> Move the BIND to after the
>   graph ?g {
>     .. some pattern ..
>  }
>  .. some BIND involving ?g ..
> 
>     Andy
> 
>> 
>> 
> 
> Where is the 3.13.0 server?
> 
>> 1) start source apache-jena-fuseki-server on port 3030
>> 
>> 2) start target apache-jena-fuseki-3.17.0-server on port 3031
>> 
>> 3) unzip and upload the attachment ds.zip into source 
>> apache-jena-fuseki-server on port 3030 using command
>> 
>>       curl -XPOST --header 'Content-Type: application/trig' 
>> --data-binary @ds.trig http://<source 
>> apache-jena-fuseki-server>:3030/<source dataset>
>> 
>> 4) create one in-memory dataset and one persistent dataset on target 
>> apache-jena-fuseki-3.17.0-server
>> 
>> 5) update the source apache-jena-fuseki-server and source dataset in 
>> combine_NGs.sparql-file
>> 
>> 6) run combine_NGs.sparql-script in in-memory dataset and persistent 
>> dataset of the target apache-jena-fuseki-3.17.0-server
> 
> Run how?
> 
>> 
>> 7) run query
>> 
>>     SELECT ?subject ?predicate ?object
>>       WHERE {
>>        ?subject ?predicate ?object
>>       }
>> 
>> in in-memory dataset and persistent dataset of the target 
>> apache-jena-fuseki-3.17.0-server and compare the results.
>> 
>> See attachements in_memory.png and persistent.png for my results after 
>> the above procedure.
> 
> That's screenshots of a count: just do
> 
> SELECT (Count(*) AS ?C) { ?s ?p ?o }
> 
>> 
>> Jaana
>> 
>> 
>>> The first thing to do is dump, or Fuseki backup, the database from
>>> each setup and see if they are the same.
>>> 
>>> Then if they are, send a minimal reproducible example [1].
>>> Something someone else can run.
>>> 
>>>     Andy
>>> 
>>> [1]
>>> https://stackoverflow.com/help/minimal-reproducible-example
>>> 
>>> 
>>> On 11/08/2021 13:35, jaanam@kolumbus.fi wrote:
>>>> Hello,
>>>> 
>>>> My jena-fuseki database consists of several named graphs. In order 
>>>> to provide users graphql-like interface to jena-fuseki I have to 
>>>> combine my NGs into one big default graph for HyperGraphql 
>>>> (https://www.hypergraphql.org/) that provides the interface.
>>>> 
>>>> At some point the users started to get less data than before and 
>>>> when I investigated the issue I noticed that this was after 
>>>> upgrading jena-fuseki from 3.13 to 3.17 !
>>>> 
>>>> To combine the NGs I'm using the following command:
>>>> 
>>>>    curl -i -H "Content-Type: application/sparql-update"  -X POST 
>>>> http://<myTargetHost>:8061/<myDs>/update --data-binary 
>>>> "@./combine-NGs.sparql"
>>>> 
>>>> (see NGs_to_be_combined.txt for hint of my source database and 
>>>> combine-NGs.sparql as the executed script).
>>>> 
>>>> So I run the combine-NGs.sparql-script in the target jena host and 
>>>> the target dataset.
>>>> 
>>>> If I use in-memory dataset as my target dataset I get only half of 
>>>> the triplets cmapared to the amount of triplets with persistent 
>>>> dataset. This happens only with jena-fuseki 3.17.
>>>> 
>>>> In 3.13 I haven't seen this issue!
>>>> 
>>>> Br, Jaana
>>>> 
>>>> 
>>>> 

Re: difference between 3.13 and 3.17

Posted by Andy Seaborne <an...@apache.org>.

On 13/08/2021 12:03, jaanam@kolumbus.fi wrote:
> Andy Seaborne kirjoitti 11.8.2021 17:38:
>> Hi there,
>>
>> There isn't enough information to see what's happening.
>>
> Hello,
> 
> see steps to repeat the issue below.

I've got all the parts of the example - it's not minimal though.

What is a short amount of data, shorter update script that shows the 
problem?

Does it need the data pulled from another server that than execute on 
already loaded data?

There is a data file of 363,559 quads (which has warnings), a SPARQL 
update script of 241 lines.

To work out what is going on, someone has to reduce that large setup to 
the part that causes the difference.

The first thing that script does is delete all the local data and pull 
some, not all, data from the source server. Is that step necessary?

I don't believe it needs all the data and all the script to show a 
difference nor that it needs to pull the data out of one server, and put 
it in the local store in order to be different, why not just load 
something directly?


The rest of the update does some kind of manipulation of the data - I 
don't understand what it is trying to do - its purpose relates the to 
data model.

You are in a much better place to reduce that large script to a minimal 
one that shows a difference because it's your application.

Does it need all those steps together to show the difference or just one 
of them?  (BTW each update step is done independency: there'll be a 
point where the answers start diverging.)

Looking at it though, the use of

where{
   {
     graph ?g { }.
   }
   graph ?g {
     .. some pattern ..
     .. some BIND involving ?g ..
  }
}

is pretty suspect.
Omit the first part and put the BIND after the second:

Move the BIND to after the
   graph ?g {
     .. some pattern ..
  }
  .. some BIND involving ?g ..

     Andy

> 
> 

Where is the 3.13.0 server?

> 1) start source apache-jena-fuseki-server on port 3030
> 
> 2) start target apache-jena-fuseki-3.17.0-server on port 3031
> 
> 3) unzip and upload the attachment ds.zip into source 
> apache-jena-fuseki-server on port 3030 using command
> 
>       curl -XPOST --header 'Content-Type: application/trig' 
> --data-binary @ds.trig http://<source 
> apache-jena-fuseki-server>:3030/<source dataset>
> 
> 4) create one in-memory dataset and one persistent dataset on target 
> apache-jena-fuseki-3.17.0-server
> 
> 5) update the source apache-jena-fuseki-server and source dataset in 
> combine_NGs.sparql-file
> 
> 6) run combine_NGs.sparql-script in in-memory dataset and persistent 
> dataset of the target apache-jena-fuseki-3.17.0-server

Run how?

> 
> 7) run query
> 
>     SELECT ?subject ?predicate ?object
>       WHERE {
>        ?subject ?predicate ?object
>       }
> 
> in in-memory dataset and persistent dataset of the target 
> apache-jena-fuseki-3.17.0-server and compare the results.
> 
> See attachements in_memory.png and persistent.png for my results after 
> the above procedure.

That's screenshots of a count: just do

SELECT (Count(*) AS ?C) { ?s ?p ?o }

> 
> Jaana
> 
> 
>> The first thing to do is dump, or Fuseki backup, the database from
>> each setup and see if they are the same.
>>
>> Then if they are, send a minimal reproducible example [1].
>> Something someone else can run.
>>
>>     Andy
>>
>> [1]
>> https://stackoverflow.com/help/minimal-reproducible-example
>>
>>
>> On 11/08/2021 13:35, jaanam@kolumbus.fi wrote:
>>> Hello,
>>>
>>> My jena-fuseki database consists of several named graphs. In order to 
>>> provide users graphql-like interface to jena-fuseki I have to combine 
>>> my NGs into one big default graph for HyperGraphql 
>>> (https://www.hypergraphql.org/) that provides the interface.
>>>
>>> At some point the users started to get less data than before and when 
>>> I investigated the issue I noticed that this was after upgrading 
>>> jena-fuseki from 3.13 to 3.17 !
>>>
>>> To combine the NGs I'm using the following command:
>>>
>>>    curl -i -H "Content-Type: application/sparql-update"  -X POST 
>>> http://<myTargetHost>:8061/<myDs>/update --data-binary 
>>> "@./combine-NGs.sparql"
>>>
>>> (see NGs_to_be_combined.txt for hint of my source database and 
>>> combine-NGs.sparql as the executed script).
>>>
>>> So I run the combine-NGs.sparql-script in the target jena host and 
>>> the target dataset.
>>>
>>> If I use in-memory dataset as my target dataset I get only half of 
>>> the triplets cmapared to the amount of triplets with persistent 
>>> dataset. This happens only with jena-fuseki 3.17.
>>>
>>> In 3.13 I haven't seen this issue!
>>>
>>> Br, Jaana
>>>
>>>
>>>