You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by "Lorenz B." <co...@googlemail.com> on 2017/10/17 11:58:13 UTC

Property Paths benchmark @ ISWC2017

Hi,

I just walked through the papers for the upcoming ISWC conference and
found a paper about benchmarking of SPARQL property paths [1] .

Not sure if this is relevant, but it looks like Jena has some issues
with different types of queries using the property path. For example,

SELECT ?o WHERE {A B* ?o.} LIMIT 100

lead to an OOM error on non-cyclic data. Here is the relevant part of
the paper:

> While benchmarking Virtuoso, RDF4J and Allegrograph no errors or
> exceptions have occurred. During the benchmark process of Jena an
> OutOfMemoryError has been thrown whenever a query with the * operator
> was used. In order to identify the cause of the error, the amount of
> results the query should return has been limited to 100. The results
> that have been returned by a query of the form SELECT ?o WHERE {A B*
> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A.
> Due to this fact it is presumable that the query containing the *
> operator returns A recursively until the main memory was full. To
> ensure that this behaviour is not caused by cycles in the dataset a
> query of the same form but with a predicate IRI that did not exist in
> the dataset was executed. This query still returned 100 times A. This
> indicates, that the * operator is not implemented correctly.
In addition, the experiments showed that:
> Due to the problems with the * operator the queries 4, 7 and 8 could
> not be processed. Additionally query 3, 5, and 6 returned no results
> after 1 hour and thus, were aborted. Query 1 returned an empty and
> thus, incomplete result set. Only for query 2 a valid result was
> returned. Due to the lack of comparable results, Jena has been omitted
> in the comparison of triple stores.

In the discussion section, they summarize the overall performance of Jena by

> Jena could not return results for any query in under 1 hour besides
> query 2. Furthermore, the * operator could not be evaluated at all and
> the inverse operator returned empty result sets.

It looks like they used version 3.0.1, so maybe this doesn't hold
anymore for all of the queries. If not, it could be interesting to
improve performance and/or completeness.

I hope I didn't miss some open JIRA ticket, but in general I just wanted
to highlight the presence of some published benchmark for those kind of
queries.


Cheers,

Lorenz

[1] http://ceur-ws.org/Vol-1932/paper-04.pdf

Re: Property Paths benchmark @ ISWC2017

Posted by "Lorenz B." <co...@googlemail.com>.

Hi all,

just to clarify, so far I didn't contact the authors.

Right now I'm trying to reproduce the experiments but it looks like I'd
need some more details:

* did they use the Jena in-memory engine or was it TDB?

* did they increase the Java heap space? when using the CLI of Jena, the
JVM_ARGS should probably be set - maybe I'm wrong, but it looks like for
3.0.1 the default value is hard-coded to -Xmx 1024M


So far I tried different versions of Apache Jena (3.0.1, 3.1.1, 3.4.0)
but could not reproduce any of the reported errors. But I didn't use the
larger BTC dataset (~100G unzipped) yet. I'm using the mentioned Polish
DBpedia dump, but even here I'm a bit lost as I couldn't figure out
which files they loaded to get the 1.3 million triples (even the dataset
with mapping-based properties comprises already ~3 million triples).


The type of query they reported to fail with an OOM exception was

SELECT ?o WHERE {A B* ?o.} LIMIT 100

with A and B being valid URIs in the dataset. Thus, I used

SELECT ?o {<http://dbpedia.org/resource/Nissan_Almera>
<http://dbpedia.org/ontology/successor>* ?o } LIMIT 100

and it works as expected

╔═════════════════════════════════════════════╗
║                      o                                                
         ║
╠═════════════════════════════════════════════╣
║ <http://dbpedia.org/resource/Nissan_Almera>        ║
║ <http://dbpedia.org/resource/Nissan_Tiida>            ║
╚═════════════════════════════════════════════╝

Note that dbr:Nissan_Almera has a dbo:successor relation to itself -
something that I would expect to be a corner case that could force the
problem.

@Andy can you think of a special case that would lead to this weird bug
and return 100 times the subject resource? I can see that you changed
the datastructure which keeps track of the visited to a set, but even
with a list containment check would be done by equality check on the
Node object.


I also tried the case with the inverse operator

SELECT ?o1 WHERE {?o1 ˆP1 S1 . }

and it did return an non-empty result for me - as expected.

Either something forces Jena to fail on the BTC dataset or I'm doing
something wrong (which cannot be ruled out for sure :D )


In general, it would just be interesting to know whether those bugs
still occur or have been fixed by recent code changes.


Cheers,

Lorenz


On 19.10.2017 11:08, Marco Neumann wrote:
> did you try to contact Daniel Janke, Adrian Skubella or Steffen Staab
> to get a response?
>
> the findings seem to based on work that has been published online as
> part of a bachelor’s thesis by Adrian Skubella.
>
> https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf
>
>
>
> On Thu, Oct 19, 2017 at 10:54 AM, Lorenz B. <co...@googlemail.com> wrote:
>> For me this is really bad practice. It also looks like they did the
>> benchmark more than one year ago. Otherwise due to JENA-1195 this error
>> wouldn't occur anymore. And submission deadline was August 6th, 2017 .
>> Their experiments contain 8 queries, rerunning those shouldn't take ages...
>>
>> I'm currently trying to reproduce the results of the paper, but the
>> whole experimental setup remains unclear. I'm wondering if they used
>> just the Jena CLI or TDB. The same holds for RDF4J. I'm puzzled because
>> the runtimes in the eval section are quite small, but even loading the
>> data of their benchmark takes much more time. So maybe they used the
>> RDF4J server.
>>
>> The worst thing is that they didn't contact any of the developers. Or
>> did they talk to somebody here and then Andy created the ticket
>> JENA-1195? Also for the other queries that failed, I would expect to see
>> tickets on Apache JIRA or at least a hint on the Jena mailing list...
>>
>> @Andy I'm also wondering whether JENA-1317 addresses the problem with
>> the empty result of benchmark query containing an inverse property path.
>>
>>
>> On 18.10.2017 17:03, ajs6f@apache.org wrote:
>>> As you know, Andy, I'm going to ISWC this year-- shall I buttonhole
>>> them and give them our POV? :grin:
>>>
>>> In all seriousness, from what I can tell the results amount to "Using
>>> older versions of our comparands and without contacting the projects
>>> in question we couldn't find a store that implements every property
>>> path feature correctly and some fail entirely."
>>>
>>> I'm not really sure how useful that information is...? But I am ready
>>> to do a benchmarking paper for next year. Seems like it's a lot easier
>>> than I thought!
>>>
>>>
>>> ajs6f
>>>
>>>
>>> Andy Seaborne wrote on 10/17/17 9:28 AM:
>>>> Hi Lorenz,
>>>>
>>>> Looks like JENA-1195 which is fixed.  Does that look like it?
>>>>
>>>> I think it is shame when papers focus on bugs rather than discussing
>>>> and even fixing them.  Bugs aren't research.
>>>>
>>>> Path evaluation could improved to stream in more cases (that's why
>>>> LIMIT didn't help), but 1195 explains the slowness
>>>> and memory.
>>>>
>>>>     Andy
>>>>
>>>> On 17/10/17 07:58, Lorenz B. wrote:
>>>>> Hi,
>>>>>
>>>>> I just walked through the papers for the upcoming ISWC conference and
>>>>> found a paper about benchmarking of SPARQL property paths [1] .
>>>>>
>>>>> Not sure if this is relevant, but it looks like Jena has some issues
>>>>> with different types of queries using the property path. For example,
>>>>>
>>>>> SELECT ?o WHERE {A B* ?o.} LIMIT 100
>>>>>
>>>>> lead to an OOM error on non-cyclic data. Here is the relevant part of
>>>>> the paper:
>>>>>
>>>>>> While benchmarking Virtuoso, RDF4J and Allegrograph no errors or
>>>>>> exceptions have occurred. During the benchmark process of Jena an
>>>>>> OutOfMemoryError has been thrown whenever a query with the * operator
>>>>>> was used. In order to identify the cause of the error, the amount of
>>>>>> results the query should return has been limited to 100. The results
>>>>>> that have been returned by a query of the form SELECT ?o WHERE {A B*
>>>>>> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A.
>>>>>> Due to this fact it is presumable that the query containing the *
>>>>>> operator returns A recursively until the main memory was full. To
>>>>>> ensure that this behaviour is not caused by cycles in the dataset a
>>>>>> query of the same form but with a predicate IRI that did not exist in
>>>>>> the dataset was executed. This query still returned 100 times A. This
>>>>>> indicates, that the * operator is not implemented correctly.
>>>>> In addition, the experiments showed that:
>>>>>> Due to the problems with the * operator the queries 4, 7 and 8 could
>>>>>> not be processed. Additionally query 3, 5, and 6 returned no results
>>>>>> after 1 hour and thus, were aborted. Query 1 returned an empty and
>>>>>> thus, incomplete result set. Only for query 2 a valid result was
>>>>>> returned. Due to the lack of comparable results, Jena has been omitted
>>>>>> in the comparison of triple stores.
>>>>> In the discussion section, they summarize the overall performance of
>>>>> Jena by
>>>>>
>>>>>> Jena could not return results for any query in under 1 hour besides
>>>>>> query 2. Furthermore, the * operator could not be evaluated at all and
>>>>>> the inverse operator returned empty result sets.
>>>>> It looks like they used version 3.0.1, so maybe this doesn't hold
>>>>> anymore for all of the queries. If not, it could be interesting to
>>>>> improve performance and/or completeness.
>>>>>
>>>>> I hope I didn't miss some open JIRA ticket, but in general I just
>>>>> wanted
>>>>> to highlight the presence of some published benchmark for those kind of
>>>>> queries.
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Lorenz
>>>>>
>>>>> [1] http://ceur-ws.org/Vol-1932/paper-04.pdf
>>>>>

Re: Property Paths benchmark @ ISWC2017

Posted by aj...@apache.org.

Perhaps the first line of work could be to contact the authors and ask them:

Did you contact Jena (or for that matter, any of the other projects) for this work? Why did you use such an old version 
of Jena?

Would you be willing to try again with a modern version? If the results are significantly different (as they almost 
certainly will be) would you be willing to make an emendation for your workshop paper?


ajs6f

Marco Neumann wrote on 10/19/17 12:10 PM:
> just on a side note since this is "only" a workshop contribution it
> will not make an appearance in the conference itself and will not
> appear in the main ISWC  2017 conference proceedings published by
> Springer but only as an independent publication of the workshop
> itself.
>
> responsibility for the workshop sits with the  Organising Committee
>
> Axel-Cyrille Ngonga Ngomo, Institute for Applied Informatics, Leipzig, Germany
> Anastasia Krithara, National Center for Scienti c Research
> “Demokritos”, Athens, Greece
> Irini Fundulaki, ICS-FORTH, Heraklion, Crete, Greece
>
> and for review the Program Committee
>
> Milos Jovanovik, OpenLink Software, United Kingdom
> Pavlos Fafalios, University of Hannover. Germany
> Kostas Stefanidis, University of Tampere, Finland
> Muhammad Saleem, AKSW, University of Leipzig, Germany
> Manolis Terrovitis, IMIS, RC Athena, Greece
> Ricardo Usbeck, University of Leipzig, Germany
> George Papastefanatos, IMIS RC Athena, Greece
> Stasinos Kostantopoulos, NCSR Demokritos, Greece
>
>
>
>
> On Thu, Oct 19, 2017 at 3:51 PM,  <aj...@apache.org> wrote:
>> I hadn't intended to spend time at the benchmarking sessions at ISWC, but if
>> it seems useful, I can try and raise this issue in person. I suppose partly
>> it's a question of setting the record straight, and then partly it's a
>> question of standing up for good practice, and then it's also a question of
>> protecting Jena from unmerited negative consequences.
>>
>> I don't know how widely used such benchmarks are. Except for a few
>> high-profile projects, I rarely see anyone refer to this sort of evidence as
>> a reason to or not to adopt a system.
>>
>>
>> ajs6f
>>
>> Marco Neumann wrote on 10/19/17 9:26 AM:
>>
>>> Rob,
>>>
>>> unfortunately this is more common in Semantic Web research papers than
>>> one might expect. I have seen this before in particular with regards
>>> to perceived shortcomings of jena or its components. It might be a
>>> good idea to bring this to the attention of affiliated people in the
>>> organisation (here University of Southampton and Koblenz-Landau ).
>>>
>>> while I don't think this is an intentional attempt to bring Jena into
>>> disrepute the situation could be clarified and addressed by the ISWC
>>> workshop or track chair as well. I wish your mentioned "standard
>>> Industry and research practice" would be more common than it currently
>>> is.
>>>
>>> btw the thesis report is dated Juli 2016
>>>
>>>
>>>
>>> On Thu, Oct 19, 2017 at 12:08 PM, Rob Vesse <rv...@dotnetrdf.org> wrote:
>>>>
>>>> Marco
>>>>
>>>> I don’t believe anyone has tried to contact them yet
>>>>
>>>> I think that the complaints here are that there doesn’t appear to have
>>>> been any attempt to report the issues identified back to the projects
>>>> studied. If this was a security flaw in the project the standard Industry
>>>> and research practice would be to make a responsible disclosure to the
>>>> projects in advance of the public disclosure such that the researchers and
>>>> projects can work together to resolve the problem. The implication being
>>>> that it is irresponsible for the authors to benefit from pointing out flaws
>>>> in the projects while appearing to make no efforts to help report/resolve
>>>> those issues.
>>>>
>>>> As you suggest this paper does appear to be based upon some thesis work,
>>>> that thesis indicates that the research was originally carried out in 2015
>>>> implying that the author knew of the issue two years ago.
>>>>
>>>> The project has a relatively small core of developers most of whom work
>>>> on Jena on the side. We very much rely upon the wider community to provide
>>>> input on bugs that need to be resolved e.g. Performance issues and the
>>>> features we should prioritise. When someone clearly knew of a problem but
>>>> didn’t tell us that is inevitably frustrating for the project.
>>>>
>>>> Rob
>>>>
>>>> On 19/10/2017 10:08, "Marco Neumann" <ma...@gmail.com> wrote:
>>>>
>>>>     did you try to contact Daniel Janke, Adrian Skubella or Steffen Staab
>>>>     to get a response?
>>>>
>>>>     the findings seem to based on work that has been published online as
>>>>     part of a bachelor’s thesis by Adrian Skubella.
>>>>
>>>>
>>>> https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf
>>>>
>>>>
>>>>
>>>>     On Thu, Oct 19, 2017 at 10:54 AM, Lorenz B.
>>>> <co...@googlemail.com> wrote:
>>>>     > For me this is really bad practice. It also looks like they did the
>>>>     > benchmark more than one year ago. Otherwise due to JENA-1195 this
>>>> error
>>>>     > wouldn't occur anymore. And submission deadline was August 6th,
>>>> 2017 .
>>>>     > Their experiments contain 8 queries, rerunning those shouldn't take
>>>> ages...
>>>>     >
>>>>     > I'm currently trying to reproduce the results of the paper, but the
>>>>     > whole experimental setup remains unclear. I'm wondering if they
>>>> used
>>>>     > just the Jena CLI or TDB. The same holds for RDF4J. I'm puzzled
>>>> because
>>>>     > the runtimes in the eval section are quite small, but even loading
>>>> the
>>>>     > data of their benchmark takes much more time. So maybe they used
>>>> the
>>>>     > RDF4J server.
>>>>     >
>>>>     > The worst thing is that they didn't contact any of the developers.
>>>> Or
>>>>     > did they talk to somebody here and then Andy created the ticket
>>>>     > JENA-1195? Also for the other queries that failed, I would expect
>>>> to see
>>>>     > tickets on Apache JIRA or at least a hint on the Jena mailing
>>>> list...
>>>>     >
>>>>     > @Andy I'm also wondering whether JENA-1317 addresses the problem
>>>> with
>>>>     > the empty result of benchmark query containing an inverse property
>>>> path.
>>>>     >
>>>>     >
>>>>     > On 18.10.2017 17:03, ajs6f@apache.org wrote:
>>>>     >> As you know, Andy, I'm going to ISWC this year-- shall I
>>>> buttonhole
>>>>     >> them and give them our POV? :grin:
>>>>     >>
>>>>     >> In all seriousness, from what I can tell the results amount to
>>>> "Using
>>>>     >> older versions of our comparands and without contacting the
>>>> projects
>>>>     >> in question we couldn't find a store that implements every
>>>> property
>>>>     >> path feature correctly and some fail entirely."
>>>>     >>
>>>>     >> I'm not really sure how useful that information is...? But I am
>>>> ready
>>>>     >> to do a benchmarking paper for next year. Seems like it's a lot
>>>> easier
>>>>     >> than I thought!
>>>>     >>
>>>>     >>
>>>>     >> ajs6f
>>>>     >>
>>>>     >>
>>>>     >> Andy Seaborne wrote on 10/17/17 9:28 AM:
>>>>     >>> Hi Lorenz,
>>>>     >>>
>>>>     >>> Looks like JENA-1195 which is fixed.  Does that look like it?
>>>>     >>>
>>>>     >>> I think it is shame when papers focus on bugs rather than
>>>> discussing
>>>>     >>> and even fixing them.  Bugs aren't research.
>>>>     >>>
>>>>     >>> Path evaluation could improved to stream in more cases (that's
>>>> why
>>>>     >>> LIMIT didn't help), but 1195 explains the slowness
>>>>     >>> and memory.
>>>>     >>>
>>>>     >>>     Andy
>>>>     >>>
>>>>     >>> On 17/10/17 07:58, Lorenz B. wrote:
>>>>     >>>> Hi,
>>>>     >>>>
>>>>     >>>> I just walked through the papers for the upcoming ISWC
>>>> conference and
>>>>     >>>> found a paper about benchmarking of SPARQL property paths [1] .
>>>>     >>>>
>>>>     >>>> Not sure if this is relevant, but it looks like Jena has some
>>>> issues
>>>>     >>>> with different types of queries using the property path. For
>>>> example,
>>>>     >>>>
>>>>     >>>> SELECT ?o WHERE {A B* ?o.} LIMIT 100
>>>>     >>>>
>>>>     >>>> lead to an OOM error on non-cyclic data. Here is the relevant
>>>> part of
>>>>     >>>> the paper:
>>>>     >>>>
>>>>     >>>>> While benchmarking Virtuoso, RDF4J and Allegrograph no errors
>>>> or
>>>>     >>>>> exceptions have occurred. During the benchmark process of Jena
>>>> an
>>>>     >>>>> OutOfMemoryError has been thrown whenever a query with the *
>>>> operator
>>>>     >>>>> was used. In order to identify the cause of the error, the
>>>> amount of
>>>>     >>>>> results the query should return has been limited to 100. The
>>>> results
>>>>     >>>>> that have been returned by a query of the form SELECT ?o WHERE
>>>> {A B*
>>>>     >>>>> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100
>>>> times A.
>>>>     >>>>> Due to this fact it is presumable that the query containing the
>>>> *
>>>>     >>>>> operator returns A recursively until the main memory was full.
>>>> To
>>>>     >>>>> ensure that this behaviour is not caused by cycles in the
>>>> dataset a
>>>>     >>>>> query of the same form but with a predicate IRI that did not
>>>> exist in
>>>>     >>>>> the dataset was executed. This query still returned 100 times
>>>> A. This
>>>>     >>>>> indicates, that the * operator is not implemented correctly.
>>>>     >>>> In addition, the experiments showed that:
>>>>     >>>>> Due to the problems with the * operator the queries 4, 7 and 8
>>>> could
>>>>     >>>>> not be processed. Additionally query 3, 5, and 6 returned no
>>>> results
>>>>     >>>>> after 1 hour and thus, were aborted. Query 1 returned an empty
>>>> and
>>>>     >>>>> thus, incomplete result set. Only for query 2 a valid result
>>>> was
>>>>     >>>>> returned. Due to the lack of comparable results, Jena has been
>>>> omitted
>>>>     >>>>> in the comparison of triple stores.
>>>>     >>>>
>>>>     >>>> In the discussion section, they summarize the overall
>>>> performance of
>>>>     >>>> Jena by
>>>>     >>>>
>>>>     >>>>> Jena could not return results for any query in under 1 hour
>>>> besides
>>>>     >>>>> query 2. Furthermore, the * operator could not be evaluated at
>>>> all and
>>>>     >>>>> the inverse operator returned empty result sets.
>>>>     >>>>
>>>>     >>>> It looks like they used version 3.0.1, so maybe this doesn't
>>>> hold
>>>>     >>>> anymore for all of the queries. If not, it could be interesting
>>>> to
>>>>     >>>> improve performance and/or completeness.
>>>>     >>>>
>>>>     >>>> I hope I didn't miss some open JIRA ticket, but in general I
>>>> just
>>>>     >>>> wanted
>>>>     >>>> to highlight the presence of some published benchmark for those
>>>> kind of
>>>>     >>>> queries.
>>>>     >>>>
>>>>     >>>>
>>>>     >>>> Cheers,
>>>>     >>>>
>>>>     >>>> Lorenz
>>>>     >>>>
>>>>     >>>> [1] http://ceur-ws.org/Vol-1932/paper-04.pdf
>>>>     >>>>
>>>>     >
>>>>
>>>>
>>>>
>>>>     --
>>>>
>>>>
>>>>     ---
>>>>     Marco Neumann
>>>>     KONA
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>

Re: Property Paths benchmark @ ISWC2017

Posted by Marco Neumann <ma...@gmail.com>.

just on a side note since this is "only" a workshop contribution it
will not make an appearance in the conference itself and will not
appear in the main ISWC  2017 conference proceedings published by
Springer but only as an independent publication of the workshop
itself.

responsibility for the workshop sits with the  Organising Committee

Axel-Cyrille Ngonga Ngomo, Institute for Applied Informatics, Leipzig, Germany
Anastasia Krithara, National Center for Scienti c Research
“Demokritos”, Athens, Greece
Irini Fundulaki, ICS-FORTH, Heraklion, Crete, Greece

and for review the Program Committee

Milos Jovanovik, OpenLink Software, United Kingdom
Pavlos Fafalios, University of Hannover. Germany
Kostas Stefanidis, University of Tampere, Finland
Muhammad Saleem, AKSW, University of Leipzig, Germany
Manolis Terrovitis, IMIS, RC Athena, Greece
Ricardo Usbeck, University of Leipzig, Germany
George Papastefanatos, IMIS RC Athena, Greece
Stasinos Kostantopoulos, NCSR Demokritos, Greece




On Thu, Oct 19, 2017 at 3:51 PM,  <aj...@apache.org> wrote:
> I hadn't intended to spend time at the benchmarking sessions at ISWC, but if
> it seems useful, I can try and raise this issue in person. I suppose partly
> it's a question of setting the record straight, and then partly it's a
> question of standing up for good practice, and then it's also a question of
> protecting Jena from unmerited negative consequences.
>
> I don't know how widely used such benchmarks are. Except for a few
> high-profile projects, I rarely see anyone refer to this sort of evidence as
> a reason to or not to adopt a system.
>
>
> ajs6f
>
> Marco Neumann wrote on 10/19/17 9:26 AM:
>
>> Rob,
>>
>> unfortunately this is more common in Semantic Web research papers than
>> one might expect. I have seen this before in particular with regards
>> to perceived shortcomings of jena or its components. It might be a
>> good idea to bring this to the attention of affiliated people in the
>> organisation (here University of Southampton and Koblenz-Landau ).
>>
>> while I don't think this is an intentional attempt to bring Jena into
>> disrepute the situation could be clarified and addressed by the ISWC
>> workshop or track chair as well. I wish your mentioned "standard
>> Industry and research practice" would be more common than it currently
>> is.
>>
>> btw the thesis report is dated Juli 2016
>>
>>
>>
>> On Thu, Oct 19, 2017 at 12:08 PM, Rob Vesse <rv...@dotnetrdf.org> wrote:
>>>
>>> Marco
>>>
>>> I don’t believe anyone has tried to contact them yet
>>>
>>> I think that the complaints here are that there doesn’t appear to have
>>> been any attempt to report the issues identified back to the projects
>>> studied. If this was a security flaw in the project the standard Industry
>>> and research practice would be to make a responsible disclosure to the
>>> projects in advance of the public disclosure such that the researchers and
>>> projects can work together to resolve the problem. The implication being
>>> that it is irresponsible for the authors to benefit from pointing out flaws
>>> in the projects while appearing to make no efforts to help report/resolve
>>> those issues.
>>>
>>> As you suggest this paper does appear to be based upon some thesis work,
>>> that thesis indicates that the research was originally carried out in 2015
>>> implying that the author knew of the issue two years ago.
>>>
>>> The project has a relatively small core of developers most of whom work
>>> on Jena on the side. We very much rely upon the wider community to provide
>>> input on bugs that need to be resolved e.g. Performance issues and the
>>> features we should prioritise. When someone clearly knew of a problem but
>>> didn’t tell us that is inevitably frustrating for the project.
>>>
>>> Rob
>>>
>>> On 19/10/2017 10:08, "Marco Neumann" <ma...@gmail.com> wrote:
>>>
>>>     did you try to contact Daniel Janke, Adrian Skubella or Steffen Staab
>>>     to get a response?
>>>
>>>     the findings seem to based on work that has been published online as
>>>     part of a bachelor’s thesis by Adrian Skubella.
>>>
>>>
>>> https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf
>>>
>>>
>>>
>>>     On Thu, Oct 19, 2017 at 10:54 AM, Lorenz B.
>>> <co...@googlemail.com> wrote:
>>>     > For me this is really bad practice. It also looks like they did the
>>>     > benchmark more than one year ago. Otherwise due to JENA-1195 this
>>> error
>>>     > wouldn't occur anymore. And submission deadline was August 6th,
>>> 2017 .
>>>     > Their experiments contain 8 queries, rerunning those shouldn't take
>>> ages...
>>>     >
>>>     > I'm currently trying to reproduce the results of the paper, but the
>>>     > whole experimental setup remains unclear. I'm wondering if they
>>> used
>>>     > just the Jena CLI or TDB. The same holds for RDF4J. I'm puzzled
>>> because
>>>     > the runtimes in the eval section are quite small, but even loading
>>> the
>>>     > data of their benchmark takes much more time. So maybe they used
>>> the
>>>     > RDF4J server.
>>>     >
>>>     > The worst thing is that they didn't contact any of the developers.
>>> Or
>>>     > did they talk to somebody here and then Andy created the ticket
>>>     > JENA-1195? Also for the other queries that failed, I would expect
>>> to see
>>>     > tickets on Apache JIRA or at least a hint on the Jena mailing
>>> list...
>>>     >
>>>     > @Andy I'm also wondering whether JENA-1317 addresses the problem
>>> with
>>>     > the empty result of benchmark query containing an inverse property
>>> path.
>>>     >
>>>     >
>>>     > On 18.10.2017 17:03, ajs6f@apache.org wrote:
>>>     >> As you know, Andy, I'm going to ISWC this year-- shall I
>>> buttonhole
>>>     >> them and give them our POV? :grin:
>>>     >>
>>>     >> In all seriousness, from what I can tell the results amount to
>>> "Using
>>>     >> older versions of our comparands and without contacting the
>>> projects
>>>     >> in question we couldn't find a store that implements every
>>> property
>>>     >> path feature correctly and some fail entirely."
>>>     >>
>>>     >> I'm not really sure how useful that information is...? But I am
>>> ready
>>>     >> to do a benchmarking paper for next year. Seems like it's a lot
>>> easier
>>>     >> than I thought!
>>>     >>
>>>     >>
>>>     >> ajs6f
>>>     >>
>>>     >>
>>>     >> Andy Seaborne wrote on 10/17/17 9:28 AM:
>>>     >>> Hi Lorenz,
>>>     >>>
>>>     >>> Looks like JENA-1195 which is fixed.  Does that look like it?
>>>     >>>
>>>     >>> I think it is shame when papers focus on bugs rather than
>>> discussing
>>>     >>> and even fixing them.  Bugs aren't research.
>>>     >>>
>>>     >>> Path evaluation could improved to stream in more cases (that's
>>> why
>>>     >>> LIMIT didn't help), but 1195 explains the slowness
>>>     >>> and memory.
>>>     >>>
>>>     >>>     Andy
>>>     >>>
>>>     >>> On 17/10/17 07:58, Lorenz B. wrote:
>>>     >>>> Hi,
>>>     >>>>
>>>     >>>> I just walked through the papers for the upcoming ISWC
>>> conference and
>>>     >>>> found a paper about benchmarking of SPARQL property paths [1] .
>>>     >>>>
>>>     >>>> Not sure if this is relevant, but it looks like Jena has some
>>> issues
>>>     >>>> with different types of queries using the property path. For
>>> example,
>>>     >>>>
>>>     >>>> SELECT ?o WHERE {A B* ?o.} LIMIT 100
>>>     >>>>
>>>     >>>> lead to an OOM error on non-cyclic data. Here is the relevant
>>> part of
>>>     >>>> the paper:
>>>     >>>>
>>>     >>>>> While benchmarking Virtuoso, RDF4J and Allegrograph no errors
>>> or
>>>     >>>>> exceptions have occurred. During the benchmark process of Jena
>>> an
>>>     >>>>> OutOfMemoryError has been thrown whenever a query with the *
>>> operator
>>>     >>>>> was used. In order to identify the cause of the error, the
>>> amount of
>>>     >>>>> results the query should return has been limited to 100. The
>>> results
>>>     >>>>> that have been returned by a query of the form SELECT ?o WHERE
>>> {A B*
>>>     >>>>> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100
>>> times A.
>>>     >>>>> Due to this fact it is presumable that the query containing the
>>> *
>>>     >>>>> operator returns A recursively until the main memory was full.
>>> To
>>>     >>>>> ensure that this behaviour is not caused by cycles in the
>>> dataset a
>>>     >>>>> query of the same form but with a predicate IRI that did not
>>> exist in
>>>     >>>>> the dataset was executed. This query still returned 100 times
>>> A. This
>>>     >>>>> indicates, that the * operator is not implemented correctly.
>>>     >>>> In addition, the experiments showed that:
>>>     >>>>> Due to the problems with the * operator the queries 4, 7 and 8
>>> could
>>>     >>>>> not be processed. Additionally query 3, 5, and 6 returned no
>>> results
>>>     >>>>> after 1 hour and thus, were aborted. Query 1 returned an empty
>>> and
>>>     >>>>> thus, incomplete result set. Only for query 2 a valid result
>>> was
>>>     >>>>> returned. Due to the lack of comparable results, Jena has been
>>> omitted
>>>     >>>>> in the comparison of triple stores.
>>>     >>>>
>>>     >>>> In the discussion section, they summarize the overall
>>> performance of
>>>     >>>> Jena by
>>>     >>>>
>>>     >>>>> Jena could not return results for any query in under 1 hour
>>> besides
>>>     >>>>> query 2. Furthermore, the * operator could not be evaluated at
>>> all and
>>>     >>>>> the inverse operator returned empty result sets.
>>>     >>>>
>>>     >>>> It looks like they used version 3.0.1, so maybe this doesn't
>>> hold
>>>     >>>> anymore for all of the queries. If not, it could be interesting
>>> to
>>>     >>>> improve performance and/or completeness.
>>>     >>>>
>>>     >>>> I hope I didn't miss some open JIRA ticket, but in general I
>>> just
>>>     >>>> wanted
>>>     >>>> to highlight the presence of some published benchmark for those
>>> kind of
>>>     >>>> queries.
>>>     >>>>
>>>     >>>>
>>>     >>>> Cheers,
>>>     >>>>
>>>     >>>> Lorenz
>>>     >>>>
>>>     >>>> [1] http://ceur-ws.org/Vol-1932/paper-04.pdf
>>>     >>>>
>>>     >
>>>
>>>
>>>
>>>     --
>>>
>>>
>>>     ---
>>>     Marco Neumann
>>>     KONA
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>



-- 


---
Marco Neumann
KONA

Re: Property Paths benchmark @ ISWC2017

Posted by Marco Neumann <ma...@gmail.com>.

On Sat, 21 Oct 2017 at 17:36, Andy Seaborne <an...@apache.org> wrote:

>
>
> On 21/10/17 16:14, ajs6f@apache.org wrote:
> > I think Rob's suggested message is pretty reasonable. I think what we
> > can do in this situation is to help open a larger conversation about
> > what is fair and what is desirable for this kind of research.
> >
> > ajs6f
>
> No problem with that.
>
> I did some editting on it to emphasis the Code of practice, and away
> from the incident:
>
> ---------------------------
>
>
> On the Responsible Disclosure of Benchmarking Results
>
>
> The Apache Jena PMC would like to suggest to the benchmarking community
> that they adopt a code of practice that will improve benchmarking
> semantic web systems by focusing on the contribution to the literature
> and away from transient details.
>
> The PMC was recently made aware of a paper scheduled to be presented at
> the Workshop on Benchmarking Linked Data (BLINK) at ISWC 2017. The paper
> in question provides a new benchmark for property paths.
>
> We are disappointed that the authors identified a deficiency in our
> project's implementation about which they made no attempt to contact us.
> Indeed, our public JIRA has independently reported tickets that are
> relevant.  A fix has been available for some time.
>
> We are by no means the only project affected, other correctness and
> performance issues across several projects were identified in this
> paper.
>
> We wish to raise a general issue we and others in our community perceive
> across this field of research.
>
> Investigation and analysis of algorithms and designs should not be based
> on engineering details.
>
> As an open source project maintained by volunteers we rely upon the
> wider community, both in industry and academia, to bring issues to our
> attention in a timely fashion.
>
> If this was a security flaw the expected standard practice would be to
> responsibly disclose the issue to the affected projects and work with
> those projects to address the issue.
>
> Many of us have a background in scientific research and appreciate that
> research often happens on tight timelines but simply sending a short to
> a project about an identified issue is not unreasonable.


jjust a typo here > short "notice" to


>
> We would like to suggest to the benchmarking community that they adopt a
> code of good practice that encourages feedback leading to high-quality
> results for the long term benefit of other researchers.
>
> If you could raise the topic of responsible disclosure of issues
> identified during the course of your workshop that would be much
> appreciated.
>
> Regards,
>
> The Apache Jena PMC
>
-- 


---
Marco Neumann
KONA

Re: Property Paths benchmark @ ISWC2017

Posted by Osma Suominen <os...@helsinki.fi>.

Andy Seaborne kirjoitti 23.10.2017 klo 17:59:
> Process : if we all agree with some text, then shoudl we email it to 
> semantic-web@w3?  I'd prefer if it wasn't me sending it to show it's the 
> PMC.

I'm fine with any recent version of the text.

> I'm neutral to that - I didn't want the message to be too much about the 
> specific paper - it's an MSc piece of work, researcher in training.
> 
> I am annoyed by the revision of dates to July/2017 (while the work was 
> done in 2015, it's not the first thing you come across). That in itself 
> is poor.

It's standard practice to include "last retrieved" dates for URL 
references in papers, in case the web site dies or changes substantially 
after the paper was written. As already pointed out, they probably just 
bumped the date when submitting the paper to indicate that they 
re-checked the Jena site and it was still there. Unfortunately, since 
the URL was for the Jena project and they mention a specific (older) 
Jena release, the result is confusing.

The reference could have been formulated better, but it's not wrong.

> We don't say "report in public". That's for discussion.

Right.

> This work isn't new - it's 2015 reworked.  So even by the argument of 
> secret, they could have done something.

True, they could at least have contacted the project after the original 
paper/thesis was published.

-Osma

-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Property Paths benchmark @ ISWC2017

Posted by Andy Seaborne <an...@apache.org>.

Process : if we all agree with some text, then shoudl we email it to 
semantic-web@w3?  I'd prefer if it wasn't me sending it to show it's the 
PMC.

Comments inline.

On 23/10/17 13:13, Osma Suominen wrote:
> +1 for the edited text, but "A fix has been available for some time" 
> looks a bit too vague for my taste. It could be understood as "we fixed 
> this last week so you shouldn't complain about it". So maybe add a 
> version number and date. Also mentioning the JIRA ticket numbers would 
> make this statement more transparent.

I'm neutral to that - I didn't want the message to be too much about the 
specific paper - it's an MSc piece of work, researcher in training.

I am annoyed by the revision of dates to July/2017 (while the work was 
done in 2015, it's not the first thing you come across). That in itself 
is poor.

> As a former PhD student who has done some "benchmarking" style papers 
> (albeit about SKOS dataset quality, not software) I can somewhat 
> sympathize with the researchers' point of view here. If you've come up 
> with a new benchmark and spent a non-trivial amount of time testing 
> various software packages, you're likely to find a number of problems in 
> them. Reporting back all of them through various channels can seem a bit 
> of an extra burden - and maybe you don't want to explain your benchmark 
> in a public forum such as an issue tracker or mailing list before you've 
> published a paper about it, in case you're afraid of someone stealing 
> your ideas.

We don't say "report in public". That's for discussion.

This work isn't new - it's 2015 reworked.  So even by the argument of 
secret, they could have done something.

I don't see why it can't be public - it's the nature of open source (AKA 
free) - researchers keeping secrets is the ambush games at the end of 
SPARQL 1.1 is exploitation for personal publicity.

I am annoyed by the revision of dates to July/2017 (while the work was 
done in 2015, it's not the first thing you come across). That in itself 
is poor - it's marketing FUD and deserves a public correction from them.

> I think it would make sense for workshops like this to require that the 
> tested tools are recent enough - say, no more than three or six months 
> behind the latest official release. This wouldn't enforce contacting the 
> authors (which can be problematic, e.g. for the above reasons) but would 
> at least make the results more relevant for comparisons and prevent the 
> situation we had here, where the problem reported in the paper was 
> apparently fixed some time ago, independently of the research.
> 
> -Osma
> 
> 
> Andy Seaborne kirjoitti 21.10.2017 klo 18:36:
> 
>> I did some editting on it to emphasis the Code of practice, and away 
>> from the incident:
>>
>> ---------------------------
>>
>>
>> On the Responsible Disclosure of Benchmarking Results
>>
>>
>> The Apache Jena PMC would like to suggest to the benchmarking community
>> that they adopt a code of practice that will improve benchmarking
>> semantic web systems by focusing on the contribution to the literature
>> and away from transient details.
>>
>> The PMC was recently made aware of a paper scheduled to be presented at
>> the Workshop on Benchmarking Linked Data (BLINK) at ISWC 2017. The paper
>> in question provides a new benchmark for property paths.
>>
>> We are disappointed that the authors identified a deficiency in our
>> project's implementation about which they made no attempt to contact us.
>> Indeed, our public JIRA has independently reported tickets that are
>> relevant.  A fix has been available for some time.
>>
>> We are by no means the only project affected, other correctness and
>> performance issues across several projects were identified in this
>> paper.
>>
>> We wish to raise a general issue we and others in our community perceive
>> across this field of research.
>>
>> Investigation and analysis of algorithms and designs should not be based
>> on engineering details.
>>
>> As an open source project maintained by volunteers we rely upon the
>> wider community, both in industry and academia, to bring issues to our
>> attention in a timely fashion.
>>
>> If this was a security flaw the expected standard practice would be to
>> responsibly disclose the issue to the affected projects and work with
>> those projects to address the issue.
>>
>> Many of us have a background in scientific research and appreciate that
>> research often happens on tight timelines but simply sending a short to
>> a project about an identified issue is not unreasonable.
>>
>> We would like to suggest to the benchmarking community that they adopt a
>> code of good practice that encourages feedback leading to high-quality
>> results for the long term benefit of other researchers.
>>
>> If you could raise the topic of responsible disclosure of issues
>> identified during the course of your workshop that would be much
>> appreciated.
>>
>> Regards,
>>
>> The Apache Jena PMC
> 
>

Re: Property Paths benchmark @ ISWC2017

Posted by Osma Suominen <os...@helsinki.fi>.

+1 for the edited text, but "A fix has been available for some time" 
looks a bit too vague for my taste. It could be understood as "we fixed 
this last week so you shouldn't complain about it". So maybe add a 
version number and date. Also mentioning the JIRA ticket numbers would 
make this statement more transparent.


As a former PhD student who has done some "benchmarking" style papers 
(albeit about SKOS dataset quality, not software) I can somewhat 
sympathize with the researchers' point of view here. If you've come up 
with a new benchmark and spent a non-trivial amount of time testing 
various software packages, you're likely to find a number of problems in 
them. Reporting back all of them through various channels can seem a bit 
of an extra burden - and maybe you don't want to explain your benchmark 
in a public forum such as an issue tracker or mailing list before you've 
published a paper about it, in case you're afraid of someone stealing 
your ideas.

I think it would make sense for workshops like this to require that the 
tested tools are recent enough - say, no more than three or six months 
behind the latest official release. This wouldn't enforce contacting the 
authors (which can be problematic, e.g. for the above reasons) but would 
at least make the results more relevant for comparisons and prevent the 
situation we had here, where the problem reported in the paper was 
apparently fixed some time ago, independently of the research.

-Osma


Andy Seaborne kirjoitti 21.10.2017 klo 18:36:

> I did some editting on it to emphasis the Code of practice, and away 
> from the incident:
> 
> ---------------------------
> 
> 
> On the Responsible Disclosure of Benchmarking Results
> 
> 
> The Apache Jena PMC would like to suggest to the benchmarking community
> that they adopt a code of practice that will improve benchmarking
> semantic web systems by focusing on the contribution to the literature
> and away from transient details.
> 
> The PMC was recently made aware of a paper scheduled to be presented at
> the Workshop on Benchmarking Linked Data (BLINK) at ISWC 2017. The paper
> in question provides a new benchmark for property paths.
> 
> We are disappointed that the authors identified a deficiency in our
> project's implementation about which they made no attempt to contact us.
> Indeed, our public JIRA has independently reported tickets that are
> relevant.  A fix has been available for some time.
> 
> We are by no means the only project affected, other correctness and
> performance issues across several projects were identified in this
> paper.
> 
> We wish to raise a general issue we and others in our community perceive
> across this field of research.
> 
> Investigation and analysis of algorithms and designs should not be based
> on engineering details.
> 
> As an open source project maintained by volunteers we rely upon the
> wider community, both in industry and academia, to bring issues to our
> attention in a timely fashion.
> 
> If this was a security flaw the expected standard practice would be to
> responsibly disclose the issue to the affected projects and work with
> those projects to address the issue.
> 
> Many of us have a background in scientific research and appreciate that
> research often happens on tight timelines but simply sending a short to
> a project about an identified issue is not unreasonable.
> 
> We would like to suggest to the benchmarking community that they adopt a
> code of good practice that encourages feedback leading to high-quality
> results for the long term benefit of other researchers.
> 
> If you could raise the topic of responsible disclosure of issues
> identified during the course of your workshop that would be much
> appreciated.
> 
> Regards,
> 
> The Apache Jena PMC


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Property Paths benchmark @ ISWC2017

Posted by Andy Seaborne <an...@apache.org>.

On 21/10/17 16:14, ajs6f@apache.org wrote:
> I think Rob's suggested message is pretty reasonable. I think what we 
> can do in this situation is to help open a larger conversation about 
> what is fair and what is desirable for this kind of research.
> 
> ajs6f

No problem with that.

I did some editting on it to emphasis the Code of practice, and away 
from the incident:

---------------------------

On the Responsible Disclosure of Benchmarking Results

The Apache Jena PMC would like to suggest to the benchmarking community
that they adopt a code of practice that will improve benchmarking
semantic web systems by focusing on the contribution to the literature
and away from transient details.

The PMC was recently made aware of a paper scheduled to be presented at
the Workshop on Benchmarking Linked Data (BLINK) at ISWC 2017. The paper
in question provides a new benchmark for property paths.

We are disappointed that the authors identified a deficiency in our
project's implementation about which they made no attempt to contact us.
Indeed, our public JIRA has independently reported tickets that are
relevant.  A fix has been available for some time.

We are by no means the only project affected, other correctness and
performance issues across several projects were identified in this
paper.

We wish to raise a general issue we and others in our community perceive
across this field of research.

Investigation and analysis of algorithms and designs should not be based
on engineering details.

As an open source project maintained by volunteers we rely upon the
wider community, both in industry and academia, to bring issues to our
attention in a timely fashion.

If this was a security flaw the expected standard practice would be to
responsibly disclose the issue to the affected projects and work with
those projects to address the issue.

Many of us have a background in scientific research and appreciate that
research often happens on tight timelines but simply sending a short to
a project about an identified issue is not unreasonable.

We would like to suggest to the benchmarking community that they adopt a
code of good practice that encourages feedback leading to high-quality
results for the long term benefit of other researchers.

If you could raise the topic of responsible disclosure of issues
identified during the course of your workshop that would be much
appreciated.

Regards,

The Apache Jena PMC

Re: Property Paths benchmark @ ISWC2017

Posted by aj...@apache.org.

I think Rob's suggested message is pretty reasonable. I think what we can do in this situation is to help open a larger 
conversation about what is fair and what is desirable for this kind of research.

ajs6f

Andy Seaborne wrote on 10/20/17 5:30 PM:
>
>
> On 20/10/17 11:13, Rob Vesse wrote:
>>
>> On 20/10/2017 15:56, "Andy Seaborne" <an...@apache.org> wrote:
>>
>>      Given this, references to the 2015 are spurious and misleading.
>>
>>   If you read the original bachelors thesis that Marco referenced [1] the equivalent text and the footnote is as follows:
>>
>>         3 https://jena.apache.org/ retrieved at 13.12.2015
>>
>> Which would indeed be Jena 3.0.1, so the original research was started in December 2015 and completed sometime between
>> then and July 2016 when that thesis was submitted.
>
> I'm not disputing that at all - but the average reader will read the paper and that's what it claims.  Clearly its wrong
> because we look harder; others may take it at face value.
>
>>
>> I would guess that when it was reformatted into a workshop paper they simply checked that all the URLs still worked
>> and updated the footnotes accordingly
>>
>>   Maybe we are just splitting hairs and expecting too much, it just frustrates me when someone discovers a problem and
>> makes no effort to resolve it
>
> +1
>
>>
>> Rob
>>
>> [1]
>> https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf
>>
>>
>>
>>
>>
>>
>>

Re: Property Paths benchmark @ ISWC2017

Posted by Andy Seaborne <an...@apache.org>.


On 20/10/17 11:13, Rob Vesse wrote:
> 
> On 20/10/2017 15:56, "Andy Seaborne" <an...@apache.org> wrote:
> 
>      Given this, references to the 2015 are spurious and misleading.
> 
>   If you read the original bachelors thesis that Marco referenced [1] the equivalent text and the footnote is as follows:
> 
>         3 https://jena.apache.org/ retrieved at 13.12.2015
> 
> Which would indeed be Jena 3.0.1, so the original research was started in December 2015 and completed sometime between then and July 2016 when that thesis was submitted.

I'm not disputing that at all - but the average reader will read the 
paper and that's what it claims.  Clearly its wrong because we look 
harder; others may take it at face value.

> 
> I would guess that when it was reformatted into a workshop paper they simply checked that all the URLs still worked and updated the footnotes accordingly
> 
>   Maybe we are just splitting hairs and expecting too much, it just frustrates me when someone discovers a problem and makes no effort to resolve it

+1

> 
> Rob
> 
> [1] https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf
> 
> 
> 
> 
> 
>

Re: Property Paths benchmark @ ISWC2017

Posted by Rob Vesse <rv...@dotnetrdf.org>.

On 20/10/2017 15:56, "Andy Seaborne" <an...@apache.org> wrote:

    Given this, references to the 2015 are spurious and misleading.

 If you read the original bachelors thesis that Marco referenced [1] the equivalent text and the footnote is as follows:

       3 https://jena.apache.org/ retrieved at 13.12.2015

Which would indeed be Jena 3.0.1, so the original research was started in December 2015 and completed sometime between then and July 2016 when that thesis was submitted.

I would guess that when it was reformatted into a workshop paper they simply checked that all the URLs still worked and updated the footnotes accordingly

 Maybe we are just splitting hairs and expecting too much, it just frustrates me when someone discovers a problem and makes no effort to resolve it

Rob

[1] https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf

Re: Property Paths benchmark @ ISWC2017

Posted by Andy Seaborne <an...@apache.org>.

They say:
----
... Apache Jena 3.0.1 [7] ...

[7] https://jena.apache.org/ retrieved at 3.7.2017
----

but 3/July/17 is 3.3.0 which has JENA-1195 fixed (in Jena 3.1.1) which 
is careless and invalidates their figures.

Given this, references to the 2015 are spurious and misleading.



But is this research? A decent MSc project to build skills, but 
benchmarking systems as research? Algorithms analysis maybe, but not 
black-box performance.

And for Jena, after all, it is open source!

No consideration of the coverage or not of the SPARQL 1.1 test suite?

I quite like the idea of responsible benchmarking because ambush-papers 
were also a feature of SPARQL 1.1 creation.

That said, we ought not to be worry that much, see it as good that Jena 
is considered.  Much worse things are said by sales teams about open 
source software, including about Jena. (Example: our SDB page rightly 
says "don't use it anymore" and that is used as "evidence" that the 
whole of Jena is suspect.)

     Andy

And, details, ...

PREFIX foaf: http://xmlns.com/foaf/0.1/
...
<foaf:knows>∗

sad face.

Re: Property Paths benchmark @ ISWC2017

Posted by Rob Vesse <rv...@dotnetrdf.org>.

I wouldn’t want you to derail yourself from spending time at whatever sessions are most relevant to yourself

Perhaps we as a PMC could draft a general statement about responsible disclosure and submit it to the workshop organisers asking them to raise the topic as part of their workshop. We are not the only project affected by this. Off the top of my head how about something like the following:

---

On the Responsible Disclosure of Benchmarking Results 

The Apache Jena PMC was recently made aware of a paper scheduled to be presented at your Workshop on Benchmarking Linked Data (BLINK) at ISWC 2017. The paper in question provides a new benchmark for property paths which is undoubtedly a valuable contribution to the field. However, we were disappointed to learn on reading the paper that the authors identified a serious deficiency in our projects implementation about which they made no attempt to contact us as far as we can tell. Our main disappointment stems from our discovery that the authors own self cited prior work suggests that this research may have been carried out in part as early as 2015 based upon software versions and retrieval dates cited in that earlier work.  We are by no means the only project affected, other correctness and performance issues across several projects were identified in this paper.

We do not wish to single out a specific paper but rather to raise a general problem we and others in our community perceive across this field of research.

As an open source project maintained by a relatively small core of volunteers we rely upon the wider community, both in industry and academia, to bring issues to our attention in a timely fashion. If this was a security flaw the expected standard practice would be to responsibly disclose the issue to the affected projects and work with those projects to address the issue. This often improves the contribution of the research because it doesn’t just identify the problem but helps to resolve it. Many of us have a background in scientific research and appreciate that research often happens on tight timelines but simply sending a short email to a project about an identified issue is not an unreasonable ask.

We would like to request that the benchmarking community adopt these practices in their work. If you could raise the topic of responsible disclosure of issues identified during the course of your workshop that would be much appreciated.

Regards,

The Apache Jena PMC


How does that sound?

Note that I did a search of our email archives and JIRA and couldn’t find any obvious mention of any of the authors names other than in this thread.

Rob

On 19/10/2017 14:51, "ajs6f@apache.org" <aj...@apache.org> wrote:

    I hadn't intended to spend time at the benchmarking sessions at ISWC, but if it seems useful, I can try and raise this 
    issue in person. I suppose partly it's a question of setting the record straight, and then partly it's a question of 
    standing up for good practice, and then it's also a question of protecting Jena from unmerited negative consequences.
    
    I don't know how widely used such benchmarks are. Except for a few high-profile projects, I rarely see anyone refer to 
    this sort of evidence as a reason to or not to adopt a system.
    
    
    ajs6f
    
    Marco Neumann wrote on 10/19/17 9:26 AM:
    > Rob,
    >
    > unfortunately this is more common in Semantic Web research papers than
    > one might expect. I have seen this before in particular with regards
    > to perceived shortcomings of jena or its components. It might be a
    > good idea to bring this to the attention of affiliated people in the
    > organisation (here University of Southampton and Koblenz-Landau ).
    >
    > while I don't think this is an intentional attempt to bring Jena into
    > disrepute the situation could be clarified and addressed by the ISWC
    > workshop or track chair as well. I wish your mentioned "standard
    > Industry and research practice" would be more common than it currently
    > is.
    >
    > btw the thesis report is dated Juli 2016
    >
    >
    >
    > On Thu, Oct 19, 2017 at 12:08 PM, Rob Vesse <rv...@dotnetrdf.org> wrote:
    >> Marco
    >>
    >> I don’t believe anyone has tried to contact them yet
    >>
    >> I think that the complaints here are that there doesn’t appear to have been any attempt to report the issues identified back to the projects studied. If this was a security flaw in the project the standard Industry and research practice would be to make a responsible disclosure to the projects in advance of the public disclosure such that the researchers and projects can work together to resolve the problem. The implication being that it is irresponsible for the authors to benefit from pointing out flaws in the projects while appearing to make no efforts to help report/resolve those issues.
    >>
    >> As you suggest this paper does appear T based upon some thesis work, that thesis indicates that the research was originally carried out in 2015 implying that the author knew of the issue two years ago.
    >>
    >> The project has a relatively small core of developers most of whom work on Jena on the side. We very much rely upon the wider community to provide input on bugs that need to be resolved e.g. Performance issues and the features we should prioritise. When someone clearly knew of a problem but didn’t tell us that is inevitably frustrating for the project.
    >>
    >> Rob
    >>
    >> On 19/10/2017 10:08, "Marco Neumann" <ma...@gmail.com> wrote:
    >>
    >>     did you try to contact Daniel Janke, Adrian Skubella or Steffen Staab
    >>     to get a response?
    >>
    >>     the findings seem to based on work that has been published online as
    >>     part of a bachelor’s thesis by Adrian Skubella.
    >>
    >>     https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf
    >>
    >>
    >>
    >>     On Thu, Oct 19, 2017 at 10:54 AM, Lorenz B. <co...@googlemail.com> wrote:
    >>     > For me this is really bad practice. It also looks like they did the
    >>     > benchmark more than one year ago. Otherwise due to JENA-1195 this error
    >>     > wouldn't occur anymore. And submission deadline was August 6th, 2017 .
    >>     > Their experiments contain 8 queries, rerunning those shouldn't take ages...
    >>     >
    >>     > I'm currently trying to reproduce the results of the paper, but the
    >>     > whole experimental setup remains unclear. I'm wondering if they used
    >>     > just the Jena CLI or TDB. The same holds for RDF4J. I'm puzzled because
    >>     > the runtimes in the eval section are quite small, but even loading the
    >>     > data of their benchmark takes much more time. So maybe they used the
    >>     > RDF4J server.
    >>     >
    >>     > The worst thing is that they didn't contact any of the developers. Or
    >>     > did they talk to somebody here and then Andy created the ticket
    >>     > JENA-1195? Also for the other queries that failed, I would expect to see
    >>     > tickets on Apache JIRA or at least a hint on the Jena mailing list...
    >>     >
    >>     > @Andy I'm also wondering whether JENA-1317 addresses the problem with
    >>     > the empty result of benchmark query containing an inverse property path.
    >>     >
    >>     >
    >>     > On 18.10.2017 17:03, ajs6f@apache.org wrote:
    >>     >> As you know, Andy, I'm going to ISWC this year-- shall I buttonhole
    >>     >> them and give them our POV? :grin:
    >>     >>
    >>     >> In all seriousness, from what I can tell the results amount to "Using
    >>     >> older versions of our comparands and without contacting the projects
    >>     >> in question we couldn't find a store that implements every property
    >>     >> path feature correctly and some fail entirely."
    >>     >>
    >>     >> I'm not really sure how useful that information is...? But I am ready
    >>     >> to do a benchmarking paper for next year. Seems like it's a lot easier
    >>     >> than I thought!
    >>     >>
    >>     >>
    >>     >> ajs6f
    >>     >>
    >>     >>
    >>     >> Andy Seaborne wrote on 10/17/17 9:28 AM:
    >>     >>> Hi Lorenz,
    >>     >>>
    >>     >>> Looks like JENA-1195 which is fixed.  Does that look like it?
    >>     >>>
    >>     >>> I think it is shame when papers focus on bugs rather than discussing
    >>     >>> and even fixing them.  Bugs aren't research.
    >>     >>>
    >>     >>> Path evaluation could improved to stream in more cases (that's why
    >>     >>> LIMIT didn't help), but 1195 explains the slowness
    >>     >>> and memory.
    >>     >>>
    >>     >>>     Andy
    >>     >>>
    >>     >>> On 17/10/17 07:58, Lorenz B. wrote:
    >>     >>>> Hi,
    >>     >>>>
    >>     >>>> I just walked through the papers for the upcoming ISWC conference and
    >>     >>>> found a paper about benchmarking of SPARQL property paths [1] .
    >>     >>>>
    >>     >>>> Not sure if this is relevant, but it looks like Jena has some issues
    >>     >>>> with different types of queries using the property path. For example,
    >>     >>>>
    >>     >>>> SELECT ?o WHERE {A B* ?o.} LIMIT 100
    >>     >>>>
    >>     >>>> lead to an OOM error on non-cyclic data. Here is the relevant part of
    >>     >>>> the paper:
    >>     >>>>
    >>     >>>>> While benchmarking Virtuoso, RDF4J and Allegrograph no errors or
    >>     >>>>> exceptions have occurred. During the benchmark process of Jena an
    >>     >>>>> OutOfMemoryError has been thrown whenever a query with the * operator
    >>     >>>>> was used. In order to identify the cause of the error, the amount of
    >>     >>>>> results the query should return has been limited to 100. The results
    >>     >>>>> that have been returned by a query of the form SELECT ?o WHERE {A B*
    >>     >>>>> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A.
    >>     >>>>> Due to this fact it is presumable that the query containing the *
    >>     >>>>> operator returns A recursively until the main memory was full. To
    >>     >>>>> ensure that this behaviour is not caused by cycles in the dataset a
    >>     >>>>> query of the same form but with a predicate IRI that did not exist in
    >>     >>>>> the dataset was executed. This query still returned 100 times A. This
    >>     >>>>> indicates, that the * operator is not implemented correctly.
    >>     >>>> In addition, the experiments showed that:
    >>     >>>>> Due to the problems with the * operator the queries 4, 7 and 8 could
    >>     >>>>> not be processed. Additionally query 3, 5, and 6 returned no results
    >>     >>>>> after 1 hour and thus, were aborted. Query 1 returned an empty and
    >>     >>>>> thus, incomplete result set. Only for query 2 a valid result was
    >>     >>>>> returned. Due to the lack of comparable results, Jena has been omitted
    >>     >>>>> in the comparison of triple stores.
    >>     >>>>
    >>     >>>> In the discussion section, they summarize the overall performance of
    >>     >>>> Jena by
    >>     >>>>
    >>     >>>>> Jena could not return results for any query in under 1 hour besides
    >>     >>>>> query 2. Furthermore, the * operator could not be evaluated at all and
    >>     >>>>> the inverse operator returned empty result sets.
    >>     >>>>
    >>     >>>> It looks like they used version 3.0.1, so maybe this doesn't hold
    >>     >>>> anymore for all of the queries. If not, it could be interesting to
    >>     >>>> improve performance and/or completeness.
    >>     >>>>
    >>     >>>> I hope I didn't miss some open JIRA ticket, but in general I just
    >>     >>>> wanted
    >>     >>>> to highlight the presence of some published benchmark for those kind of
    >>     >>>> queries.
    >>     >>>>
    >>     >>>>
    >>     >>>> Cheers,
    >>     >>>>
    >>     >>>> Lorenz
    >>     >>>>
    >>     >>>> [1] http://ceur-ws.org/Vol-1932/paper-04.pdf
    >>     >>>>
    >>     >
    >>
    >>
    >>
    >>     --
    >>
    >>
    >>     ---
    >>     Marco Neumann
    >>     KONA
    >>
    >>
    >>
    >>
    >>
    >
    >
    >

Re: Property Paths benchmark @ ISWC2017

Posted by aj...@apache.org.

I hadn't intended to spend time at the benchmarking sessions at ISWC, but if it seems useful, I can try and raise this 
issue in person. I suppose partly it's a question of setting the record straight, and then partly it's a question of 
standing up for good practice, and then it's also a question of protecting Jena from unmerited negative consequences.

I don't know how widely used such benchmarks are. Except for a few high-profile projects, I rarely see anyone refer to 
this sort of evidence as a reason to or not to adopt a system.


ajs6f

Marco Neumann wrote on 10/19/17 9:26 AM:
> Rob,
>
> unfortunately this is more common in Semantic Web research papers than
> one might expect. I have seen this before in particular with regards
> to perceived shortcomings of jena or its components. It might be a
> good idea to bring this to the attention of affiliated people in the
> organisation (here University of Southampton and Koblenz-Landau ).
>
> while I don't think this is an intentional attempt to bring Jena into
> disrepute the situation could be clarified and addressed by the ISWC
> workshop or track chair as well. I wish your mentioned "standard
> Industry and research practice" would be more common than it currently
> is.
>
> btw the thesis report is dated Juli 2016
>
>
>
> On Thu, Oct 19, 2017 at 12:08 PM, Rob Vesse <rv...@dotnetrdf.org> wrote:
>> Marco
>>
>> I don’t believe anyone has tried to contact them yet
>>
>> I think that the complaints here are that there doesn’t appear to have been any attempt to report the issues identified back to the projects studied. If this was a security flaw in the project the standard Industry and research practice would be to make a responsible disclosure to the projects in advance of the public disclosure such that the researchers and projects can work together to resolve the problem. The implication being that it is irresponsible for the authors to benefit from pointing out flaws in the projects while appearing to make no efforts to help report/resolve those issues.
>>
>> As you suggest this paper does appear to be based upon some thesis work, that thesis indicates that the research was originally carried out in 2015 implying that the author knew of the issue two years ago.
>>
>> The project has a relatively small core of developers most of whom work on Jena on the side. We very much rely upon the wider community to provide input on bugs that need to be resolved e.g. Performance issues and the features we should prioritise. When someone clearly knew of a problem but didn’t tell us that is inevitably frustrating for the project.
>>
>> Rob
>>
>> On 19/10/2017 10:08, "Marco Neumann" <ma...@gmail.com> wrote:
>>
>>     did you try to contact Daniel Janke, Adrian Skubella or Steffen Staab
>>     to get a response?
>>
>>     the findings seem to based on work that has been published online as
>>     part of a bachelor’s thesis by Adrian Skubella.
>>
>>     https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf
>>
>>
>>
>>     On Thu, Oct 19, 2017 at 10:54 AM, Lorenz B. <co...@googlemail.com> wrote:
>>     > For me this is really bad practice. It also looks like they did the
>>     > benchmark more than one year ago. Otherwise due to JENA-1195 this error
>>     > wouldn't occur anymore. And submission deadline was August 6th, 2017 .
>>     > Their experiments contain 8 queries, rerunning those shouldn't take ages...
>>     >
>>     > I'm currently trying to reproduce the results of the paper, but the
>>     > whole experimental setup remains unclear. I'm wondering if they used
>>     > just the Jena CLI or TDB. The same holds for RDF4J. I'm puzzled because
>>     > the runtimes in the eval section are quite small, but even loading the
>>     > data of their benchmark takes much more time. So maybe they used the
>>     > RDF4J server.
>>     >
>>     > The worst thing is that they didn't contact any of the developers. Or
>>     > did they talk to somebody here and then Andy created the ticket
>>     > JENA-1195? Also for the other queries that failed, I would expect to see
>>     > tickets on Apache JIRA or at least a hint on the Jena mailing list...
>>     >
>>     > @Andy I'm also wondering whether JENA-1317 addresses the problem with
>>     > the empty result of benchmark query containing an inverse property path.
>>     >
>>     >
>>     > On 18.10.2017 17:03, ajs6f@apache.org wrote:
>>     >> As you know, Andy, I'm going to ISWC this year-- shall I buttonhole
>>     >> them and give them our POV? :grin:
>>     >>
>>     >> In all seriousness, from what I can tell the results amount to "Using
>>     >> older versions of our comparands and without contacting the projects
>>     >> in question we couldn't find a store that implements every property
>>     >> path feature correctly and some fail entirely."
>>     >>
>>     >> I'm not really sure how useful that information is...? But I am ready
>>     >> to do a benchmarking paper for next year. Seems like it's a lot easier
>>     >> than I thought!
>>     >>
>>     >>
>>     >> ajs6f
>>     >>
>>     >>
>>     >> Andy Seaborne wrote on 10/17/17 9:28 AM:
>>     >>> Hi Lorenz,
>>     >>>
>>     >>> Looks like JENA-1195 which is fixed.  Does that look like it?
>>     >>>
>>     >>> I think it is shame when papers focus on bugs rather than discussing
>>     >>> and even fixing them.  Bugs aren't research.
>>     >>>
>>     >>> Path evaluation could improved to stream in more cases (that's why
>>     >>> LIMIT didn't help), but 1195 explains the slowness
>>     >>> and memory.
>>     >>>
>>     >>>     Andy
>>     >>>
>>     >>> On 17/10/17 07:58, Lorenz B. wrote:
>>     >>>> Hi,
>>     >>>>
>>     >>>> I just walked through the papers for the upcoming ISWC conference and
>>     >>>> found a paper about benchmarking of SPARQL property paths [1] .
>>     >>>>
>>     >>>> Not sure if this is relevant, but it looks like Jena has some issues
>>     >>>> with different types of queries using the property path. For example,
>>     >>>>
>>     >>>> SELECT ?o WHERE {A B* ?o.} LIMIT 100
>>     >>>>
>>     >>>> lead to an OOM error on non-cyclic data. Here is the relevant part of
>>     >>>> the paper:
>>     >>>>
>>     >>>>> While benchmarking Virtuoso, RDF4J and Allegrograph no errors or
>>     >>>>> exceptions have occurred. During the benchmark process of Jena an
>>     >>>>> OutOfMemoryError has been thrown whenever a query with the * operator
>>     >>>>> was used. In order to identify the cause of the error, the amount of
>>     >>>>> results the query should return has been limited to 100. The results
>>     >>>>> that have been returned by a query of the form SELECT ?o WHERE {A B*
>>     >>>>> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A.
>>     >>>>> Due to this fact it is presumable that the query containing the *
>>     >>>>> operator returns A recursively until the main memory was full. To
>>     >>>>> ensure that this behaviour is not caused by cycles in the dataset a
>>     >>>>> query of the same form but with a predicate IRI that did not exist in
>>     >>>>> the dataset was executed. This query still returned 100 times A. This
>>     >>>>> indicates, that the * operator is not implemented correctly.
>>     >>>> In addition, the experiments showed that:
>>     >>>>> Due to the problems with the * operator the queries 4, 7 and 8 could
>>     >>>>> not be processed. Additionally query 3, 5, and 6 returned no results
>>     >>>>> after 1 hour and thus, were aborted. Query 1 returned an empty and
>>     >>>>> thus, incomplete result set. Only for query 2 a valid result was
>>     >>>>> returned. Due to the lack of comparable results, Jena has been omitted
>>     >>>>> in the comparison of triple stores.
>>     >>>>
>>     >>>> In the discussion section, they summarize the overall performance of
>>     >>>> Jena by
>>     >>>>
>>     >>>>> Jena could not return results for any query in under 1 hour besides
>>     >>>>> query 2. Furthermore, the * operator could not be evaluated at all and
>>     >>>>> the inverse operator returned empty result sets.
>>     >>>>
>>     >>>> It looks like they used version 3.0.1, so maybe this doesn't hold
>>     >>>> anymore for all of the queries. If not, it could be interesting to
>>     >>>> improve performance and/or completeness.
>>     >>>>
>>     >>>> I hope I didn't miss some open JIRA ticket, but in general I just
>>     >>>> wanted
>>     >>>> to highlight the presence of some published benchmark for those kind of
>>     >>>> queries.
>>     >>>>
>>     >>>>
>>     >>>> Cheers,
>>     >>>>
>>     >>>> Lorenz
>>     >>>>
>>     >>>> [1] http://ceur-ws.org/Vol-1932/paper-04.pdf
>>     >>>>
>>     >
>>
>>
>>
>>     --
>>
>>
>>     ---
>>     Marco Neumann
>>     KONA
>>
>>
>>
>>
>>
>
>
>

Re: Property Paths benchmark @ ISWC2017

Posted by Marco Neumann <ma...@gmail.com>.

Rob,

unfortunately this is more common in Semantic Web research papers than
one might expect. I have seen this before in particular with regards
to perceived shortcomings of jena or its components. It might be a
good idea to bring this to the attention of affiliated people in the
organisation (here University of Southampton and Koblenz-Landau ).

while I don't think this is an intentional attempt to bring Jena into
disrepute the situation could be clarified and addressed by the ISWC
workshop or track chair as well. I wish your mentioned "standard
Industry and research practice" would be more common than it currently
is.

btw the thesis report is dated Juli 2016



On Thu, Oct 19, 2017 at 12:08 PM, Rob Vesse <rv...@dotnetrdf.org> wrote:
> Marco
>
> I don’t believe anyone has tried to contact them yet
>
> I think that the complaints here are that there doesn’t appear to have been any attempt to report the issues identified back to the projects studied. If this was a security flaw in the project the standard Industry and research practice would be to make a responsible disclosure to the projects in advance of the public disclosure such that the researchers and projects can work together to resolve the problem. The implication being that it is irresponsible for the authors to benefit from pointing out flaws in the projects while appearing to make no efforts to help report/resolve those issues.
>
> As you suggest this paper does appear to be based upon some thesis work, that thesis indicates that the research was originally carried out in 2015 implying that the author knew of the issue two years ago.
>
> The project has a relatively small core of developers most of whom work on Jena on the side. We very much rely upon the wider community to provide input on bugs that need to be resolved e.g. Performance issues and the features we should prioritise. When someone clearly knew of a problem but didn’t tell us that is inevitably frustrating for the project.
>
> Rob
>
> On 19/10/2017 10:08, "Marco Neumann" <ma...@gmail.com> wrote:
>
>     did you try to contact Daniel Janke, Adrian Skubella or Steffen Staab
>     to get a response?
>
>     the findings seem to based on work that has been published online as
>     part of a bachelor’s thesis by Adrian Skubella.
>
>     https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf
>
>
>
>     On Thu, Oct 19, 2017 at 10:54 AM, Lorenz B. <co...@googlemail.com> wrote:
>     > For me this is really bad practice. It also looks like they did the
>     > benchmark more than one year ago. Otherwise due to JENA-1195 this error
>     > wouldn't occur anymore. And submission deadline was August 6th, 2017 .
>     > Their experiments contain 8 queries, rerunning those shouldn't take ages...
>     >
>     > I'm currently trying to reproduce the results of the paper, but the
>     > whole experimental setup remains unclear. I'm wondering if they used
>     > just the Jena CLI or TDB. The same holds for RDF4J. I'm puzzled because
>     > the runtimes in the eval section are quite small, but even loading the
>     > data of their benchmark takes much more time. So maybe they used the
>     > RDF4J server.
>     >
>     > The worst thing is that they didn't contact any of the developers. Or
>     > did they talk to somebody here and then Andy created the ticket
>     > JENA-1195? Also for the other queries that failed, I would expect to see
>     > tickets on Apache JIRA or at least a hint on the Jena mailing list...
>     >
>     > @Andy I'm also wondering whether JENA-1317 addresses the problem with
>     > the empty result of benchmark query containing an inverse property path.
>     >
>     >
>     > On 18.10.2017 17:03, ajs6f@apache.org wrote:
>     >> As you know, Andy, I'm going to ISWC this year-- shall I buttonhole
>     >> them and give them our POV? :grin:
>     >>
>     >> In all seriousness, from what I can tell the results amount to "Using
>     >> older versions of our comparands and without contacting the projects
>     >> in question we couldn't find a store that implements every property
>     >> path feature correctly and some fail entirely."
>     >>
>     >> I'm not really sure how useful that information is...? But I am ready
>     >> to do a benchmarking paper for next year. Seems like it's a lot easier
>     >> than I thought!
>     >>
>     >>
>     >> ajs6f
>     >>
>     >>
>     >> Andy Seaborne wrote on 10/17/17 9:28 AM:
>     >>> Hi Lorenz,
>     >>>
>     >>> Looks like JENA-1195 which is fixed.  Does that look like it?
>     >>>
>     >>> I think it is shame when papers focus on bugs rather than discussing
>     >>> and even fixing them.  Bugs aren't research.
>     >>>
>     >>> Path evaluation could improved to stream in more cases (that's why
>     >>> LIMIT didn't help), but 1195 explains the slowness
>     >>> and memory.
>     >>>
>     >>>     Andy
>     >>>
>     >>> On 17/10/17 07:58, Lorenz B. wrote:
>     >>>> Hi,
>     >>>>
>     >>>> I just walked through the papers for the upcoming ISWC conference and
>     >>>> found a paper about benchmarking of SPARQL property paths [1] .
>     >>>>
>     >>>> Not sure if this is relevant, but it looks like Jena has some issues
>     >>>> with different types of queries using the property path. For example,
>     >>>>
>     >>>> SELECT ?o WHERE {A B* ?o.} LIMIT 100
>     >>>>
>     >>>> lead to an OOM error on non-cyclic data. Here is the relevant part of
>     >>>> the paper:
>     >>>>
>     >>>>> While benchmarking Virtuoso, RDF4J and Allegrograph no errors or
>     >>>>> exceptions have occurred. During the benchmark process of Jena an
>     >>>>> OutOfMemoryError has been thrown whenever a query with the * operator
>     >>>>> was used. In order to identify the cause of the error, the amount of
>     >>>>> results the query should return has been limited to 100. The results
>     >>>>> that have been returned by a query of the form SELECT ?o WHERE {A B*
>     >>>>> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A.
>     >>>>> Due to this fact it is presumable that the query containing the *
>     >>>>> operator returns A recursively until the main memory was full. To
>     >>>>> ensure that this behaviour is not caused by cycles in the dataset a
>     >>>>> query of the same form but with a predicate IRI that did not exist in
>     >>>>> the dataset was executed. This query still returned 100 times A. This
>     >>>>> indicates, that the * operator is not implemented correctly.
>     >>>> In addition, the experiments showed that:
>     >>>>> Due to the problems with the * operator the queries 4, 7 and 8 could
>     >>>>> not be processed. Additionally query 3, 5, and 6 returned no results
>     >>>>> after 1 hour and thus, were aborted. Query 1 returned an empty and
>     >>>>> thus, incomplete result set. Only for query 2 a valid result was
>     >>>>> returned. Due to the lack of comparable results, Jena has been omitted
>     >>>>> in the comparison of triple stores.
>     >>>>
>     >>>> In the discussion section, they summarize the overall performance of
>     >>>> Jena by
>     >>>>
>     >>>>> Jena could not return results for any query in under 1 hour besides
>     >>>>> query 2. Furthermore, the * operator could not be evaluated at all and
>     >>>>> the inverse operator returned empty result sets.
>     >>>>
>     >>>> It looks like they used version 3.0.1, so maybe this doesn't hold
>     >>>> anymore for all of the queries. If not, it could be interesting to
>     >>>> improve performance and/or completeness.
>     >>>>
>     >>>> I hope I didn't miss some open JIRA ticket, but in general I just
>     >>>> wanted
>     >>>> to highlight the presence of some published benchmark for those kind of
>     >>>> queries.
>     >>>>
>     >>>>
>     >>>> Cheers,
>     >>>>
>     >>>> Lorenz
>     >>>>
>     >>>> [1] http://ceur-ws.org/Vol-1932/paper-04.pdf
>     >>>>
>     >
>
>
>
>     --
>
>
>     ---
>     Marco Neumann
>     KONA
>
>
>
>
>



-- 


---
Marco Neumann
KONA

Re: Property Paths benchmark @ ISWC2017

Posted by Rob Vesse <rv...@dotnetrdf.org>.

Marco

I don’t believe anyone has tried to contact them yet

I think that the complaints here are that there doesn’t appear to have been any attempt to report the issues identified back to the projects studied. If this was a security flaw in the project the standard Industry and research practice would be to make a responsible disclosure to the projects in advance of the public disclosure such that the researchers and projects can work together to resolve the problem. The implication being that it is irresponsible for the authors to benefit from pointing out flaws in the projects while appearing to make no efforts to help report/resolve those issues.

As you suggest this paper does appear to be based upon some thesis work, that thesis indicates that the research was originally carried out in 2015 implying that the author knew of the issue two years ago.

The project has a relatively small core of developers most of whom work on Jena on the side. We very much rely upon the wider community to provide input on bugs that need to be resolved e.g. Performance issues and the features we should prioritise. When someone clearly knew of a problem but didn’t tell us that is inevitably frustrating for the project.

Rob

On 19/10/2017 10:08, "Marco Neumann" <ma...@gmail.com> wrote:

    did you try to contact Daniel Janke, Adrian Skubella or Steffen Staab
    to get a response?
    
    the findings seem to based on work that has been published online as
    part of a bachelor’s thesis by Adrian Skubella.
    
    https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf
    
    
    
    On Thu, Oct 19, 2017 at 10:54 AM, Lorenz B. <co...@googlemail.com> wrote:
    > For me this is really bad practice. It also looks like they did the
    > benchmark more than one year ago. Otherwise due to JENA-1195 this error
    > wouldn't occur anymore. And submission deadline was August 6th, 2017 .
    > Their experiments contain 8 queries, rerunning those shouldn't take ages...
    >
    > I'm currently trying to reproduce the results of the paper, but the
    > whole experimental setup remains unclear. I'm wondering if they used
    > just the Jena CLI or TDB. The same holds for RDF4J. I'm puzzled because
    > the runtimes in the eval section are quite small, but even loading the
    > data of their benchmark takes much more time. So maybe they used the
    > RDF4J server.
    >
    > The worst thing is that they didn't contact any of the developers. Or
    > did they talk to somebody here and then Andy created the ticket
    > JENA-1195? Also for the other queries that failed, I would expect to see
    > tickets on Apache JIRA or at least a hint on the Jena mailing list...
    >
    > @Andy I'm also wondering whether JENA-1317 addresses the problem with
    > the empty result of benchmark query containing an inverse property path.
    >
    >
    > On 18.10.2017 17:03, ajs6f@apache.org wrote:
    >> As you know, Andy, I'm going to ISWC this year-- shall I buttonhole
    >> them and give them our POV? :grin:
    >>
    >> In all seriousness, from what I can tell the results amount to "Using
    >> older versions of our comparands and without contacting the projects
    >> in question we couldn't find a store that implements every property
    >> path feature correctly and some fail entirely."
    >>
    >> I'm not really sure how useful that information is...? But I am ready
    >> to do a benchmarking paper for next year. Seems like it's a lot easier
    >> than I thought!
    >>
    >>
    >> ajs6f
    >>
    >>
    >> Andy Seaborne wrote on 10/17/17 9:28 AM:
    >>> Hi Lorenz,
    >>>
    >>> Looks like JENA-1195 which is fixed.  Does that look like it?
    >>>
    >>> I think it is shame when papers focus on bugs rather than discussing
    >>> and even fixing them.  Bugs aren't research.
    >>>
    >>> Path evaluation could improved to stream in more cases (that's why
    >>> LIMIT didn't help), but 1195 explains the slowness
    >>> and memory.
    >>>
    >>>     Andy
    >>>
    >>> On 17/10/17 07:58, Lorenz B. wrote:
    >>>> Hi,
    >>>>
    >>>> I just walked through the papers for the upcoming ISWC conference and
    >>>> found a paper about benchmarking of SPARQL property paths [1] .
    >>>>
    >>>> Not sure if this is relevant, but it looks like Jena has some issues
    >>>> with different types of queries using the property path. For example,
    >>>>
    >>>> SELECT ?o WHERE {A B* ?o.} LIMIT 100
    >>>>
    >>>> lead to an OOM error on non-cyclic data. Here is the relevant part of
    >>>> the paper:
    >>>>
    >>>>> While benchmarking Virtuoso, RDF4J and Allegrograph no errors or
    >>>>> exceptions have occurred. During the benchmark process of Jena an
    >>>>> OutOfMemoryError has been thrown whenever a query with the * operator
    >>>>> was used. In order to identify the cause of the error, the amount of
    >>>>> results the query should return has been limited to 100. The results
    >>>>> that have been returned by a query of the form SELECT ?o WHERE {A B*
    >>>>> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A.
    >>>>> Due to this fact it is presumable that the query containing the *
    >>>>> operator returns A recursively until the main memory was full. To
    >>>>> ensure that this behaviour is not caused by cycles in the dataset a
    >>>>> query of the same form but with a predicate IRI that did not exist in
    >>>>> the dataset was executed. This query still returned 100 times A. This
    >>>>> indicates, that the * operator is not implemented correctly.
    >>>> In addition, the experiments showed that:
    >>>>> Due to the problems with the * operator the queries 4, 7 and 8 could
    >>>>> not be processed. Additionally query 3, 5, and 6 returned no results
    >>>>> after 1 hour and thus, were aborted. Query 1 returned an empty and
    >>>>> thus, incomplete result set. Only for query 2 a valid result was
    >>>>> returned. Due to the lack of comparable results, Jena has been omitted
    >>>>> in the comparison of triple stores.
    >>>>
    >>>> In the discussion section, they summarize the overall performance of
    >>>> Jena by
    >>>>
    >>>>> Jena could not return results for any query in under 1 hour besides
    >>>>> query 2. Furthermore, the * operator could not be evaluated at all and
    >>>>> the inverse operator returned empty result sets.
    >>>>
    >>>> It looks like they used version 3.0.1, so maybe this doesn't hold
    >>>> anymore for all of the queries. If not, it could be interesting to
    >>>> improve performance and/or completeness.
    >>>>
    >>>> I hope I didn't miss some open JIRA ticket, but in general I just
    >>>> wanted
    >>>> to highlight the presence of some published benchmark for those kind of
    >>>> queries.
    >>>>
    >>>>
    >>>> Cheers,
    >>>>
    >>>> Lorenz
    >>>>
    >>>> [1] http://ceur-ws.org/Vol-1932/paper-04.pdf
    >>>>
    >
    
    
    
    -- 
    
    
    ---
    Marco Neumann
    KONA

Re: Property Paths benchmark @ ISWC2017

Posted by Marco Neumann <ma...@gmail.com>.

did you try to contact Daniel Janke, Adrian Skubella or Steffen Staab
to get a response?

the findings seem to based on work that has been published online as
part of a bachelor’s thesis by Adrian Skubella.

https://west.uni-koblenz.de/sites/default/files/studying/theses-files/bachelorarbeit-adrian-skubella-benchmarks-for-sparql-property-paths.pdf



On Thu, Oct 19, 2017 at 10:54 AM, Lorenz B. <co...@googlemail.com> wrote:
> For me this is really bad practice. It also looks like they did the
> benchmark more than one year ago. Otherwise due to JENA-1195 this error
> wouldn't occur anymore. And submission deadline was August 6th, 2017 .
> Their experiments contain 8 queries, rerunning those shouldn't take ages...
>
> I'm currently trying to reproduce the results of the paper, but the
> whole experimental setup remains unclear. I'm wondering if they used
> just the Jena CLI or TDB. The same holds for RDF4J. I'm puzzled because
> the runtimes in the eval section are quite small, but even loading the
> data of their benchmark takes much more time. So maybe they used the
> RDF4J server.
>
> The worst thing is that they didn't contact any of the developers. Or
> did they talk to somebody here and then Andy created the ticket
> JENA-1195? Also for the other queries that failed, I would expect to see
> tickets on Apache JIRA or at least a hint on the Jena mailing list...
>
> @Andy I'm also wondering whether JENA-1317 addresses the problem with
> the empty result of benchmark query containing an inverse property path.
>
>
> On 18.10.2017 17:03, ajs6f@apache.org wrote:
>> As you know, Andy, I'm going to ISWC this year-- shall I buttonhole
>> them and give them our POV? :grin:
>>
>> In all seriousness, from what I can tell the results amount to "Using
>> older versions of our comparands and without contacting the projects
>> in question we couldn't find a store that implements every property
>> path feature correctly and some fail entirely."
>>
>> I'm not really sure how useful that information is...? But I am ready
>> to do a benchmarking paper for next year. Seems like it's a lot easier
>> than I thought!
>>
>>
>> ajs6f
>>
>>
>> Andy Seaborne wrote on 10/17/17 9:28 AM:
>>> Hi Lorenz,
>>>
>>> Looks like JENA-1195 which is fixed.  Does that look like it?
>>>
>>> I think it is shame when papers focus on bugs rather than discussing
>>> and even fixing them.  Bugs aren't research.
>>>
>>> Path evaluation could improved to stream in more cases (that's why
>>> LIMIT didn't help), but 1195 explains the slowness
>>> and memory.
>>>
>>>     Andy
>>>
>>> On 17/10/17 07:58, Lorenz B. wrote:
>>>> Hi,
>>>>
>>>> I just walked through the papers for the upcoming ISWC conference and
>>>> found a paper about benchmarking of SPARQL property paths [1] .
>>>>
>>>> Not sure if this is relevant, but it looks like Jena has some issues
>>>> with different types of queries using the property path. For example,
>>>>
>>>> SELECT ?o WHERE {A B* ?o.} LIMIT 100
>>>>
>>>> lead to an OOM error on non-cyclic data. Here is the relevant part of
>>>> the paper:
>>>>
>>>>> While benchmarking Virtuoso, RDF4J and Allegrograph no errors or
>>>>> exceptions have occurred. During the benchmark process of Jena an
>>>>> OutOfMemoryError has been thrown whenever a query with the * operator
>>>>> was used. In order to identify the cause of the error, the amount of
>>>>> results the query should return has been limited to 100. The results
>>>>> that have been returned by a query of the form SELECT ?o WHERE {A B*
>>>>> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A.
>>>>> Due to this fact it is presumable that the query containing the *
>>>>> operator returns A recursively until the main memory was full. To
>>>>> ensure that this behaviour is not caused by cycles in the dataset a
>>>>> query of the same form but with a predicate IRI that did not exist in
>>>>> the dataset was executed. This query still returned 100 times A. This
>>>>> indicates, that the * operator is not implemented correctly.
>>>> In addition, the experiments showed that:
>>>>> Due to the problems with the * operator the queries 4, 7 and 8 could
>>>>> not be processed. Additionally query 3, 5, and 6 returned no results
>>>>> after 1 hour and thus, were aborted. Query 1 returned an empty and
>>>>> thus, incomplete result set. Only for query 2 a valid result was
>>>>> returned. Due to the lack of comparable results, Jena has been omitted
>>>>> in the comparison of triple stores.
>>>>
>>>> In the discussion section, they summarize the overall performance of
>>>> Jena by
>>>>
>>>>> Jena could not return results for any query in under 1 hour besides
>>>>> query 2. Furthermore, the * operator could not be evaluated at all and
>>>>> the inverse operator returned empty result sets.
>>>>
>>>> It looks like they used version 3.0.1, so maybe this doesn't hold
>>>> anymore for all of the queries. If not, it could be interesting to
>>>> improve performance and/or completeness.
>>>>
>>>> I hope I didn't miss some open JIRA ticket, but in general I just
>>>> wanted
>>>> to highlight the presence of some published benchmark for those kind of
>>>> queries.
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Lorenz
>>>>
>>>> [1] http://ceur-ws.org/Vol-1932/paper-04.pdf
>>>>
>



-- 


---
Marco Neumann
KONA

Re: Property Paths benchmark @ ISWC2017

Posted by "Lorenz B." <co...@googlemail.com>.

For me this is really bad practice. It also looks like they did the
benchmark more than one year ago. Otherwise due to JENA-1195 this error
wouldn't occur anymore. And submission deadline was August 6th, 2017 .
Their experiments contain 8 queries, rerunning those shouldn't take ages...

I'm currently trying to reproduce the results of the paper, but the
whole experimental setup remains unclear. I'm wondering if they used
just the Jena CLI or TDB. The same holds for RDF4J. I'm puzzled because
the runtimes in the eval section are quite small, but even loading the
data of their benchmark takes much more time. So maybe they used the
RDF4J server.

The worst thing is that they didn't contact any of the developers. Or
did they talk to somebody here and then Andy created the ticket
JENA-1195? Also for the other queries that failed, I would expect to see
tickets on Apache JIRA or at least a hint on the Jena mailing list...

@Andy I'm also wondering whether JENA-1317 addresses the problem with
the empty result of benchmark query containing an inverse property path.


On 18.10.2017 17:03, ajs6f@apache.org wrote:
> As you know, Andy, I'm going to ISWC this year-- shall I buttonhole
> them and give them our POV? :grin:
>
> In all seriousness, from what I can tell the results amount to "Using
> older versions of our comparands and without contacting the projects
> in question we couldn't find a store that implements every property
> path feature correctly and some fail entirely."
>
> I'm not really sure how useful that information is...? But I am ready
> to do a benchmarking paper for next year. Seems like it's a lot easier
> than I thought!
>
>
> ajs6f
>
>
> Andy Seaborne wrote on 10/17/17 9:28 AM:
>> Hi Lorenz,
>>
>> Looks like JENA-1195 which is fixed.  Does that look like it?
>>
>> I think it is shame when papers focus on bugs rather than discussing
>> and even fixing them.  Bugs aren't research.
>>
>> Path evaluation could improved to stream in more cases (that's why
>> LIMIT didn't help), but 1195 explains the slowness
>> and memory.
>>
>>     Andy
>>
>> On 17/10/17 07:58, Lorenz B. wrote:
>>> Hi,
>>>
>>> I just walked through the papers for the upcoming ISWC conference and
>>> found a paper about benchmarking of SPARQL property paths [1] .
>>>
>>> Not sure if this is relevant, but it looks like Jena has some issues
>>> with different types of queries using the property path. For example,
>>>
>>> SELECT ?o WHERE {A B* ?o.} LIMIT 100
>>>
>>> lead to an OOM error on non-cyclic data. Here is the relevant part of
>>> the paper:
>>>
>>>> While benchmarking Virtuoso, RDF4J and Allegrograph no errors or
>>>> exceptions have occurred. During the benchmark process of Jena an
>>>> OutOfMemoryError has been thrown whenever a query with the * operator
>>>> was used. In order to identify the cause of the error, the amount of
>>>> results the query should return has been limited to 100. The results
>>>> that have been returned by a query of the form SELECT ?o WHERE {A B*
>>>> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A.
>>>> Due to this fact it is presumable that the query containing the *
>>>> operator returns A recursively until the main memory was full. To
>>>> ensure that this behaviour is not caused by cycles in the dataset a
>>>> query of the same form but with a predicate IRI that did not exist in
>>>> the dataset was executed. This query still returned 100 times A. This
>>>> indicates, that the * operator is not implemented correctly.
>>> In addition, the experiments showed that:
>>>> Due to the problems with the * operator the queries 4, 7 and 8 could
>>>> not be processed. Additionally query 3, 5, and 6 returned no results
>>>> after 1 hour and thus, were aborted. Query 1 returned an empty and
>>>> thus, incomplete result set. Only for query 2 a valid result was
>>>> returned. Due to the lack of comparable results, Jena has been omitted
>>>> in the comparison of triple stores.
>>>
>>> In the discussion section, they summarize the overall performance of
>>> Jena by
>>>
>>>> Jena could not return results for any query in under 1 hour besides
>>>> query 2. Furthermore, the * operator could not be evaluated at all and
>>>> the inverse operator returned empty result sets.
>>>
>>> It looks like they used version 3.0.1, so maybe this doesn't hold
>>> anymore for all of the queries. If not, it could be interesting to
>>> improve performance and/or completeness.
>>>
>>> I hope I didn't miss some open JIRA ticket, but in general I just
>>> wanted
>>> to highlight the presence of some published benchmark for those kind of
>>> queries.
>>>
>>>
>>> Cheers,
>>>
>>> Lorenz
>>>
>>> [1] http://ceur-ws.org/Vol-1932/paper-04.pdf
>>>

Re: Property Paths benchmark @ ISWC2017

Posted by aj...@apache.org.

As you know, Andy, I'm going to ISWC this year-- shall I buttonhole them and give them our POV? :grin:

In all seriousness, from what I can tell the results amount to "Using older versions of our comparands and without 
contacting the projects in question we couldn't find a store that implements every property path feature correctly and 
some fail entirely."

I'm not really sure how useful that information is...? But I am ready to do a benchmarking paper for next year. Seems 
like it's a lot easier than I thought!


ajs6f


Andy Seaborne wrote on 10/17/17 9:28 AM:
> Hi Lorenz,
>
> Looks like JENA-1195 which is fixed.  Does that look like it?
>
> I think it is shame when papers focus on bugs rather than discussing and even fixing them.  Bugs aren't research.
>
> Path evaluation could improved to stream in more cases (that's why LIMIT didn't help), but 1195 explains the slowness
> and memory.
>
>     Andy
>
> On 17/10/17 07:58, Lorenz B. wrote:
>> Hi,
>>
>> I just walked through the papers for the upcoming ISWC conference and
>> found a paper about benchmarking of SPARQL property paths [1] .
>>
>> Not sure if this is relevant, but it looks like Jena has some issues
>> with different types of queries using the property path. For example,
>>
>> SELECT ?o WHERE {A B* ?o.} LIMIT 100
>>
>> lead to an OOM error on non-cyclic data. Here is the relevant part of
>> the paper:
>>
>>> While benchmarking Virtuoso, RDF4J and Allegrograph no errors or
>>> exceptions have occurred. During the benchmark process of Jena an
>>> OutOfMemoryError has been thrown whenever a query with the * operator
>>> was used. In order to identify the cause of the error, the amount of
>>> results the query should return has been limited to 100. The results
>>> that have been returned by a query of the form SELECT ?o WHERE {A B*
>>> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A.
>>> Due to this fact it is presumable that the query containing the *
>>> operator returns A recursively until the main memory was full. To
>>> ensure that this behaviour is not caused by cycles in the dataset a
>>> query of the same form but with a predicate IRI that did not exist in
>>> the dataset was executed. This query still returned 100 times A. This
>>> indicates, that the * operator is not implemented correctly.
>> In addition, the experiments showed that:
>>> Due to the problems with the * operator the queries 4, 7 and 8 could
>>> not be processed. Additionally query 3, 5, and 6 returned no results
>>> after 1 hour and thus, were aborted. Query 1 returned an empty and
>>> thus, incomplete result set. Only for query 2 a valid result was
>>> returned. Due to the lack of comparable results, Jena has been omitted
>>> in the comparison of triple stores.
>>
>> In the discussion section, they summarize the overall performance of Jena by
>>
>>> Jena could not return results for any query in under 1 hour besides
>>> query 2. Furthermore, the * operator could not be evaluated at all and
>>> the inverse operator returned empty result sets.
>>
>> It looks like they used version 3.0.1, so maybe this doesn't hold
>> anymore for all of the queries. If not, it could be interesting to
>> improve performance and/or completeness.
>>
>> I hope I didn't miss some open JIRA ticket, but in general I just wanted
>> to highlight the presence of some published benchmark for those kind of
>> queries.
>>
>>
>> Cheers,
>>
>> Lorenz
>>
>> [1] http://ceur-ws.org/Vol-1932/paper-04.pdf
>>

Re: Property Paths benchmark @ ISWC2017

Posted by Andy Seaborne <an...@apache.org>.

Hi Lorenz,

Looks like JENA-1195 which is fixed.  Does that look like it?

I think it is shame when papers focus on bugs rather than discussing and 
even fixing them.  Bugs aren't research.

Path evaluation could improved to stream in more cases (that's why LIMIT 
didn't help), but 1195 explains the slowness and memory.

     Andy

On 17/10/17 07:58, Lorenz B. wrote:
> Hi,
> 
> I just walked through the papers for the upcoming ISWC conference and
> found a paper about benchmarking of SPARQL property paths [1] .
> 
> Not sure if this is relevant, but it looks like Jena has some issues
> with different types of queries using the property path. For example,
> 
> SELECT ?o WHERE {A B* ?o.} LIMIT 100
> 
> lead to an OOM error on non-cyclic data. Here is the relevant part of
> the paper:
> 
>> While benchmarking Virtuoso, RDF4J and Allegrograph no errors or
>> exceptions have occurred. During the benchmark process of Jena an
>> OutOfMemoryError has been thrown whenever a query with the * operator
>> was used. In order to identify the cause of the error, the amount of
>> results the query should return has been limited to 100. The results
>> that have been returned by a query of the form SELECT ?o WHERE {A B*
>> ?o.} LIMIT 100 where A and B are valid IRIs, consisted of 100 times A.
>> Due to this fact it is presumable that the query containing the *
>> operator returns A recursively until the main memory was full. To
>> ensure that this behaviour is not caused by cycles in the dataset a
>> query of the same form but with a predicate IRI that did not exist in
>> the dataset was executed. This query still returned 100 times A. This
>> indicates, that the * operator is not implemented correctly.
> In addition, the experiments showed that:
>> Due to the problems with the * operator the queries 4, 7 and 8 could
>> not be processed. Additionally query 3, 5, and 6 returned no results
>> after 1 hour and thus, were aborted. Query 1 returned an empty and
>> thus, incomplete result set. Only for query 2 a valid result was
>> returned. Due to the lack of comparable results, Jena has been omitted
>> in the comparison of triple stores.
> 
> In the discussion section, they summarize the overall performance of Jena by
> 
>> Jena could not return results for any query in under 1 hour besides
>> query 2. Furthermore, the * operator could not be evaluated at all and
>> the inverse operator returned empty result sets.
> 
> It looks like they used version 3.0.1, so maybe this doesn't hold
> anymore for all of the queries. If not, it could be interesting to
> improve performance and/or completeness.
> 
> I hope I didn't miss some open JIRA ticket, but in general I just wanted
> to highlight the presence of some published benchmark for those kind of
> queries.
> 
> 
> Cheers,
> 
> Lorenz
> 
> [1] http://ceur-ws.org/Vol-1932/paper-04.pdf
>