You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by "Lorenz B." <co...@googlemail.com.INVALID> on 2020/08/26 06:23:11 UTC

Jena SHACL Benchmark

Hi all,


as usual when I see something regarding Jena in recent publications:
"Benchmark for Performance Evaluation of SHACL Implementations in Graph
Databases" [1]

As the title indicates, it's a benchmark about SHACL validation
performance. The benchmark comprise 58 SHACL shapes tested on I guess 1
million triples. Jena got second place close to Stardog - which I think
is a success.

Some other metrics like memory consumption might be something to
investigate - not sure if those numbers make sense, but according to the
paper Jena needs 14GB of RAM? RDF4J even 16GB, but Stardog only 1.2GB.

A weak point of Jena was "lack of documentation".


Anyways, good job Andy (and contributors).

Happy to see comments and thoughts from your side.


Cheers,

Lorenz


[1] https://link.springer.com/chapter/10.1007/978-3-030-57977-7_6

Re: Jena SHACL Benchmark

Posted by "Lorenz B." <co...@googlemail.com.INVALID>.

Benchmark data is here: https://doi.org/10.17632/jfrdpnb945.1

On 26.08.20 10:07, Andy Seaborne wrote:
>
>
>
> On 26/08/2020 08:36, Andy Seaborne wrote:
>> Thanks for the report.
>> I don't have access to the paper.
>
> https://t.co/ivcVMnPGEr
>
> via
>
> https://twitter.com/alphaverda/status/1276836317100421121
>
> Unfortunately, the evaluation is a custom app for each database.
> Without seeing the app, it's opaque.
>
> I was hoping to see which features of SHACL they used in their shapes.
> While Jena covers a all of core and SPARQL, the more obscure ones are
> maybe slow.  But if they are comparing across databases, maybe they
> went for an implemented subset.
>
> Where are the shapes they used?
>
>     Andy
>
> "18 GB of heap memory" ... on a 16G machine. Hmm.
>
>
>>
>> I don't see a reference to TopQuadrant SHACL in the references.
>>
>> On 26/08/2020 07:23, Lorenz B. wrote:
>>> Hi all,
>>>
>>>
>>> as usual when I see something regarding Jena in recent publications:
>>> "Benchmark for Performance Evaluation of SHACL Implementations in Graph
>>> Databases" [1]
>>>
>>> As the title indicates, it's a benchmark about SHACL validation
>>> performance. The benchmark comprise 58 SHACL shapes tested on I guess 1
>>> million triples. Jena got second place close to Stardog - which I think
>>> is a success.
>>
>> Especially for something that isn't in the slightest optimized other
>> than the fact it compiles the shapes to an execution tree.
>> Incremental validation for transactions is "work in progress".
>>
>>> Some other metrics like memory consumption might be something to
>>> investigate - not sure if those numbers make sense, but according to
>>> the
>>> paper Jena needs 14GB of RAM? RDF4J even 16GB, but Stardog only 1.2GB.
>>
>> Sounds suspect to me for one million triples. A TDB database on disk
>> is likely less than 1G bytes.
>>
>> Of course, Jena uses the Java heap and that just grows until a GC
>> happens but it's not all in use.
>
>
>>
>> (Stardog does a lot outside the heap)
>>
>>>
>>> A weak point of Jena was "lack of documentation".
>>
>> That's fixable -  what were they looking for? A hands on-guide to SHACL?
>>
>>      Again, thanks
>>      Andy
>>
>>>
>>> Anyways, good job Andy (and contributors).
>>>
>>> Happy to see comments and thoughts from your side.
>>>
>>>
>>> Cheers,
>>>
>>> Lorenz
>>>
>>>
>>> [1] https://link.springer.com/chapter/10.1007/978-3-030-57977-7_6
>>>
>>>

Re: Jena SHACL Benchmark

Posted by Andy Seaborne <an...@apache.org>.



On 26/08/2020 08:36, Andy Seaborne wrote:
> Thanks for the report.
> I don't have access to the paper.

https://t.co/ivcVMnPGEr

via

https://twitter.com/alphaverda/status/1276836317100421121

Unfortunately, the evaluation is a custom app for each database. Without 
seeing the app, it's opaque.

I was hoping to see which features of SHACL they used in their shapes. 
While Jena covers a all of core and SPARQL, the more obscure ones are 
maybe slow.  But if they are comparing across databases, maybe they went 
for an implemented subset.

Where are the shapes they used?

     Andy

"18 GB of heap memory" ... on a 16G machine. Hmm.


> 
> I don't see a reference to TopQuadrant SHACL in the references.
> 
> On 26/08/2020 07:23, Lorenz B. wrote:
>> Hi all,
>>
>>
>> as usual when I see something regarding Jena in recent publications:
>> "Benchmark for Performance Evaluation of SHACL Implementations in Graph
>> Databases" [1]
>>
>> As the title indicates, it's a benchmark about SHACL validation
>> performance. The benchmark comprise 58 SHACL shapes tested on I guess 1
>> million triples. Jena got second place close to Stardog - which I think
>> is a success.
> 
> Especially for something that isn't in the slightest optimized other 
> than the fact it compiles the shapes to an execution tree. Incremental 
> validation for transactions is "work in progress".
> 
>> Some other metrics like memory consumption might be something to
>> investigate - not sure if those numbers make sense, but according to the
>> paper Jena needs 14GB of RAM? RDF4J even 16GB, but Stardog only 1.2GB.
> 
> Sounds suspect to me for one million triples. A TDB database on disk is 
> likely less than 1G bytes.
> 
> Of course, Jena uses the Java heap and that just grows until a GC 
> happens but it's not all in use.


> 
> (Stardog does a lot outside the heap)
> 
>>
>> A weak point of Jena was "lack of documentation".
> 
> That's fixable -  what were they looking for? A hands on-guide to SHACL?
> 
>      Again, thanks
>      Andy
> 
>>
>> Anyways, good job Andy (and contributors).
>>
>> Happy to see comments and thoughts from your side.
>>
>>
>> Cheers,
>>
>> Lorenz
>>
>>
>> [1] https://link.springer.com/chapter/10.1007/978-3-030-57977-7_6
>>
>>

Re: Jena SHACL Benchmark

Posted by Andy Seaborne <an...@apache.org>.

Thanks for the report.
I don't have access to the paper.

I don't see a reference to TopQuadrant SHACL in the references.

On 26/08/2020 07:23, Lorenz B. wrote:
> Hi all,
> 
> 
> as usual when I see something regarding Jena in recent publications:
> "Benchmark for Performance Evaluation of SHACL Implementations in Graph
> Databases" [1]
> 
> As the title indicates, it's a benchmark about SHACL validation
> performance. The benchmark comprise 58 SHACL shapes tested on I guess 1
> million triples. Jena got second place close to Stardog - which I think
> is a success.

Especially for something that isn't in the slightest optimized other 
than the fact it compiles the shapes to an execution tree. Incremental 
validation for transactions is "work in progress".

> Some other metrics like memory consumption might be something to
> investigate - not sure if those numbers make sense, but according to the
> paper Jena needs 14GB of RAM? RDF4J even 16GB, but Stardog only 1.2GB.

Sounds suspect to me for one million triples. A TDB database on disk is 
likely less than 1G bytes.

Of course, Jena uses the Java heap and that just grows until a GC 
happens but it's not all in use.

(Stardog does a lot outside the heap)

> 
> A weak point of Jena was "lack of documentation".

That's fixable -  what were they looking for? A hands on-guide to SHACL?

	Again, thanks
	Andy

> 
> Anyways, good job Andy (and contributors).
> 
> Happy to see comments and thoughts from your side.
> 
> 
> Cheers,
> 
> Lorenz
> 
> 
> [1] https://link.springer.com/chapter/10.1007/978-3-030-57977-7_6
> 
>