You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Damien Obrist (Jira)" <ji...@apache.org> on 2019/10/16 11:48:00 UTC

[jira] [Commented] (JENA-1769) Dataset#listNames slow for large TDB2 datasets

    [ https://issues.apache.org/jira/browse/JENA-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952753#comment-16952753 ] 

Damien Obrist commented on JENA-1769:
-------------------------------------

h2. Analysis

Using git bisect I was able to identify JENA-1695 as the issue that introduced this increase.

I have also stumbled upon this [comment|https://github.com/apache/jena/pull/598#discussion_r319529442] in the pull request for JENA-1748, from which I gather that {{listNames}} now takes so long because it iterates over all quads in the dataset.

[~andy] I wasn't sure whether to file this issue as a bug or an improvement, given that iterating over all quads (and the implied running time) seems to be the expected behavior. I'm pretty much unfamiliar with the internals of TDB, but I'm wondering if in the refactored code there was a way to achieve a similar performance than in Jena 3.12.0.

> Dataset#listNames slow for large TDB2 datasets
> ----------------------------------------------
>
>                 Key: JENA-1769
>                 URL: https://issues.apache.org/jira/browse/JENA-1769
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: TDB2
>    Affects Versions: Jena 3.13.0
>            Reporter: Damien Obrist
>            Priority: Major
>              Labels: performance
>
> With Jena 3.13.0, the running time of {{Dataset#listNames}} has increased significantly for TDB2 datasets.
> I have compared the running times for a sample TDB2 dataset containing *1'000'000 triples*. I have observed a running time of *~270ms* with Jena 3.12.0 and *~13.5s* with Jena 3.13.0.
> We're using a dataset with many millions of triples and for our use case, the running time has increased from seconds to minutes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)