You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (Jira)" <ji...@apache.org> on 2020/11/30 12:03:00 UTC

[jira] [Updated] (JENA-2006) Dataset prefixes

     [ https://issues.apache.org/jira/browse/JENA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andy Seaborne updated JENA-2006:
--------------------------------
    Fix Version/s: Jena 3.18.0

> Dataset prefixes
> ----------------
>
>                 Key: JENA-2006
>                 URL: https://issues.apache.org/jira/browse/JENA-2006
>             Project: Apache Jena
>          Issue Type: Improvement
>            Reporter: Andy Seaborne
>            Assignee: Andy Seaborne
>            Priority: Major
>             Fix For: Jena 3.18.0
>
>
> Summary:
> Add API calls:
> {{DatasetGraph.prefixes()}} -> {{PrefixMap}}
> {{ Dataset.getPrefixMapping()}} -> {{PrefixMapping}}
> Rework internal implementation code to reflect this. 
> Clearup the different handling of prefixes; switch to a consistent provision of a dataset prefix map. Remove {{DatasetPrefixStorage}} (multiple prefix maps per dataset). 
> My first attempt of this work was to use {{DatasetPrefixStorage}} consistently but it ended up as a lot of classes mirroring PrefixMap implementations. Because input formats only have prefixes by datasets, not individual graph, the extra feature of multiple prefix maps can only be used by API and it just doesn't seem worth the effort and extra code. It was quicker doing the final form - "one prefix map per dataset" than the more complicated form.
> More details:
> "TDB" means both TDB1 anbd TDB2.
> The main use case for prefixes is set as part of data parsing and use for output to abbreviate URIs.
> For output, we know that URI->prefixed name is a performance critical operation. It is optimized in {{PrefixMapStd}}. This does not change. The writers copy prefixes into a {{PrefixMapStd} which has a fast-path for the common case of split at last "/" or "#" and a reverse map from URI to prefix.
> Mostly, up to now, implementation has been "store the prefixes in the default graph" and while TDB stores multiple set of prefixes for each dataset so that here is the possibility of graphs in the same dataset having different prefixes, it used the default graph as well. Output has never made use of multiple prefixes per dataset.
> The {{PrefixMapping}} API presumes a reverse mapping and the API contract is part of the Model API (Model extends PrefixMapping). The other odd feature of {{PrefixMapping}} is that there is no direct access to the prefixes as a map, only a copy form.
> {{PrefixMap}} is simpler with the needs of parsers and storage implementation in mind.
> The idea is that {{PrefixMapping}} is to be considered to be part of the Dataset/Model/Statement/Resource APIs. There is a legacy quirk that Graph has "getPrefixMapping" but otherwise {{PrefixMap}} is the internal abstraction for the Model API.
> There will be adapters between the two viewpoints. Aside from the implicit contract of {{PrefixMapping}} following XML qname rules, while Turtle is less restrictive, the functionality can be mapped both ways.
> Mostly the XML-rules contract has been moved into the writers themselves in previous iterations of implementation improvement. The adapters are lightweight objects, with no state other than the object that adapt and "double adapting" actually removes wrappers and returns the underlying prefixes object.
> The improved way:
> Basic datasets (DatasetGraphMap and DatasetGraphMapLink) - dataset prefixes are the default graph prefixes.
> TIM: All graphs in the dataset have the same prefix map. The PrefixMap is thread-safe but isn't transactional (possible future work if needed).
> TDB1, TDB2: These have there own, more general prefix storage but the additional feature is not exposed. All graphs in the dataset have the same prefix map. There is no change to on-disk format.
> SDB: As before. There is no change to on-disk format.
> The nulls (DatasetGraphZero and DatasetGraphSink): Sink is "forget updates", Zero is "empty, no updates":  Suitably misbehaved implemented of the {{PrefixMap}} API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)