You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Osma Suominen (JIRA)" <ji...@apache.org> on 2016/05/02 09:32:12 UTC

[jira] [Commented] (JENA-1172) blank nodes can break jena-text

    [ https://issues.apache.org/jira/browse/JENA-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266189#comment-15266189 ] 

Osma Suominen commented on JENA-1172:
-------------------------------------

There are basically two ways to address this:
1. Prevent blank nodes from being indexed with jena-text
2. Add real support for blank nodes

1 is trivial (though it won't fix older indexes that have already been tainted by blank nodes).
2 would require a bit more work since not only URIs, but also some kind of internal identifiers would need to be stored in the text index.

Do blank nodes have such an identifier that could be used instead of URI in the text index?

> blank nodes can break jena-text
> -------------------------------
>
>                 Key: JENA-1172
>                 URL: https://issues.apache.org/jira/browse/JENA-1172
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Text
>    Affects Versions: Fuseki 2.3.1
>            Reporter: Osma Suominen
>            Assignee: Osma Suominen
>
> Data with blank node subjects can break the jena-text index.
> For this example I use a typical jena-text configuration which indexes rdfs:label. Then I add this triple:
> {noformat}
> _:b0 <http://www.w3.org/2000/01/rdf-schema#label> "blank" .
> {noformat}
> There is no error (though I remember seeing WARNINGs in other situations like this) and the triple gets indexed.
> When I later execute this query:
> {noformat}
> PREFIX text: <http://jena.apache.org/text#>
> SELECT ?s { ?s text:query 'blank' }
> {noformat}
> I get this error:
> {noformat}
> 10:22:38 WARN  [5] RC = 500 : java.lang.UnsupportedOperationException: 3ed87b7f14f612ef53788d889f6410d6 is not a URI node
> org.apache.jena.ext.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.UnsupportedOperationException: 3ed87b7f14f612ef53788d889f6410d6 is not a URI node
> 	at org.apache.jena.ext.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
> 	at org.apache.jena.ext.com.google.common.cache.LocalCache.get(LocalCache.java:3937)
> 	at org.apache.jena.ext.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
> 	at org.apache.jena.atlas.lib.cache.CacheGuava.getOrFill(CacheGuava.java:58)
> 	at org.apache.jena.query.text.TextQueryPF.query(TextQueryPF.java:291)
> 	at org.apache.jena.query.text.TextQueryPF.variableSubject(TextQueryPF.java:229)
> 	at org.apache.jena.query.text.TextQueryPF.exec(TextQueryPF.java:198)
> 	at org.apache.jena.sparql.pfunction.PropertyFunctionBase$RepeatApplyIteratorPF.nextStage(PropertyFunctionBase.java:106)
> {noformat}
> Note that this happens any time the jena-text query happens to match a blank node subject. So a single triple with a blank node subject can "taint" the whole index. This is what happens with LCSH, which for whatever reason happens to contain a few hundred blank nodes that have a skos:prefLabel property (among almost 8M triples that generally use URIs for everything).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)