You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Sorin Gheorghiu <so...@uni-konstanz.de> on 2016/04/28 16:47:24 UTC

Error during text index

Hello,

Jena text index returned the following error:

# java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar 
jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
java.lang.UnsupportedOperationException: 
http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent is 
not a literal node
         at org.apache.jena.graph.Node.getLiteral(Node.java:100)
         at 
org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
         at 
org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
         at jena.textindexer.exec(textindexer.java:122)
         at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
         at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
         at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
         at jena.textindexer.main(textindexer.java:51)

when attempted to index entries like:

@prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
         gndo:variantNameForTheConferenceOrEvent "Conferentie van 
Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
         gndo:preferredNameForTheConferenceOrEvent "Conference of 
Non-Nuclear Weapon States" ;
         a gndo:SeriesOfConferenceOrEvent .

Here is the EntityMap assembler setup:

<#entMap> a text:EntityMap ;
     text:entityField      "gndUri" ;
     text:defaultField     "prefName" ; ## Must be defined in the text:map
     text:map (
          [ text:field "prefName";
            text:predicate gndo:preferredNameForTheSubjectHeading
          ]
          [ text:field "type";
            text:predicate rdf:type
          ]
          ...

'type' contains an URL, but a literal node is expected instead.
There is no difference if 'type' is defined as 'text' or 'string' in 
Solr schema.xml.

How is possible to fix it?

Thank you in advance,
Sorin

Re: Error during text index

Posted by Sorin Gheorghiu <so...@uni-konstanz.de>.

Hi Andy,

thank you for your suggestion, I will use this workaround.

Regards,
Sorin


Am 29.04.2016 um 12:20 schrieb Andy Seaborne:
> The use of rdf:type seems to mix being a displayable label and a class 
> type.
>
> Maybe adding skos:prefLabel to keep the display label is worth doing.
>
> You can extract the fragment from a URI with:
>
>     STRAFTER(STR(<http://example/foo#bar>), "#")'
>
> (untested):
>
> INSERT { ?s skos:prefLabel ?label }
> WHERE {
>    ?s a ?T .
>    BIND ( ?label as STRAFTER(STR(?T), "#")
> }
>
>
> On 29/04/16 09:25, Sorin Gheorghiu wrote:
>> Hi Osma,
>>
>> I do need the type in the text index to get faster results than using
>> sparql queries.
>>
>> I found an analyzer which could replace the URI with the string type,
>> but I cannot use it as long as the non-literal are skiped.
>>
>>      <fieldType name="text_type_gnd" class="solr.TextField" >
>>        <analyzer>
>>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>          <filter class="solr.PatternReplaceFilterFactory" pattern="
>> http://d-nb.info/standards/elementset/gnd#" replacement="" 
>> replace="all" />
>>        </analyzer>
>>      </fieldType>
>>
>> I am still looking for a workaround for this case.
>>
>> Thanks,
>> Sorin
>>
>> Am 29.04.2016 um 08:43 schrieb Osma Suominen:
>>> Hi Sorin!
>>>
>>> Why do you need the type in the text index? The text index is designed
>>> to store literals. It does not know how to handle URIs at all.
>>>
>>> Generally what you would do to combine text search with a restriction
>>> on rdf:type is to use separate query patterns, e.g.
>>>
>>> {
>>>    ?s text:query 'nuclear' .
>>>    ?s a gndo:SeriesOfConferenceOrEvent .
>>> }
>>>
>>> -Osma
>>>
>>>
>>> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>>>> Hi Andy,
>>>>
>>>> I need just the type of the entry, from the example just the last part
>>>> 'SeriesOfConferenceOrEvent'.
>>>> If possible I would set an analyser which would trim the first 
>>>> part, but
>>>> I don't know how.
>>>>
>>>> Thanks
>>>> Sorin
>>>>
>>>>
>>>>
>>>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>>>> Hi Sorin,
>>>>>
>>>>> I'm curious as to why you are indexing a URI and what you see the
>>>>> benefit of that.  You might at least want to set the analyser
>>>>> carefully.
>>>>>
>>>>>     Andy
>>>>>
>>>>> PS I fixed the cause of the "UnsupportedOperationException" but only
>>>>> in the sense that it now issues a warning and skips the non-literal.
>>>>> The test for being a literal or not was there ... but after calling
>>>>> getLiteral.
>>>>>
>>>>>
>>>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Jena text index returned the following error:
>>>>>>
>>>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>>>> java.lang.UnsupportedOperationException:
>>>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent 
>>>>>> is
>>>>>> not a literal node
>>>>>>          at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>>>          at
>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80) 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>          at
>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67) 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>          at jena.textindexer.exec(textindexer.java:122)
>>>>>>          at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>>>          at jena.textindexer.main(textindexer.java:51)
>>>>>>
>>>>>> when attempted to index entries like:
>>>>>>
>>>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>>
>>>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>>>          gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>>>>>>          gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>>>> Non-Nuclear Weapon States" ;
>>>>>>          a gndo:SeriesOfConferenceOrEvent .
>>>>>>
>>>>>> Here is the EntityMap assembler setup:
>>>>>>
>>>>>> <#entMap> a text:EntityMap ;
>>>>>>      text:entityField      "gndUri" ;
>>>>>>      text:defaultField     "prefName" ; ## Must be defined in the
>>>>>> text:map
>>>>>>      text:map (
>>>>>>           [ text:field "prefName";
>>>>>>             text:predicate gndo:preferredNameForTheSubjectHeading
>>>>>>           ]
>>>>>>           [ text:field "type";
>>>>>>             text:predicate rdf:type
>>>>>>           ]
>>>>>>           ...
>>>>>>
>>>>>> 'type' contains an URL, but a literal node is expected instead.
>>>>>> There is no difference if 'type' is defined as 'text' or 'string' in
>>>>>> Solr schema.xml.
>>>>>>
>>>>>> How is possible to fix it?
>>>>>>
>>>>>> Thank you in advance,
>>>>>> Sorin
>>>>>
>>>>
>>>
>>>
>>
>

-- 
Sorin Gheorghiu             Tel: +49 7531 88-3198
Universität Konstanz        Raum: B703
78464 Konstanz              sorin.gheorghiu@uni-konstanz.de

- KIM: Abteilung Contentdienste -

Re: Fwd: Re: Error during text index

Posted by Andy Seaborne <an...@apache.org>.

On 13/05/16 09:45, Sorin Gheorghiu wrote:
> Hi Andy,
>
> I found on the server a coredump reporting insufficient memory for the
> JRE (see attach).
> It is weird, fuseki allocates a 32Gb maximum Java heap size, but it uses
> only 16Gb:
>
> #
> java -Xmx32G  -jar fuseki-server.jar --update --config=/etc/default/fuseki/config.ttl

-Xmx32G will not help. In fact it will slow the system down a bit.

TDB keeps cached files outside the heap.

> As well *ulimit -c unlimited* didn't change anything. Do you have any
> idea what could restrict more memory to use?

This is not a heap issue. The JVM crashed due to lack of system memory 
trying to map a file on a 64 bitmachine,

As the size is 2,047,868,928 I'm not sure it's TDB - TDB uses 8M 
increments.  Maybe Lucene is using mmap files.

There is a list of possible reasons in the hs_err. The JVM should never 
crash.

"ulimit -c unlimited" only affect core dumps.

There may be a system limit on memory mapped areas. I'm not sure which 
ulimit flag this is. Set all the size ones to "unlimited" (but the OS 
may ignore that).

> As regards the fuseki server log, I added the file *log4j.properties *to
> /opt/apache-jena-fuseki-2.3.1/run, but after a fuseki restart no logs
> were generated (expected is a logfile like
> /etc/fuseki/logs/fuseki.log?). Do I have to set up anything more?

The server logs by default.  You may be not enabling the right logs - 
set them all to INFO.

Logs come out to stdout ("fuseki-server") or to /logs/ ("fuseki" 
service) depending on how you run the server.

	Andy

>
> Thank you,
> Sorin

	Andy

Re: Fwd: Re: Error during text index

Posted by Sorin Gheorghiu <so...@uni-konstanz.de>.

Hi Andy,

I found on the server a coredump reporting insufficient memory for the 
JRE (see attach).
It is weird, fuseki allocates a 32Gb maximum Java heap size, but it uses 
only 16Gb:

# 
java -Xmx32G  -jar fuseki-server.jar --update --config=/etc/default/fuseki/config.ttl

As well *ulimit -c unlimited* didn't change anything. Do you have any 
idea what could restrict more memory to use?


As regards the fuseki server log, I added the file *log4j.properties *to 
/opt/apache-jena-fuseki-2.3.1/run, but after a fuseki restart no logs 
were generated (expected is a logfile like 
/etc/fuseki/logs/fuseki.log?). Do I have to set up anything more?

Thank you,
Sorin



Am 12.05.2016 um 23:44 schrieb Andy Seaborne:
> Hi,
>
> It's not clear to me what's happening.  The server log may offer some 
> more information.  It's as if the response in truncated somehow.
>
> You could try using curl or wget to make the request.  They can also 
> print out the HTTP headers.
>
>  Andy
>
>
>
> On 12/05/16 19:43, Sorin Gheorghiu wrote:
>> Hi,
>>
>> the attempt to perform a sparql insert using *s-update* has failed with
>> the error:
>>
>> # /opt/apache-jena-fuseki-2.3.1/bin/s-update
>> --service=http://localhost:3030/<dataset>/update --file update.ru
>>
>> /usr/lib/ruby/1.9.1/net/protocol.rb:141:in `read_nonblock': end of file
>> reached (EOFError)
>>          from /usr/lib/ruby/1.9.1/net/protocol.rb:141:in `rbuf_fill'
>>          from /usr/lib/ruby/1.9.1/net/protocol.rb:122:in `readuntil'
>>          from /usr/lib/ruby/1.9.1/net/protocol.rb:132:in `readline'
>>          from /usr/lib/ruby/1.9.1/net/http.rb:2563:in `read_status_line'
>>          from /usr/lib/ruby/1.9.1/net/http.rb:2552:in `read_new'
>>          from /usr/lib/ruby/1.9.1/net/http.rb:1320:in `block in
>> transport_request'
>>          from /usr/lib/ruby/1.9.1/net/http.rb:1317:in `catch'
>>          from /usr/lib/ruby/1.9.1/net/http.rb:1317:in 
>> `transport_request'
>>          from /usr/lib/ruby/1.9.1/net/http.rb:1294:in `request'
>>          from /opt/apache-jena-fuseki-2.3.1/bin/s-update:221:in
>> `response_no_body'
>>          from /opt/apache-jena-fuseki-2.3.1/bin/s-update:614:in
>> `SPARQL_update'
>>          from /opt/apache-jena-fuseki-2.3.1/bin/s-update:681:in
>> `cmd_sparql_update'
>>          from /opt/apache-jena-fuseki-2.3.1/bin/s-update:708:in `<main>'
>>
>> The same error will occur with ruby > 2.0 (but no backtrace printed 
>> out):
>>
>> /opt/apache-jena-fuseki-2.3.1/bin/s-update: end of file reached 
>> (EOFError)
>>
>> Do you have any hit, please?
>>
>> Thanks
>> Sorin
>>
>> Am 04.05.2016 um 14:54 schrieb Andy Seaborne:
>>> Hi there,
>>>
>>> This looks like something to do with the solr setup.  I'm not very
>>> familiar with solr, is there some configuration that affects timeouts
>>> on connections? I don't think Jena does any timeouts itself.
>>>
>>>     Andy
>>>
>>> On 03/05/16 08:50, Sorin Gheorghiu wrote:
>>>> After Solr server restart, it looks like the indexes aren't corrupted.
>>>> Thus, it seems the error isn't critical and I may ignore it.
>>>>
>>>> But my expectation was that the insert command will add the new
>>>> parameter to Jena TDB and not to Solr.
>>>>
>>>>
>>>> -------- Weitergeleitete Nachricht --------
>>>> Betreff:     Re: Error during text index
>>>> Datum:     Mon, 2 May 2016 20:05:37 +0200
>>>> Von:     Sorin Gheorghiu <so...@uni-konstanz.de>
>>>> An:     users@jena.apache.org
>>>>
>>>>
>>>>
>>>> Hi Andy,
>>>>
>>>> after 2 attempts to insert the new SKOS variable, I got the following
>>>> error:
>>>>
>>>> org.apache.jena.query.text.TextIndexException:
>>>> org.apache.solr.client.solrj.SolrServerException: IOException occured
>>>> when talking to server at: http://localhost:8983/solr/GND100316_550
>>>> ............................................................................................................................... 
>>>>
>>>>
>>>>
>>>>
>>>> [2016-05-02 19:23:40] Fuseki     INFO  [4] 500
>>>> org.apache.solr.client.solrj.SolrServerException: IOException occured
>>>> when talking to server at: http://localhost:8983/solr/GND100316_550
>>>> (30,147.934 s)
>>>>
>>>> This occured after more than 8 hours, but it failed before the
>>>> completion.
>>>>
>>>> No related Solr error was printed out in the logs in that moment, but
>>>> when I refreshed the Solr page http://localhost:8983/solr/#/~cores, 
>>>> then
>>>> I got:
>>>>
>>>> 30852656 INFO  (qtp1013423070-18) [   ] o.a.s.s.HttpSolrCall [admin]
>>>> webapp=null path=/admin/info/system params={wt=json&_=1462210386319}
>>>> status=0 QTime=1758
>>>> 30854518 ERROR (qtp1013423070-20) [   ] o.a.s.h.RequestHandlerBase
>>>> org.apache.solr.common.SolrException: Error handling 'status' action
>>>> ............................................................................................................................... 
>>>>
>>>>
>>>>
>>>> Caused by: java.nio.file.NoSuchFileException:
>>>> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_1
>>>>
>>>> Indeed, there is no *segments_1* file in ../data/index/ but a 
>>>> different
>>>> one:
>>>>
>>>> # ls -lrt 
>>>> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments*
>>>> -rw-r--r-- 1 root root 937 May  2 17:42
>>>> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_10r
>>>>
>>>> I could provide the backtrace if needed. Could you help me to 
>>>> understand
>>>> the root cause please?
>>>>
>>>> Thank you
>>>> Sorin
>>>>
>>>>
>>>> Am 29.04.2016 um 12:20 schrieb Andy Seaborne:
>>>>> The use of rdf:type seems to mix being a displayable label and a 
>>>>> class
>>>>> type.
>>>>>
>>>>> Maybe adding skos:prefLabel to keep the display label is worth doing.
>>>>>
>>>>> You can extract the fragment from a URI with:
>>>>>
>>>>>     STRAFTER(STR(<http://example/foo#bar>), "#")'
>>>>>
>>>>> (untested):
>>>>>
>>>>> INSERT { ?s skos:prefLabel ?label }
>>>>> WHERE {
>>>>>    ?s a ?T .
>>>>>    BIND ( ?label as STRAFTER(STR(?T), "#")
>>>>> }
>>>>>
>>>>>
>>>>> On 29/04/16 09:25, Sorin Gheorghiu wrote:
>>>>>> Hi Osma,
>>>>>>
>>>>>> I do need the type in the text index to get faster results than 
>>>>>> using
>>>>>> sparql queries.
>>>>>>
>>>>>> I found an analyzer which could replace the URI with the string 
>>>>>> type,
>>>>>> but I cannot use it as long as the non-literal are skiped.
>>>>>>
>>>>>>      <fieldType name="text_type_gnd" class="solr.TextField" >
>>>>>>        <analyzer>
>>>>>>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>>          <filter class="solr.PatternReplaceFilterFactory" pattern="
>>>>>> http://d-nb.info/standards/elementset/gnd#" replacement=""
>>>>>> replace="all" />
>>>>>>        </analyzer>
>>>>>>      </fieldType>
>>>>>>
>>>>>> I am still looking for a workaround for this case.
>>>>>>
>>>>>> Thanks,
>>>>>> Sorin
>>>>>>
>>>>>> Am 29.04.2016 um 08:43 schrieb Osma Suominen:
>>>>>>> Hi Sorin!
>>>>>>>
>>>>>>> Why do you need the type in the text index? The text index is
>>>>>>> designed
>>>>>>> to store literals. It does not know how to handle URIs at all.
>>>>>>>
>>>>>>> Generally what you would do to combine text search with a 
>>>>>>> restriction
>>>>>>> on rdf:type is to use separate query patterns, e.g.
>>>>>>>
>>>>>>> {
>>>>>>>    ?s text:query 'nuclear' .
>>>>>>>    ?s a gndo:SeriesOfConferenceOrEvent .
>>>>>>> }
>>>>>>>
>>>>>>> -Osma
>>>>>>>
>>>>>>>
>>>>>>> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>>>>>>>> Hi Andy,
>>>>>>>>
>>>>>>>> I need just the type of the entry, from the example just the last
>>>>>>>> part
>>>>>>>> 'SeriesOfConferenceOrEvent'.
>>>>>>>> If possible I would set an analyser which would trim the first
>>>>>>>> part, but
>>>>>>>> I don't know how.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Sorin
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>>>>>>>> Hi Sorin,
>>>>>>>>>
>>>>>>>>> I'm curious as to why you are indexing a URI and what you see the
>>>>>>>>> benefit of that.  You might at least want to set the analyser
>>>>>>>>> carefully.
>>>>>>>>>
>>>>>>>>>     Andy
>>>>>>>>>
>>>>>>>>> PS I fixed the cause of the "UnsupportedOperationException" but
>>>>>>>>> only
>>>>>>>>> in the sense that it now issues a warning and skips the
>>>>>>>>> non-literal.
>>>>>>>>> The test for being a literal or not was there ... but after 
>>>>>>>>> calling
>>>>>>>>> getLiteral.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> Jena text index returned the following error:
>>>>>>>>>>
>>>>>>>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>>>>>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>>>>>>>> java.lang.UnsupportedOperationException:
>>>>>>>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> is
>>>>>>>>>> not a literal node
>>>>>>>>>>          at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>>>>>>>          at
>>>>>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80) 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>          at
>>>>>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67) 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>          at jena.textindexer.exec(textindexer.java:122)
>>>>>>>>>>          at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>>>>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>>>>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>>>>>>>          at jena.textindexer.main(textindexer.java:51)
>>>>>>>>>>
>>>>>>>>>> when attempted to index entries like:
>>>>>>>>>>
>>>>>>>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>>>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>>>>>>
>>>>>>>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>>>>>>>          gndo:variantNameForTheConferenceOrEvent "Conferentie 
>>>>>>>>>> van
>>>>>>>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon
>>>>>>>>>> States" ;
>>>>>>>>>> gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>>>>>>>> Non-Nuclear Weapon States" ;
>>>>>>>>>>          a gndo:SeriesOfConferenceOrEvent .
>>>>>>>>>>
>>>>>>>>>> Here is the EntityMap assembler setup:
>>>>>>>>>>
>>>>>>>>>> <#entMap> a text:EntityMap ;
>>>>>>>>>>      text:entityField      "gndUri" ;
>>>>>>>>>>      text:defaultField     "prefName" ; ## Must be defined in 
>>>>>>>>>> the
>>>>>>>>>> text:map
>>>>>>>>>>      text:map (
>>>>>>>>>>           [ text:field "prefName";
>>>>>>>>>>             text:predicate 
>>>>>>>>>> gndo:preferredNameForTheSubjectHeading
>>>>>>>>>>           ]
>>>>>>>>>>           [ text:field "type";
>>>>>>>>>>             text:predicate rdf:type
>>>>>>>>>>           ]
>>>>>>>>>>           ...
>>>>>>>>>>
>>>>>>>>>> 'type' contains an URL, but a literal node is expected instead.
>>>>>>>>>> There is no difference if 'type' is defined as 'text' or
>>>>>>>>>> 'string' in
>>>>>>>>>> Solr schema.xml.
>>>>>>>>>>
>>>>>>>>>> How is possible to fix it?
>>>>>>>>>>
>>>>>>>>>> Thank you in advance,
>>>>>>>>>> Sorin
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

-- 
Sorin Gheorghiu             Tel: +49 7531 88-3198
Universit�t Konstanz        Raum: B703
78464 Konstanz              sorin.gheorghiu@uni-konstanz.de

- KIM: Abteilung Contentdienste -

Re: Fwd: Re: Error during text index

Posted by Andy Seaborne <an...@apache.org>.

Hi,

It's not clear to me what's happening.  The server log may offer some 
more information.  It's as if the response in truncated somehow.

You could try using curl or wget to make the request.  They can also 
print out the HTTP headers.

  Andy



On 12/05/16 19:43, Sorin Gheorghiu wrote:
> Hi,
>
> the attempt to perform a sparql insert using *s-update* has failed with
> the error:
>
> # /opt/apache-jena-fuseki-2.3.1/bin/s-update
> --service=http://localhost:3030/<dataset>/update --file update.ru
>
> /usr/lib/ruby/1.9.1/net/protocol.rb:141:in `read_nonblock': end of file
> reached (EOFError)
>          from /usr/lib/ruby/1.9.1/net/protocol.rb:141:in `rbuf_fill'
>          from /usr/lib/ruby/1.9.1/net/protocol.rb:122:in `readuntil'
>          from /usr/lib/ruby/1.9.1/net/protocol.rb:132:in `readline'
>          from /usr/lib/ruby/1.9.1/net/http.rb:2563:in `read_status_line'
>          from /usr/lib/ruby/1.9.1/net/http.rb:2552:in `read_new'
>          from /usr/lib/ruby/1.9.1/net/http.rb:1320:in `block in
> transport_request'
>          from /usr/lib/ruby/1.9.1/net/http.rb:1317:in `catch'
>          from /usr/lib/ruby/1.9.1/net/http.rb:1317:in `transport_request'
>          from /usr/lib/ruby/1.9.1/net/http.rb:1294:in `request'
>          from /opt/apache-jena-fuseki-2.3.1/bin/s-update:221:in
> `response_no_body'
>          from /opt/apache-jena-fuseki-2.3.1/bin/s-update:614:in
> `SPARQL_update'
>          from /opt/apache-jena-fuseki-2.3.1/bin/s-update:681:in
> `cmd_sparql_update'
>          from /opt/apache-jena-fuseki-2.3.1/bin/s-update:708:in `<main>'
>
> The same error will occur with ruby > 2.0 (but no backtrace printed out):
>
> /opt/apache-jena-fuseki-2.3.1/bin/s-update: end of file reached (EOFError)
>
> Do you have any hit, please?
>
> Thanks
> Sorin
>
> Am 04.05.2016 um 14:54 schrieb Andy Seaborne:
>> Hi there,
>>
>> This looks like something to do with the solr setup.  I'm not very
>> familiar with solr, is there some configuration that affects timeouts
>> on connections? I don't think Jena does any timeouts itself.
>>
>>     Andy
>>
>> On 03/05/16 08:50, Sorin Gheorghiu wrote:
>>> After Solr server restart, it looks like the indexes aren't corrupted.
>>> Thus, it seems the error isn't critical and I may ignore it.
>>>
>>> But my expectation was that the insert command will add the new
>>> parameter to Jena TDB and not to Solr.
>>>
>>>
>>> -------- Weitergeleitete Nachricht --------
>>> Betreff:     Re: Error during text index
>>> Datum:     Mon, 2 May 2016 20:05:37 +0200
>>> Von:     Sorin Gheorghiu <so...@uni-konstanz.de>
>>> An:     users@jena.apache.org
>>>
>>>
>>>
>>> Hi Andy,
>>>
>>> after 2 attempts to insert the new SKOS variable, I got the following
>>> error:
>>>
>>> org.apache.jena.query.text.TextIndexException:
>>> org.apache.solr.client.solrj.SolrServerException: IOException occured
>>> when talking to server at: http://localhost:8983/solr/GND100316_550
>>> ...............................................................................................................................
>>>
>>>
>>>
>>> [2016-05-02 19:23:40] Fuseki     INFO  [4] 500
>>> org.apache.solr.client.solrj.SolrServerException: IOException occured
>>> when talking to server at: http://localhost:8983/solr/GND100316_550
>>> (30,147.934 s)
>>>
>>> This occured after more than 8 hours, but it failed before the
>>> completion.
>>>
>>> No related Solr error was printed out in the logs in that moment, but
>>> when I refreshed the Solr page http://localhost:8983/solr/#/~cores, then
>>> I got:
>>>
>>> 30852656 INFO  (qtp1013423070-18) [   ] o.a.s.s.HttpSolrCall [admin]
>>> webapp=null path=/admin/info/system params={wt=json&_=1462210386319}
>>> status=0 QTime=1758
>>> 30854518 ERROR (qtp1013423070-20) [   ] o.a.s.h.RequestHandlerBase
>>> org.apache.solr.common.SolrException: Error handling 'status' action
>>> ...............................................................................................................................
>>>
>>>
>>> Caused by: java.nio.file.NoSuchFileException:
>>> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_1
>>>
>>> Indeed, there is no *segments_1* file in ../data/index/ but a different
>>> one:
>>>
>>> # ls -lrt /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments*
>>> -rw-r--r-- 1 root root 937 May  2 17:42
>>> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_10r
>>>
>>> I could provide the backtrace if needed. Could you help me to understand
>>> the root cause please?
>>>
>>> Thank you
>>> Sorin
>>>
>>>
>>> Am 29.04.2016 um 12:20 schrieb Andy Seaborne:
>>>> The use of rdf:type seems to mix being a displayable label and a class
>>>> type.
>>>>
>>>> Maybe adding skos:prefLabel to keep the display label is worth doing.
>>>>
>>>> You can extract the fragment from a URI with:
>>>>
>>>>     STRAFTER(STR(<http://example/foo#bar>), "#")'
>>>>
>>>> (untested):
>>>>
>>>> INSERT { ?s skos:prefLabel ?label }
>>>> WHERE {
>>>>    ?s a ?T .
>>>>    BIND ( ?label as STRAFTER(STR(?T), "#")
>>>> }
>>>>
>>>>
>>>> On 29/04/16 09:25, Sorin Gheorghiu wrote:
>>>>> Hi Osma,
>>>>>
>>>>> I do need the type in the text index to get faster results than using
>>>>> sparql queries.
>>>>>
>>>>> I found an analyzer which could replace the URI with the string type,
>>>>> but I cannot use it as long as the non-literal are skiped.
>>>>>
>>>>>      <fieldType name="text_type_gnd" class="solr.TextField" >
>>>>>        <analyzer>
>>>>>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>          <filter class="solr.PatternReplaceFilterFactory" pattern="
>>>>> http://d-nb.info/standards/elementset/gnd#" replacement=""
>>>>> replace="all" />
>>>>>        </analyzer>
>>>>>      </fieldType>
>>>>>
>>>>> I am still looking for a workaround for this case.
>>>>>
>>>>> Thanks,
>>>>> Sorin
>>>>>
>>>>> Am 29.04.2016 um 08:43 schrieb Osma Suominen:
>>>>>> Hi Sorin!
>>>>>>
>>>>>> Why do you need the type in the text index? The text index is
>>>>>> designed
>>>>>> to store literals. It does not know how to handle URIs at all.
>>>>>>
>>>>>> Generally what you would do to combine text search with a restriction
>>>>>> on rdf:type is to use separate query patterns, e.g.
>>>>>>
>>>>>> {
>>>>>>    ?s text:query 'nuclear' .
>>>>>>    ?s a gndo:SeriesOfConferenceOrEvent .
>>>>>> }
>>>>>>
>>>>>> -Osma
>>>>>>
>>>>>>
>>>>>> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>>>>>>> Hi Andy,
>>>>>>>
>>>>>>> I need just the type of the entry, from the example just the last
>>>>>>> part
>>>>>>> 'SeriesOfConferenceOrEvent'.
>>>>>>> If possible I would set an analyser which would trim the first
>>>>>>> part, but
>>>>>>> I don't know how.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Sorin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>>>>>>> Hi Sorin,
>>>>>>>>
>>>>>>>> I'm curious as to why you are indexing a URI and what you see the
>>>>>>>> benefit of that.  You might at least want to set the analyser
>>>>>>>> carefully.
>>>>>>>>
>>>>>>>>     Andy
>>>>>>>>
>>>>>>>> PS I fixed the cause of the "UnsupportedOperationException" but
>>>>>>>> only
>>>>>>>> in the sense that it now issues a warning and skips the
>>>>>>>> non-literal.
>>>>>>>> The test for being a literal or not was there ... but after calling
>>>>>>>> getLiteral.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Jena text index returned the following error:
>>>>>>>>>
>>>>>>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>>>>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>>>>>>> java.lang.UnsupportedOperationException:
>>>>>>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent
>>>>>>>>>
>>>>>>>>> is
>>>>>>>>> not a literal node
>>>>>>>>>          at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>>>>>>          at
>>>>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>          at
>>>>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>          at jena.textindexer.exec(textindexer.java:122)
>>>>>>>>>          at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>>>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>>>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>>>>>>          at jena.textindexer.main(textindexer.java:51)
>>>>>>>>>
>>>>>>>>> when attempted to index entries like:
>>>>>>>>>
>>>>>>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>>>>>
>>>>>>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>>>>>>          gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>>>>>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon
>>>>>>>>> States" ;
>>>>>>>>>          gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>>>>>>> Non-Nuclear Weapon States" ;
>>>>>>>>>          a gndo:SeriesOfConferenceOrEvent .
>>>>>>>>>
>>>>>>>>> Here is the EntityMap assembler setup:
>>>>>>>>>
>>>>>>>>> <#entMap> a text:EntityMap ;
>>>>>>>>>      text:entityField      "gndUri" ;
>>>>>>>>>      text:defaultField     "prefName" ; ## Must be defined in the
>>>>>>>>> text:map
>>>>>>>>>      text:map (
>>>>>>>>>           [ text:field "prefName";
>>>>>>>>>             text:predicate gndo:preferredNameForTheSubjectHeading
>>>>>>>>>           ]
>>>>>>>>>           [ text:field "type";
>>>>>>>>>             text:predicate rdf:type
>>>>>>>>>           ]
>>>>>>>>>           ...
>>>>>>>>>
>>>>>>>>> 'type' contains an URL, but a literal node is expected instead.
>>>>>>>>> There is no difference if 'type' is defined as 'text' or
>>>>>>>>> 'string' in
>>>>>>>>> Solr schema.xml.
>>>>>>>>>
>>>>>>>>> How is possible to fix it?
>>>>>>>>>
>>>>>>>>> Thank you in advance,
>>>>>>>>> Sorin
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Fwd: Re: Error during text index

Posted by Sorin Gheorghiu <so...@uni-konstanz.de>.

Hi,

the attempt to perform a sparql insert using *s-update* has failed with 
the error:

# /opt/apache-jena-fuseki-2.3.1/bin/s-update 
--service=http://localhost:3030/<dataset>/update --file update.ru

/usr/lib/ruby/1.9.1/net/protocol.rb:141:in `read_nonblock': end of file 
reached (EOFError)
         from /usr/lib/ruby/1.9.1/net/protocol.rb:141:in `rbuf_fill'
         from /usr/lib/ruby/1.9.1/net/protocol.rb:122:in `readuntil'
         from /usr/lib/ruby/1.9.1/net/protocol.rb:132:in `readline'
         from /usr/lib/ruby/1.9.1/net/http.rb:2563:in `read_status_line'
         from /usr/lib/ruby/1.9.1/net/http.rb:2552:in `read_new'
         from /usr/lib/ruby/1.9.1/net/http.rb:1320:in `block in 
transport_request'
         from /usr/lib/ruby/1.9.1/net/http.rb:1317:in `catch'
         from /usr/lib/ruby/1.9.1/net/http.rb:1317:in `transport_request'
         from /usr/lib/ruby/1.9.1/net/http.rb:1294:in `request'
         from /opt/apache-jena-fuseki-2.3.1/bin/s-update:221:in 
`response_no_body'
         from /opt/apache-jena-fuseki-2.3.1/bin/s-update:614:in 
`SPARQL_update'
         from /opt/apache-jena-fuseki-2.3.1/bin/s-update:681:in 
`cmd_sparql_update'
         from /opt/apache-jena-fuseki-2.3.1/bin/s-update:708:in `<main>'

The same error will occur with ruby > 2.0 (but no backtrace printed out):

/opt/apache-jena-fuseki-2.3.1/bin/s-update: end of file reached (EOFError)

Do you have any hit, please?

Thanks
Sorin

Am 04.05.2016 um 14:54 schrieb Andy Seaborne:
> Hi there,
>
> This looks like something to do with the solr setup.  I'm not very 
> familiar with solr, is there some configuration that affects timeouts 
> on connections? I don't think Jena does any timeouts itself.
>
>     Andy
>
> On 03/05/16 08:50, Sorin Gheorghiu wrote:
>> After Solr server restart, it looks like the indexes aren't corrupted.
>> Thus, it seems the error isn't critical and I may ignore it.
>>
>> But my expectation was that the insert command will add the new
>> parameter to Jena TDB and not to Solr.
>>
>>
>> -------- Weitergeleitete Nachricht --------
>> Betreff:     Re: Error during text index
>> Datum:     Mon, 2 May 2016 20:05:37 +0200
>> Von:     Sorin Gheorghiu <so...@uni-konstanz.de>
>> An:     users@jena.apache.org
>>
>>
>>
>> Hi Andy,
>>
>> after 2 attempts to insert the new SKOS variable, I got the following
>> error:
>>
>> org.apache.jena.query.text.TextIndexException:
>> org.apache.solr.client.solrj.SolrServerException: IOException occured
>> when talking to server at: http://localhost:8983/solr/GND100316_550
>> ............................................................................................................................... 
>>
>>
>>
>> [2016-05-02 19:23:40] Fuseki     INFO  [4] 500
>> org.apache.solr.client.solrj.SolrServerException: IOException occured
>> when talking to server at: http://localhost:8983/solr/GND100316_550
>> (30,147.934 s)
>>
>> This occured after more than 8 hours, but it failed before the 
>> completion.
>>
>> No related Solr error was printed out in the logs in that moment, but
>> when I refreshed the Solr page http://localhost:8983/solr/#/~cores, then
>> I got:
>>
>> 30852656 INFO  (qtp1013423070-18) [   ] o.a.s.s.HttpSolrCall [admin]
>> webapp=null path=/admin/info/system params={wt=json&_=1462210386319}
>> status=0 QTime=1758
>> 30854518 ERROR (qtp1013423070-20) [   ] o.a.s.h.RequestHandlerBase
>> org.apache.solr.common.SolrException: Error handling 'status' action
>> ............................................................................................................................... 
>>
>>
>> Caused by: java.nio.file.NoSuchFileException:
>> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_1
>>
>> Indeed, there is no *segments_1* file in ../data/index/ but a different
>> one:
>>
>> # ls -lrt /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments*
>> -rw-r--r-- 1 root root 937 May  2 17:42
>> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_10r
>>
>> I could provide the backtrace if needed. Could you help me to understand
>> the root cause please?
>>
>> Thank you
>> Sorin
>>
>>
>> Am 29.04.2016 um 12:20 schrieb Andy Seaborne:
>>> The use of rdf:type seems to mix being a displayable label and a class
>>> type.
>>>
>>> Maybe adding skos:prefLabel to keep the display label is worth doing.
>>>
>>> You can extract the fragment from a URI with:
>>>
>>>     STRAFTER(STR(<http://example/foo#bar>), "#")'
>>>
>>> (untested):
>>>
>>> INSERT { ?s skos:prefLabel ?label }
>>> WHERE {
>>>    ?s a ?T .
>>>    BIND ( ?label as STRAFTER(STR(?T), "#")
>>> }
>>>
>>>
>>> On 29/04/16 09:25, Sorin Gheorghiu wrote:
>>>> Hi Osma,
>>>>
>>>> I do need the type in the text index to get faster results than using
>>>> sparql queries.
>>>>
>>>> I found an analyzer which could replace the URI with the string type,
>>>> but I cannot use it as long as the non-literal are skiped.
>>>>
>>>>      <fieldType name="text_type_gnd" class="solr.TextField" >
>>>>        <analyzer>
>>>>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>          <filter class="solr.PatternReplaceFilterFactory" pattern="
>>>> http://d-nb.info/standards/elementset/gnd#" replacement=""
>>>> replace="all" />
>>>>        </analyzer>
>>>>      </fieldType>
>>>>
>>>> I am still looking for a workaround for this case.
>>>>
>>>> Thanks,
>>>> Sorin
>>>>
>>>> Am 29.04.2016 um 08:43 schrieb Osma Suominen:
>>>>> Hi Sorin!
>>>>>
>>>>> Why do you need the type in the text index? The text index is 
>>>>> designed
>>>>> to store literals. It does not know how to handle URIs at all.
>>>>>
>>>>> Generally what you would do to combine text search with a restriction
>>>>> on rdf:type is to use separate query patterns, e.g.
>>>>>
>>>>> {
>>>>>    ?s text:query 'nuclear' .
>>>>>    ?s a gndo:SeriesOfConferenceOrEvent .
>>>>> }
>>>>>
>>>>> -Osma
>>>>>
>>>>>
>>>>> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>>>>>> Hi Andy,
>>>>>>
>>>>>> I need just the type of the entry, from the example just the last 
>>>>>> part
>>>>>> 'SeriesOfConferenceOrEvent'.
>>>>>> If possible I would set an analyser which would trim the first
>>>>>> part, but
>>>>>> I don't know how.
>>>>>>
>>>>>> Thanks
>>>>>> Sorin
>>>>>>
>>>>>>
>>>>>>
>>>>>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>>>>>> Hi Sorin,
>>>>>>>
>>>>>>> I'm curious as to why you are indexing a URI and what you see the
>>>>>>> benefit of that.  You might at least want to set the analyser
>>>>>>> carefully.
>>>>>>>
>>>>>>>     Andy
>>>>>>>
>>>>>>> PS I fixed the cause of the "UnsupportedOperationException" but 
>>>>>>> only
>>>>>>> in the sense that it now issues a warning and skips the 
>>>>>>> non-literal.
>>>>>>> The test for being a literal or not was there ... but after calling
>>>>>>> getLiteral.
>>>>>>>
>>>>>>>
>>>>>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Jena text index returned the following error:
>>>>>>>>
>>>>>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>>>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>>>>>> java.lang.UnsupportedOperationException:
>>>>>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent 
>>>>>>>>
>>>>>>>> is
>>>>>>>> not a literal node
>>>>>>>>          at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>>>>>          at
>>>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80) 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>          at
>>>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67) 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>          at jena.textindexer.exec(textindexer.java:122)
>>>>>>>>          at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>>>>>          at jena.textindexer.main(textindexer.java:51)
>>>>>>>>
>>>>>>>> when attempted to index entries like:
>>>>>>>>
>>>>>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>>>>
>>>>>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>>>>>          gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>>>>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon 
>>>>>>>> States" ;
>>>>>>>>          gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>>>>>> Non-Nuclear Weapon States" ;
>>>>>>>>          a gndo:SeriesOfConferenceOrEvent .
>>>>>>>>
>>>>>>>> Here is the EntityMap assembler setup:
>>>>>>>>
>>>>>>>> <#entMap> a text:EntityMap ;
>>>>>>>>      text:entityField      "gndUri" ;
>>>>>>>>      text:defaultField     "prefName" ; ## Must be defined in the
>>>>>>>> text:map
>>>>>>>>      text:map (
>>>>>>>>           [ text:field "prefName";
>>>>>>>>             text:predicate gndo:preferredNameForTheSubjectHeading
>>>>>>>>           ]
>>>>>>>>           [ text:field "type";
>>>>>>>>             text:predicate rdf:type
>>>>>>>>           ]
>>>>>>>>           ...
>>>>>>>>
>>>>>>>> 'type' contains an URL, but a literal node is expected instead.
>>>>>>>> There is no difference if 'type' is defined as 'text' or 
>>>>>>>> 'string' in
>>>>>>>> Solr schema.xml.
>>>>>>>>
>>>>>>>> How is possible to fix it?
>>>>>>>>
>>>>>>>> Thank you in advance,
>>>>>>>> Sorin
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

-- 
Sorin Gheorghiu             Tel: +49 7531 88-3198
Universit�t Konstanz        Raum: B703
78464 Konstanz              sorin.gheorghiu@uni-konstanz.de

- KIM: Abteilung Contentdienste -

Re: Fwd: Re: Error during text index

Posted by Andy Seaborne <an...@apache.org>.

Hi there,

This looks like something to do with the solr setup.  I'm not very 
familiar with solr, is there some configuration that affects timeouts on 
connections? I don't think Jena does any timeouts itself.

     Andy

On 03/05/16 08:50, Sorin Gheorghiu wrote:
> After Solr server restart, it looks like the indexes aren't corrupted.
> Thus, it seems the error isn't critical and I may ignore it.
>
> But my expectation was that the insert command will add the new
> parameter to Jena TDB and not to Solr.
>
>
> -------- Weitergeleitete Nachricht --------
> Betreff:     Re: Error during text index
> Datum:     Mon, 2 May 2016 20:05:37 +0200
> Von:     Sorin Gheorghiu <so...@uni-konstanz.de>
> An:     users@jena.apache.org
>
>
>
> Hi Andy,
>
> after 2 attempts to insert the new SKOS variable, I got the following
> error:
>
> org.apache.jena.query.text.TextIndexException:
> org.apache.solr.client.solrj.SolrServerException: IOException occured
> when talking to server at: http://localhost:8983/solr/GND100316_550
> ...............................................................................................................................
>
>
> [2016-05-02 19:23:40] Fuseki     INFO  [4] 500
> org.apache.solr.client.solrj.SolrServerException: IOException occured
> when talking to server at: http://localhost:8983/solr/GND100316_550
> (30,147.934 s)
>
> This occured after more than 8 hours, but it failed before the completion.
>
> No related Solr error was printed out in the logs in that moment, but
> when I refreshed the Solr page http://localhost:8983/solr/#/~cores, then
> I got:
>
> 30852656 INFO  (qtp1013423070-18) [   ] o.a.s.s.HttpSolrCall [admin]
> webapp=null path=/admin/info/system params={wt=json&_=1462210386319}
> status=0 QTime=1758
> 30854518 ERROR (qtp1013423070-20) [   ] o.a.s.h.RequestHandlerBase
> org.apache.solr.common.SolrException: Error handling 'status' action
> ...............................................................................................................................
>
> Caused by: java.nio.file.NoSuchFileException:
> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_1
>
> Indeed, there is no *segments_1* file in ../data/index/ but a different
> one:
>
> # ls -lrt /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments*
> -rw-r--r-- 1 root root 937 May  2 17:42
> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_10r
>
> I could provide the backtrace if needed. Could you help me to understand
> the root cause please?
>
> Thank you
> Sorin
>
>
> Am 29.04.2016 um 12:20 schrieb Andy Seaborne:
>> The use of rdf:type seems to mix being a displayable label and a class
>> type.
>>
>> Maybe adding skos:prefLabel to keep the display label is worth doing.
>>
>> You can extract the fragment from a URI with:
>>
>>     STRAFTER(STR(<http://example/foo#bar>), "#")'
>>
>> (untested):
>>
>> INSERT { ?s skos:prefLabel ?label }
>> WHERE {
>>    ?s a ?T .
>>    BIND ( ?label as STRAFTER(STR(?T), "#")
>> }
>>
>>
>> On 29/04/16 09:25, Sorin Gheorghiu wrote:
>>> Hi Osma,
>>>
>>> I do need the type in the text index to get faster results than using
>>> sparql queries.
>>>
>>> I found an analyzer which could replace the URI with the string type,
>>> but I cannot use it as long as the non-literal are skiped.
>>>
>>>      <fieldType name="text_type_gnd" class="solr.TextField" >
>>>        <analyzer>
>>>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>          <filter class="solr.PatternReplaceFilterFactory" pattern="
>>> http://d-nb.info/standards/elementset/gnd#" replacement=""
>>> replace="all" />
>>>        </analyzer>
>>>      </fieldType>
>>>
>>> I am still looking for a workaround for this case.
>>>
>>> Thanks,
>>> Sorin
>>>
>>> Am 29.04.2016 um 08:43 schrieb Osma Suominen:
>>>> Hi Sorin!
>>>>
>>>> Why do you need the type in the text index? The text index is designed
>>>> to store literals. It does not know how to handle URIs at all.
>>>>
>>>> Generally what you would do to combine text search with a restriction
>>>> on rdf:type is to use separate query patterns, e.g.
>>>>
>>>> {
>>>>    ?s text:query 'nuclear' .
>>>>    ?s a gndo:SeriesOfConferenceOrEvent .
>>>> }
>>>>
>>>> -Osma
>>>>
>>>>
>>>> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>>>>> Hi Andy,
>>>>>
>>>>> I need just the type of the entry, from the example just the last part
>>>>> 'SeriesOfConferenceOrEvent'.
>>>>> If possible I would set an analyser which would trim the first
>>>>> part, but
>>>>> I don't know how.
>>>>>
>>>>> Thanks
>>>>> Sorin
>>>>>
>>>>>
>>>>>
>>>>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>>>>> Hi Sorin,
>>>>>>
>>>>>> I'm curious as to why you are indexing a URI and what you see the
>>>>>> benefit of that.  You might at least want to set the analyser
>>>>>> carefully.
>>>>>>
>>>>>>     Andy
>>>>>>
>>>>>> PS I fixed the cause of the "UnsupportedOperationException" but only
>>>>>> in the sense that it now issues a warning and skips the non-literal.
>>>>>> The test for being a literal or not was there ... but after calling
>>>>>> getLiteral.
>>>>>>
>>>>>>
>>>>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> Jena text index returned the following error:
>>>>>>>
>>>>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>>>>> java.lang.UnsupportedOperationException:
>>>>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent
>>>>>>> is
>>>>>>> not a literal node
>>>>>>>          at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>>>>          at
>>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>          at
>>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>          at jena.textindexer.exec(textindexer.java:122)
>>>>>>>          at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>>>>          at jena.textindexer.main(textindexer.java:51)
>>>>>>>
>>>>>>> when attempted to index entries like:
>>>>>>>
>>>>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>>>
>>>>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>>>>          gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>>>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>>>>>>>          gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>>>>> Non-Nuclear Weapon States" ;
>>>>>>>          a gndo:SeriesOfConferenceOrEvent .
>>>>>>>
>>>>>>> Here is the EntityMap assembler setup:
>>>>>>>
>>>>>>> <#entMap> a text:EntityMap ;
>>>>>>>      text:entityField      "gndUri" ;
>>>>>>>      text:defaultField     "prefName" ; ## Must be defined in the
>>>>>>> text:map
>>>>>>>      text:map (
>>>>>>>           [ text:field "prefName";
>>>>>>>             text:predicate gndo:preferredNameForTheSubjectHeading
>>>>>>>           ]
>>>>>>>           [ text:field "type";
>>>>>>>             text:predicate rdf:type
>>>>>>>           ]
>>>>>>>           ...
>>>>>>>
>>>>>>> 'type' contains an URL, but a literal node is expected instead.
>>>>>>> There is no difference if 'type' is defined as 'text' or 'string' in
>>>>>>> Solr schema.xml.
>>>>>>>
>>>>>>> How is possible to fix it?
>>>>>>>
>>>>>>> Thank you in advance,
>>>>>>> Sorin
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Fwd: Re: Error during text index

Posted by Sorin Gheorghiu <so...@uni-konstanz.de>.

After Solr server restart, it looks like the indexes aren't corrupted. 
Thus, it seems the error isn't critical and I may ignore it.

But my expectation was that the insert command will add the new 
parameter to Jena TDB and not to Solr.


-------- Weitergeleitete Nachricht --------
Betreff: 	Re: Error during text index
Datum: 	Mon, 2 May 2016 20:05:37 +0200
Von: 	Sorin Gheorghiu <so...@uni-konstanz.de>
An: 	users@jena.apache.org



Hi Andy,

after 2 attempts to insert the new SKOS variable, I got the following error:

org.apache.jena.query.text.TextIndexException: 
org.apache.solr.client.solrj.SolrServerException: IOException occured 
when talking to server at: http://localhost:8983/solr/GND100316_550
...............................................................................................................................

[2016-05-02 19:23:40] Fuseki     INFO  [4] 500 
org.apache.solr.client.solrj.SolrServerException: IOException occured 
when talking to server at: http://localhost:8983/solr/GND100316_550 
(30,147.934 s)

This occured after more than 8 hours, but it failed before the completion.

No related Solr error was printed out in the logs in that moment, but 
when I refreshed the Solr page http://localhost:8983/solr/#/~cores, then 
I got:

30852656 INFO  (qtp1013423070-18) [   ] o.a.s.s.HttpSolrCall [admin] 
webapp=null path=/admin/info/system params={wt=json&_=1462210386319} 
status=0 QTime=1758
30854518 ERROR (qtp1013423070-20) [   ] o.a.s.h.RequestHandlerBase 
org.apache.solr.common.SolrException: Error handling 'status' action
...............................................................................................................................
Caused by: java.nio.file.NoSuchFileException: 
/opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_1

Indeed, there is no *segments_1* file in ../data/index/ but a different one:

# ls -lrt /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments*
-rw-r--r-- 1 root root 937 May  2 17:42 
/opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_10r

I could provide the backtrace if needed. Could you help me to understand 
the root cause please?

Thank you
Sorin


Am 29.04.2016 um 12:20 schrieb Andy Seaborne:
> The use of rdf:type seems to mix being a displayable label and a class 
> type.
>
> Maybe adding skos:prefLabel to keep the display label is worth doing.
>
> You can extract the fragment from a URI with:
>
>     STRAFTER(STR(<http://example/foo#bar>), "#")'
>
> (untested):
>
> INSERT { ?s skos:prefLabel ?label }
> WHERE {
>    ?s a ?T .
>    BIND ( ?label as STRAFTER(STR(?T), "#")
> }
>
>
> On 29/04/16 09:25, Sorin Gheorghiu wrote:
>> Hi Osma,
>>
>> I do need the type in the text index to get faster results than using
>> sparql queries.
>>
>> I found an analyzer which could replace the URI with the string type,
>> but I cannot use it as long as the non-literal are skiped.
>>
>>      <fieldType name="text_type_gnd" class="solr.TextField" >
>>        <analyzer>
>>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>          <filter class="solr.PatternReplaceFilterFactory" pattern="
>> http://d-nb.info/standards/elementset/gnd#" replacement="" 
>> replace="all" />
>>        </analyzer>
>>      </fieldType>
>>
>> I am still looking for a workaround for this case.
>>
>> Thanks,
>> Sorin
>>
>> Am 29.04.2016 um 08:43 schrieb Osma Suominen:
>>> Hi Sorin!
>>>
>>> Why do you need the type in the text index? The text index is designed
>>> to store literals. It does not know how to handle URIs at all.
>>>
>>> Generally what you would do to combine text search with a restriction
>>> on rdf:type is to use separate query patterns, e.g.
>>>
>>> {
>>>    ?s text:query 'nuclear' .
>>>    ?s a gndo:SeriesOfConferenceOrEvent .
>>> }
>>>
>>> -Osma
>>>
>>>
>>> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>>>> Hi Andy,
>>>>
>>>> I need just the type of the entry, from the example just the last part
>>>> 'SeriesOfConferenceOrEvent'.
>>>> If possible I would set an analyser which would trim the first 
>>>> part, but
>>>> I don't know how.
>>>>
>>>> Thanks
>>>> Sorin
>>>>
>>>>
>>>>
>>>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>>>> Hi Sorin,
>>>>>
>>>>> I'm curious as to why you are indexing a URI and what you see the
>>>>> benefit of that.  You might at least want to set the analyser
>>>>> carefully.
>>>>>
>>>>>     Andy
>>>>>
>>>>> PS I fixed the cause of the "UnsupportedOperationException" but only
>>>>> in the sense that it now issues a warning and skips the non-literal.
>>>>> The test for being a literal or not was there ... but after calling
>>>>> getLiteral.
>>>>>
>>>>>
>>>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Jena text index returned the following error:
>>>>>>
>>>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>>>> java.lang.UnsupportedOperationException:
>>>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent 
>>>>>> is
>>>>>> not a literal node
>>>>>>          at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>>>          at
>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80) 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>          at
>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67) 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>          at jena.textindexer.exec(textindexer.java:122)
>>>>>>          at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>>>          at jena.textindexer.main(textindexer.java:51)
>>>>>>
>>>>>> when attempted to index entries like:
>>>>>>
>>>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>>
>>>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>>>          gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>>>>>>          gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>>>> Non-Nuclear Weapon States" ;
>>>>>>          a gndo:SeriesOfConferenceOrEvent .
>>>>>>
>>>>>> Here is the EntityMap assembler setup:
>>>>>>
>>>>>> <#entMap> a text:EntityMap ;
>>>>>>      text:entityField      "gndUri" ;
>>>>>>      text:defaultField     "prefName" ; ## Must be defined in the
>>>>>> text:map
>>>>>>      text:map (
>>>>>>           [ text:field "prefName";
>>>>>>             text:predicate gndo:preferredNameForTheSubjectHeading
>>>>>>           ]
>>>>>>           [ text:field "type";
>>>>>>             text:predicate rdf:type
>>>>>>           ]
>>>>>>           ...
>>>>>>
>>>>>> 'type' contains an URL, but a literal node is expected instead.
>>>>>> There is no difference if 'type' is defined as 'text' or 'string' in
>>>>>> Solr schema.xml.
>>>>>>
>>>>>> How is possible to fix it?
>>>>>>
>>>>>> Thank you in advance,
>>>>>> Sorin
>>>>>
>>>>
>>>
>>>
>>
>

-- 
Sorin Gheorghiu             Tel: +49 7531 88-3198
Universit�t Konstanz        Raum: B703
78464 Konstanzsorin.gheorghiu@uni-konstanz.de

- KIM: Abteilung Contentdienste -

Re: Error during text index

Posted by Sorin Gheorghiu <so...@uni-konstanz.de>.

Hi Andy,

after 2 attempts to insert the new SKOS variable, I got the following error:

org.apache.jena.query.text.TextIndexException: 
org.apache.solr.client.solrj.SolrServerException: IOException occured 
when talking to server at: http://localhost:8983/solr/GND100316_550
...............................................................................................................................

[2016-05-02 19:23:40] Fuseki     INFO  [4] 500 
org.apache.solr.client.solrj.SolrServerException: IOException occured 
when talking to server at: http://localhost:8983/solr/GND100316_550 
(30,147.934 s)

This occured after more than 8 hours, but it failed before the completion.

No related Solr error was printed out in the logs in that moment, but 
when I refreshed the Solr page http://localhost:8983/solr/#/~cores, then 
I got:

30852656 INFO  (qtp1013423070-18) [   ] o.a.s.s.HttpSolrCall [admin] 
webapp=null path=/admin/info/system params={wt=json&_=1462210386319} 
status=0 QTime=1758
30854518 ERROR (qtp1013423070-20) [   ] o.a.s.h.RequestHandlerBase 
org.apache.solr.common.SolrException: Error handling 'status' action
...............................................................................................................................
Caused by: java.nio.file.NoSuchFileException: 
/opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_1

Indeed, there is no *segments_1* file in ../data/index/ but a different one:

# ls -lrt /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments*
-rw-r--r-- 1 root root 937 May  2 17:42 
/opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_10r

I could provide the backtrace if needed. Could you help me to understand 
the root cause please?

Thank you
Sorin


Am 29.04.2016 um 12:20 schrieb Andy Seaborne:
> The use of rdf:type seems to mix being a displayable label and a class 
> type.
>
> Maybe adding skos:prefLabel to keep the display label is worth doing.
>
> You can extract the fragment from a URI with:
>
>     STRAFTER(STR(<http://example/foo#bar>), "#")'
>
> (untested):
>
> INSERT { ?s skos:prefLabel ?label }
> WHERE {
>    ?s a ?T .
>    BIND ( ?label as STRAFTER(STR(?T), "#")
> }
>
>
> On 29/04/16 09:25, Sorin Gheorghiu wrote:
>> Hi Osma,
>>
>> I do need the type in the text index to get faster results than using
>> sparql queries.
>>
>> I found an analyzer which could replace the URI with the string type,
>> but I cannot use it as long as the non-literal are skiped.
>>
>>      <fieldType name="text_type_gnd" class="solr.TextField" >
>>        <analyzer>
>>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>          <filter class="solr.PatternReplaceFilterFactory" pattern="
>> http://d-nb.info/standards/elementset/gnd#" replacement="" 
>> replace="all" />
>>        </analyzer>
>>      </fieldType>
>>
>> I am still looking for a workaround for this case.
>>
>> Thanks,
>> Sorin
>>
>> Am 29.04.2016 um 08:43 schrieb Osma Suominen:
>>> Hi Sorin!
>>>
>>> Why do you need the type in the text index? The text index is designed
>>> to store literals. It does not know how to handle URIs at all.
>>>
>>> Generally what you would do to combine text search with a restriction
>>> on rdf:type is to use separate query patterns, e.g.
>>>
>>> {
>>>    ?s text:query 'nuclear' .
>>>    ?s a gndo:SeriesOfConferenceOrEvent .
>>> }
>>>
>>> -Osma
>>>
>>>
>>> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>>>> Hi Andy,
>>>>
>>>> I need just the type of the entry, from the example just the last part
>>>> 'SeriesOfConferenceOrEvent'.
>>>> If possible I would set an analyser which would trim the first 
>>>> part, but
>>>> I don't know how.
>>>>
>>>> Thanks
>>>> Sorin
>>>>
>>>>
>>>>
>>>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>>>> Hi Sorin,
>>>>>
>>>>> I'm curious as to why you are indexing a URI and what you see the
>>>>> benefit of that.  You might at least want to set the analyser
>>>>> carefully.
>>>>>
>>>>>     Andy
>>>>>
>>>>> PS I fixed the cause of the "UnsupportedOperationException" but only
>>>>> in the sense that it now issues a warning and skips the non-literal.
>>>>> The test for being a literal or not was there ... but after calling
>>>>> getLiteral.
>>>>>
>>>>>
>>>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Jena text index returned the following error:
>>>>>>
>>>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>>>> java.lang.UnsupportedOperationException:
>>>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent 
>>>>>> is
>>>>>> not a literal node
>>>>>>          at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>>>          at
>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80) 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>          at
>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67) 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>          at jena.textindexer.exec(textindexer.java:122)
>>>>>>          at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>>>          at jena.textindexer.main(textindexer.java:51)
>>>>>>
>>>>>> when attempted to index entries like:
>>>>>>
>>>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>>
>>>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>>>          gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>>>>>>          gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>>>> Non-Nuclear Weapon States" ;
>>>>>>          a gndo:SeriesOfConferenceOrEvent .
>>>>>>
>>>>>> Here is the EntityMap assembler setup:
>>>>>>
>>>>>> <#entMap> a text:EntityMap ;
>>>>>>      text:entityField      "gndUri" ;
>>>>>>      text:defaultField     "prefName" ; ## Must be defined in the
>>>>>> text:map
>>>>>>      text:map (
>>>>>>           [ text:field "prefName";
>>>>>>             text:predicate gndo:preferredNameForTheSubjectHeading
>>>>>>           ]
>>>>>>           [ text:field "type";
>>>>>>             text:predicate rdf:type
>>>>>>           ]
>>>>>>           ...
>>>>>>
>>>>>> 'type' contains an URL, but a literal node is expected instead.
>>>>>> There is no difference if 'type' is defined as 'text' or 'string' in
>>>>>> Solr schema.xml.
>>>>>>
>>>>>> How is possible to fix it?
>>>>>>
>>>>>> Thank you in advance,
>>>>>> Sorin
>>>>>
>>>>
>>>
>>>
>>
>

-- 
Sorin Gheorghiu             Tel: +49 7531 88-3198
Universit�t Konstanz        Raum: B703
78464 Konstanz              sorin.gheorghiu@uni-konstanz.de

- KIM: Abteilung Contentdienste -

Re: Error during text index

Posted by Andy Seaborne <an...@apache.org>.

The use of rdf:type seems to mix being a displayable label and a class type.

Maybe adding skos:prefLabel to keep the display label is worth doing.

You can extract the fragment from a URI with:

     STRAFTER(STR(<http://example/foo#bar>), "#")'

(untested):

INSERT { ?s skos:prefLabel ?label }
WHERE {
    ?s a ?T .
    BIND ( ?label as STRAFTER(STR(?T), "#")
}


On 29/04/16 09:25, Sorin Gheorghiu wrote:
> Hi Osma,
>
> I do need the type in the text index to get faster results than using
> sparql queries.
>
> I found an analyzer which could replace the URI with the string type,
> but I cannot use it as long as the non-literal are skiped.
>
>      <fieldType name="text_type_gnd" class="solr.TextField" >
>        <analyzer>
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.PatternReplaceFilterFactory" pattern="
> http://d-nb.info/standards/elementset/gnd#" replacement="" replace="all" />
>        </analyzer>
>      </fieldType>
>
> I am still looking for a workaround for this case.
>
> Thanks,
> Sorin
>
> Am 29.04.2016 um 08:43 schrieb Osma Suominen:
>> Hi Sorin!
>>
>> Why do you need the type in the text index? The text index is designed
>> to store literals. It does not know how to handle URIs at all.
>>
>> Generally what you would do to combine text search with a restriction
>> on rdf:type is to use separate query patterns, e.g.
>>
>> {
>>    ?s text:query 'nuclear' .
>>    ?s a gndo:SeriesOfConferenceOrEvent .
>> }
>>
>> -Osma
>>
>>
>> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>>> Hi Andy,
>>>
>>> I need just the type of the entry, from the example just the last part
>>> 'SeriesOfConferenceOrEvent'.
>>> If possible I would set an analyser which would trim the first part, but
>>> I don't know how.
>>>
>>> Thanks
>>> Sorin
>>>
>>>
>>>
>>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>>> Hi Sorin,
>>>>
>>>> I'm curious as to why you are indexing a URI and what you see the
>>>> benefit of that.  You might at least want to set the analyser
>>>> carefully.
>>>>
>>>>     Andy
>>>>
>>>> PS I fixed the cause of the "UnsupportedOperationException" but only
>>>> in the sense that it now issues a warning and skips the non-literal.
>>>> The test for being a literal or not was there ... but after calling
>>>> getLiteral.
>>>>
>>>>
>>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>>> Hello,
>>>>>
>>>>> Jena text index returned the following error:
>>>>>
>>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>>> java.lang.UnsupportedOperationException:
>>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent is
>>>>> not a literal node
>>>>>          at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>>          at
>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>>>>>
>>>>>
>>>>>
>>>>>          at
>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>>>>>
>>>>>
>>>>>
>>>>>          at jena.textindexer.exec(textindexer.java:122)
>>>>>          at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>>          at jena.textindexer.main(textindexer.java:51)
>>>>>
>>>>> when attempted to index entries like:
>>>>>
>>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>
>>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>>          gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>>>>>          gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>>> Non-Nuclear Weapon States" ;
>>>>>          a gndo:SeriesOfConferenceOrEvent .
>>>>>
>>>>> Here is the EntityMap assembler setup:
>>>>>
>>>>> <#entMap> a text:EntityMap ;
>>>>>      text:entityField      "gndUri" ;
>>>>>      text:defaultField     "prefName" ; ## Must be defined in the
>>>>> text:map
>>>>>      text:map (
>>>>>           [ text:field "prefName";
>>>>>             text:predicate gndo:preferredNameForTheSubjectHeading
>>>>>           ]
>>>>>           [ text:field "type";
>>>>>             text:predicate rdf:type
>>>>>           ]
>>>>>           ...
>>>>>
>>>>> 'type' contains an URL, but a literal node is expected instead.
>>>>> There is no difference if 'type' is defined as 'text' or 'string' in
>>>>> Solr schema.xml.
>>>>>
>>>>> How is possible to fix it?
>>>>>
>>>>> Thank you in advance,
>>>>> Sorin
>>>>
>>>
>>
>>
>

Re: Error during text index

Posted by Sorin Gheorghiu <so...@uni-konstanz.de>.

Hi Osma,

I do need the type in the text index to get faster results than using 
sparql queries.

I found an analyzer which could replace the URI with the string type, 
but I cannot use it as long as the non-literal are skiped.

     <fieldType name="text_type_gnd" class="solr.TextField" >
       <analyzer>
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.PatternReplaceFilterFactory" pattern=" http://d-nb.info/standards/elementset/gnd#" replacement="" replace="all" />
       </analyzer>
     </fieldType>

I am still looking for a workaround for this case.

Thanks,
Sorin

Am 29.04.2016 um 08:43 schrieb Osma Suominen:
> Hi Sorin!
>
> Why do you need the type in the text index? The text index is designed 
> to store literals. It does not know how to handle URIs at all.
>
> Generally what you would do to combine text search with a restriction 
> on rdf:type is to use separate query patterns, e.g.
>
> {
>    ?s text:query 'nuclear' .
>    ?s a gndo:SeriesOfConferenceOrEvent .
> }
>
> -Osma
>
>
> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>> Hi Andy,
>>
>> I need just the type of the entry, from the example just the last part
>> 'SeriesOfConferenceOrEvent'.
>> If possible I would set an analyser which would trim the first part, but
>> I don't know how.
>>
>> Thanks
>> Sorin
>>
>>
>>
>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>> Hi Sorin,
>>>
>>> I'm curious as to why you are indexing a URI and what you see the
>>> benefit of that.  You might at least want to set the analyser 
>>> carefully.
>>>
>>>     Andy
>>>
>>> PS I fixed the cause of the "UnsupportedOperationException" but only
>>> in the sense that it now issues a warning and skips the non-literal.
>>> The test for being a literal or not was there ... but after calling
>>> getLiteral.
>>>
>>>
>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>> Hello,
>>>>
>>>> Jena text index returned the following error:
>>>>
>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>> java.lang.UnsupportedOperationException:
>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent is
>>>> not a literal node
>>>>          at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>          at
>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80) 
>>>>
>>>>
>>>>
>>>>          at
>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67) 
>>>>
>>>>
>>>>
>>>>          at jena.textindexer.exec(textindexer.java:122)
>>>>          at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>          at jena.textindexer.main(textindexer.java:51)
>>>>
>>>> when attempted to index entries like:
>>>>
>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>
>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>          gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>>>>          gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>> Non-Nuclear Weapon States" ;
>>>>          a gndo:SeriesOfConferenceOrEvent .
>>>>
>>>> Here is the EntityMap assembler setup:
>>>>
>>>> <#entMap> a text:EntityMap ;
>>>>      text:entityField      "gndUri" ;
>>>>      text:defaultField     "prefName" ; ## Must be defined in the
>>>> text:map
>>>>      text:map (
>>>>           [ text:field "prefName";
>>>>             text:predicate gndo:preferredNameForTheSubjectHeading
>>>>           ]
>>>>           [ text:field "type";
>>>>             text:predicate rdf:type
>>>>           ]
>>>>           ...
>>>>
>>>> 'type' contains an URL, but a literal node is expected instead.
>>>> There is no difference if 'type' is defined as 'text' or 'string' in
>>>> Solr schema.xml.
>>>>
>>>> How is possible to fix it?
>>>>
>>>> Thank you in advance,
>>>> Sorin
>>>
>>
>
>

-- 
Sorin Gheorghiu             Tel: +49 7531 88-3198
Universität Konstanz        Raum: B703
78464 Konstanz              sorin.gheorghiu@uni-konstanz.de

- KIM: Abteilung Contentdienste -

Re: Error during text index

Posted by Osma Suominen <os...@helsinki.fi>.

Hi Sorin!

Why do you need the type in the text index? The text index is designed 
to store literals. It does not know how to handle URIs at all.

Generally what you would do to combine text search with a restriction on 
rdf:type is to use separate query patterns, e.g.

{
    ?s text:query 'nuclear' .
    ?s a gndo:SeriesOfConferenceOrEvent .
}

-Osma


On 28/04/16 18:30, Sorin Gheorghiu wrote:
> Hi Andy,
>
> I need just the type of the entry, from the example just the last part
> 'SeriesOfConferenceOrEvent'.
> If possible I would set an analyser which would trim the first part, but
> I don't know how.
>
> Thanks
> Sorin
>
>
>
> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>> Hi Sorin,
>>
>> I'm curious as to why you are indexing a URI and what you see the
>> benefit of that.  You might at least want to set the analyser carefully.
>>
>>     Andy
>>
>> PS I fixed the cause of the "UnsupportedOperationException" but only
>> in the sense that it now issues a warning and skips the non-literal.
>> The test for being a literal or not was there ... but after calling
>> getLiteral.
>>
>>
>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>> Hello,
>>>
>>> Jena text index returned the following error:
>>>
>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>> java.lang.UnsupportedOperationException:
>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent is
>>> not a literal node
>>>          at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>          at
>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>>>
>>>
>>>          at
>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>>>
>>>
>>>          at jena.textindexer.exec(textindexer.java:122)
>>>          at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>          at jena.textindexer.main(textindexer.java:51)
>>>
>>> when attempted to index entries like:
>>>
>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>
>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>          gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>>>          gndo:preferredNameForTheConferenceOrEvent "Conference of
>>> Non-Nuclear Weapon States" ;
>>>          a gndo:SeriesOfConferenceOrEvent .
>>>
>>> Here is the EntityMap assembler setup:
>>>
>>> <#entMap> a text:EntityMap ;
>>>      text:entityField      "gndUri" ;
>>>      text:defaultField     "prefName" ; ## Must be defined in the
>>> text:map
>>>      text:map (
>>>           [ text:field "prefName";
>>>             text:predicate gndo:preferredNameForTheSubjectHeading
>>>           ]
>>>           [ text:field "type";
>>>             text:predicate rdf:type
>>>           ]
>>>           ...
>>>
>>> 'type' contains an URL, but a literal node is expected instead.
>>> There is no difference if 'type' is defined as 'text' or 'string' in
>>> Solr schema.xml.
>>>
>>> How is possible to fix it?
>>>
>>> Thank you in advance,
>>> Sorin
>>
>


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: Error during text index

Posted by Sorin Gheorghiu <so...@uni-konstanz.de>.

Hi Andy,

I need just the type of the entry, from the example just the last part 
'SeriesOfConferenceOrEvent'.
If possible I would set an analyser which would trim the first part, but 
I don't know how.

Thanks
Sorin



Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
> Hi Sorin,
>
> I'm curious as to why you are indexing a URI and what you see the 
> benefit of that.  You might at least want to set the analyser carefully.
>
>     Andy
>
> PS I fixed the cause of the "UnsupportedOperationException" but only 
> in the sense that it now issues a warning and skips the non-literal.  
> The test for being a literal or not was there ... but after calling 
> getLiteral.
>
>
> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>> Hello,
>>
>> Jena text index returned the following error:
>>
>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>> java.lang.UnsupportedOperationException:
>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent is
>> not a literal node
>>          at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>          at
>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80) 
>>
>>
>>          at
>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67) 
>>
>>
>>          at jena.textindexer.exec(textindexer.java:122)
>>          at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>          at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>          at jena.textindexer.main(textindexer.java:51)
>>
>> when attempted to index entries like:
>>
>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>
>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>          gndo:variantNameForTheConferenceOrEvent "Conferentie van
>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>>          gndo:preferredNameForTheConferenceOrEvent "Conference of
>> Non-Nuclear Weapon States" ;
>>          a gndo:SeriesOfConferenceOrEvent .
>>
>> Here is the EntityMap assembler setup:
>>
>> <#entMap> a text:EntityMap ;
>>      text:entityField      "gndUri" ;
>>      text:defaultField     "prefName" ; ## Must be defined in the 
>> text:map
>>      text:map (
>>           [ text:field "prefName";
>>             text:predicate gndo:preferredNameForTheSubjectHeading
>>           ]
>>           [ text:field "type";
>>             text:predicate rdf:type
>>           ]
>>           ...
>>
>> 'type' contains an URL, but a literal node is expected instead.
>> There is no difference if 'type' is defined as 'text' or 'string' in
>> Solr schema.xml.
>>
>> How is possible to fix it?
>>
>> Thank you in advance,
>> Sorin
>

-- 
Sorin Gheorghiu             Tel: +49 7531 88-3198
Universität Konstanz        Raum: B703
78464 Konstanz              sorin.gheorghiu@uni-konstanz.de

- KIM: Abteilung Contentdienste -

Re: Error during text index

Posted by Andy Seaborne <an...@apache.org>.

Hi Sorin,

I'm curious as to why you are indexing a URI and what you see the 
benefit of that.  You might at least want to set the analyser carefully.

	Andy

PS I fixed the cause of the "UnsupportedOperationException" but only in 
the sense that it now issues a warning and skips the non-literal.  The 
test for being a literal or not was there ... but after calling getLiteral.


On 28/04/16 15:47, Sorin Gheorghiu wrote:
> Hello,
>
> Jena text index returned the following error:
>
> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
> java.lang.UnsupportedOperationException:
> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent is
> not a literal node
>          at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>          at
> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>
>          at
> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>
>          at jena.textindexer.exec(textindexer.java:122)
>          at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>          at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>          at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>          at jena.textindexer.main(textindexer.java:51)
>
> when attempted to index entries like:
>
> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>
> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>          gndo:variantNameForTheConferenceOrEvent "Conferentie van
> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>          gndo:preferredNameForTheConferenceOrEvent "Conference of
> Non-Nuclear Weapon States" ;
>          a gndo:SeriesOfConferenceOrEvent .
>
> Here is the EntityMap assembler setup:
>
> <#entMap> a text:EntityMap ;
>      text:entityField      "gndUri" ;
>      text:defaultField     "prefName" ; ## Must be defined in the text:map
>      text:map (
>           [ text:field "prefName";
>             text:predicate gndo:preferredNameForTheSubjectHeading
>           ]
>           [ text:field "type";
>             text:predicate rdf:type
>           ]
>           ...
>
> 'type' contains an URL, but a literal node is expected instead.
> There is no difference if 'type' is defined as 'text' or 'string' in
> Solr schema.xml.
>
> How is possible to fix it?
>
> Thank you in advance,
> Sorin