You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Sorin Gheorghiu <so...@uni-konstanz.de> on 2016/04/28 16:47:24 UTC
Error during text index
Hello,
Jena text index returned the following error:
# java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
java.lang.UnsupportedOperationException:
http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent is
not a literal node
at org.apache.jena.graph.Node.getLiteral(Node.java:100)
at
org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
at
org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
at jena.textindexer.exec(textindexer.java:122)
at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
at jena.textindexer.main(textindexer.java:51)
when attempted to index entries like:
@prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
gndo:variantNameForTheConferenceOrEvent "Conferentie van
Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
gndo:preferredNameForTheConferenceOrEvent "Conference of
Non-Nuclear Weapon States" ;
a gndo:SeriesOfConferenceOrEvent .
Here is the EntityMap assembler setup:
<#entMap> a text:EntityMap ;
text:entityField "gndUri" ;
text:defaultField "prefName" ; ## Must be defined in the text:map
text:map (
[ text:field "prefName";
text:predicate gndo:preferredNameForTheSubjectHeading
]
[ text:field "type";
text:predicate rdf:type
]
...
'type' contains an URL, but a literal node is expected instead.
There is no difference if 'type' is defined as 'text' or 'string' in
Solr schema.xml.
How is possible to fix it?
Thank you in advance,
Sorin
Re: Error during text index
Posted by Sorin Gheorghiu <so...@uni-konstanz.de>.
Hi Andy,
thank you for your suggestion, I will use this workaround.
Regards,
Sorin
Am 29.04.2016 um 12:20 schrieb Andy Seaborne:
> The use of rdf:type seems to mix being a displayable label and a class
> type.
>
> Maybe adding skos:prefLabel to keep the display label is worth doing.
>
> You can extract the fragment from a URI with:
>
> STRAFTER(STR(<http://example/foo#bar>), "#")'
>
> (untested):
>
> INSERT { ?s skos:prefLabel ?label }
> WHERE {
> ?s a ?T .
> BIND ( ?label as STRAFTER(STR(?T), "#")
> }
>
>
> On 29/04/16 09:25, Sorin Gheorghiu wrote:
>> Hi Osma,
>>
>> I do need the type in the text index to get faster results than using
>> sparql queries.
>>
>> I found an analyzer which could replace the URI with the string type,
>> but I cannot use it as long as the non-literal are skiped.
>>
>> <fieldType name="text_type_gnd" class="solr.TextField" >
>> <analyzer>
>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> <filter class="solr.PatternReplaceFilterFactory" pattern="
>> http://d-nb.info/standards/elementset/gnd#" replacement=""
>> replace="all" />
>> </analyzer>
>> </fieldType>
>>
>> I am still looking for a workaround for this case.
>>
>> Thanks,
>> Sorin
>>
>> Am 29.04.2016 um 08:43 schrieb Osma Suominen:
>>> Hi Sorin!
>>>
>>> Why do you need the type in the text index? The text index is designed
>>> to store literals. It does not know how to handle URIs at all.
>>>
>>> Generally what you would do to combine text search with a restriction
>>> on rdf:type is to use separate query patterns, e.g.
>>>
>>> {
>>> ?s text:query 'nuclear' .
>>> ?s a gndo:SeriesOfConferenceOrEvent .
>>> }
>>>
>>> -Osma
>>>
>>>
>>> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>>>> Hi Andy,
>>>>
>>>> I need just the type of the entry, from the example just the last part
>>>> 'SeriesOfConferenceOrEvent'.
>>>> If possible I would set an analyser which would trim the first
>>>> part, but
>>>> I don't know how.
>>>>
>>>> Thanks
>>>> Sorin
>>>>
>>>>
>>>>
>>>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>>>> Hi Sorin,
>>>>>
>>>>> I'm curious as to why you are indexing a URI and what you see the
>>>>> benefit of that. You might at least want to set the analyser
>>>>> carefully.
>>>>>
>>>>> Andy
>>>>>
>>>>> PS I fixed the cause of the "UnsupportedOperationException" but only
>>>>> in the sense that it now issues a warning and skips the non-literal.
>>>>> The test for being a literal or not was there ... but after calling
>>>>> getLiteral.
>>>>>
>>>>>
>>>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Jena text index returned the following error:
>>>>>>
>>>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>>>> java.lang.UnsupportedOperationException:
>>>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent
>>>>>> is
>>>>>> not a literal node
>>>>>> at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>>> at
>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> at
>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> at jena.textindexer.exec(textindexer.java:122)
>>>>>> at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>>> at jena.textindexer.main(textindexer.java:51)
>>>>>>
>>>>>> when attempted to index entries like:
>>>>>>
>>>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>>
>>>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>>> gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>>>>>> gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>>>> Non-Nuclear Weapon States" ;
>>>>>> a gndo:SeriesOfConferenceOrEvent .
>>>>>>
>>>>>> Here is the EntityMap assembler setup:
>>>>>>
>>>>>> <#entMap> a text:EntityMap ;
>>>>>> text:entityField "gndUri" ;
>>>>>> text:defaultField "prefName" ; ## Must be defined in the
>>>>>> text:map
>>>>>> text:map (
>>>>>> [ text:field "prefName";
>>>>>> text:predicate gndo:preferredNameForTheSubjectHeading
>>>>>> ]
>>>>>> [ text:field "type";
>>>>>> text:predicate rdf:type
>>>>>> ]
>>>>>> ...
>>>>>>
>>>>>> 'type' contains an URL, but a literal node is expected instead.
>>>>>> There is no difference if 'type' is defined as 'text' or 'string' in
>>>>>> Solr schema.xml.
>>>>>>
>>>>>> How is possible to fix it?
>>>>>>
>>>>>> Thank you in advance,
>>>>>> Sorin
>>>>>
>>>>
>>>
>>>
>>
>
--
Sorin Gheorghiu Tel: +49 7531 88-3198
Universität Konstanz Raum: B703
78464 Konstanz sorin.gheorghiu@uni-konstanz.de
- KIM: Abteilung Contentdienste -
Re: Fwd: Re: Error during text index
Posted by Andy Seaborne <an...@apache.org>.
On 13/05/16 09:45, Sorin Gheorghiu wrote:
> Hi Andy,
>
> I found on the server a coredump reporting insufficient memory for the
> JRE (see attach).
> It is weird, fuseki allocates a 32Gb maximum Java heap size, but it uses
> only 16Gb:
>
> #
> java -Xmx32G -jar fuseki-server.jar --update --config=/etc/default/fuseki/config.ttl
-Xmx32G will not help. In fact it will slow the system down a bit.
TDB keeps cached files outside the heap.
> As well *ulimit -c unlimited* didn't change anything. Do you have any
> idea what could restrict more memory to use?
This is not a heap issue. The JVM crashed due to lack of system memory
trying to map a file on a 64 bitmachine,
As the size is 2,047,868,928 I'm not sure it's TDB - TDB uses 8M
increments. Maybe Lucene is using mmap files.
There is a list of possible reasons in the hs_err. The JVM should never
crash.
"ulimit -c unlimited" only affect core dumps.
There may be a system limit on memory mapped areas. I'm not sure which
ulimit flag this is. Set all the size ones to "unlimited" (but the OS
may ignore that).
> As regards the fuseki server log, I added the file *log4j.properties *to
> /opt/apache-jena-fuseki-2.3.1/run, but after a fuseki restart no logs
> were generated (expected is a logfile like
> /etc/fuseki/logs/fuseki.log?). Do I have to set up anything more?
The server logs by default. You may be not enabling the right logs -
set them all to INFO.
Logs come out to stdout ("fuseki-server") or to /logs/ ("fuseki"
service) depending on how you run the server.
Andy
>
> Thank you,
> Sorin
Andy
Re: Fwd: Re: Error during text index
Posted by Sorin Gheorghiu <so...@uni-konstanz.de>.
Hi Andy,
I found on the server a coredump reporting insufficient memory for the
JRE (see attach).
It is weird, fuseki allocates a 32Gb maximum Java heap size, but it uses
only 16Gb:
#
java -Xmx32G -jar fuseki-server.jar --update --config=/etc/default/fuseki/config.ttl
As well *ulimit -c unlimited* didn't change anything. Do you have any
idea what could restrict more memory to use?
As regards the fuseki server log, I added the file *log4j.properties *to
/opt/apache-jena-fuseki-2.3.1/run, but after a fuseki restart no logs
were generated (expected is a logfile like
/etc/fuseki/logs/fuseki.log?). Do I have to set up anything more?
Thank you,
Sorin
Am 12.05.2016 um 23:44 schrieb Andy Seaborne:
> Hi,
>
> It's not clear to me what's happening. The server log may offer some
> more information. It's as if the response in truncated somehow.
>
> You could try using curl or wget to make the request. They can also
> print out the HTTP headers.
>
> Andy
>
>
>
> On 12/05/16 19:43, Sorin Gheorghiu wrote:
>> Hi,
>>
>> the attempt to perform a sparql insert using *s-update* has failed with
>> the error:
>>
>> # /opt/apache-jena-fuseki-2.3.1/bin/s-update
>> --service=http://localhost:3030/<dataset>/update --file update.ru
>>
>> /usr/lib/ruby/1.9.1/net/protocol.rb:141:in `read_nonblock': end of file
>> reached (EOFError)
>> from /usr/lib/ruby/1.9.1/net/protocol.rb:141:in `rbuf_fill'
>> from /usr/lib/ruby/1.9.1/net/protocol.rb:122:in `readuntil'
>> from /usr/lib/ruby/1.9.1/net/protocol.rb:132:in `readline'
>> from /usr/lib/ruby/1.9.1/net/http.rb:2563:in `read_status_line'
>> from /usr/lib/ruby/1.9.1/net/http.rb:2552:in `read_new'
>> from /usr/lib/ruby/1.9.1/net/http.rb:1320:in `block in
>> transport_request'
>> from /usr/lib/ruby/1.9.1/net/http.rb:1317:in `catch'
>> from /usr/lib/ruby/1.9.1/net/http.rb:1317:in
>> `transport_request'
>> from /usr/lib/ruby/1.9.1/net/http.rb:1294:in `request'
>> from /opt/apache-jena-fuseki-2.3.1/bin/s-update:221:in
>> `response_no_body'
>> from /opt/apache-jena-fuseki-2.3.1/bin/s-update:614:in
>> `SPARQL_update'
>> from /opt/apache-jena-fuseki-2.3.1/bin/s-update:681:in
>> `cmd_sparql_update'
>> from /opt/apache-jena-fuseki-2.3.1/bin/s-update:708:in `<main>'
>>
>> The same error will occur with ruby > 2.0 (but no backtrace printed
>> out):
>>
>> /opt/apache-jena-fuseki-2.3.1/bin/s-update: end of file reached
>> (EOFError)
>>
>> Do you have any hit, please?
>>
>> Thanks
>> Sorin
>>
>> Am 04.05.2016 um 14:54 schrieb Andy Seaborne:
>>> Hi there,
>>>
>>> This looks like something to do with the solr setup. I'm not very
>>> familiar with solr, is there some configuration that affects timeouts
>>> on connections? I don't think Jena does any timeouts itself.
>>>
>>> Andy
>>>
>>> On 03/05/16 08:50, Sorin Gheorghiu wrote:
>>>> After Solr server restart, it looks like the indexes aren't corrupted.
>>>> Thus, it seems the error isn't critical and I may ignore it.
>>>>
>>>> But my expectation was that the insert command will add the new
>>>> parameter to Jena TDB and not to Solr.
>>>>
>>>>
>>>> -------- Weitergeleitete Nachricht --------
>>>> Betreff: Re: Error during text index
>>>> Datum: Mon, 2 May 2016 20:05:37 +0200
>>>> Von: Sorin Gheorghiu <so...@uni-konstanz.de>
>>>> An: users@jena.apache.org
>>>>
>>>>
>>>>
>>>> Hi Andy,
>>>>
>>>> after 2 attempts to insert the new SKOS variable, I got the following
>>>> error:
>>>>
>>>> org.apache.jena.query.text.TextIndexException:
>>>> org.apache.solr.client.solrj.SolrServerException: IOException occured
>>>> when talking to server at: http://localhost:8983/solr/GND100316_550
>>>> ...............................................................................................................................
>>>>
>>>>
>>>>
>>>>
>>>> [2016-05-02 19:23:40] Fuseki INFO [4] 500
>>>> org.apache.solr.client.solrj.SolrServerException: IOException occured
>>>> when talking to server at: http://localhost:8983/solr/GND100316_550
>>>> (30,147.934 s)
>>>>
>>>> This occured after more than 8 hours, but it failed before the
>>>> completion.
>>>>
>>>> No related Solr error was printed out in the logs in that moment, but
>>>> when I refreshed the Solr page http://localhost:8983/solr/#/~cores,
>>>> then
>>>> I got:
>>>>
>>>> 30852656 INFO (qtp1013423070-18) [ ] o.a.s.s.HttpSolrCall [admin]
>>>> webapp=null path=/admin/info/system params={wt=json&_=1462210386319}
>>>> status=0 QTime=1758
>>>> 30854518 ERROR (qtp1013423070-20) [ ] o.a.s.h.RequestHandlerBase
>>>> org.apache.solr.common.SolrException: Error handling 'status' action
>>>> ...............................................................................................................................
>>>>
>>>>
>>>>
>>>> Caused by: java.nio.file.NoSuchFileException:
>>>> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_1
>>>>
>>>> Indeed, there is no *segments_1* file in ../data/index/ but a
>>>> different
>>>> one:
>>>>
>>>> # ls -lrt
>>>> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments*
>>>> -rw-r--r-- 1 root root 937 May 2 17:42
>>>> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_10r
>>>>
>>>> I could provide the backtrace if needed. Could you help me to
>>>> understand
>>>> the root cause please?
>>>>
>>>> Thank you
>>>> Sorin
>>>>
>>>>
>>>> Am 29.04.2016 um 12:20 schrieb Andy Seaborne:
>>>>> The use of rdf:type seems to mix being a displayable label and a
>>>>> class
>>>>> type.
>>>>>
>>>>> Maybe adding skos:prefLabel to keep the display label is worth doing.
>>>>>
>>>>> You can extract the fragment from a URI with:
>>>>>
>>>>> STRAFTER(STR(<http://example/foo#bar>), "#")'
>>>>>
>>>>> (untested):
>>>>>
>>>>> INSERT { ?s skos:prefLabel ?label }
>>>>> WHERE {
>>>>> ?s a ?T .
>>>>> BIND ( ?label as STRAFTER(STR(?T), "#")
>>>>> }
>>>>>
>>>>>
>>>>> On 29/04/16 09:25, Sorin Gheorghiu wrote:
>>>>>> Hi Osma,
>>>>>>
>>>>>> I do need the type in the text index to get faster results than
>>>>>> using
>>>>>> sparql queries.
>>>>>>
>>>>>> I found an analyzer which could replace the URI with the string
>>>>>> type,
>>>>>> but I cannot use it as long as the non-literal are skiped.
>>>>>>
>>>>>> <fieldType name="text_type_gnd" class="solr.TextField" >
>>>>>> <analyzer>
>>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>> <filter class="solr.PatternReplaceFilterFactory" pattern="
>>>>>> http://d-nb.info/standards/elementset/gnd#" replacement=""
>>>>>> replace="all" />
>>>>>> </analyzer>
>>>>>> </fieldType>
>>>>>>
>>>>>> I am still looking for a workaround for this case.
>>>>>>
>>>>>> Thanks,
>>>>>> Sorin
>>>>>>
>>>>>> Am 29.04.2016 um 08:43 schrieb Osma Suominen:
>>>>>>> Hi Sorin!
>>>>>>>
>>>>>>> Why do you need the type in the text index? The text index is
>>>>>>> designed
>>>>>>> to store literals. It does not know how to handle URIs at all.
>>>>>>>
>>>>>>> Generally what you would do to combine text search with a
>>>>>>> restriction
>>>>>>> on rdf:type is to use separate query patterns, e.g.
>>>>>>>
>>>>>>> {
>>>>>>> ?s text:query 'nuclear' .
>>>>>>> ?s a gndo:SeriesOfConferenceOrEvent .
>>>>>>> }
>>>>>>>
>>>>>>> -Osma
>>>>>>>
>>>>>>>
>>>>>>> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>>>>>>>> Hi Andy,
>>>>>>>>
>>>>>>>> I need just the type of the entry, from the example just the last
>>>>>>>> part
>>>>>>>> 'SeriesOfConferenceOrEvent'.
>>>>>>>> If possible I would set an analyser which would trim the first
>>>>>>>> part, but
>>>>>>>> I don't know how.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Sorin
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>>>>>>>> Hi Sorin,
>>>>>>>>>
>>>>>>>>> I'm curious as to why you are indexing a URI and what you see the
>>>>>>>>> benefit of that. You might at least want to set the analyser
>>>>>>>>> carefully.
>>>>>>>>>
>>>>>>>>> Andy
>>>>>>>>>
>>>>>>>>> PS I fixed the cause of the "UnsupportedOperationException" but
>>>>>>>>> only
>>>>>>>>> in the sense that it now issues a warning and skips the
>>>>>>>>> non-literal.
>>>>>>>>> The test for being a literal or not was there ... but after
>>>>>>>>> calling
>>>>>>>>> getLiteral.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> Jena text index returned the following error:
>>>>>>>>>>
>>>>>>>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>>>>>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>>>>>>>> java.lang.UnsupportedOperationException:
>>>>>>>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> is
>>>>>>>>>> not a literal node
>>>>>>>>>> at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>>>>>>> at
>>>>>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> at
>>>>>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> at jena.textindexer.exec(textindexer.java:122)
>>>>>>>>>> at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>>>>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>>>>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>>>>>>> at jena.textindexer.main(textindexer.java:51)
>>>>>>>>>>
>>>>>>>>>> when attempted to index entries like:
>>>>>>>>>>
>>>>>>>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>>>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>>>>>>
>>>>>>>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>>>>>>> gndo:variantNameForTheConferenceOrEvent "Conferentie
>>>>>>>>>> van
>>>>>>>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon
>>>>>>>>>> States" ;
>>>>>>>>>> gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>>>>>>>> Non-Nuclear Weapon States" ;
>>>>>>>>>> a gndo:SeriesOfConferenceOrEvent .
>>>>>>>>>>
>>>>>>>>>> Here is the EntityMap assembler setup:
>>>>>>>>>>
>>>>>>>>>> <#entMap> a text:EntityMap ;
>>>>>>>>>> text:entityField "gndUri" ;
>>>>>>>>>> text:defaultField "prefName" ; ## Must be defined in
>>>>>>>>>> the
>>>>>>>>>> text:map
>>>>>>>>>> text:map (
>>>>>>>>>> [ text:field "prefName";
>>>>>>>>>> text:predicate
>>>>>>>>>> gndo:preferredNameForTheSubjectHeading
>>>>>>>>>> ]
>>>>>>>>>> [ text:field "type";
>>>>>>>>>> text:predicate rdf:type
>>>>>>>>>> ]
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>> 'type' contains an URL, but a literal node is expected instead.
>>>>>>>>>> There is no difference if 'type' is defined as 'text' or
>>>>>>>>>> 'string' in
>>>>>>>>>> Solr schema.xml.
>>>>>>>>>>
>>>>>>>>>> How is possible to fix it?
>>>>>>>>>>
>>>>>>>>>> Thank you in advance,
>>>>>>>>>> Sorin
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
--
Sorin Gheorghiu Tel: +49 7531 88-3198
Universit�t Konstanz Raum: B703
78464 Konstanz sorin.gheorghiu@uni-konstanz.de
- KIM: Abteilung Contentdienste -
Re: Fwd: Re: Error during text index
Posted by Andy Seaborne <an...@apache.org>.
Hi,
It's not clear to me what's happening. The server log may offer some
more information. It's as if the response in truncated somehow.
You could try using curl or wget to make the request. They can also
print out the HTTP headers.
Andy
On 12/05/16 19:43, Sorin Gheorghiu wrote:
> Hi,
>
> the attempt to perform a sparql insert using *s-update* has failed with
> the error:
>
> # /opt/apache-jena-fuseki-2.3.1/bin/s-update
> --service=http://localhost:3030/<dataset>/update --file update.ru
>
> /usr/lib/ruby/1.9.1/net/protocol.rb:141:in `read_nonblock': end of file
> reached (EOFError)
> from /usr/lib/ruby/1.9.1/net/protocol.rb:141:in `rbuf_fill'
> from /usr/lib/ruby/1.9.1/net/protocol.rb:122:in `readuntil'
> from /usr/lib/ruby/1.9.1/net/protocol.rb:132:in `readline'
> from /usr/lib/ruby/1.9.1/net/http.rb:2563:in `read_status_line'
> from /usr/lib/ruby/1.9.1/net/http.rb:2552:in `read_new'
> from /usr/lib/ruby/1.9.1/net/http.rb:1320:in `block in
> transport_request'
> from /usr/lib/ruby/1.9.1/net/http.rb:1317:in `catch'
> from /usr/lib/ruby/1.9.1/net/http.rb:1317:in `transport_request'
> from /usr/lib/ruby/1.9.1/net/http.rb:1294:in `request'
> from /opt/apache-jena-fuseki-2.3.1/bin/s-update:221:in
> `response_no_body'
> from /opt/apache-jena-fuseki-2.3.1/bin/s-update:614:in
> `SPARQL_update'
> from /opt/apache-jena-fuseki-2.3.1/bin/s-update:681:in
> `cmd_sparql_update'
> from /opt/apache-jena-fuseki-2.3.1/bin/s-update:708:in `<main>'
>
> The same error will occur with ruby > 2.0 (but no backtrace printed out):
>
> /opt/apache-jena-fuseki-2.3.1/bin/s-update: end of file reached (EOFError)
>
> Do you have any hit, please?
>
> Thanks
> Sorin
>
> Am 04.05.2016 um 14:54 schrieb Andy Seaborne:
>> Hi there,
>>
>> This looks like something to do with the solr setup. I'm not very
>> familiar with solr, is there some configuration that affects timeouts
>> on connections? I don't think Jena does any timeouts itself.
>>
>> Andy
>>
>> On 03/05/16 08:50, Sorin Gheorghiu wrote:
>>> After Solr server restart, it looks like the indexes aren't corrupted.
>>> Thus, it seems the error isn't critical and I may ignore it.
>>>
>>> But my expectation was that the insert command will add the new
>>> parameter to Jena TDB and not to Solr.
>>>
>>>
>>> -------- Weitergeleitete Nachricht --------
>>> Betreff: Re: Error during text index
>>> Datum: Mon, 2 May 2016 20:05:37 +0200
>>> Von: Sorin Gheorghiu <so...@uni-konstanz.de>
>>> An: users@jena.apache.org
>>>
>>>
>>>
>>> Hi Andy,
>>>
>>> after 2 attempts to insert the new SKOS variable, I got the following
>>> error:
>>>
>>> org.apache.jena.query.text.TextIndexException:
>>> org.apache.solr.client.solrj.SolrServerException: IOException occured
>>> when talking to server at: http://localhost:8983/solr/GND100316_550
>>> ...............................................................................................................................
>>>
>>>
>>>
>>> [2016-05-02 19:23:40] Fuseki INFO [4] 500
>>> org.apache.solr.client.solrj.SolrServerException: IOException occured
>>> when talking to server at: http://localhost:8983/solr/GND100316_550
>>> (30,147.934 s)
>>>
>>> This occured after more than 8 hours, but it failed before the
>>> completion.
>>>
>>> No related Solr error was printed out in the logs in that moment, but
>>> when I refreshed the Solr page http://localhost:8983/solr/#/~cores, then
>>> I got:
>>>
>>> 30852656 INFO (qtp1013423070-18) [ ] o.a.s.s.HttpSolrCall [admin]
>>> webapp=null path=/admin/info/system params={wt=json&_=1462210386319}
>>> status=0 QTime=1758
>>> 30854518 ERROR (qtp1013423070-20) [ ] o.a.s.h.RequestHandlerBase
>>> org.apache.solr.common.SolrException: Error handling 'status' action
>>> ...............................................................................................................................
>>>
>>>
>>> Caused by: java.nio.file.NoSuchFileException:
>>> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_1
>>>
>>> Indeed, there is no *segments_1* file in ../data/index/ but a different
>>> one:
>>>
>>> # ls -lrt /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments*
>>> -rw-r--r-- 1 root root 937 May 2 17:42
>>> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_10r
>>>
>>> I could provide the backtrace if needed. Could you help me to understand
>>> the root cause please?
>>>
>>> Thank you
>>> Sorin
>>>
>>>
>>> Am 29.04.2016 um 12:20 schrieb Andy Seaborne:
>>>> The use of rdf:type seems to mix being a displayable label and a class
>>>> type.
>>>>
>>>> Maybe adding skos:prefLabel to keep the display label is worth doing.
>>>>
>>>> You can extract the fragment from a URI with:
>>>>
>>>> STRAFTER(STR(<http://example/foo#bar>), "#")'
>>>>
>>>> (untested):
>>>>
>>>> INSERT { ?s skos:prefLabel ?label }
>>>> WHERE {
>>>> ?s a ?T .
>>>> BIND ( ?label as STRAFTER(STR(?T), "#")
>>>> }
>>>>
>>>>
>>>> On 29/04/16 09:25, Sorin Gheorghiu wrote:
>>>>> Hi Osma,
>>>>>
>>>>> I do need the type in the text index to get faster results than using
>>>>> sparql queries.
>>>>>
>>>>> I found an analyzer which could replace the URI with the string type,
>>>>> but I cannot use it as long as the non-literal are skiped.
>>>>>
>>>>> <fieldType name="text_type_gnd" class="solr.TextField" >
>>>>> <analyzer>
>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>> <filter class="solr.PatternReplaceFilterFactory" pattern="
>>>>> http://d-nb.info/standards/elementset/gnd#" replacement=""
>>>>> replace="all" />
>>>>> </analyzer>
>>>>> </fieldType>
>>>>>
>>>>> I am still looking for a workaround for this case.
>>>>>
>>>>> Thanks,
>>>>> Sorin
>>>>>
>>>>> Am 29.04.2016 um 08:43 schrieb Osma Suominen:
>>>>>> Hi Sorin!
>>>>>>
>>>>>> Why do you need the type in the text index? The text index is
>>>>>> designed
>>>>>> to store literals. It does not know how to handle URIs at all.
>>>>>>
>>>>>> Generally what you would do to combine text search with a restriction
>>>>>> on rdf:type is to use separate query patterns, e.g.
>>>>>>
>>>>>> {
>>>>>> ?s text:query 'nuclear' .
>>>>>> ?s a gndo:SeriesOfConferenceOrEvent .
>>>>>> }
>>>>>>
>>>>>> -Osma
>>>>>>
>>>>>>
>>>>>> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>>>>>>> Hi Andy,
>>>>>>>
>>>>>>> I need just the type of the entry, from the example just the last
>>>>>>> part
>>>>>>> 'SeriesOfConferenceOrEvent'.
>>>>>>> If possible I would set an analyser which would trim the first
>>>>>>> part, but
>>>>>>> I don't know how.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Sorin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>>>>>>> Hi Sorin,
>>>>>>>>
>>>>>>>> I'm curious as to why you are indexing a URI and what you see the
>>>>>>>> benefit of that. You might at least want to set the analyser
>>>>>>>> carefully.
>>>>>>>>
>>>>>>>> Andy
>>>>>>>>
>>>>>>>> PS I fixed the cause of the "UnsupportedOperationException" but
>>>>>>>> only
>>>>>>>> in the sense that it now issues a warning and skips the
>>>>>>>> non-literal.
>>>>>>>> The test for being a literal or not was there ... but after calling
>>>>>>>> getLiteral.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Jena text index returned the following error:
>>>>>>>>>
>>>>>>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>>>>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>>>>>>> java.lang.UnsupportedOperationException:
>>>>>>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent
>>>>>>>>>
>>>>>>>>> is
>>>>>>>>> not a literal node
>>>>>>>>> at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>>>>>> at
>>>>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> at
>>>>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> at jena.textindexer.exec(textindexer.java:122)
>>>>>>>>> at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>>>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>>>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>>>>>> at jena.textindexer.main(textindexer.java:51)
>>>>>>>>>
>>>>>>>>> when attempted to index entries like:
>>>>>>>>>
>>>>>>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>>>>>
>>>>>>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>>>>>> gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>>>>>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon
>>>>>>>>> States" ;
>>>>>>>>> gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>>>>>>> Non-Nuclear Weapon States" ;
>>>>>>>>> a gndo:SeriesOfConferenceOrEvent .
>>>>>>>>>
>>>>>>>>> Here is the EntityMap assembler setup:
>>>>>>>>>
>>>>>>>>> <#entMap> a text:EntityMap ;
>>>>>>>>> text:entityField "gndUri" ;
>>>>>>>>> text:defaultField "prefName" ; ## Must be defined in the
>>>>>>>>> text:map
>>>>>>>>> text:map (
>>>>>>>>> [ text:field "prefName";
>>>>>>>>> text:predicate gndo:preferredNameForTheSubjectHeading
>>>>>>>>> ]
>>>>>>>>> [ text:field "type";
>>>>>>>>> text:predicate rdf:type
>>>>>>>>> ]
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> 'type' contains an URL, but a literal node is expected instead.
>>>>>>>>> There is no difference if 'type' is defined as 'text' or
>>>>>>>>> 'string' in
>>>>>>>>> Solr schema.xml.
>>>>>>>>>
>>>>>>>>> How is possible to fix it?
>>>>>>>>>
>>>>>>>>> Thank you in advance,
>>>>>>>>> Sorin
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Fwd: Re: Error during text index
Posted by Sorin Gheorghiu <so...@uni-konstanz.de>.
Hi,
the attempt to perform a sparql insert using *s-update* has failed with
the error:
# /opt/apache-jena-fuseki-2.3.1/bin/s-update
--service=http://localhost:3030/<dataset>/update --file update.ru
/usr/lib/ruby/1.9.1/net/protocol.rb:141:in `read_nonblock': end of file
reached (EOFError)
from /usr/lib/ruby/1.9.1/net/protocol.rb:141:in `rbuf_fill'
from /usr/lib/ruby/1.9.1/net/protocol.rb:122:in `readuntil'
from /usr/lib/ruby/1.9.1/net/protocol.rb:132:in `readline'
from /usr/lib/ruby/1.9.1/net/http.rb:2563:in `read_status_line'
from /usr/lib/ruby/1.9.1/net/http.rb:2552:in `read_new'
from /usr/lib/ruby/1.9.1/net/http.rb:1320:in `block in
transport_request'
from /usr/lib/ruby/1.9.1/net/http.rb:1317:in `catch'
from /usr/lib/ruby/1.9.1/net/http.rb:1317:in `transport_request'
from /usr/lib/ruby/1.9.1/net/http.rb:1294:in `request'
from /opt/apache-jena-fuseki-2.3.1/bin/s-update:221:in
`response_no_body'
from /opt/apache-jena-fuseki-2.3.1/bin/s-update:614:in
`SPARQL_update'
from /opt/apache-jena-fuseki-2.3.1/bin/s-update:681:in
`cmd_sparql_update'
from /opt/apache-jena-fuseki-2.3.1/bin/s-update:708:in `<main>'
The same error will occur with ruby > 2.0 (but no backtrace printed out):
/opt/apache-jena-fuseki-2.3.1/bin/s-update: end of file reached (EOFError)
Do you have any hit, please?
Thanks
Sorin
Am 04.05.2016 um 14:54 schrieb Andy Seaborne:
> Hi there,
>
> This looks like something to do with the solr setup. I'm not very
> familiar with solr, is there some configuration that affects timeouts
> on connections? I don't think Jena does any timeouts itself.
>
> Andy
>
> On 03/05/16 08:50, Sorin Gheorghiu wrote:
>> After Solr server restart, it looks like the indexes aren't corrupted.
>> Thus, it seems the error isn't critical and I may ignore it.
>>
>> But my expectation was that the insert command will add the new
>> parameter to Jena TDB and not to Solr.
>>
>>
>> -------- Weitergeleitete Nachricht --------
>> Betreff: Re: Error during text index
>> Datum: Mon, 2 May 2016 20:05:37 +0200
>> Von: Sorin Gheorghiu <so...@uni-konstanz.de>
>> An: users@jena.apache.org
>>
>>
>>
>> Hi Andy,
>>
>> after 2 attempts to insert the new SKOS variable, I got the following
>> error:
>>
>> org.apache.jena.query.text.TextIndexException:
>> org.apache.solr.client.solrj.SolrServerException: IOException occured
>> when talking to server at: http://localhost:8983/solr/GND100316_550
>> ...............................................................................................................................
>>
>>
>>
>> [2016-05-02 19:23:40] Fuseki INFO [4] 500
>> org.apache.solr.client.solrj.SolrServerException: IOException occured
>> when talking to server at: http://localhost:8983/solr/GND100316_550
>> (30,147.934 s)
>>
>> This occured after more than 8 hours, but it failed before the
>> completion.
>>
>> No related Solr error was printed out in the logs in that moment, but
>> when I refreshed the Solr page http://localhost:8983/solr/#/~cores, then
>> I got:
>>
>> 30852656 INFO (qtp1013423070-18) [ ] o.a.s.s.HttpSolrCall [admin]
>> webapp=null path=/admin/info/system params={wt=json&_=1462210386319}
>> status=0 QTime=1758
>> 30854518 ERROR (qtp1013423070-20) [ ] o.a.s.h.RequestHandlerBase
>> org.apache.solr.common.SolrException: Error handling 'status' action
>> ...............................................................................................................................
>>
>>
>> Caused by: java.nio.file.NoSuchFileException:
>> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_1
>>
>> Indeed, there is no *segments_1* file in ../data/index/ but a different
>> one:
>>
>> # ls -lrt /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments*
>> -rw-r--r-- 1 root root 937 May 2 17:42
>> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_10r
>>
>> I could provide the backtrace if needed. Could you help me to understand
>> the root cause please?
>>
>> Thank you
>> Sorin
>>
>>
>> Am 29.04.2016 um 12:20 schrieb Andy Seaborne:
>>> The use of rdf:type seems to mix being a displayable label and a class
>>> type.
>>>
>>> Maybe adding skos:prefLabel to keep the display label is worth doing.
>>>
>>> You can extract the fragment from a URI with:
>>>
>>> STRAFTER(STR(<http://example/foo#bar>), "#")'
>>>
>>> (untested):
>>>
>>> INSERT { ?s skos:prefLabel ?label }
>>> WHERE {
>>> ?s a ?T .
>>> BIND ( ?label as STRAFTER(STR(?T), "#")
>>> }
>>>
>>>
>>> On 29/04/16 09:25, Sorin Gheorghiu wrote:
>>>> Hi Osma,
>>>>
>>>> I do need the type in the text index to get faster results than using
>>>> sparql queries.
>>>>
>>>> I found an analyzer which could replace the URI with the string type,
>>>> but I cannot use it as long as the non-literal are skiped.
>>>>
>>>> <fieldType name="text_type_gnd" class="solr.TextField" >
>>>> <analyzer>
>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>> <filter class="solr.PatternReplaceFilterFactory" pattern="
>>>> http://d-nb.info/standards/elementset/gnd#" replacement=""
>>>> replace="all" />
>>>> </analyzer>
>>>> </fieldType>
>>>>
>>>> I am still looking for a workaround for this case.
>>>>
>>>> Thanks,
>>>> Sorin
>>>>
>>>> Am 29.04.2016 um 08:43 schrieb Osma Suominen:
>>>>> Hi Sorin!
>>>>>
>>>>> Why do you need the type in the text index? The text index is
>>>>> designed
>>>>> to store literals. It does not know how to handle URIs at all.
>>>>>
>>>>> Generally what you would do to combine text search with a restriction
>>>>> on rdf:type is to use separate query patterns, e.g.
>>>>>
>>>>> {
>>>>> ?s text:query 'nuclear' .
>>>>> ?s a gndo:SeriesOfConferenceOrEvent .
>>>>> }
>>>>>
>>>>> -Osma
>>>>>
>>>>>
>>>>> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>>>>>> Hi Andy,
>>>>>>
>>>>>> I need just the type of the entry, from the example just the last
>>>>>> part
>>>>>> 'SeriesOfConferenceOrEvent'.
>>>>>> If possible I would set an analyser which would trim the first
>>>>>> part, but
>>>>>> I don't know how.
>>>>>>
>>>>>> Thanks
>>>>>> Sorin
>>>>>>
>>>>>>
>>>>>>
>>>>>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>>>>>> Hi Sorin,
>>>>>>>
>>>>>>> I'm curious as to why you are indexing a URI and what you see the
>>>>>>> benefit of that. You might at least want to set the analyser
>>>>>>> carefully.
>>>>>>>
>>>>>>> Andy
>>>>>>>
>>>>>>> PS I fixed the cause of the "UnsupportedOperationException" but
>>>>>>> only
>>>>>>> in the sense that it now issues a warning and skips the
>>>>>>> non-literal.
>>>>>>> The test for being a literal or not was there ... but after calling
>>>>>>> getLiteral.
>>>>>>>
>>>>>>>
>>>>>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Jena text index returned the following error:
>>>>>>>>
>>>>>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>>>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>>>>>> java.lang.UnsupportedOperationException:
>>>>>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent
>>>>>>>>
>>>>>>>> is
>>>>>>>> not a literal node
>>>>>>>> at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>>>>> at
>>>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> at jena.textindexer.exec(textindexer.java:122)
>>>>>>>> at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>>>>> at jena.textindexer.main(textindexer.java:51)
>>>>>>>>
>>>>>>>> when attempted to index entries like:
>>>>>>>>
>>>>>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>>>>
>>>>>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>>>>> gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>>>>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon
>>>>>>>> States" ;
>>>>>>>> gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>>>>>> Non-Nuclear Weapon States" ;
>>>>>>>> a gndo:SeriesOfConferenceOrEvent .
>>>>>>>>
>>>>>>>> Here is the EntityMap assembler setup:
>>>>>>>>
>>>>>>>> <#entMap> a text:EntityMap ;
>>>>>>>> text:entityField "gndUri" ;
>>>>>>>> text:defaultField "prefName" ; ## Must be defined in the
>>>>>>>> text:map
>>>>>>>> text:map (
>>>>>>>> [ text:field "prefName";
>>>>>>>> text:predicate gndo:preferredNameForTheSubjectHeading
>>>>>>>> ]
>>>>>>>> [ text:field "type";
>>>>>>>> text:predicate rdf:type
>>>>>>>> ]
>>>>>>>> ...
>>>>>>>>
>>>>>>>> 'type' contains an URL, but a literal node is expected instead.
>>>>>>>> There is no difference if 'type' is defined as 'text' or
>>>>>>>> 'string' in
>>>>>>>> Solr schema.xml.
>>>>>>>>
>>>>>>>> How is possible to fix it?
>>>>>>>>
>>>>>>>> Thank you in advance,
>>>>>>>> Sorin
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
--
Sorin Gheorghiu Tel: +49 7531 88-3198
Universit�t Konstanz Raum: B703
78464 Konstanz sorin.gheorghiu@uni-konstanz.de
- KIM: Abteilung Contentdienste -
Re: Fwd: Re: Error during text index
Posted by Andy Seaborne <an...@apache.org>.
Hi there,
This looks like something to do with the solr setup. I'm not very
familiar with solr, is there some configuration that affects timeouts on
connections? I don't think Jena does any timeouts itself.
Andy
On 03/05/16 08:50, Sorin Gheorghiu wrote:
> After Solr server restart, it looks like the indexes aren't corrupted.
> Thus, it seems the error isn't critical and I may ignore it.
>
> But my expectation was that the insert command will add the new
> parameter to Jena TDB and not to Solr.
>
>
> -------- Weitergeleitete Nachricht --------
> Betreff: Re: Error during text index
> Datum: Mon, 2 May 2016 20:05:37 +0200
> Von: Sorin Gheorghiu <so...@uni-konstanz.de>
> An: users@jena.apache.org
>
>
>
> Hi Andy,
>
> after 2 attempts to insert the new SKOS variable, I got the following
> error:
>
> org.apache.jena.query.text.TextIndexException:
> org.apache.solr.client.solrj.SolrServerException: IOException occured
> when talking to server at: http://localhost:8983/solr/GND100316_550
> ...............................................................................................................................
>
>
> [2016-05-02 19:23:40] Fuseki INFO [4] 500
> org.apache.solr.client.solrj.SolrServerException: IOException occured
> when talking to server at: http://localhost:8983/solr/GND100316_550
> (30,147.934 s)
>
> This occured after more than 8 hours, but it failed before the completion.
>
> No related Solr error was printed out in the logs in that moment, but
> when I refreshed the Solr page http://localhost:8983/solr/#/~cores, then
> I got:
>
> 30852656 INFO (qtp1013423070-18) [ ] o.a.s.s.HttpSolrCall [admin]
> webapp=null path=/admin/info/system params={wt=json&_=1462210386319}
> status=0 QTime=1758
> 30854518 ERROR (qtp1013423070-20) [ ] o.a.s.h.RequestHandlerBase
> org.apache.solr.common.SolrException: Error handling 'status' action
> ...............................................................................................................................
>
> Caused by: java.nio.file.NoSuchFileException:
> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_1
>
> Indeed, there is no *segments_1* file in ../data/index/ but a different
> one:
>
> # ls -lrt /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments*
> -rw-r--r-- 1 root root 937 May 2 17:42
> /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_10r
>
> I could provide the backtrace if needed. Could you help me to understand
> the root cause please?
>
> Thank you
> Sorin
>
>
> Am 29.04.2016 um 12:20 schrieb Andy Seaborne:
>> The use of rdf:type seems to mix being a displayable label and a class
>> type.
>>
>> Maybe adding skos:prefLabel to keep the display label is worth doing.
>>
>> You can extract the fragment from a URI with:
>>
>> STRAFTER(STR(<http://example/foo#bar>), "#")'
>>
>> (untested):
>>
>> INSERT { ?s skos:prefLabel ?label }
>> WHERE {
>> ?s a ?T .
>> BIND ( ?label as STRAFTER(STR(?T), "#")
>> }
>>
>>
>> On 29/04/16 09:25, Sorin Gheorghiu wrote:
>>> Hi Osma,
>>>
>>> I do need the type in the text index to get faster results than using
>>> sparql queries.
>>>
>>> I found an analyzer which could replace the URI with the string type,
>>> but I cannot use it as long as the non-literal are skiped.
>>>
>>> <fieldType name="text_type_gnd" class="solr.TextField" >
>>> <analyzer>
>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>> <filter class="solr.PatternReplaceFilterFactory" pattern="
>>> http://d-nb.info/standards/elementset/gnd#" replacement=""
>>> replace="all" />
>>> </analyzer>
>>> </fieldType>
>>>
>>> I am still looking for a workaround for this case.
>>>
>>> Thanks,
>>> Sorin
>>>
>>> Am 29.04.2016 um 08:43 schrieb Osma Suominen:
>>>> Hi Sorin!
>>>>
>>>> Why do you need the type in the text index? The text index is designed
>>>> to store literals. It does not know how to handle URIs at all.
>>>>
>>>> Generally what you would do to combine text search with a restriction
>>>> on rdf:type is to use separate query patterns, e.g.
>>>>
>>>> {
>>>> ?s text:query 'nuclear' .
>>>> ?s a gndo:SeriesOfConferenceOrEvent .
>>>> }
>>>>
>>>> -Osma
>>>>
>>>>
>>>> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>>>>> Hi Andy,
>>>>>
>>>>> I need just the type of the entry, from the example just the last part
>>>>> 'SeriesOfConferenceOrEvent'.
>>>>> If possible I would set an analyser which would trim the first
>>>>> part, but
>>>>> I don't know how.
>>>>>
>>>>> Thanks
>>>>> Sorin
>>>>>
>>>>>
>>>>>
>>>>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>>>>> Hi Sorin,
>>>>>>
>>>>>> I'm curious as to why you are indexing a URI and what you see the
>>>>>> benefit of that. You might at least want to set the analyser
>>>>>> carefully.
>>>>>>
>>>>>> Andy
>>>>>>
>>>>>> PS I fixed the cause of the "UnsupportedOperationException" but only
>>>>>> in the sense that it now issues a warning and skips the non-literal.
>>>>>> The test for being a literal or not was there ... but after calling
>>>>>> getLiteral.
>>>>>>
>>>>>>
>>>>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> Jena text index returned the following error:
>>>>>>>
>>>>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>>>>> java.lang.UnsupportedOperationException:
>>>>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent
>>>>>>> is
>>>>>>> not a literal node
>>>>>>> at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>>>> at
>>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> at jena.textindexer.exec(textindexer.java:122)
>>>>>>> at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>>>> at jena.textindexer.main(textindexer.java:51)
>>>>>>>
>>>>>>> when attempted to index entries like:
>>>>>>>
>>>>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>>>
>>>>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>>>> gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>>>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>>>>>>> gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>>>>> Non-Nuclear Weapon States" ;
>>>>>>> a gndo:SeriesOfConferenceOrEvent .
>>>>>>>
>>>>>>> Here is the EntityMap assembler setup:
>>>>>>>
>>>>>>> <#entMap> a text:EntityMap ;
>>>>>>> text:entityField "gndUri" ;
>>>>>>> text:defaultField "prefName" ; ## Must be defined in the
>>>>>>> text:map
>>>>>>> text:map (
>>>>>>> [ text:field "prefName";
>>>>>>> text:predicate gndo:preferredNameForTheSubjectHeading
>>>>>>> ]
>>>>>>> [ text:field "type";
>>>>>>> text:predicate rdf:type
>>>>>>> ]
>>>>>>> ...
>>>>>>>
>>>>>>> 'type' contains an URL, but a literal node is expected instead.
>>>>>>> There is no difference if 'type' is defined as 'text' or 'string' in
>>>>>>> Solr schema.xml.
>>>>>>>
>>>>>>> How is possible to fix it?
>>>>>>>
>>>>>>> Thank you in advance,
>>>>>>> Sorin
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>
Fwd: Re: Error during text index
Posted by Sorin Gheorghiu <so...@uni-konstanz.de>.
After Solr server restart, it looks like the indexes aren't corrupted.
Thus, it seems the error isn't critical and I may ignore it.
But my expectation was that the insert command will add the new
parameter to Jena TDB and not to Solr.
-------- Weitergeleitete Nachricht --------
Betreff: Re: Error during text index
Datum: Mon, 2 May 2016 20:05:37 +0200
Von: Sorin Gheorghiu <so...@uni-konstanz.de>
An: users@jena.apache.org
Hi Andy,
after 2 attempts to insert the new SKOS variable, I got the following error:
org.apache.jena.query.text.TextIndexException:
org.apache.solr.client.solrj.SolrServerException: IOException occured
when talking to server at: http://localhost:8983/solr/GND100316_550
...............................................................................................................................
[2016-05-02 19:23:40] Fuseki INFO [4] 500
org.apache.solr.client.solrj.SolrServerException: IOException occured
when talking to server at: http://localhost:8983/solr/GND100316_550
(30,147.934 s)
This occured after more than 8 hours, but it failed before the completion.
No related Solr error was printed out in the logs in that moment, but
when I refreshed the Solr page http://localhost:8983/solr/#/~cores, then
I got:
30852656 INFO (qtp1013423070-18) [ ] o.a.s.s.HttpSolrCall [admin]
webapp=null path=/admin/info/system params={wt=json&_=1462210386319}
status=0 QTime=1758
30854518 ERROR (qtp1013423070-20) [ ] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Error handling 'status' action
...............................................................................................................................
Caused by: java.nio.file.NoSuchFileException:
/opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_1
Indeed, there is no *segments_1* file in ../data/index/ but a different one:
# ls -lrt /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments*
-rw-r--r-- 1 root root 937 May 2 17:42
/opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_10r
I could provide the backtrace if needed. Could you help me to understand
the root cause please?
Thank you
Sorin
Am 29.04.2016 um 12:20 schrieb Andy Seaborne:
> The use of rdf:type seems to mix being a displayable label and a class
> type.
>
> Maybe adding skos:prefLabel to keep the display label is worth doing.
>
> You can extract the fragment from a URI with:
>
> STRAFTER(STR(<http://example/foo#bar>), "#")'
>
> (untested):
>
> INSERT { ?s skos:prefLabel ?label }
> WHERE {
> ?s a ?T .
> BIND ( ?label as STRAFTER(STR(?T), "#")
> }
>
>
> On 29/04/16 09:25, Sorin Gheorghiu wrote:
>> Hi Osma,
>>
>> I do need the type in the text index to get faster results than using
>> sparql queries.
>>
>> I found an analyzer which could replace the URI with the string type,
>> but I cannot use it as long as the non-literal are skiped.
>>
>> <fieldType name="text_type_gnd" class="solr.TextField" >
>> <analyzer>
>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> <filter class="solr.PatternReplaceFilterFactory" pattern="
>> http://d-nb.info/standards/elementset/gnd#" replacement=""
>> replace="all" />
>> </analyzer>
>> </fieldType>
>>
>> I am still looking for a workaround for this case.
>>
>> Thanks,
>> Sorin
>>
>> Am 29.04.2016 um 08:43 schrieb Osma Suominen:
>>> Hi Sorin!
>>>
>>> Why do you need the type in the text index? The text index is designed
>>> to store literals. It does not know how to handle URIs at all.
>>>
>>> Generally what you would do to combine text search with a restriction
>>> on rdf:type is to use separate query patterns, e.g.
>>>
>>> {
>>> ?s text:query 'nuclear' .
>>> ?s a gndo:SeriesOfConferenceOrEvent .
>>> }
>>>
>>> -Osma
>>>
>>>
>>> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>>>> Hi Andy,
>>>>
>>>> I need just the type of the entry, from the example just the last part
>>>> 'SeriesOfConferenceOrEvent'.
>>>> If possible I would set an analyser which would trim the first
>>>> part, but
>>>> I don't know how.
>>>>
>>>> Thanks
>>>> Sorin
>>>>
>>>>
>>>>
>>>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>>>> Hi Sorin,
>>>>>
>>>>> I'm curious as to why you are indexing a URI and what you see the
>>>>> benefit of that. You might at least want to set the analyser
>>>>> carefully.
>>>>>
>>>>> Andy
>>>>>
>>>>> PS I fixed the cause of the "UnsupportedOperationException" but only
>>>>> in the sense that it now issues a warning and skips the non-literal.
>>>>> The test for being a literal or not was there ... but after calling
>>>>> getLiteral.
>>>>>
>>>>>
>>>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Jena text index returned the following error:
>>>>>>
>>>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>>>> java.lang.UnsupportedOperationException:
>>>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent
>>>>>> is
>>>>>> not a literal node
>>>>>> at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>>> at
>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> at
>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> at jena.textindexer.exec(textindexer.java:122)
>>>>>> at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>>> at jena.textindexer.main(textindexer.java:51)
>>>>>>
>>>>>> when attempted to index entries like:
>>>>>>
>>>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>>
>>>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>>> gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>>>>>> gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>>>> Non-Nuclear Weapon States" ;
>>>>>> a gndo:SeriesOfConferenceOrEvent .
>>>>>>
>>>>>> Here is the EntityMap assembler setup:
>>>>>>
>>>>>> <#entMap> a text:EntityMap ;
>>>>>> text:entityField "gndUri" ;
>>>>>> text:defaultField "prefName" ; ## Must be defined in the
>>>>>> text:map
>>>>>> text:map (
>>>>>> [ text:field "prefName";
>>>>>> text:predicate gndo:preferredNameForTheSubjectHeading
>>>>>> ]
>>>>>> [ text:field "type";
>>>>>> text:predicate rdf:type
>>>>>> ]
>>>>>> ...
>>>>>>
>>>>>> 'type' contains an URL, but a literal node is expected instead.
>>>>>> There is no difference if 'type' is defined as 'text' or 'string' in
>>>>>> Solr schema.xml.
>>>>>>
>>>>>> How is possible to fix it?
>>>>>>
>>>>>> Thank you in advance,
>>>>>> Sorin
>>>>>
>>>>
>>>
>>>
>>
>
--
Sorin Gheorghiu Tel: +49 7531 88-3198
Universit�t Konstanz Raum: B703
78464 Konstanzsorin.gheorghiu@uni-konstanz.de
- KIM: Abteilung Contentdienste -
Re: Error during text index
Posted by Sorin Gheorghiu <so...@uni-konstanz.de>.
Hi Andy,
after 2 attempts to insert the new SKOS variable, I got the following error:
org.apache.jena.query.text.TextIndexException:
org.apache.solr.client.solrj.SolrServerException: IOException occured
when talking to server at: http://localhost:8983/solr/GND100316_550
...............................................................................................................................
[2016-05-02 19:23:40] Fuseki INFO [4] 500
org.apache.solr.client.solrj.SolrServerException: IOException occured
when talking to server at: http://localhost:8983/solr/GND100316_550
(30,147.934 s)
This occured after more than 8 hours, but it failed before the completion.
No related Solr error was printed out in the logs in that moment, but
when I refreshed the Solr page http://localhost:8983/solr/#/~cores, then
I got:
30852656 INFO (qtp1013423070-18) [ ] o.a.s.s.HttpSolrCall [admin]
webapp=null path=/admin/info/system params={wt=json&_=1462210386319}
status=0 QTime=1758
30854518 ERROR (qtp1013423070-20) [ ] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Error handling 'status' action
...............................................................................................................................
Caused by: java.nio.file.NoSuchFileException:
/opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_1
Indeed, there is no *segments_1* file in ../data/index/ but a different one:
# ls -lrt /opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments*
-rw-r--r-- 1 root root 937 May 2 17:42
/opt/solr-5.5.0/server/solr/GND100316_550/data/index/segments_10r
I could provide the backtrace if needed. Could you help me to understand
the root cause please?
Thank you
Sorin
Am 29.04.2016 um 12:20 schrieb Andy Seaborne:
> The use of rdf:type seems to mix being a displayable label and a class
> type.
>
> Maybe adding skos:prefLabel to keep the display label is worth doing.
>
> You can extract the fragment from a URI with:
>
> STRAFTER(STR(<http://example/foo#bar>), "#")'
>
> (untested):
>
> INSERT { ?s skos:prefLabel ?label }
> WHERE {
> ?s a ?T .
> BIND ( ?label as STRAFTER(STR(?T), "#")
> }
>
>
> On 29/04/16 09:25, Sorin Gheorghiu wrote:
>> Hi Osma,
>>
>> I do need the type in the text index to get faster results than using
>> sparql queries.
>>
>> I found an analyzer which could replace the URI with the string type,
>> but I cannot use it as long as the non-literal are skiped.
>>
>> <fieldType name="text_type_gnd" class="solr.TextField" >
>> <analyzer>
>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> <filter class="solr.PatternReplaceFilterFactory" pattern="
>> http://d-nb.info/standards/elementset/gnd#" replacement=""
>> replace="all" />
>> </analyzer>
>> </fieldType>
>>
>> I am still looking for a workaround for this case.
>>
>> Thanks,
>> Sorin
>>
>> Am 29.04.2016 um 08:43 schrieb Osma Suominen:
>>> Hi Sorin!
>>>
>>> Why do you need the type in the text index? The text index is designed
>>> to store literals. It does not know how to handle URIs at all.
>>>
>>> Generally what you would do to combine text search with a restriction
>>> on rdf:type is to use separate query patterns, e.g.
>>>
>>> {
>>> ?s text:query 'nuclear' .
>>> ?s a gndo:SeriesOfConferenceOrEvent .
>>> }
>>>
>>> -Osma
>>>
>>>
>>> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>>>> Hi Andy,
>>>>
>>>> I need just the type of the entry, from the example just the last part
>>>> 'SeriesOfConferenceOrEvent'.
>>>> If possible I would set an analyser which would trim the first
>>>> part, but
>>>> I don't know how.
>>>>
>>>> Thanks
>>>> Sorin
>>>>
>>>>
>>>>
>>>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>>>> Hi Sorin,
>>>>>
>>>>> I'm curious as to why you are indexing a URI and what you see the
>>>>> benefit of that. You might at least want to set the analyser
>>>>> carefully.
>>>>>
>>>>> Andy
>>>>>
>>>>> PS I fixed the cause of the "UnsupportedOperationException" but only
>>>>> in the sense that it now issues a warning and skips the non-literal.
>>>>> The test for being a literal or not was there ... but after calling
>>>>> getLiteral.
>>>>>
>>>>>
>>>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Jena text index returned the following error:
>>>>>>
>>>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>>>> java.lang.UnsupportedOperationException:
>>>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent
>>>>>> is
>>>>>> not a literal node
>>>>>> at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>>> at
>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> at
>>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> at jena.textindexer.exec(textindexer.java:122)
>>>>>> at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>>> at jena.textindexer.main(textindexer.java:51)
>>>>>>
>>>>>> when attempted to index entries like:
>>>>>>
>>>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>>
>>>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>>> gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>>>>>> gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>>>> Non-Nuclear Weapon States" ;
>>>>>> a gndo:SeriesOfConferenceOrEvent .
>>>>>>
>>>>>> Here is the EntityMap assembler setup:
>>>>>>
>>>>>> <#entMap> a text:EntityMap ;
>>>>>> text:entityField "gndUri" ;
>>>>>> text:defaultField "prefName" ; ## Must be defined in the
>>>>>> text:map
>>>>>> text:map (
>>>>>> [ text:field "prefName";
>>>>>> text:predicate gndo:preferredNameForTheSubjectHeading
>>>>>> ]
>>>>>> [ text:field "type";
>>>>>> text:predicate rdf:type
>>>>>> ]
>>>>>> ...
>>>>>>
>>>>>> 'type' contains an URL, but a literal node is expected instead.
>>>>>> There is no difference if 'type' is defined as 'text' or 'string' in
>>>>>> Solr schema.xml.
>>>>>>
>>>>>> How is possible to fix it?
>>>>>>
>>>>>> Thank you in advance,
>>>>>> Sorin
>>>>>
>>>>
>>>
>>>
>>
>
--
Sorin Gheorghiu Tel: +49 7531 88-3198
Universit�t Konstanz Raum: B703
78464 Konstanz sorin.gheorghiu@uni-konstanz.de
- KIM: Abteilung Contentdienste -
Re: Error during text index
Posted by Andy Seaborne <an...@apache.org>.
The use of rdf:type seems to mix being a displayable label and a class type.
Maybe adding skos:prefLabel to keep the display label is worth doing.
You can extract the fragment from a URI with:
STRAFTER(STR(<http://example/foo#bar>), "#")'
(untested):
INSERT { ?s skos:prefLabel ?label }
WHERE {
?s a ?T .
BIND ( ?label as STRAFTER(STR(?T), "#")
}
On 29/04/16 09:25, Sorin Gheorghiu wrote:
> Hi Osma,
>
> I do need the type in the text index to get faster results than using
> sparql queries.
>
> I found an analyzer which could replace the URI with the string type,
> but I cannot use it as long as the non-literal are skiped.
>
> <fieldType name="text_type_gnd" class="solr.TextField" >
> <analyzer>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.PatternReplaceFilterFactory" pattern="
> http://d-nb.info/standards/elementset/gnd#" replacement="" replace="all" />
> </analyzer>
> </fieldType>
>
> I am still looking for a workaround for this case.
>
> Thanks,
> Sorin
>
> Am 29.04.2016 um 08:43 schrieb Osma Suominen:
>> Hi Sorin!
>>
>> Why do you need the type in the text index? The text index is designed
>> to store literals. It does not know how to handle URIs at all.
>>
>> Generally what you would do to combine text search with a restriction
>> on rdf:type is to use separate query patterns, e.g.
>>
>> {
>> ?s text:query 'nuclear' .
>> ?s a gndo:SeriesOfConferenceOrEvent .
>> }
>>
>> -Osma
>>
>>
>> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>>> Hi Andy,
>>>
>>> I need just the type of the entry, from the example just the last part
>>> 'SeriesOfConferenceOrEvent'.
>>> If possible I would set an analyser which would trim the first part, but
>>> I don't know how.
>>>
>>> Thanks
>>> Sorin
>>>
>>>
>>>
>>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>>> Hi Sorin,
>>>>
>>>> I'm curious as to why you are indexing a URI and what you see the
>>>> benefit of that. You might at least want to set the analyser
>>>> carefully.
>>>>
>>>> Andy
>>>>
>>>> PS I fixed the cause of the "UnsupportedOperationException" but only
>>>> in the sense that it now issues a warning and skips the non-literal.
>>>> The test for being a literal or not was there ... but after calling
>>>> getLiteral.
>>>>
>>>>
>>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>>> Hello,
>>>>>
>>>>> Jena text index returned the following error:
>>>>>
>>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>>> java.lang.UnsupportedOperationException:
>>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent is
>>>>> not a literal node
>>>>> at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>>> at
>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>>>>>
>>>>>
>>>>>
>>>>> at
>>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>>>>>
>>>>>
>>>>>
>>>>> at jena.textindexer.exec(textindexer.java:122)
>>>>> at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>>> at jena.textindexer.main(textindexer.java:51)
>>>>>
>>>>> when attempted to index entries like:
>>>>>
>>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>>
>>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>>> gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>>>>> gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>>> Non-Nuclear Weapon States" ;
>>>>> a gndo:SeriesOfConferenceOrEvent .
>>>>>
>>>>> Here is the EntityMap assembler setup:
>>>>>
>>>>> <#entMap> a text:EntityMap ;
>>>>> text:entityField "gndUri" ;
>>>>> text:defaultField "prefName" ; ## Must be defined in the
>>>>> text:map
>>>>> text:map (
>>>>> [ text:field "prefName";
>>>>> text:predicate gndo:preferredNameForTheSubjectHeading
>>>>> ]
>>>>> [ text:field "type";
>>>>> text:predicate rdf:type
>>>>> ]
>>>>> ...
>>>>>
>>>>> 'type' contains an URL, but a literal node is expected instead.
>>>>> There is no difference if 'type' is defined as 'text' or 'string' in
>>>>> Solr schema.xml.
>>>>>
>>>>> How is possible to fix it?
>>>>>
>>>>> Thank you in advance,
>>>>> Sorin
>>>>
>>>
>>
>>
>
Re: Error during text index
Posted by Sorin Gheorghiu <so...@uni-konstanz.de>.
Hi Osma,
I do need the type in the text index to get faster results than using
sparql queries.
I found an analyzer which could replace the URI with the string type,
but I cannot use it as long as the non-literal are skiped.
<fieldType name="text_type_gnd" class="solr.TextField" >
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern=" http://d-nb.info/standards/elementset/gnd#" replacement="" replace="all" />
</analyzer>
</fieldType>
I am still looking for a workaround for this case.
Thanks,
Sorin
Am 29.04.2016 um 08:43 schrieb Osma Suominen:
> Hi Sorin!
>
> Why do you need the type in the text index? The text index is designed
> to store literals. It does not know how to handle URIs at all.
>
> Generally what you would do to combine text search with a restriction
> on rdf:type is to use separate query patterns, e.g.
>
> {
> ?s text:query 'nuclear' .
> ?s a gndo:SeriesOfConferenceOrEvent .
> }
>
> -Osma
>
>
> On 28/04/16 18:30, Sorin Gheorghiu wrote:
>> Hi Andy,
>>
>> I need just the type of the entry, from the example just the last part
>> 'SeriesOfConferenceOrEvent'.
>> If possible I would set an analyser which would trim the first part, but
>> I don't know how.
>>
>> Thanks
>> Sorin
>>
>>
>>
>> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>>> Hi Sorin,
>>>
>>> I'm curious as to why you are indexing a URI and what you see the
>>> benefit of that. You might at least want to set the analyser
>>> carefully.
>>>
>>> Andy
>>>
>>> PS I fixed the cause of the "UnsupportedOperationException" but only
>>> in the sense that it now issues a warning and skips the non-literal.
>>> The test for being a literal or not was there ... but after calling
>>> getLiteral.
>>>
>>>
>>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>>> Hello,
>>>>
>>>> Jena text index returned the following error:
>>>>
>>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>>> java.lang.UnsupportedOperationException:
>>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent is
>>>> not a literal node
>>>> at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>>> at
>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>>>>
>>>>
>>>>
>>>> at
>>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>>>>
>>>>
>>>>
>>>> at jena.textindexer.exec(textindexer.java:122)
>>>> at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>>> at jena.textindexer.main(textindexer.java:51)
>>>>
>>>> when attempted to index entries like:
>>>>
>>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>>
>>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>>> gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>>>> gndo:preferredNameForTheConferenceOrEvent "Conference of
>>>> Non-Nuclear Weapon States" ;
>>>> a gndo:SeriesOfConferenceOrEvent .
>>>>
>>>> Here is the EntityMap assembler setup:
>>>>
>>>> <#entMap> a text:EntityMap ;
>>>> text:entityField "gndUri" ;
>>>> text:defaultField "prefName" ; ## Must be defined in the
>>>> text:map
>>>> text:map (
>>>> [ text:field "prefName";
>>>> text:predicate gndo:preferredNameForTheSubjectHeading
>>>> ]
>>>> [ text:field "type";
>>>> text:predicate rdf:type
>>>> ]
>>>> ...
>>>>
>>>> 'type' contains an URL, but a literal node is expected instead.
>>>> There is no difference if 'type' is defined as 'text' or 'string' in
>>>> Solr schema.xml.
>>>>
>>>> How is possible to fix it?
>>>>
>>>> Thank you in advance,
>>>> Sorin
>>>
>>
>
>
--
Sorin Gheorghiu Tel: +49 7531 88-3198
Universität Konstanz Raum: B703
78464 Konstanz sorin.gheorghiu@uni-konstanz.de
- KIM: Abteilung Contentdienste -
Re: Error during text index
Posted by Osma Suominen <os...@helsinki.fi>.
Hi Sorin!
Why do you need the type in the text index? The text index is designed
to store literals. It does not know how to handle URIs at all.
Generally what you would do to combine text search with a restriction on
rdf:type is to use separate query patterns, e.g.
{
?s text:query 'nuclear' .
?s a gndo:SeriesOfConferenceOrEvent .
}
-Osma
On 28/04/16 18:30, Sorin Gheorghiu wrote:
> Hi Andy,
>
> I need just the type of the entry, from the example just the last part
> 'SeriesOfConferenceOrEvent'.
> If possible I would set an analyser which would trim the first part, but
> I don't know how.
>
> Thanks
> Sorin
>
>
>
> Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
>> Hi Sorin,
>>
>> I'm curious as to why you are indexing a URI and what you see the
>> benefit of that. You might at least want to set the analyser carefully.
>>
>> Andy
>>
>> PS I fixed the cause of the "UnsupportedOperationException" but only
>> in the sense that it now issues a warning and skips the non-literal.
>> The test for being a literal or not was there ... but after calling
>> getLiteral.
>>
>>
>> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>>> Hello,
>>>
>>> Jena text index returned the following error:
>>>
>>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>>> java.lang.UnsupportedOperationException:
>>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent is
>>> not a literal node
>>> at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>>> at
>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>>>
>>>
>>> at
>>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>>>
>>>
>>> at jena.textindexer.exec(textindexer.java:122)
>>> at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>>> at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>>> at jena.textindexer.main(textindexer.java:51)
>>>
>>> when attempted to index entries like:
>>>
>>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>>
>>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>>> gndo:variantNameForTheConferenceOrEvent "Conferentie van
>>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>>> gndo:preferredNameForTheConferenceOrEvent "Conference of
>>> Non-Nuclear Weapon States" ;
>>> a gndo:SeriesOfConferenceOrEvent .
>>>
>>> Here is the EntityMap assembler setup:
>>>
>>> <#entMap> a text:EntityMap ;
>>> text:entityField "gndUri" ;
>>> text:defaultField "prefName" ; ## Must be defined in the
>>> text:map
>>> text:map (
>>> [ text:field "prefName";
>>> text:predicate gndo:preferredNameForTheSubjectHeading
>>> ]
>>> [ text:field "type";
>>> text:predicate rdf:type
>>> ]
>>> ...
>>>
>>> 'type' contains an URL, but a literal node is expected instead.
>>> There is no difference if 'type' is defined as 'text' or 'string' in
>>> Solr schema.xml.
>>>
>>> How is possible to fix it?
>>>
>>> Thank you in advance,
>>> Sorin
>>
>
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi
Re: Error during text index
Posted by Sorin Gheorghiu <so...@uni-konstanz.de>.
Hi Andy,
I need just the type of the entry, from the example just the last part
'SeriesOfConferenceOrEvent'.
If possible I would set an analyser which would trim the first part, but
I don't know how.
Thanks
Sorin
Am 28.04.2016 um 17:25 schrieb Andy Seaborne:
> Hi Sorin,
>
> I'm curious as to why you are indexing a URI and what you see the
> benefit of that. You might at least want to set the analyser carefully.
>
> Andy
>
> PS I fixed the cause of the "UnsupportedOperationException" but only
> in the sense that it now issues a warning and skips the non-literal.
> The test for being a literal or not was there ... but after calling
> getLiteral.
>
>
> On 28/04/16 15:47, Sorin Gheorghiu wrote:
>> Hello,
>>
>> Jena text index returned the following error:
>>
>> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
>> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
>> java.lang.UnsupportedOperationException:
>> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent is
>> not a literal node
>> at org.apache.jena.graph.Node.getLiteral(Node.java:100)
>> at
>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>>
>>
>> at
>> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>>
>>
>> at jena.textindexer.exec(textindexer.java:122)
>> at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
>> at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>> at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>> at jena.textindexer.main(textindexer.java:51)
>>
>> when attempted to index entries like:
>>
>> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>>
>> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
>> gndo:variantNameForTheConferenceOrEvent "Conferentie van
>> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
>> gndo:preferredNameForTheConferenceOrEvent "Conference of
>> Non-Nuclear Weapon States" ;
>> a gndo:SeriesOfConferenceOrEvent .
>>
>> Here is the EntityMap assembler setup:
>>
>> <#entMap> a text:EntityMap ;
>> text:entityField "gndUri" ;
>> text:defaultField "prefName" ; ## Must be defined in the
>> text:map
>> text:map (
>> [ text:field "prefName";
>> text:predicate gndo:preferredNameForTheSubjectHeading
>> ]
>> [ text:field "type";
>> text:predicate rdf:type
>> ]
>> ...
>>
>> 'type' contains an URL, but a literal node is expected instead.
>> There is no difference if 'type' is defined as 'text' or 'string' in
>> Solr schema.xml.
>>
>> How is possible to fix it?
>>
>> Thank you in advance,
>> Sorin
>
--
Sorin Gheorghiu Tel: +49 7531 88-3198
Universität Konstanz Raum: B703
78464 Konstanz sorin.gheorghiu@uni-konstanz.de
- KIM: Abteilung Contentdienste -
Re: Error during text index
Posted by Andy Seaborne <an...@apache.org>.
Hi Sorin,
I'm curious as to why you are indexing a URI and what you see the
benefit of that. You might at least want to set the analyser carefully.
Andy
PS I fixed the cause of the "UnsupportedOperationException" but only in
the sense that it now issues a warning and skips the non-literal. The
test for being a literal or not was there ... but after calling getLiteral.
On 28/04/16 15:47, Sorin Gheorghiu wrote:
> Hello,
>
> Jena text index returned the following error:
>
> # java -cp /opt/apache-jena-fuseki-2.3.1/fuseki-server.jar
> jena.textindexer --desc=/etc/default/fuseki/jena-text-config.ttl
> java.lang.UnsupportedOperationException:
> http://d-nb.info/standards/elementset/gnd#SeriesOfConferenceOrEvent is
> not a literal node
> at org.apache.jena.graph.Node.getLiteral(Node.java:100)
> at
> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:80)
>
> at
> org.apache.jena.query.text.TextQueryFuncs.entityFromQuad(TextQueryFuncs.java:67)
>
> at jena.textindexer.exec(textindexer.java:122)
> at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
> at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
> at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
> at jena.textindexer.main(textindexer.java:51)
>
> when attempted to index entries like:
>
> @prefix gndo: <http://d-nb.info/standards/elementset/gnd#> .
> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
>
> <http://d-nb.info/gnd/1-2> gndo:gndIdentifier "1-2" ;
> gndo:variantNameForTheConferenceOrEvent "Conferentie van
> Niet-Kernwapenstaten" , "Conference on Non-Nuclear Weapon States" ;
> gndo:preferredNameForTheConferenceOrEvent "Conference of
> Non-Nuclear Weapon States" ;
> a gndo:SeriesOfConferenceOrEvent .
>
> Here is the EntityMap assembler setup:
>
> <#entMap> a text:EntityMap ;
> text:entityField "gndUri" ;
> text:defaultField "prefName" ; ## Must be defined in the text:map
> text:map (
> [ text:field "prefName";
> text:predicate gndo:preferredNameForTheSubjectHeading
> ]
> [ text:field "type";
> text:predicate rdf:type
> ]
> ...
>
> 'type' contains an URL, but a literal node is expected instead.
> There is no difference if 'type' is defined as 'text' or 'string' in
> Solr schema.xml.
>
> How is possible to fix it?
>
> Thank you in advance,
> Sorin