You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Gintautas Sulskus <gi...@gmail.com> on 2015/10/09 01:39:00 UTC

keywordLinking engine does not work with words in quotes

Hello,

I have noticed that keywordLinking engine does not work with words in
quotes.
For the given text:

> I suspect that the keyword linking engine does not work with "quotes".


I get the following exception: [1].
[2] Shows the SPARQL Query and Virtuoso error message

It seems that the problem is in the SPARQL query line:

>  ?v_1 bif:contains '*"*\"quotes"' .


Stanbol sends the query to Virtuoso with both UNescaped (that should have
been removed) and escaped quote: "*%22*%5C%22quotes%22" - which translates
to *"*\"quotes". Removing the first quote -\"quotes" - solves the problem.

PS. Could you please explain me, what parameter is expected instead of
"null" in ["quotes]@[en, null]? [1]

Refs:
[1] Exception
org.apache.stanbol.enhancer.servicesapi.EngineException: Exception while
searchign for ["quotes]@[en, null]in the ReferencedSite virt
at
org.apache.stanbol.enhancer.engines.keywordextraction.impl.EntityLinker.lookupEntities(EntityLinker.java:298)
at
org.apache.stanbol.enhancer.engines.keywordextraction.impl.EntityLinker.process(EntityLinker.java:124)
at
org.apache.stanbol.enhancer.engines.keywordextraction.engine.KeywordLinkingEngine.computeEnhancements(KeywordLinkingEngine.java:392)
at
org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:279)
at
org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:197)
at
org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:415)
at
org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:118)
at
org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:159)


[2] SPARQL Query and Virtuoso error message

Virtuoso 37000 Error XM029: Free-text expression, line 0: Unterminated
double-quoted word or phrase at


SPARQL query:
define sql:big-data-const 0
#output-format:application/rdf+xml
CONSTRUCT {
  ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 .
  ?id <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?v_2 .
  ?id <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?v_3 .
  <http://stanbol.apache.org/ontology/entityhub/query#QueryResultSet> <
http://stanbol.apache.org/ontology/entityhub/query#queryResult> ?id .
} WHERE {
  {
    SELECT ?id
    WHERE {
      ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 .
        ?v_1 bif:contains '"\"quotes"' .
        FILTER(((lang(?v_1) = "en") || (lang(?v_1) = ""))) .
    }
    ORDER BY DESC ( <LONG::IRI_RANK> (?id) )
    LIMIT 10
          }
  OPTIONAL { ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 . }
  OPTIONAL { ?id <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?v_2 . }
  OPTIONAL { ?id <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?v_3 . }
}

Best Regards,
Gin

Re: keywordLinking engine does not work with words in quotes

Posted by Gintautas Sulskus <gi...@gmail.com>.
Hi Rupert,

Thanks!

Best Wishes,
Gin

Best Regards,
Gintautas Sulskus

On Wed, Oct 28, 2015 at 7:41 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi Gintautas Sulskus
>
> I fixed both issues for both 0.12.1-SNAPSHOT and 1.0.0-SNAPSHOT. See
> STANBOL-877 for more information
>
> best
> Rupert
>
> On Tue, Oct 27, 2015 at 1:22 PM, Rupert Westenthaler
> <ru...@gmail.com> wrote:
> > Hi,
> >
> > finally got some time to look into that ...
> >
> >
> > First - my assumption that the patch for STANBOL-877 got not merged
> > into the codebase was wrong. It is present in both 0.12.1 and trunk.
> > So this means that both reported errors are indeed new issues.
> >
> >> PS. Could you please explain me, what parameter is expected instead of
> "null" in ["quotes]@[en, null]? [1]
> >
> > null means to search for literals without language tag (the null
> > language). This is why
> >
> >     FILTER(((lang(?v_1) = "en") || (lang(?v_1) = "")))
> >
> > is in the SPARQL query
> >
> >
> > Finally the disambiguation-mlt engine will not work with the SPARQL
> > backed Enityhub Site. This is simple because SPARQL does not allow
> > MoreLikeThis queries. Still it does look like as if the unsupported
> > constraint type results in an additional '.' added to the query. This
> > needs to be fixed.
> >
> > best
> > Rupert
> >
> >
> >
> > On Fri, Oct 23, 2015 at 9:26 AM, Gintautas Sulskus
> > <gi...@gmail.com> wrote:
> >> Hi,
> >>
> >> I will re-use this thread to report a similar SPARQL query construction
> >> issue with "disambiguation-mlt" engine.
> >>
> >> The engine fails in an attempt to process the following sentence: "The
> >> House Benghazi committee took its best swings at Hillary Clinton."
> >> Virtuoso throws an invalid syntax exception [1].
> >> However, sentence "Committee took its best swings at Hillary Clinton."
> >> works just fine.
> >>
> >> Regards,
> >> Gin
> >>
> >> [1]
> >>
> >> Virtuoso 37000 Error SP030: SPARQL compiler, line 6: syntax error at
> >> '.' before '?id'
> >>
> >> SPARQL query:
> >> define sql:big-data-const 0
> >> #output-format:application/sparql-results+json
> >> SELECT DISTINCT ?id
> >> WHERE {
> >>   {
> >>  .
> >>     ?id <http://www.w3.org/2000/01/rdf-schema#label> ?tmp1 .
> >>       ?tmp1 bif:contains '("hillary" AND "clinton")' .
> >>       FILTER(((lang(?tmp1) = "en") || (lang(?tmp1) = ""))) .
> >>   }
> >> }
> >> ORDER BY DESC ( <LONG::IRI_RANK> (?id) )
> >> LIMIT 25
> >>
> >>
> >>
> >>
> >> Best Wishes,
> >> Gintautas Sulskus
> >>
> >> On Mon, Oct 12, 2015 at 1:18 PM, Gintautas Sulskus <
> >> gintautas.sulskus@gmail.com> wrote:
> >>
> >>> Hi Rupert,
> >>>
> >>> thanks.
> >>>
> >>> Cheers,
> >>> Gin
> >>>
> >>> Best Wishes,
> >>> Gintautas Sulskus
> >>>
> >>> On Mon, Oct 12, 2015 at 7:50 AM, Rupert Westenthaler <
> >>> rupert.westenthaler@gmail.com> wrote:
> >>>
> >>>> Hi Gin,
> >>>>
> >>>> Thanks for reporting. This looks like a bug in the generation of the
> >>>> SPARQL queries by the SparqlQueryUtils.java [1] class.
> >>>>
> >>>> A search for existing JIRA issues related to this revealed STANBOL-877
> >>>> [2] that already provided a patch that seamed the have never been
> >>>> applied to the code base. I try to have a look into this but I might
> >>>> not find time this week.
> >>>>
> >>>> best
> >>>> Rupert
> >>>>
> >>>> [1]
> >>>>
> http://svn.apache.org/repos/asf/stanbol/trunk/entityhub/query/sparql/src/main/java/org/apache/stanbol/entityhub/query/sparql/SparqlQueryUtils.java
> >>>> [2] https://issues.apache.org/jira/browse/STANBOL-877
> >>>>
> >>>> On Fri, Oct 9, 2015 at 1:39 AM, Gintautas Sulskus
> >>>> <gi...@gmail.com> wrote:
> >>>> > Hello,
> >>>> >
> >>>> > I have noticed that keywordLinking engine does not work with words
> in
> >>>> > quotes.
> >>>> > For the given text:
> >>>> >
> >>>> >> I suspect that the keyword linking engine does not work with
> "quotes".
> >>>> >
> >>>> >
> >>>> > I get the following exception: [1].
> >>>> > [2] Shows the SPARQL Query and Virtuoso error message
> >>>> >
> >>>> > It seems that the problem is in the SPARQL query line:
> >>>> >
> >>>> >>  ?v_1 bif:contains '*"*\"quotes"' .
> >>>> >
> >>>> >
> >>>> > Stanbol sends the query to Virtuoso with both UNescaped (that should
> >>>> have
> >>>> > been removed) and escaped quote: "*%22*%5C%22quotes%22" - which
> >>>> translates
> >>>> > to *"*\"quotes". Removing the first quote -\"quotes" - solves the
> >>>> problem.
> >>>> >
> >>>> > PS. Could you please explain me, what parameter is expected instead
> of
> >>>> > "null" in ["quotes]@[en, null]? [1]
> >>>> >
> >>>> > Refs:
> >>>> > [1] Exception
> >>>> > org.apache.stanbol.enhancer.servicesapi.EngineException: Exception
> while
> >>>> > searchign for ["quotes]@[en, null]in the ReferencedSite virt
> >>>> > at
> >>>> >
> >>>>
> org.apache.stanbol.enhancer.engines.keywordextraction.impl.EntityLinker.lookupEntities(EntityLinker.java:298)
> >>>> > at
> >>>> >
> >>>>
> org.apache.stanbol.enhancer.engines.keywordextraction.impl.EntityLinker.process(EntityLinker.java:124)
> >>>> > at
> >>>> >
> >>>>
> org.apache.stanbol.enhancer.engines.keywordextraction.engine.KeywordLinkingEngine.computeEnhancements(KeywordLinkingEngine.java:392)
> >>>> > at
> >>>> >
> >>>>
> org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:279)
> >>>> > at
> >>>> >
> >>>>
> org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:197)
> >>>> > at
> >>>> >
> >>>>
> org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:415)
> >>>> > at
> >>>> >
> >>>>
> org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:118)
> >>>> > at
> >>>> >
> >>>>
> org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:159)
> >>>> >
> >>>> >
> >>>> > [2] SPARQL Query and Virtuoso error message
> >>>> >
> >>>> > Virtuoso 37000 Error XM029: Free-text expression, line 0:
> Unterminated
> >>>> > double-quoted word or phrase at
> >>>> >
> >>>> >
> >>>> > SPARQL query:
> >>>> > define sql:big-data-const 0
> >>>> > #output-format:application/rdf+xml
> >>>> > CONSTRUCT {
> >>>> >   ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 .
> >>>> >   ?id <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?v_2 .
> >>>> >   ?id <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?v_3 .
> >>>> >   <
> http://stanbol.apache.org/ontology/entityhub/query#QueryResultSet> <
> >>>> > http://stanbol.apache.org/ontology/entityhub/query#queryResult>
> ?id .
> >>>> > } WHERE {
> >>>> >   {
> >>>> >     SELECT ?id
> >>>> >     WHERE {
> >>>> >       ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 .
> >>>> >         ?v_1 bif:contains '"\"quotes"' .
> >>>> >         FILTER(((lang(?v_1) = "en") || (lang(?v_1) = ""))) .
> >>>> >     }
> >>>> >     ORDER BY DESC ( <LONG::IRI_RANK> (?id) )
> >>>> >     LIMIT 10
> >>>> >           }
> >>>> >   OPTIONAL { ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1
> . }
> >>>> >   OPTIONAL { ?id <http://www.w3.org/2000/01/rdf-schema#seeAlso>
> ?v_2 .
> >>>> }
> >>>> >   OPTIONAL { ?id <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> >>>> ?v_3 . }
> >>>> > }
> >>>> >
> >>>> > Best Regards,
> >>>> > Gin
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >>>> | Bodenlehenstraße 11                              ++43-699-11108907
> >>>> | A-5500 Bischofshofen
> >>>> | REDLINK.CO
> >>>>
> ..........................................................................
> >>>> | http://redlink.co/
> >>>>
> >>>
> >>>
> >
> >
> >
> > --
> > | Rupert Westenthaler             rupert.westenthaler@gmail.com
> > | Bodenlehenstraße 11                              ++43-699-11108907
> > | A-5500 Bischofshofen
> > | REDLINK.CO
> ..........................................................................
> > | http://redlink.co/
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                              ++43-699-11108907
> | A-5500 Bischofshofen
> | REDLINK.CO
> ..........................................................................
> | http://redlink.co/
>

Re: keywordLinking engine does not work with words in quotes

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Gintautas Sulskus

I fixed both issues for both 0.12.1-SNAPSHOT and 1.0.0-SNAPSHOT. See
STANBOL-877 for more information

best
Rupert

On Tue, Oct 27, 2015 at 1:22 PM, Rupert Westenthaler
<ru...@gmail.com> wrote:
> Hi,
>
> finally got some time to look into that ...
>
>
> First - my assumption that the patch for STANBOL-877 got not merged
> into the codebase was wrong. It is present in both 0.12.1 and trunk.
> So this means that both reported errors are indeed new issues.
>
>> PS. Could you please explain me, what parameter is expected instead of "null" in ["quotes]@[en, null]? [1]
>
> null means to search for literals without language tag (the null
> language). This is why
>
>     FILTER(((lang(?v_1) = "en") || (lang(?v_1) = "")))
>
> is in the SPARQL query
>
>
> Finally the disambiguation-mlt engine will not work with the SPARQL
> backed Enityhub Site. This is simple because SPARQL does not allow
> MoreLikeThis queries. Still it does look like as if the unsupported
> constraint type results in an additional '.' added to the query. This
> needs to be fixed.
>
> best
> Rupert
>
>
>
> On Fri, Oct 23, 2015 at 9:26 AM, Gintautas Sulskus
> <gi...@gmail.com> wrote:
>> Hi,
>>
>> I will re-use this thread to report a similar SPARQL query construction
>> issue with "disambiguation-mlt" engine.
>>
>> The engine fails in an attempt to process the following sentence: "The
>> House Benghazi committee took its best swings at Hillary Clinton."
>> Virtuoso throws an invalid syntax exception [1].
>> However, sentence "Committee took its best swings at Hillary Clinton."
>> works just fine.
>>
>> Regards,
>> Gin
>>
>> [1]
>>
>> Virtuoso 37000 Error SP030: SPARQL compiler, line 6: syntax error at
>> '.' before '?id'
>>
>> SPARQL query:
>> define sql:big-data-const 0
>> #output-format:application/sparql-results+json
>> SELECT DISTINCT ?id
>> WHERE {
>>   {
>>  .
>>     ?id <http://www.w3.org/2000/01/rdf-schema#label> ?tmp1 .
>>       ?tmp1 bif:contains '("hillary" AND "clinton")' .
>>       FILTER(((lang(?tmp1) = "en") || (lang(?tmp1) = ""))) .
>>   }
>> }
>> ORDER BY DESC ( <LONG::IRI_RANK> (?id) )
>> LIMIT 25
>>
>>
>>
>>
>> Best Wishes,
>> Gintautas Sulskus
>>
>> On Mon, Oct 12, 2015 at 1:18 PM, Gintautas Sulskus <
>> gintautas.sulskus@gmail.com> wrote:
>>
>>> Hi Rupert,
>>>
>>> thanks.
>>>
>>> Cheers,
>>> Gin
>>>
>>> Best Wishes,
>>> Gintautas Sulskus
>>>
>>> On Mon, Oct 12, 2015 at 7:50 AM, Rupert Westenthaler <
>>> rupert.westenthaler@gmail.com> wrote:
>>>
>>>> Hi Gin,
>>>>
>>>> Thanks for reporting. This looks like a bug in the generation of the
>>>> SPARQL queries by the SparqlQueryUtils.java [1] class.
>>>>
>>>> A search for existing JIRA issues related to this revealed STANBOL-877
>>>> [2] that already provided a patch that seamed the have never been
>>>> applied to the code base. I try to have a look into this but I might
>>>> not find time this week.
>>>>
>>>> best
>>>> Rupert
>>>>
>>>> [1]
>>>> http://svn.apache.org/repos/asf/stanbol/trunk/entityhub/query/sparql/src/main/java/org/apache/stanbol/entityhub/query/sparql/SparqlQueryUtils.java
>>>> [2] https://issues.apache.org/jira/browse/STANBOL-877
>>>>
>>>> On Fri, Oct 9, 2015 at 1:39 AM, Gintautas Sulskus
>>>> <gi...@gmail.com> wrote:
>>>> > Hello,
>>>> >
>>>> > I have noticed that keywordLinking engine does not work with words in
>>>> > quotes.
>>>> > For the given text:
>>>> >
>>>> >> I suspect that the keyword linking engine does not work with "quotes".
>>>> >
>>>> >
>>>> > I get the following exception: [1].
>>>> > [2] Shows the SPARQL Query and Virtuoso error message
>>>> >
>>>> > It seems that the problem is in the SPARQL query line:
>>>> >
>>>> >>  ?v_1 bif:contains '*"*\"quotes"' .
>>>> >
>>>> >
>>>> > Stanbol sends the query to Virtuoso with both UNescaped (that should
>>>> have
>>>> > been removed) and escaped quote: "*%22*%5C%22quotes%22" - which
>>>> translates
>>>> > to *"*\"quotes". Removing the first quote -\"quotes" - solves the
>>>> problem.
>>>> >
>>>> > PS. Could you please explain me, what parameter is expected instead of
>>>> > "null" in ["quotes]@[en, null]? [1]
>>>> >
>>>> > Refs:
>>>> > [1] Exception
>>>> > org.apache.stanbol.enhancer.servicesapi.EngineException: Exception while
>>>> > searchign for ["quotes]@[en, null]in the ReferencedSite virt
>>>> > at
>>>> >
>>>> org.apache.stanbol.enhancer.engines.keywordextraction.impl.EntityLinker.lookupEntities(EntityLinker.java:298)
>>>> > at
>>>> >
>>>> org.apache.stanbol.enhancer.engines.keywordextraction.impl.EntityLinker.process(EntityLinker.java:124)
>>>> > at
>>>> >
>>>> org.apache.stanbol.enhancer.engines.keywordextraction.engine.KeywordLinkingEngine.computeEnhancements(KeywordLinkingEngine.java:392)
>>>> > at
>>>> >
>>>> org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:279)
>>>> > at
>>>> >
>>>> org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:197)
>>>> > at
>>>> >
>>>> org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:415)
>>>> > at
>>>> >
>>>> org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:118)
>>>> > at
>>>> >
>>>> org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:159)
>>>> >
>>>> >
>>>> > [2] SPARQL Query and Virtuoso error message
>>>> >
>>>> > Virtuoso 37000 Error XM029: Free-text expression, line 0: Unterminated
>>>> > double-quoted word or phrase at
>>>> >
>>>> >
>>>> > SPARQL query:
>>>> > define sql:big-data-const 0
>>>> > #output-format:application/rdf+xml
>>>> > CONSTRUCT {
>>>> >   ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 .
>>>> >   ?id <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?v_2 .
>>>> >   ?id <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?v_3 .
>>>> >   <http://stanbol.apache.org/ontology/entityhub/query#QueryResultSet> <
>>>> > http://stanbol.apache.org/ontology/entityhub/query#queryResult> ?id .
>>>> > } WHERE {
>>>> >   {
>>>> >     SELECT ?id
>>>> >     WHERE {
>>>> >       ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 .
>>>> >         ?v_1 bif:contains '"\"quotes"' .
>>>> >         FILTER(((lang(?v_1) = "en") || (lang(?v_1) = ""))) .
>>>> >     }
>>>> >     ORDER BY DESC ( <LONG::IRI_RANK> (?id) )
>>>> >     LIMIT 10
>>>> >           }
>>>> >   OPTIONAL { ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 . }
>>>> >   OPTIONAL { ?id <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?v_2 .
>>>> }
>>>> >   OPTIONAL { ?id <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> ?v_3 . }
>>>> > }
>>>> >
>>>> > Best Regards,
>>>> > Gin
>>>>
>>>>
>>>>
>>>> --
>>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> | Bodenlehenstraße 11                              ++43-699-11108907
>>>> | A-5500 Bischofshofen
>>>> | REDLINK.CO
>>>> ..........................................................................
>>>> | http://redlink.co/
>>>>
>>>
>>>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                              ++43-699-11108907
> | A-5500 Bischofshofen
> | REDLINK.CO ..........................................................................
> | http://redlink.co/



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO ..........................................................................
| http://redlink.co/

Re: keywordLinking engine does not work with words in quotes

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi,

finally got some time to look into that ...


First - my assumption that the patch for STANBOL-877 got not merged
into the codebase was wrong. It is present in both 0.12.1 and trunk.
So this means that both reported errors are indeed new issues.

> PS. Could you please explain me, what parameter is expected instead of "null" in ["quotes]@[en, null]? [1]

null means to search for literals without language tag (the null
language). This is why

    FILTER(((lang(?v_1) = "en") || (lang(?v_1) = "")))

is in the SPARQL query


Finally the disambiguation-mlt engine will not work with the SPARQL
backed Enityhub Site. This is simple because SPARQL does not allow
MoreLikeThis queries. Still it does look like as if the unsupported
constraint type results in an additional '.' added to the query. This
needs to be fixed.

best
Rupert



On Fri, Oct 23, 2015 at 9:26 AM, Gintautas Sulskus
<gi...@gmail.com> wrote:
> Hi,
>
> I will re-use this thread to report a similar SPARQL query construction
> issue with "disambiguation-mlt" engine.
>
> The engine fails in an attempt to process the following sentence: "The
> House Benghazi committee took its best swings at Hillary Clinton."
> Virtuoso throws an invalid syntax exception [1].
> However, sentence "Committee took its best swings at Hillary Clinton."
> works just fine.
>
> Regards,
> Gin
>
> [1]
>
> Virtuoso 37000 Error SP030: SPARQL compiler, line 6: syntax error at
> '.' before '?id'
>
> SPARQL query:
> define sql:big-data-const 0
> #output-format:application/sparql-results+json
> SELECT DISTINCT ?id
> WHERE {
>   {
>  .
>     ?id <http://www.w3.org/2000/01/rdf-schema#label> ?tmp1 .
>       ?tmp1 bif:contains '("hillary" AND "clinton")' .
>       FILTER(((lang(?tmp1) = "en") || (lang(?tmp1) = ""))) .
>   }
> }
> ORDER BY DESC ( <LONG::IRI_RANK> (?id) )
> LIMIT 25
>
>
>
>
> Best Wishes,
> Gintautas Sulskus
>
> On Mon, Oct 12, 2015 at 1:18 PM, Gintautas Sulskus <
> gintautas.sulskus@gmail.com> wrote:
>
>> Hi Rupert,
>>
>> thanks.
>>
>> Cheers,
>> Gin
>>
>> Best Wishes,
>> Gintautas Sulskus
>>
>> On Mon, Oct 12, 2015 at 7:50 AM, Rupert Westenthaler <
>> rupert.westenthaler@gmail.com> wrote:
>>
>>> Hi Gin,
>>>
>>> Thanks for reporting. This looks like a bug in the generation of the
>>> SPARQL queries by the SparqlQueryUtils.java [1] class.
>>>
>>> A search for existing JIRA issues related to this revealed STANBOL-877
>>> [2] that already provided a patch that seamed the have never been
>>> applied to the code base. I try to have a look into this but I might
>>> not find time this week.
>>>
>>> best
>>> Rupert
>>>
>>> [1]
>>> http://svn.apache.org/repos/asf/stanbol/trunk/entityhub/query/sparql/src/main/java/org/apache/stanbol/entityhub/query/sparql/SparqlQueryUtils.java
>>> [2] https://issues.apache.org/jira/browse/STANBOL-877
>>>
>>> On Fri, Oct 9, 2015 at 1:39 AM, Gintautas Sulskus
>>> <gi...@gmail.com> wrote:
>>> > Hello,
>>> >
>>> > I have noticed that keywordLinking engine does not work with words in
>>> > quotes.
>>> > For the given text:
>>> >
>>> >> I suspect that the keyword linking engine does not work with "quotes".
>>> >
>>> >
>>> > I get the following exception: [1].
>>> > [2] Shows the SPARQL Query and Virtuoso error message
>>> >
>>> > It seems that the problem is in the SPARQL query line:
>>> >
>>> >>  ?v_1 bif:contains '*"*\"quotes"' .
>>> >
>>> >
>>> > Stanbol sends the query to Virtuoso with both UNescaped (that should
>>> have
>>> > been removed) and escaped quote: "*%22*%5C%22quotes%22" - which
>>> translates
>>> > to *"*\"quotes". Removing the first quote -\"quotes" - solves the
>>> problem.
>>> >
>>> > PS. Could you please explain me, what parameter is expected instead of
>>> > "null" in ["quotes]@[en, null]? [1]
>>> >
>>> > Refs:
>>> > [1] Exception
>>> > org.apache.stanbol.enhancer.servicesapi.EngineException: Exception while
>>> > searchign for ["quotes]@[en, null]in the ReferencedSite virt
>>> > at
>>> >
>>> org.apache.stanbol.enhancer.engines.keywordextraction.impl.EntityLinker.lookupEntities(EntityLinker.java:298)
>>> > at
>>> >
>>> org.apache.stanbol.enhancer.engines.keywordextraction.impl.EntityLinker.process(EntityLinker.java:124)
>>> > at
>>> >
>>> org.apache.stanbol.enhancer.engines.keywordextraction.engine.KeywordLinkingEngine.computeEnhancements(KeywordLinkingEngine.java:392)
>>> > at
>>> >
>>> org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:279)
>>> > at
>>> >
>>> org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:197)
>>> > at
>>> >
>>> org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:415)
>>> > at
>>> >
>>> org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:118)
>>> > at
>>> >
>>> org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:159)
>>> >
>>> >
>>> > [2] SPARQL Query and Virtuoso error message
>>> >
>>> > Virtuoso 37000 Error XM029: Free-text expression, line 0: Unterminated
>>> > double-quoted word or phrase at
>>> >
>>> >
>>> > SPARQL query:
>>> > define sql:big-data-const 0
>>> > #output-format:application/rdf+xml
>>> > CONSTRUCT {
>>> >   ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 .
>>> >   ?id <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?v_2 .
>>> >   ?id <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?v_3 .
>>> >   <http://stanbol.apache.org/ontology/entityhub/query#QueryResultSet> <
>>> > http://stanbol.apache.org/ontology/entityhub/query#queryResult> ?id .
>>> > } WHERE {
>>> >   {
>>> >     SELECT ?id
>>> >     WHERE {
>>> >       ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 .
>>> >         ?v_1 bif:contains '"\"quotes"' .
>>> >         FILTER(((lang(?v_1) = "en") || (lang(?v_1) = ""))) .
>>> >     }
>>> >     ORDER BY DESC ( <LONG::IRI_RANK> (?id) )
>>> >     LIMIT 10
>>> >           }
>>> >   OPTIONAL { ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 . }
>>> >   OPTIONAL { ?id <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?v_2 .
>>> }
>>> >   OPTIONAL { ?id <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> ?v_3 . }
>>> > }
>>> >
>>> > Best Regards,
>>> > Gin
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> | Bodenlehenstraße 11                              ++43-699-11108907
>>> | A-5500 Bischofshofen
>>> | REDLINK.CO
>>> ..........................................................................
>>> | http://redlink.co/
>>>
>>
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO ..........................................................................
| http://redlink.co/

Re: keywordLinking engine does not work with words in quotes

Posted by Gintautas Sulskus <gi...@gmail.com>.
Hi,

I will re-use this thread to report a similar SPARQL query construction
issue with "disambiguation-mlt" engine.

The engine fails in an attempt to process the following sentence: "The
House Benghazi committee took its best swings at Hillary Clinton."
Virtuoso throws an invalid syntax exception [1].
However, sentence "Committee took its best swings at Hillary Clinton."
works just fine.

Regards,
Gin

[1]

Virtuoso 37000 Error SP030: SPARQL compiler, line 6: syntax error at
'.' before '?id'

SPARQL query:
define sql:big-data-const 0
#output-format:application/sparql-results+json
SELECT DISTINCT ?id
WHERE {
  {
 .
    ?id <http://www.w3.org/2000/01/rdf-schema#label> ?tmp1 .
      ?tmp1 bif:contains '("hillary" AND "clinton")' .
      FILTER(((lang(?tmp1) = "en") || (lang(?tmp1) = ""))) .
  }
}
ORDER BY DESC ( <LONG::IRI_RANK> (?id) )
LIMIT 25




Best Wishes,
Gintautas Sulskus

On Mon, Oct 12, 2015 at 1:18 PM, Gintautas Sulskus <
gintautas.sulskus@gmail.com> wrote:

> Hi Rupert,
>
> thanks.
>
> Cheers,
> Gin
>
> Best Wishes,
> Gintautas Sulskus
>
> On Mon, Oct 12, 2015 at 7:50 AM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi Gin,
>>
>> Thanks for reporting. This looks like a bug in the generation of the
>> SPARQL queries by the SparqlQueryUtils.java [1] class.
>>
>> A search for existing JIRA issues related to this revealed STANBOL-877
>> [2] that already provided a patch that seamed the have never been
>> applied to the code base. I try to have a look into this but I might
>> not find time this week.
>>
>> best
>> Rupert
>>
>> [1]
>> http://svn.apache.org/repos/asf/stanbol/trunk/entityhub/query/sparql/src/main/java/org/apache/stanbol/entityhub/query/sparql/SparqlQueryUtils.java
>> [2] https://issues.apache.org/jira/browse/STANBOL-877
>>
>> On Fri, Oct 9, 2015 at 1:39 AM, Gintautas Sulskus
>> <gi...@gmail.com> wrote:
>> > Hello,
>> >
>> > I have noticed that keywordLinking engine does not work with words in
>> > quotes.
>> > For the given text:
>> >
>> >> I suspect that the keyword linking engine does not work with "quotes".
>> >
>> >
>> > I get the following exception: [1].
>> > [2] Shows the SPARQL Query and Virtuoso error message
>> >
>> > It seems that the problem is in the SPARQL query line:
>> >
>> >>  ?v_1 bif:contains '*"*\"quotes"' .
>> >
>> >
>> > Stanbol sends the query to Virtuoso with both UNescaped (that should
>> have
>> > been removed) and escaped quote: "*%22*%5C%22quotes%22" - which
>> translates
>> > to *"*\"quotes". Removing the first quote -\"quotes" - solves the
>> problem.
>> >
>> > PS. Could you please explain me, what parameter is expected instead of
>> > "null" in ["quotes]@[en, null]? [1]
>> >
>> > Refs:
>> > [1] Exception
>> > org.apache.stanbol.enhancer.servicesapi.EngineException: Exception while
>> > searchign for ["quotes]@[en, null]in the ReferencedSite virt
>> > at
>> >
>> org.apache.stanbol.enhancer.engines.keywordextraction.impl.EntityLinker.lookupEntities(EntityLinker.java:298)
>> > at
>> >
>> org.apache.stanbol.enhancer.engines.keywordextraction.impl.EntityLinker.process(EntityLinker.java:124)
>> > at
>> >
>> org.apache.stanbol.enhancer.engines.keywordextraction.engine.KeywordLinkingEngine.computeEnhancements(KeywordLinkingEngine.java:392)
>> > at
>> >
>> org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:279)
>> > at
>> >
>> org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:197)
>> > at
>> >
>> org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:415)
>> > at
>> >
>> org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:118)
>> > at
>> >
>> org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:159)
>> >
>> >
>> > [2] SPARQL Query and Virtuoso error message
>> >
>> > Virtuoso 37000 Error XM029: Free-text expression, line 0: Unterminated
>> > double-quoted word or phrase at
>> >
>> >
>> > SPARQL query:
>> > define sql:big-data-const 0
>> > #output-format:application/rdf+xml
>> > CONSTRUCT {
>> >   ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 .
>> >   ?id <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?v_2 .
>> >   ?id <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?v_3 .
>> >   <http://stanbol.apache.org/ontology/entityhub/query#QueryResultSet> <
>> > http://stanbol.apache.org/ontology/entityhub/query#queryResult> ?id .
>> > } WHERE {
>> >   {
>> >     SELECT ?id
>> >     WHERE {
>> >       ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 .
>> >         ?v_1 bif:contains '"\"quotes"' .
>> >         FILTER(((lang(?v_1) = "en") || (lang(?v_1) = ""))) .
>> >     }
>> >     ORDER BY DESC ( <LONG::IRI_RANK> (?id) )
>> >     LIMIT 10
>> >           }
>> >   OPTIONAL { ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 . }
>> >   OPTIONAL { ?id <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?v_2 .
>> }
>> >   OPTIONAL { ?id <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> ?v_3 . }
>> > }
>> >
>> > Best Regards,
>> > Gin
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                              ++43-699-11108907
>> | A-5500 Bischofshofen
>> | REDLINK.CO
>> ..........................................................................
>> | http://redlink.co/
>>
>
>

Re: keywordLinking engine does not work with words in quotes

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Gin,

Thanks for reporting. This looks like a bug in the generation of the
SPARQL queries by the SparqlQueryUtils.java [1] class.

A search for existing JIRA issues related to this revealed STANBOL-877
[2] that already provided a patch that seamed the have never been
applied to the code base. I try to have a look into this but I might
not find time this week.

best
Rupert

[1] http://svn.apache.org/repos/asf/stanbol/trunk/entityhub/query/sparql/src/main/java/org/apache/stanbol/entityhub/query/sparql/SparqlQueryUtils.java
[2] https://issues.apache.org/jira/browse/STANBOL-877

On Fri, Oct 9, 2015 at 1:39 AM, Gintautas Sulskus
<gi...@gmail.com> wrote:
> Hello,
>
> I have noticed that keywordLinking engine does not work with words in
> quotes.
> For the given text:
>
>> I suspect that the keyword linking engine does not work with "quotes".
>
>
> I get the following exception: [1].
> [2] Shows the SPARQL Query and Virtuoso error message
>
> It seems that the problem is in the SPARQL query line:
>
>>  ?v_1 bif:contains '*"*\"quotes"' .
>
>
> Stanbol sends the query to Virtuoso with both UNescaped (that should have
> been removed) and escaped quote: "*%22*%5C%22quotes%22" - which translates
> to *"*\"quotes". Removing the first quote -\"quotes" - solves the problem.
>
> PS. Could you please explain me, what parameter is expected instead of
> "null" in ["quotes]@[en, null]? [1]
>
> Refs:
> [1] Exception
> org.apache.stanbol.enhancer.servicesapi.EngineException: Exception while
> searchign for ["quotes]@[en, null]in the ReferencedSite virt
> at
> org.apache.stanbol.enhancer.engines.keywordextraction.impl.EntityLinker.lookupEntities(EntityLinker.java:298)
> at
> org.apache.stanbol.enhancer.engines.keywordextraction.impl.EntityLinker.process(EntityLinker.java:124)
> at
> org.apache.stanbol.enhancer.engines.keywordextraction.engine.KeywordLinkingEngine.computeEnhancements(KeywordLinkingEngine.java:392)
> at
> org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:279)
> at
> org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:197)
> at
> org.apache.felix.eventadmin.impl.handler.EventHandlerProxy.sendEvent(EventHandlerProxy.java:415)
> at
> org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:118)
> at
> org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:159)
>
>
> [2] SPARQL Query and Virtuoso error message
>
> Virtuoso 37000 Error XM029: Free-text expression, line 0: Unterminated
> double-quoted word or phrase at
>
>
> SPARQL query:
> define sql:big-data-const 0
> #output-format:application/rdf+xml
> CONSTRUCT {
>   ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 .
>   ?id <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?v_2 .
>   ?id <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?v_3 .
>   <http://stanbol.apache.org/ontology/entityhub/query#QueryResultSet> <
> http://stanbol.apache.org/ontology/entityhub/query#queryResult> ?id .
> } WHERE {
>   {
>     SELECT ?id
>     WHERE {
>       ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 .
>         ?v_1 bif:contains '"\"quotes"' .
>         FILTER(((lang(?v_1) = "en") || (lang(?v_1) = ""))) .
>     }
>     ORDER BY DESC ( <LONG::IRI_RANK> (?id) )
>     LIMIT 10
>           }
>   OPTIONAL { ?id <http://www.w3.org/2000/01/rdf-schema#label> ?v_1 . }
>   OPTIONAL { ?id <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?v_2 . }
>   OPTIONAL { ?id <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?v_3 . }
> }
>
> Best Regards,
> Gin



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO ..........................................................................
| http://redlink.co/