You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2012/05/08 11:25:51 UTC

[jira] [Created] (STANBOL-607) SolrYard should use quotes instead of AND for multi word TextConstraints

Rupert Westenthaler created STANBOL-607:
-------------------------------------------

             Summary: SolrYard should use quotes instead of AND for multi word TextConstraints
                 Key: STANBOL-607
                 URL: https://issues.apache.org/jira/browse/STANBOL-607
             Project: Stanbol
          Issue Type: Improvement
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler
            Priority: Minor
             Fix For: 0.10.0-incubating


Currently a Text constraint for rdfs:label containing "The Book of Three" is encoded like

    (_\!@/rdfs\:label/:The) AND ((_\!@/rdfs\:label/:Book) AND  (_\!@/rdfs\:label/:of) AND(_\!@/rdfs\:label/:Three))

however Solr/Lucene allow to use quotes for multi word searches. So the correct way to encode this query would be

    ((_\!@/rdfs\:label/:"The Book of Three"))

This need to be fixed in the "org.apache.stanbol.entityhub.yard.solr.query.QueryUtils#encodeQueryValue(..)

NOTE: The impact of this change for Wildcard queries need to be further investigated

e.g  take a query "Frankf* am Main" currently encoded like

    (_\!@/rdfs\:label/:Frankf*) AND ((_\!@/rdfs\:label/:am) AND  (_\!@/rdfs\:label/:main))

would than result in

    ((_\!@/rdfs\:label/:"Frankf* am Main"))



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (STANBOL-607) SolrYard should use quotes instead of AND for multi word TextConstraints

Posted by "Rupert Westenthaler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271285#comment-13271285 ] 

Rupert Westenthaler edited comment on STANBOL-607 at 5/10/12 5:42 AM:
----------------------------------------------------------------------

Ragarding "Frankf* am Main":

This is not supported to use wild cards within brackets. So the best solution is to keep the tokenized for for TextConstraint that do use wild cards (meaning separate tokens that contain '?' or '*')

That means that the above query would be best encoded as follows.

    (_\!@/rdfs\:label/:frankf*) AND (_\!@/rdfs\:label/:"am Main")

NOTE: until we switch to Solr 3.6+ tokens that contain wildcards need to be converted to lower case (see SOLR-2438). Because of that "Frankf*" is changed to "frankf*"
                
      was (Author: rwesten):
    Ragarding "Frankf* am Main":

This is not supported to use wild cards within brackets. So the best solution is to keep the tokenized for for TextConstraint that do use wild cards (meaning separate tokens that contain '?' or '*')

That means that the above query would be best encoded as follows.

    (_\!@/rdfs\:label/:Frankf*) AND (_\!@/rdfs\:label/:"am Main")
                  
> SolrYard should use quotes instead of AND for multi word TextConstraints
> ------------------------------------------------------------------------
>
>                 Key: STANBOL-607
>                 URL: https://issues.apache.org/jira/browse/STANBOL-607
>             Project: Stanbol
>          Issue Type: Improvement
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>             Fix For: 0.10.0-incubating
>
>
> Currently a Text constraint for rdfs:label containing "The Book of Three" is encoded like
>     (_\!@/rdfs\:label/:The) AND ((_\!@/rdfs\:label/:Book) AND  (_\!@/rdfs\:label/:of) AND(_\!@/rdfs\:label/:Three))
> however Solr/Lucene allow to use quotes for multi word searches. So the correct way to encode this query would be
>     ((_\!@/rdfs\:label/:"The Book of Three"))
> This need to be fixed in the "org.apache.stanbol.entityhub.yard.solr.query.QueryUtils#encodeQueryValue(..)
> NOTE: The impact of this change for Wildcard queries need to be further investigated
> e.g  take a query "Frankf* am Main" currently encoded like
>     (_\!@/rdfs\:label/:Frankf*) AND ((_\!@/rdfs\:label/:am) AND  (_\!@/rdfs\:label/:main))
> would than result in
>     ((_\!@/rdfs\:label/:"Frankf* am Main"))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (STANBOL-607) SolrYard should use quotes instead of AND for multi word TextConstraints

Posted by "Fabian Christ (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STANBOL-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fabian Christ updated STANBOL-607:
----------------------------------

    Affects Version/s: commons.web.base-0.10.0-incubating
        Fix Version/s: commons.web.base-0.10.0-incubating
    
> SolrYard should use quotes instead of AND for multi word TextConstraints
> ------------------------------------------------------------------------
>
>                 Key: STANBOL-607
>                 URL: https://issues.apache.org/jira/browse/STANBOL-607
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Entity Hub
>    Affects Versions: 0.9.0-incubating, commons.web.base-0.10.0-incubating
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>             Fix For: entityhub-0.10.0-incubating, commons.web.base-0.10.0-incubating
>
>
> Currently a Text constraint for rdfs:label containing "The Book of Three" is encoded like
>     (_\!@/rdfs\:label/:The) AND ((_\!@/rdfs\:label/:Book) AND  (_\!@/rdfs\:label/:of) AND(_\!@/rdfs\:label/:Three))
> however Solr/Lucene allow to use quotes for multi word searches. So the correct way to encode this query would be
>     ((_\!@/rdfs\:label/:"The Book of Three"))
> This need to be fixed in the "org.apache.stanbol.entityhub.yard.solr.query.QueryUtils#encodeQueryValue(..)
> NOTE: The impact of this change for Wildcard queries need to be further investigated
> e.g  take a query "Frankf* am Main" currently encoded like
>     (_\!@/rdfs\:label/:Frankf*) AND ((_\!@/rdfs\:label/:am) AND  (_\!@/rdfs\:label/:main))
> would than result in
>     ((_\!@/rdfs\:label/:"Frankf* am Main"))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (STANBOL-607) SolrYard should use quotes instead of AND for multi word TextConstraints

Posted by "Rupert Westenthaler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STANBOL-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rupert Westenthaler resolved STANBOL-607.
-----------------------------------------

    Resolution: Fixed

fixed with 	#1337141
                
> SolrYard should use quotes instead of AND for multi word TextConstraints
> ------------------------------------------------------------------------
>
>                 Key: STANBOL-607
>                 URL: https://issues.apache.org/jira/browse/STANBOL-607
>             Project: Stanbol
>          Issue Type: Improvement
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>             Fix For: 0.10.0-incubating
>
>
> Currently a Text constraint for rdfs:label containing "The Book of Three" is encoded like
>     (_\!@/rdfs\:label/:The) AND ((_\!@/rdfs\:label/:Book) AND  (_\!@/rdfs\:label/:of) AND(_\!@/rdfs\:label/:Three))
> however Solr/Lucene allow to use quotes for multi word searches. So the correct way to encode this query would be
>     ((_\!@/rdfs\:label/:"The Book of Three"))
> This need to be fixed in the "org.apache.stanbol.entityhub.yard.solr.query.QueryUtils#encodeQueryValue(..)
> NOTE: The impact of this change for Wildcard queries need to be further investigated
> e.g  take a query "Frankf* am Main" currently encoded like
>     (_\!@/rdfs\:label/:Frankf*) AND ((_\!@/rdfs\:label/:am) AND  (_\!@/rdfs\:label/:main))
> would than result in
>     ((_\!@/rdfs\:label/:"Frankf* am Main"))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-607) SolrYard should use quotes instead of AND for multi word TextConstraints

Posted by "Rupert Westenthaler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271285#comment-13271285 ] 

Rupert Westenthaler commented on STANBOL-607:
---------------------------------------------

Ragarding "Frankf* am Main":

This is not supported to use wild cards within brackets. So the best solution is to keep the tokenized for for TextConstraint that do use wild cards (meaning separate tokens that contain '?' or '*')

That means that the above query would be best encoded as follows.

    (_\!@/rdfs\:label/:Frankf*) AND (_\!@/rdfs\:label/:"am Main")
                
> SolrYard should use quotes instead of AND for multi word TextConstraints
> ------------------------------------------------------------------------
>
>                 Key: STANBOL-607
>                 URL: https://issues.apache.org/jira/browse/STANBOL-607
>             Project: Stanbol
>          Issue Type: Improvement
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>             Fix For: 0.10.0-incubating
>
>
> Currently a Text constraint for rdfs:label containing "The Book of Three" is encoded like
>     (_\!@/rdfs\:label/:The) AND ((_\!@/rdfs\:label/:Book) AND  (_\!@/rdfs\:label/:of) AND(_\!@/rdfs\:label/:Three))
> however Solr/Lucene allow to use quotes for multi word searches. So the correct way to encode this query would be
>     ((_\!@/rdfs\:label/:"The Book of Three"))
> This need to be fixed in the "org.apache.stanbol.entityhub.yard.solr.query.QueryUtils#encodeQueryValue(..)
> NOTE: The impact of this change for Wildcard queries need to be further investigated
> e.g  take a query "Frankf* am Main" currently encoded like
>     (_\!@/rdfs\:label/:Frankf*) AND ((_\!@/rdfs\:label/:am) AND  (_\!@/rdfs\:label/:main))
> would than result in
>     ((_\!@/rdfs\:label/:"Frankf* am Main"))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (STANBOL-607) SolrYard should use quotes instead of AND for multi word TextConstraints

Posted by "Fabian Christ (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STANBOL-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fabian Christ updated STANBOL-607:
----------------------------------

    Component/s: Entity Hub
    
> SolrYard should use quotes instead of AND for multi word TextConstraints
> ------------------------------------------------------------------------
>
>                 Key: STANBOL-607
>                 URL: https://issues.apache.org/jira/browse/STANBOL-607
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Entity Hub
>    Affects Versions: 0.9.0-incubating
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>
> Currently a Text constraint for rdfs:label containing "The Book of Three" is encoded like
>     (_\!@/rdfs\:label/:The) AND ((_\!@/rdfs\:label/:Book) AND  (_\!@/rdfs\:label/:of) AND(_\!@/rdfs\:label/:Three))
> however Solr/Lucene allow to use quotes for multi word searches. So the correct way to encode this query would be
>     ((_\!@/rdfs\:label/:"The Book of Three"))
> This need to be fixed in the "org.apache.stanbol.entityhub.yard.solr.query.QueryUtils#encodeQueryValue(..)
> NOTE: The impact of this change for Wildcard queries need to be further investigated
> e.g  take a query "Frankf* am Main" currently encoded like
>     (_\!@/rdfs\:label/:Frankf*) AND ((_\!@/rdfs\:label/:am) AND  (_\!@/rdfs\:label/:main))
> would than result in
>     ((_\!@/rdfs\:label/:"Frankf* am Main"))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira