You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by diyun2008 <di...@gmail.com> on 2016/01/27 11:02:19 UTC

Solr cannot return result when query with # * like title:#7654321*

Hi guys
    I have a document index with title:#7654321.
    Then when I query it with q=title:#7654321, it works
    When I query it with q=title:#7654321*, it cannot work,it cannot hit any
result.
    Then I remove # and query it with q=title:7654321*, it works again.
    
     I tried q=title:76543*, it works!

     So I suspect there's a bug in lucene when using query with symbol # and
*.

     Who had ever met this ?or does anyone can help try it.
    
Thank you for your help.
 



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-cannot-return-result-when-query-with-like-title-7654321-tp4253541.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr cannot return result when query with # * like title:#7654321*

Posted by Jack Krupansky <ja...@gmail.com>.
Thanks. This is what Yonik was referring to - that # is a special URL
syntax character which signifies that the text after the # is what is known
as a fragment identifier, which is separated from the path and query
parameters of the URL. The Solr query is simply one URL query parameter
(&name=value). You need to escape the #, such as %23. But if you are using
SolrJ, the escaping should handled by the SolrJ API itself.

See:
https://en.wikipedia.org/wiki/Fragment_identifier
https://tools.ietf.org/html/rfc3986

Just to be super clear, how exactly are you sending the query to Solr - if
using curl, please post the full curl command.


-- Jack Krupansky

On Thu, Jan 28, 2016 at 1:03 AM, diyun2008 <di...@gmail.com> wrote:

> The query is rather simple:
> http://127.0.0.1:8080/solr/collection1/select?q=title:#7654321*
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-cannot-return-result-when-query-with-like-title-7654321-tp4253541p4253760.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr cannot return result when query with # * like title:#7654321*

Posted by diyun2008 <di...@gmail.com>.
The query is rather simple:
http://127.0.0.1:8080/solr/collection1/select?q=title:#7654321*




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-cannot-return-result-when-query-with-like-title-7654321-tp4253541p4253760.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr cannot return result when query with # * like title:#7654321*

Posted by Jack Krupansky <ja...@gmail.com>.
Just be to sure, please post the lines of code or command line that you are
using to issue the query.

-- Jack Krupansky

On Wed, Jan 27, 2016 at 10:50 PM, Yonik Seeley <ys...@gmail.com> wrote:

> On Wed, Jan 27, 2016 at 10:47 PM, diyun2008 <di...@gmail.com> wrote:
> > Hi Yonik
> >
> >    I do actually encode it like q=titile:%237654321* (which is :
> > q=titile:#7654321*)
>
> Yes, if you *need* to encode it yourself (i.e. if you're using curl,
> or a browser URL bar).  It really depends on the client you are using.
>
> -Yonik
>

Re: Solr cannot return result when query with # * like title:#7654321*

Posted by Yonik Seeley <ys...@gmail.com>.
On Wed, Jan 27, 2016 at 10:47 PM, diyun2008 <di...@gmail.com> wrote:
> Hi Yonik
>
>    I do actually encode it like q=titile:%237654321* (which is :
> q=titile:#7654321*)

Yes, if you *need* to encode it yourself (i.e. if you're using curl,
or a browser URL bar).  It really depends on the client you are using.

-Yonik

Re: Solr cannot return result when query with # * like title:#7654321*

Posted by diyun2008 <di...@gmail.com>.
Hi Yonik

   I do actually encode it like q=titile:%237654321* (which is :
q=titile:#7654321*) 





--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-cannot-return-result-when-query-with-like-title-7654321-tp4253541p4253734.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr cannot return result when query with # * like title:#7654321*

Posted by Yonik Seeley <ys...@gmail.com>.
Are you sure the query is getting through unadulterated to Solr?

For example:

/opt/code/lusolr$ curl 'http://localhost:8983/solr/techproducts/query?q=id:#foo'
{
  "responseHeader":{
    "status":400,
    "QTime":2,
    "params":{
      "q":"id:"}},
  "error":{
    "msg":"org.apache.solr.search.SyntaxError: Cannot parse 'id:':
Encountered \"<EOF>\" at line 1, column 3.\nWas expecting one of:\n
<BAREOPER> ...\n    \"(\" ...\n    \"*\" ...\n    <QUOTED> ...\n
<TERM> ...\n    <PREFIXTERM> ...\n    <WILDTERM> ...\n    <REGEXPTERM>
...\n    \"[\" ...\n    \"{\" ...\n    <LPARAMS> ...\n    \"filter(\"
...\n    <NUMBER> ...\n    ",
    "code":400}}


Look at the echoed params, and how they were cut off at the "#" which
is special for URLs (and needs to be URL encoded).
Whether *you* need to URL encode it depends on exactly how you are
sending the request to Solr.  With a browser or with curl, you will
need to do it yourself... but other types of clients like SolrJ should
do it for you.

-Yonik


On Wed, Jan 27, 2016 at 10:00 PM, diyun2008 <di...@gmail.com> wrote:
> Hi Eric
>
>     Thank you for your reply.
>     I saw the admin/analyze page before.
>
>     The string "titile:#7654321" for index analysis hit ST with "7654321"
> .(That means # is not indexed)
>     The string "title:#7654321*" for query analysis hit ST(Standard
> Tokenizer) with "7654321" as well.
>
>     What I was rather confused is if they all hit same string "7654321", why
> it cannot return the hit result by query(q=#7654321*)?
>
>     BTW, The title type is "text_general" and it uses Standard Tokenizer.

Re: Solr cannot return result when query with # * like title:#7654321*

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Diyun,

Have you read 


https://lucidworks.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ 
and
https://wiki.apache.org/solr/MultitermQueryAnalysis
?

Ahmet


On Thursday, January 28, 2016 9:02 AM, diyun2008 <di...@gmail.com> wrote:
Hi Shawn

    Your information is very important. It can explain the phenomena I met. 
    Do you know from where  I can get the related document Or subject about
what you said?
    I want to have a deep understand to this.
  
    Thank you Very much!

Diyun





--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-cannot-return-result-when-query-with-like-title-7654321-tp4253541p4253769.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr cannot return result when query with # * like title:#7654321*

Posted by diyun2008 <di...@gmail.com>.
Hi Shawn

    Your information is very important. It can explain the phenomena I met. 
    Do you know from where  I can get the related document Or subject about
what you said?
    I want to have a deep understand to this.
  
    Thank you Very much!

Diyun

 



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-cannot-return-result-when-query-with-like-title-7654321-tp4253541p4253769.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr cannot return result when query with # * like title:#7654321*

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/27/2016 11:31 PM, diyun2008 wrote:
>    Thank you for you reply.
>    But The weird place is why it can return result by condition
> (q=title:#7654321) and condition (q=title:7654321*) or (q=title:7654321).
>
>    From your assumption, the condition (q=title:#7654321) should not return
> result as well.(But it does return hit result).

When the query does NOT have a wildcard, the defined query analysis on
the field is used.  The query analysis for this particular field
includes a tokenizer or a filter that strips # characters, so the query
matches.

When the query DOES have a wildcard (* or ? characters), any defined
query analysis is completely skipped, so the # character is not stripped
from the query.  It will not match what's in the index.

The example of q=title:7654321* matches because it does not contain the
# character.

Thanks,
Shawn


Re: Solr cannot return result when query with # * like title:#7654321*

Posted by diyun2008 <di...@gmail.com>.
Hi Shawn

   Thank you for you reply.
   But The weird place is why it can return result by condition
(q=title:#7654321) and condition (q=title:7654321*) or (q=title:7654321).

   From your assumption, the condition (q=title:#7654321) should not return
result as well.(But it does return hit result).

Thanks




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-cannot-return-result-when-query-with-like-title-7654321-tp4253541p4253763.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr cannot return result when query with # * like title:#7654321*

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/27/2016 8:00 PM, diyun2008 wrote:
>     What I was rather confused is if they all hit same string "7654321", why
> it cannot return the hit result by query(q=#7654321*)? 

When a wildcard is present in the query, the terms in the query are NOT
analyzed.  I'm not sure exactly why this is the case, but it is how
Lucene behaves.

If the # character is correctly encoded (which it appears to be, based
on subsequent messages), it will still be in the query when wildcards
are present, and will not match what is indexed.

Thanks,
Shawn


Re: Solr cannot return result when query with # * like title:#7654321*

Posted by diyun2008 <di...@gmail.com>.
Hi Eric

    Thank you for your reply. 
    I saw the admin/analyze page before.

    The string "titile:#7654321" for index analysis hit ST with "7654321"
.(That means # is not indexed)
    The string "title:#7654321*" for query analysis hit ST(Standard
Tokenizer) with "7654321" as well.

    What I was rather confused is if they all hit same string "7654321", why
it cannot return the hit result by query(q=#7654321*)? 

    BTW, The title type is "text_general" and it uses Standard Tokenizer.  



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-cannot-return-result-when-query-with-like-title-7654321-tp4253541p4253727.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr cannot return result when query with # * like title:#7654321*

Posted by Erick Erickson <er...@gmail.com>.
The admin/analysis page is your friend here, it'll show you
exactly what happens to your terms as they go through various
stages of your analysis chain. And Ahmet's comment is also
spot on, wildcards are tricky....

On Wed, Jan 27, 2016 at 2:25 AM, Ahmet Arslan <io...@yahoo.com.invalid> wrote:
> Hi Diyun,
>
> Willard queries are not analysed. Probably your index time analyzer is stripping # character.
> Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis
>
> Ahmet
>
>
>
> On Wednesday, January 27, 2016 12:02 PM, diyun2008 <di...@gmail.com> wrote:
> Hi guys
>     I have a document index with title:#7654321.
>     Then when I query it with q=title:#7654321, it works
>     When I query it with q=title:#7654321*, it cannot work,it cannot hit any
> result.
>     Then I remove # and query it with q=title:7654321*, it works again.
>
>      I tried q=title:76543*, it works!
>
>      So I suspect there's a bug in lucene when using query with symbol # and
> *.
>
>      Who had ever met this ?or does anyone can help try it.
>
> Thank you for your help.
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-cannot-return-result-when-query-with-like-title-7654321-tp4253541.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr cannot return result when query with # * like title:#7654321*

Posted by diyun2008 <di...@gmail.com>.
Hi Ahmet

    Thank you for your reply.
    The title type is "text_general" and it uses Standard Tokenizer.

    The string "titile:#7654321" by index analysis hit ST with "7654321"
.(That means # is not indexed)
    The string "title:#7654321*" by query analysis hit ST(Standard
Tokenizer) with "7654321" as well.

    What I was rather confused is if they all hit same string "7654321", why
it cannot return the hit result by query(q=#7654321*)?

   



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-cannot-return-result-when-query-with-like-title-7654321-tp4253541p4253733.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr cannot return result when query with # * like title:#7654321*

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Diyun,

Willard queries are not analysed. Probably your index time analyzer is stripping # character.
Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Wednesday, January 27, 2016 12:02 PM, diyun2008 <di...@gmail.com> wrote:
Hi guys
    I have a document index with title:#7654321.
    Then when I query it with q=title:#7654321, it works
    When I query it with q=title:#7654321*, it cannot work,it cannot hit any
result.
    Then I remove # and query it with q=title:7654321*, it works again.
    
     I tried q=title:76543*, it works!

     So I suspect there's a bug in lucene when using query with symbol # and
*.

     Who had ever met this ?or does anyone can help try it.
    
Thank you for your help.




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-cannot-return-result-when-query-with-like-title-7654321-tp4253541.html
Sent from the Solr - User mailing list archive at Nabble.com.