You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Joel Nylund <jn...@yahoo.com> on 2009/12/07 21:16:44 UTC

# in query

Hi,

How can I put a # sign in a query, do I need to escape it?

For example I want to query books with title that contain #

No work so far:
http://localhost:8983/solr/select?q=textTitle:"#"
http://localhost:8983/solr/select?q=textTitle:#
http://localhost:8983/solr/select?q=textTitle:"\#"

Getting
org.apache.lucene.queryParser.ParseException: Cannot parse 'textTitle: 
\': Lexical error at line 1, column 12.  Encountered: <EOF> after : ""

and sometimes just no response.


thanks
Joel


Re: # in query

Posted by Erick Erickson <er...@gmail.com>.
Sorry, I usually think of things in Lucene land and reflexively think of the
fat client.

Anyway, here's your problem I think...

WordDelimiterFilterFactory. See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

It's losing the ##### altogether, as indicated by the
tokens you saw:
s|s,  ecapsym|myspace,  golb|blog"
not a # in sight.

It's kind of subtle, but in the above page entry, this phrase implies that
all non
alpha-numerics are dropped: "(by default, all non alpha-numeric characters)"

title is: "#######'s myspace blog"

I'm assuming that the Title (if you're looking at it in Luke)
is giving back your stored value. The tokens are what count
during searching, storing and indexing are orthogonal....

HTH
Erick

On Tue, Dec 8, 2009 at 2:25 PM, Joel Nylund <jn...@yahoo.com> wrote:

> ok, I just realized I was using the luke handler, didnt know there was a
> fat client, I assume thats what you are talking about.
>
> I downloaded the lukeall.jar, ran it, pointed to my index, found the
> document in question, didn't see how it was tokenized, but I clicked the
> "reconstruct & edit" button,
>
> this gives me a tab that has tokenized per field, for this field it shows:
>
>
> " s|s,  ecapsym|myspace,  golb|blog"
>
> title is: "#######'s myspace blog"
>
> schema is:
>
>  <!-- A general unstemmed text field that indexes tokens normally and also
>         reversed (via ReversedWildcardFilterFactory), to enable more
> efficient
>         leading wildcard queries. -->
>    <fieldType name="text_rev" class="solr.TextField"
> positionIncrementGap="100">
>
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.ReversedWildcardFilterFactory"
> withOriginal="true"
>           maxPosAsterisk="3" maxPosQuestion="2"
> maxFractionAsterisk="0.33"/>
>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
>
>        <field name="textTitle" type="text_rev" indexed="true" stored="true"
> required="false" multiValued="false"/>
>
>
>
> thanks
> Joel
>
>
>
>
>
> On Dec 8, 2009, at 11:14 AM, Erick Erickson wrote:
>
>  In Luke, there's a tab that will let you go to a document ID. From there
>> you can see all the fields in a particular document, and examine what
>> the actual tokens stored are. Until and unless you know what tokens
>> are being indexed, you simply can't know what your queries should look
>> like.......
>>
>> *Assuming* that the ### are getting indexed and *assuming* your tokenizer
>> tokenized on, whitespace, and *assuming* that by text_rev you
>> are talking about ReversedWildcardFilterFactory, I
>> wouldn't expect a search to match if it wasn't exactly:
>> s'#######. But as you see, there's a long chain of assumptions there any
>> one of which may be violated by your schema. So please post the
>> relevant portions of your schema to make it easier to help.
>>
>> Best
>> Erick
>>
>>
>> On Tue, Dec 8, 2009 at 9:54 AM, Joel Nylund <jn...@yahoo.com> wrote:
>>
>>  Thanks Eric,
>>>
>>> I looked more into this, but still stuck:
>>>
>>> I have this field indexed using text_rev
>>>
>>> I looked at the luke analysis for this field, but im unsure how to read
>>> it.
>>>
>>> When I query the field by the id I get:
>>>
>>> <result name="response" numFound="1" start="0">
>>> -
>>> <doc>
>>> <str name="id">5405255</str>
>>> <str name="textTitle">#######'s test blog</str>
>>> </doc>
>>> </result>
>>>
>>> If I try to query even multiple ### I get nothing.
>>>
>>> Here is what luke handler says:  (btw when I used id instead of docid on
>>> luke I got a nullpointer exception  /admin/luke?docid=5405255  vs
>>> /admin/luke?id=5405255)
>>>
>>> <lst name="textTitle">
>>> <str name="type">text_rev</str>
>>> <str name="schema">ITS-----------</str>
>>> <str name="index">ITS----------</str>
>>> <int name="docs">290329</int>
>>> <int name="distinct">401016</int>
>>> -
>>> <lst name="topTerms">
>>> <int name="#1;golb">49362</int>
>>> <int name="blog">49362</int>
>>> <int name="#1;ecapsym">29426</int>
>>> <int name="myspace">29426</int>
>>> <int name="#1;s">8773</int>
>>> <int name="s">8773</int>
>>> <int name="#1;ed">8033</int>
>>> <int name="de">8033</int>
>>> <int name="com">6884</int>
>>> <int name="#1;moc">6884</int>
>>> </lst>
>>> -
>>> <lst name="histogram">
>>> <int name="1">308908</int>
>>> <int name="2">34340</int>
>>> <int name="4">21916</int>
>>> <int name="8">14474</int>
>>> <int name="16">9122</int>
>>> <int name="32">5578</int>
>>> <int name="64">3162</int>
>>> <int name="128">1844</int>
>>> <int name="256">910</int>
>>> <int name="512">464</int>
>>> <int name="1024">182</int>
>>> <int name="2048">72</int>
>>> <int name="4096">26</int>
>>> <int name="8192">12</int>
>>> <int name="16384">2</int>
>>> <int name="32768">2</int>
>>> <int name="65536">2</int>
>>> </lst>
>>> </lst>
>>>
>>>
>>> solr/select?q=textTitle:%23%23%23  - gets no results.
>>>
>>> I have the same field indexed as a alphaOnlySort, and it gives me lots of
>>> results, but not the ones I want.
>>>
>>> Any other ideas?
>>>
>>> thanks
>>> Joel
>>>
>>>
>>>
>>> On Dec 7, 2009, at 3:42 PM, Erick Erickson wrote:
>>>
>>> Well, the very first thing I would is examine the field definition in
>>>
>>>> your schema file. I suspect that the tokenizers and/or
>>>> filters you're using for indexing and/or querying is doing something
>>>> to the # symbol. Most likely stripping it. If you're just searching
>>>> for the single-letter term "#", I *think* the query parser silently just
>>>> drops that part of the clause out, but check on that.....
>>>>
>>>> The second thing would be to get a copy of Luke and examine your
>>>> index to see if what you *think* is in your index actually is there.
>>>>
>>>> HTH
>>>> Erick
>>>>
>>>> On Mon, Dec 7, 2009 at 3:28 PM, Joel Nylund <jn...@yahoo.com> wrote:
>>>>
>>>> ok thanks,  sorry my brain wasn't working, but even when I url encode
>>>> it,
>>>>
>>>>> I
>>>>> dont get any results, is there something special I have to do for solr?
>>>>>
>>>>> thanks
>>>>> Joel
>>>>>
>>>>>
>>>>> On Dec 7, 2009, at 3:20 PM, Paul Libbrecht wrote:
>>>>>
>>>>> Sure you have to escape it! %23
>>>>>
>>>>>
>>>>>> otherwise the browser considers it as a separator between the URL for
>>>>>> the
>>>>>> server (on the left) and the fragment identifier (on the right) which
>>>>>> is
>>>>>> not
>>>>>> sent the server.
>>>>>>
>>>>>> You might want to read about "URL-encoding", escaping with backslash
>>>>>> is
>>>>>> a
>>>>>> shell-thing, not a thing for URLs!
>>>>>>
>>>>>> paul
>>>>>>
>>>>>>
>>>>>> Le 07-déc.-09 à 21:16, Joel Nylund a écrit :
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>> How can I put a # sign in a query, do I need to escape it?
>>>>>>>
>>>>>>> For example I want to query books with title that contain #
>>>>>>>
>>>>>>> No work so far:
>>>>>>> http://localhost:8983/solr/select?q=textTitle:"#"
>>>>>>> http://localhost:8983/solr/select?q=textTitle:#
>>>>>>> http://localhost:8983/solr/select?q=textTitle:"\#"
>>>>>>>
>>>>>>> Getting
>>>>>>> org.apache.lucene.queryParser.ParseException: Cannot parse
>>>>>>> 'textTitle:\':
>>>>>>> Lexical error at line 1, column 12.  Encountered: <EOF> after : ""
>>>>>>>
>>>>>>> and sometimes just no response.
>>>>>>>
>>>>>>>
>>>>>>> thanks
>>>>>>> Joel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>

Re: # in query

Posted by Joel Nylund <jn...@yahoo.com>.
ok, I just realized I was using the luke handler, didnt know there was  
a fat client, I assume thats what you are talking about.

I downloaded the lukeall.jar, ran it, pointed to my index, found the  
document in question, didn't see how it was tokenized, but I clicked  
the "reconstruct & edit" button,

this gives me a tab that has tokenized per field, for this field it  
shows:


"s|s, ecapsym|myspace, golb|blog"

title is: "#######'s myspace blog"

schema is:

  <!-- A general unstemmed text field that indexes tokens normally and  
also
          reversed (via ReversedWildcardFilterFactory), to enable more  
efficient
	 leading wildcard queries. -->
     <fieldType name="text_rev" class="solr.TextField"  
positionIncrementGap="100">
       <analyzer type="index">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true"  
words="stopwords.txt" enablePositionIncrements="true" />
         <filter class="solr.WordDelimiterFilterFactory"  
generateWordParts="1" generateNumberParts="1" catenateWords="1"  
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.ReversedWildcardFilterFactory"  
withOriginal="true"
            maxPosAsterisk="3" maxPosQuestion="2"  
maxFractionAsterisk="0.33"/>
       </analyzer>
       <analyzer type="query">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.SynonymFilterFactory"  
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
         <filter class="solr.StopFilterFactory"
                 ignoreCase="true"
                 words="stopwords.txt"
                 enablePositionIncrements="true"
                 />
         <filter class="solr.WordDelimiterFilterFactory"  
generateWordParts="1" generateNumberParts="1" catenateWords="0"  
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
         <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>
     </fieldType>


	<field name="textTitle" type="text_rev" indexed="true" stored="true"  
required="false" multiValued="false"/>



thanks
Joel




On Dec 8, 2009, at 11:14 AM, Erick Erickson wrote:

> In Luke, there's a tab that will let you go to a document ID. From  
> there
> you can see all the fields in a particular document, and examine what
> the actual tokens stored are. Until and unless you know what tokens
> are being indexed, you simply can't know what your queries should look
> like.......
>
> *Assuming* that the ### are getting indexed and *assuming* your  
> tokenizer
> tokenized on, whitespace, and *assuming* that by text_rev you
> are talking about ReversedWildcardFilterFactory, I
> wouldn't expect a search to match if it wasn't exactly:
> s'#######. But as you see, there's a long chain of assumptions there  
> any
> one of which may be violated by your schema. So please post the
> relevant portions of your schema to make it easier to help.
>
> Best
> Erick
>
>
> On Tue, Dec 8, 2009 at 9:54 AM, Joel Nylund <jn...@yahoo.com> wrote:
>
>> Thanks Eric,
>>
>> I looked more into this, but still stuck:
>>
>> I have this field indexed using text_rev
>>
>> I looked at the luke analysis for this field, but im unsure how to  
>> read it.
>>
>> When I query the field by the id I get:
>>
>> <result name="response" numFound="1" start="0">
>> -
>> <doc>
>> <str name="id">5405255</str>
>> <str name="textTitle">#######'s test blog</str>
>> </doc>
>> </result>
>>
>> If I try to query even multiple ### I get nothing.
>>
>> Here is what luke handler says:  (btw when I used id instead of  
>> docid on
>> luke I got a nullpointer exception  /admin/luke?docid=5405255  vs
>> /admin/luke?id=5405255)
>>
>> <lst name="textTitle">
>> <str name="type">text_rev</str>
>> <str name="schema">ITS-----------</str>
>> <str name="index">ITS----------</str>
>> <int name="docs">290329</int>
>> <int name="distinct">401016</int>
>> -
>> <lst name="topTerms">
>> <int name="#1;golb">49362</int>
>> <int name="blog">49362</int>
>> <int name="#1;ecapsym">29426</int>
>> <int name="myspace">29426</int>
>> <int name="#1;s">8773</int>
>> <int name="s">8773</int>
>> <int name="#1;ed">8033</int>
>> <int name="de">8033</int>
>> <int name="com">6884</int>
>> <int name="#1;moc">6884</int>
>> </lst>
>> -
>> <lst name="histogram">
>> <int name="1">308908</int>
>> <int name="2">34340</int>
>> <int name="4">21916</int>
>> <int name="8">14474</int>
>> <int name="16">9122</int>
>> <int name="32">5578</int>
>> <int name="64">3162</int>
>> <int name="128">1844</int>
>> <int name="256">910</int>
>> <int name="512">464</int>
>> <int name="1024">182</int>
>> <int name="2048">72</int>
>> <int name="4096">26</int>
>> <int name="8192">12</int>
>> <int name="16384">2</int>
>> <int name="32768">2</int>
>> <int name="65536">2</int>
>> </lst>
>> </lst>
>>
>>
>> solr/select?q=textTitle:%23%23%23  - gets no results.
>>
>> I have the same field indexed as a alphaOnlySort, and it gives me  
>> lots of
>> results, but not the ones I want.
>>
>> Any other ideas?
>>
>> thanks
>> Joel
>>
>>
>>
>> On Dec 7, 2009, at 3:42 PM, Erick Erickson wrote:
>>
>> Well, the very first thing I would is examine the field definition in
>>> your schema file. I suspect that the tokenizers and/or
>>> filters you're using for indexing and/or querying is doing something
>>> to the # symbol. Most likely stripping it. If you're just searching
>>> for the single-letter term "#", I *think* the query parser  
>>> silently just
>>> drops that part of the clause out, but check on that.....
>>>
>>> The second thing would be to get a copy of Luke and examine your
>>> index to see if what you *think* is in your index actually is there.
>>>
>>> HTH
>>> Erick
>>>
>>> On Mon, Dec 7, 2009 at 3:28 PM, Joel Nylund <jn...@yahoo.com>  
>>> wrote:
>>>
>>> ok thanks,  sorry my brain wasn't working, but even when I url  
>>> encode it,
>>>> I
>>>> dont get any results, is there something special I have to do for  
>>>> solr?
>>>>
>>>> thanks
>>>> Joel
>>>>
>>>>
>>>> On Dec 7, 2009, at 3:20 PM, Paul Libbrecht wrote:
>>>>
>>>> Sure you have to escape it! %23
>>>>
>>>>>
>>>>> otherwise the browser considers it as a separator between the  
>>>>> URL for
>>>>> the
>>>>> server (on the left) and the fragment identifier (on the right)  
>>>>> which is
>>>>> not
>>>>> sent the server.
>>>>>
>>>>> You might want to read about "URL-encoding", escaping with  
>>>>> backslash is
>>>>> a
>>>>> shell-thing, not a thing for URLs!
>>>>>
>>>>> paul
>>>>>
>>>>>
>>>>> Le 07-déc.-09 à 21:16, Joel Nylund a écrit :
>>>>>
>>>>> Hi,
>>>>>
>>>>>>
>>>>>> How can I put a # sign in a query, do I need to escape it?
>>>>>>
>>>>>> For example I want to query books with title that contain #
>>>>>>
>>>>>> No work so far:
>>>>>> http://localhost:8983/solr/select?q=textTitle:"#"
>>>>>> http://localhost:8983/solr/select?q=textTitle:#
>>>>>> http://localhost:8983/solr/select?q=textTitle:"\#"
>>>>>>
>>>>>> Getting
>>>>>> org.apache.lucene.queryParser.ParseException: Cannot parse
>>>>>> 'textTitle:\':
>>>>>> Lexical error at line 1, column 12.  Encountered: <EOF> after :  
>>>>>> ""
>>>>>>
>>>>>> and sometimes just no response.
>>>>>>
>>>>>>
>>>>>> thanks
>>>>>> Joel
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>


Re: # in query

Posted by Erick Erickson <er...@gmail.com>.
In Luke, there's a tab that will let you go to a document ID. From there
you can see all the fields in a particular document, and examine what
the actual tokens stored are. Until and unless you know what tokens
are being indexed, you simply can't know what your queries should look
like.......

*Assuming* that the ### are getting indexed and *assuming* your tokenizer
tokenized on, whitespace, and *assuming* that by text_rev you
are talking about ReversedWildcardFilterFactory, I
wouldn't expect a search to match if it wasn't exactly:
s'#######. But as you see, there's a long chain of assumptions there any
one of which may be violated by your schema. So please post the
relevant portions of your schema to make it easier to help.

Best
Erick


On Tue, Dec 8, 2009 at 9:54 AM, Joel Nylund <jn...@yahoo.com> wrote:

> Thanks Eric,
>
> I looked more into this, but still stuck:
>
> I have this field indexed using text_rev
>
> I looked at the luke analysis for this field, but im unsure how to read it.
>
> When I query the field by the id I get:
>
> <result name="response" numFound="1" start="0">
> -
> <doc>
> <str name="id">5405255</str>
> <str name="textTitle">#######'s test blog</str>
> </doc>
> </result>
>
> If I try to query even multiple ### I get nothing.
>
> Here is what luke handler says:  (btw when I used id instead of docid on
> luke I got a nullpointer exception  /admin/luke?docid=5405255  vs
> /admin/luke?id=5405255)
>
> <lst name="textTitle">
> <str name="type">text_rev</str>
> <str name="schema">ITS-----------</str>
> <str name="index">ITS----------</str>
> <int name="docs">290329</int>
> <int name="distinct">401016</int>
> -
> <lst name="topTerms">
> <int name="#1;golb">49362</int>
> <int name="blog">49362</int>
> <int name="#1;ecapsym">29426</int>
> <int name="myspace">29426</int>
> <int name="#1;s">8773</int>
> <int name="s">8773</int>
> <int name="#1;ed">8033</int>
> <int name="de">8033</int>
> <int name="com">6884</int>
> <int name="#1;moc">6884</int>
> </lst>
> -
> <lst name="histogram">
> <int name="1">308908</int>
> <int name="2">34340</int>
> <int name="4">21916</int>
> <int name="8">14474</int>
> <int name="16">9122</int>
> <int name="32">5578</int>
> <int name="64">3162</int>
> <int name="128">1844</int>
> <int name="256">910</int>
> <int name="512">464</int>
> <int name="1024">182</int>
> <int name="2048">72</int>
> <int name="4096">26</int>
> <int name="8192">12</int>
> <int name="16384">2</int>
> <int name="32768">2</int>
> <int name="65536">2</int>
> </lst>
> </lst>
>
>
> solr/select?q=textTitle:%23%23%23  - gets no results.
>
> I have the same field indexed as a alphaOnlySort, and it gives me lots of
> results, but not the ones I want.
>
> Any other ideas?
>
> thanks
> Joel
>
>
>
> On Dec 7, 2009, at 3:42 PM, Erick Erickson wrote:
>
>  Well, the very first thing I would is examine the field definition in
>> your schema file. I suspect that the tokenizers and/or
>> filters you're using for indexing and/or querying is doing something
>> to the # symbol. Most likely stripping it. If you're just searching
>> for the single-letter term "#", I *think* the query parser silently just
>> drops that part of the clause out, but check on that.....
>>
>> The second thing would be to get a copy of Luke and examine your
>> index to see if what you *think* is in your index actually is there.
>>
>> HTH
>> Erick
>>
>> On Mon, Dec 7, 2009 at 3:28 PM, Joel Nylund <jn...@yahoo.com> wrote:
>>
>>  ok thanks,  sorry my brain wasn't working, but even when I url encode it,
>>> I
>>> dont get any results, is there something special I have to do for solr?
>>>
>>> thanks
>>> Joel
>>>
>>>
>>> On Dec 7, 2009, at 3:20 PM, Paul Libbrecht wrote:
>>>
>>> Sure you have to escape it! %23
>>>
>>>>
>>>> otherwise the browser considers it as a separator between the URL for
>>>> the
>>>> server (on the left) and the fragment identifier (on the right) which is
>>>> not
>>>> sent the server.
>>>>
>>>> You might want to read about "URL-encoding", escaping with backslash is
>>>> a
>>>> shell-thing, not a thing for URLs!
>>>>
>>>> paul
>>>>
>>>>
>>>> Le 07-déc.-09 à 21:16, Joel Nylund a écrit :
>>>>
>>>> Hi,
>>>>
>>>>>
>>>>> How can I put a # sign in a query, do I need to escape it?
>>>>>
>>>>> For example I want to query books with title that contain #
>>>>>
>>>>> No work so far:
>>>>> http://localhost:8983/solr/select?q=textTitle:"#"
>>>>> http://localhost:8983/solr/select?q=textTitle:#
>>>>> http://localhost:8983/solr/select?q=textTitle:"\#"
>>>>>
>>>>> Getting
>>>>> org.apache.lucene.queryParser.ParseException: Cannot parse
>>>>> 'textTitle:\':
>>>>> Lexical error at line 1, column 12.  Encountered: <EOF> after : ""
>>>>>
>>>>> and sometimes just no response.
>>>>>
>>>>>
>>>>> thanks
>>>>> Joel
>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Re: # in query

Posted by Joel Nylund <jn...@yahoo.com>.
Thanks Eric,

I looked more into this, but still stuck:

I have this field indexed using text_rev

I looked at the luke analysis for this field, but im unsure how to  
read it.

When I query the field by the id I get:

<result name="response" numFound="1" start="0">
−
<doc>
<str name="id">5405255</str>
<str name="textTitle">#######'s test blog</str>
</doc>
</result>

If I try to query even multiple ### I get nothing.

Here is what luke handler says:  (btw when I used id instead of docid  
on luke I got a nullpointer exception  /admin/luke?docid=5405255  vs / 
admin/luke?id=5405255)

<lst name="textTitle">
<str name="type">text_rev</str>
<str name="schema">ITS-----------</str>
<str name="index">ITS----------</str>
<int name="docs">290329</int>
<int name="distinct">401016</int>
−
<lst name="topTerms">
<int name="#1;golb">49362</int>
<int name="blog">49362</int>
<int name="#1;ecapsym">29426</int>
<int name="myspace">29426</int>
<int name="#1;s">8773</int>
<int name="s">8773</int>
<int name="#1;ed">8033</int>
<int name="de">8033</int>
<int name="com">6884</int>
<int name="#1;moc">6884</int>
</lst>
−
<lst name="histogram">
<int name="1">308908</int>
<int name="2">34340</int>
<int name="4">21916</int>
<int name="8">14474</int>
<int name="16">9122</int>
<int name="32">5578</int>
<int name="64">3162</int>
<int name="128">1844</int>
<int name="256">910</int>
<int name="512">464</int>
<int name="1024">182</int>
<int name="2048">72</int>
<int name="4096">26</int>
<int name="8192">12</int>
<int name="16384">2</int>
<int name="32768">2</int>
<int name="65536">2</int>
</lst>
</lst>


solr/select?q=textTitle:%23%23%23  - gets no results.

I have the same field indexed as a alphaOnlySort, and it gives me lots  
of results, but not the ones I want.

Any other ideas?

thanks
Joel


On Dec 7, 2009, at 3:42 PM, Erick Erickson wrote:

> Well, the very first thing I would is examine the field definition in
> your schema file. I suspect that the tokenizers and/or
> filters you're using for indexing and/or querying is doing something
> to the # symbol. Most likely stripping it. If you're just searching
> for the single-letter term "#", I *think* the query parser silently  
> just
> drops that part of the clause out, but check on that.....
>
> The second thing would be to get a copy of Luke and examine your
> index to see if what you *think* is in your index actually is there.
>
> HTH
> Erick
>
> On Mon, Dec 7, 2009 at 3:28 PM, Joel Nylund <jn...@yahoo.com> wrote:
>
>> ok thanks,  sorry my brain wasn't working, but even when I url  
>> encode it, I
>> dont get any results, is there something special I have to do for  
>> solr?
>>
>> thanks
>> Joel
>>
>>
>> On Dec 7, 2009, at 3:20 PM, Paul Libbrecht wrote:
>>
>> Sure you have to escape it! %23
>>>
>>> otherwise the browser considers it as a separator between the URL  
>>> for the
>>> server (on the left) and the fragment identifier (on the right)  
>>> which is not
>>> sent the server.
>>>
>>> You might want to read about "URL-encoding", escaping with  
>>> backslash is a
>>> shell-thing, not a thing for URLs!
>>>
>>> paul
>>>
>>>
>>> Le 07-déc.-09 à 21:16, Joel Nylund a écrit :
>>>
>>> Hi,
>>>>
>>>> How can I put a # sign in a query, do I need to escape it?
>>>>
>>>> For example I want to query books with title that contain #
>>>>
>>>> No work so far:
>>>> http://localhost:8983/solr/select?q=textTitle:"#"
>>>> http://localhost:8983/solr/select?q=textTitle:#
>>>> http://localhost:8983/solr/select?q=textTitle:"\#"
>>>>
>>>> Getting
>>>> org.apache.lucene.queryParser.ParseException: Cannot parse  
>>>> 'textTitle:\':
>>>> Lexical error at line 1, column 12.  Encountered: <EOF> after : ""
>>>>
>>>> and sometimes just no response.
>>>>
>>>>
>>>> thanks
>>>> Joel
>>>>
>>>>
>>>
>>


Re: # in query

Posted by Erick Erickson <er...@gmail.com>.
Well, the very first thing I would is examine the field definition in
your schema file. I suspect that the tokenizers and/or
filters you're using for indexing and/or querying is doing something
to the # symbol. Most likely stripping it. If you're just searching
for the single-letter term "#", I *think* the query parser silently just
drops that part of the clause out, but check on that.....

The second thing would be to get a copy of Luke and examine your
index to see if what you *think* is in your index actually is there.

HTH
Erick

On Mon, Dec 7, 2009 at 3:28 PM, Joel Nylund <jn...@yahoo.com> wrote:

> ok thanks,  sorry my brain wasn't working, but even when I url encode it, I
> dont get any results, is there something special I have to do for solr?
>
> thanks
> Joel
>
>
> On Dec 7, 2009, at 3:20 PM, Paul Libbrecht wrote:
>
>  Sure you have to escape it! %23
>>
>> otherwise the browser considers it as a separator between the URL for the
>> server (on the left) and the fragment identifier (on the right) which is not
>> sent the server.
>>
>> You might want to read about "URL-encoding", escaping with backslash is a
>> shell-thing, not a thing for URLs!
>>
>> paul
>>
>>
>> Le 07-déc.-09 à 21:16, Joel Nylund a écrit :
>>
>>  Hi,
>>>
>>> How can I put a # sign in a query, do I need to escape it?
>>>
>>> For example I want to query books with title that contain #
>>>
>>> No work so far:
>>> http://localhost:8983/solr/select?q=textTitle:"#"
>>> http://localhost:8983/solr/select?q=textTitle:#
>>> http://localhost:8983/solr/select?q=textTitle:"\#"
>>>
>>> Getting
>>> org.apache.lucene.queryParser.ParseException: Cannot parse 'textTitle:\':
>>> Lexical error at line 1, column 12.  Encountered: <EOF> after : ""
>>>
>>> and sometimes just no response.
>>>
>>>
>>> thanks
>>> Joel
>>>
>>>
>>
>

Re: # in query

Posted by Joel Nylund <jn...@yahoo.com>.
ok thanks,  sorry my brain wasn't working, but even when I url encode  
it, I dont get any results, is there something special I have to do  
for solr?

thanks
Joel

On Dec 7, 2009, at 3:20 PM, Paul Libbrecht wrote:

> Sure you have to escape it! %23
>
> otherwise the browser considers it as a separator between the URL  
> for the server (on the left) and the fragment identifier (on the  
> right) which is not sent the server.
>
> You might want to read about "URL-encoding", escaping with backslash  
> is a shell-thing, not a thing for URLs!
>
> paul
>
>
> Le 07-déc.-09 à 21:16, Joel Nylund a écrit :
>
>> Hi,
>>
>> How can I put a # sign in a query, do I need to escape it?
>>
>> For example I want to query books with title that contain #
>>
>> No work so far:
>> http://localhost:8983/solr/select?q=textTitle:"#"
>> http://localhost:8983/solr/select?q=textTitle:#
>> http://localhost:8983/solr/select?q=textTitle:"\#"
>>
>> Getting
>> org.apache.lucene.queryParser.ParseException: Cannot parse  
>> 'textTitle:\': Lexical error at line 1, column 12.  Encountered:  
>> <EOF> after : ""
>>
>> and sometimes just no response.
>>
>>
>> thanks
>> Joel
>>
>


Re: # in query

Posted by Paul Libbrecht <pa...@activemath.org>.
Sure you have to escape it! %23

otherwise the browser considers it as a separator between the URL for  
the server (on the left) and the fragment identifier (on the right)  
which is not sent the server.

You might want to read about "URL-encoding", escaping with backslash  
is a shell-thing, not a thing for URLs!

paul


Le 07-déc.-09 à 21:16, Joel Nylund a écrit :

> Hi,
>
> How can I put a # sign in a query, do I need to escape it?
>
> For example I want to query books with title that contain #
>
> No work so far:
> http://localhost:8983/solr/select?q=textTitle:"#"
> http://localhost:8983/solr/select?q=textTitle:#
> http://localhost:8983/solr/select?q=textTitle:"\#"
>
> Getting
> org.apache.lucene.queryParser.ParseException: Cannot parse  
> 'textTitle:\': Lexical error at line 1, column 12.  Encountered:  
> <EOF> after : ""
>
> and sometimes just no response.
>
>
> thanks
> Joel
>