You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Robinson Raju <ro...@gmail.com> on 2005/01/27 07:42:16 UTC

Searching with words that contain % , / and the like

Hi ,
 
 Is there a way to search for words that contain "/" or "%" .
 if my query is "test/s" , it is just taken as "test"
 if my query is "test/p" , it is just taken as "test p"
 has anyone done this / faced such an issue ?
 
Regards
Robin

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Searching with words that contain % , / and the like

Posted by Robinson Raju <ro...@gmail.com>.
Hi ,
  Yes. Analyzer was the culprit behind eating away some of the letters
in the search string . StandardAnalyser has 'a' and 's' as stop words
(amongst others).
Since i want to search on these (specifically , i want to search on
words like "a/s" , "e/p" , "15%" , " 15' " ..etc). so i commented the
following lines in StandardAnalyser .. (filtering of Standard tokens
and stop words)

  public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new StandardTokenizer(reader);
//    result = new StandardFilter(result);
    result = new LowerCaseFilter(result);
//    result = new StopFilter(result, stopSet);
    return result; 

now stop words are not getting filtered but "/" still goes off.
so "a/s" is read as "a s"

Regards
Robin

On Thu, 27 Jan 2005 02:50:13 -0600, Chris Lamprecht
<cl...@gmail.com> wrote:
> Without looking at the source, my guess is that StandardAnalyzer (and
> StandardTokenizer) is the culprit.  The StandardAnalyzer grammar (in
> StandardTokenizer.jj) is probably defined so "x/y" parses into two
> tokens, "x" and "y".  "s" is a default stopword (see
> StopAnalyzer.ENGLISH_STOP_WORDS), so it gets filtered out, while "p"
> does not.
> 
> To get what you want, you can use a WhitespaceAnalyzer, write your own
> custom Analyzer or Tokenizer, or modify the StandardTokenizer.jj
> grammar to suit your needs.  WhitespaceAnalyzer is much simpler than
> StandardAnalyzer, so you may see some other things being tokenized
> differently.
> 
> -Chris
> 
> On Thu, 27 Jan 2005 12:12:16 +0530, Robinson Raju
> <ro...@gmail.com> wrote:
> > Hi ,
> >
> > Is there a way to search for words that contain "/" or "%" .
> > if my query is "test/s" , it is just taken as "test"
> > if my query is "test/p" , it is just taken as "test p"
> > has anyone done this / faced such an issue ?
> >
> > Regards
> > Robin
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


-- 
Regards,
Robin
9886394650
"The merit of an action lies in finishing it to the end"

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Searching with words that contain % , / and the like

Posted by Chris Lamprecht <cl...@gmail.com>.
Without looking at the source, my guess is that StandardAnalyzer (and
StandardTokenizer) is the culprit.  The StandardAnalyzer grammar (in
StandardTokenizer.jj) is probably defined so "x/y" parses into two
tokens, "x" and "y".  "s" is a default stopword (see
StopAnalyzer.ENGLISH_STOP_WORDS), so it gets filtered out, while "p"
does not.

To get what you want, you can use a WhitespaceAnalyzer, write your own
custom Analyzer or Tokenizer, or modify the StandardTokenizer.jj
grammar to suit your needs.  WhitespaceAnalyzer is much simpler than
StandardAnalyzer, so you may see some other things being tokenized
differently.

-Chris

On Thu, 27 Jan 2005 12:12:16 +0530, Robinson Raju
<ro...@gmail.com> wrote:
> Hi ,
> 
> Is there a way to search for words that contain "/" or "%" .
> if my query is "test/s" , it is just taken as "test"
> if my query is "test/p" , it is just taken as "test p"
> has anyone done this / faced such an issue ?
> 
> Regards
> Robin
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Searching with words that contain % , / and the like

Posted by Robinson Raju <ro...@gmail.com>.
Hi Jason , 
yes , the doc'n does mention escaping . but thats only for special
characters used in queries , right ?
but i've tried 'escaping' too.
to answer ur question , am sure it is not HTTP request which is eating it up. 

        Query query = MultiFieldQueryParser.parse("test/s",
                 "value", analyzer);

  query has "value:test"

  am using StandardAnalyzer


On Thu, 27 Jan 2005 17:53:39 +1100, Jason Polites
<ja...@tpg.com.au> wrote:
> Lucene doco mentions escaping, but doesn't include the "/" char...
> 
> ------
> Lucene supports escaping special characters that are part of the query
> syntax. The current list special characters are
> 
> + - && || ! ( ) { } [ ] ^ " ~ * ? : \
> 
> To escape these character use the \ before the character. For example to
> search for (1+1):2 use the query:
> 
> \(1\+1\)\:2
> ------
> 
> You could try escaping it anyway?
> 
> Are you sure it's not an HTTP request which is screwing with the parameter?
> 
> 
> ----- Original Message -----
> From: "Robinson Raju" <ro...@gmail.com>
> To: "Lucene Users List" <lu...@jakarta.apache.org>
> Sent: Thursday, January 27, 2005 5:42 PM
> Subject: Searching with words that contain % , / and the like
> 
> > Hi ,
> >
> > Is there a way to search for words that contain "/" or "%" .
> > if my query is "test/s" , it is just taken as "test"
> > if my query is "test/p" , it is just taken as "test p"
> > has anyone done this / faced such an issue ?
> >
> > Regards
> > Robin
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> 
> 


-- 
Regards,
Robin
9886394650
"The merit of an action lies in finishing it to the end"

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Searching with words that contain % , / and the like

Posted by Jason Polites <ja...@tpg.com.au>.
Lucene doco mentions escaping, but doesn't include the "/" char...


------
Lucene supports escaping special characters that are part of the query 
syntax. The current list special characters are

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

To escape these character use the \ before the character. For example to 
search for (1+1):2 use the query:


 \(1\+1\)\:2
------

You could try escaping it anyway?

Are you sure it's not an HTTP request which is screwing with the parameter?


----- Original Message ----- 
From: "Robinson Raju" <ro...@gmail.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, January 27, 2005 5:42 PM
Subject: Searching with words that contain % , / and the like


> Hi ,
>
> Is there a way to search for words that contain "/" or "%" .
> if my query is "test/s" , it is just taken as "test"
> if my query is "test/p" , it is just taken as "test p"
> has anyone done this / faced such an issue ?
>
> Regards
> Robin
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org