You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by janwen <to...@163.com> on 2011/10/24 11:12:32 UTC
custome index rule
Hi,
I want to implement a custom index rule:
Assume the sentence like the following:Note comma
I am in China,I am in USA,I am in UK
I hope lucene index above sentece based on the rule:
1)split the sentence with comma(,),so we get(I am in China)(I am in USA)(I am in UK)
2)then lucene just store the short senteces from step 1,NOT_ANALYZED
P.S How many characters lucene do not support,and What they are?
I input a^b and get exception:
org.apache.lucene.queryParser.ParseException: Cannot parse 'a^b: Lexical error at line 1, column 4. Encountered: "\u671d" (26397), after : ""
thanks
2011-10-24
janwen | China
website : http://www.qianpin.com/
Re: Re: custome index rule
Posted by janwen <to...@163.com>.
thanks,Ian.I will try your idea.
2011-10-24
janwen | China
website : http://www.qianpin.com/
From:Ian Lea
Date:2011-10-24 18:01
Subject:Re: custome index rule
To:java-user
Cc:
You can achieve pretty much anything by customizing parsers and
tokenizers but for your simple case I'd just use String.split() and
add the phrases one by one. Something like
Document d = ...
String[] phrases = sentence,split(",");
for (String phrase : phrases) {
d.add(new Field("phrase", phrase, ...);
}
I think that would achieve what you want.
On special characters. see
http://lucene.apache.org/java/3_4_0/queryparsersyntax.html#Escaping
Special Characters and QueryParser.escape(String s).
--
Ian.
On Mon, Oct 24, 2011 at 10:12 AM, janwen <to...@163.com> wrote:
> Hi,
> I want to implement a custom index rule:
> Assume the sentence like the following:Note comma
> I am in China,I am in USA,I am in UK
>
> I hope lucene index above sentece based on the rule:
> 1)split the sentence with comma(,),so we get(I am in China)(I am in USA)(I am in UK)
> 2)then lucene just store the short senteces from step 1,NOT_ANALYZED
>
> P.S How many characters lucene do not support,and What they are?
> I input a^b and get exception:
> org.apache.lucene.queryParser.ParseException: Cannot parse 'a^b: Lexical error at line 1, column 4. Encountered: "\u671d" (26397), after : ""
>
> thanks
>
> 2011-10-24
>
>
>
> janwen | China
> website : http://www.qianpin.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: custome index rule
Posted by Ian Lea <ia...@gmail.com>.
You can achieve pretty much anything by customizing parsers and
tokenizers but for your simple case I'd just use String.split() and
add the phrases one by one. Something like
Document d = ...
String[] phrases = sentence,split(",");
for (String phrase : phrases) {
d.add(new Field("phrase", phrase, ...);
}
I think that would achieve what you want.
On special characters. see
http://lucene.apache.org/java/3_4_0/queryparsersyntax.html#Escaping
Special Characters and QueryParser.escape(String s).
--
Ian.
On Mon, Oct 24, 2011 at 10:12 AM, janwen <to...@163.com> wrote:
> Hi,
> I want to implement a custom index rule:
> Assume the sentence like the following:Note comma
> I am in China,I am in USA,I am in UK
>
> I hope lucene index above sentece based on the rule:
> 1)split the sentence with comma(,),so we get(I am in China)(I am in USA)(I am in UK)
> 2)then lucene just store the short senteces from step 1,NOT_ANALYZED
>
> P.S How many characters lucene do not support,and What they are?
> I input a^b and get exception:
> org.apache.lucene.queryParser.ParseException: Cannot parse 'a^b: Lexical error at line 1, column 4. Encountered: "\u671d" (26397), after : ""
>
> thanks
>
> 2011-10-24
>
>
>
> janwen | China
> website : http://www.qianpin.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org