You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Smitha Rajiv <sm...@gmail.com> on 2016/01/15 10:58:52 UTC
Issue in custom filter
Hi
I have a requirement such that while indexing if tokens contains numbers,
it needs to be converted into corresponding words.
e.g : term1 part 2 assignments -> termone part two assignments.
I have created a custom filter with following code:
@Override
public boolean incrementToken() throws IOException {
if (!input.incrementToken())
return false;
char[] buffer = charTermAttr.buffer();
String newTerm = new String(buffer);
convertedTerm = Converter.convert(newTerm);
charTermAttr.setEmpty();
charTermAttr.copyBuffer(convertedTerm.toCharArray(), 0,
convertedTerm.length());
return true;
}
But its given weird results when i analyze.
After applying the custom filter i am getting the result as
termone partone twoartone assignments.
It looks like the buffer length which i am setting for the first token is
not getting reset while picking up the next token.I have a feeling that
somewhere i am messing up with the offsets.
Could you please help me in this.
Thanks & Regards,
Smitha
Re: Issue in custom filter
Posted by Smitha Rajiv <sm...@gmail.com>.
Thanks Ahmet.It worked.
As per your suggestion i have changed the code as below.
final String term=charTermAttr.toString();
final String convertedTerm = Converter.convert(term);
charTermAttr.setEmpty().append(convertedTerm);
return true;
now for the input stream "term1 part 2 assessment" gives the result
"termone part two assessment".
Thanks for your support.
Regards,
Smitha
On Fri, Jan 15, 2016 at 3:40 PM, Ahmet Arslan <io...@yahoo.com.invalid>
wrote:
> Hi Simitha,
>
> Please try below :
>
> final String term = charTermAttr.toString();
> final String s = convertedTerm = Converter.convert(term);
>
>
> // If not changed, don't waste the time adjusting the token. if ((s
> != null) && !s.equals(term))
> charTermAttr.setEmpty().append(s);
>
>
>
> Ahmet
>
> On Friday, January 15, 2016 11:59 AM, Smitha Rajiv <
> smitharajiv23@gmail.com> wrote:
>
>
>
> Hi
>
> I have a requirement such that while indexing if tokens contains numbers,
> it needs to be converted into corresponding words.
>
> e.g : term1 part 2 assignments -> termone part two assignments.
>
> I have created a custom filter with following code:
>
> @Override
> public boolean incrementToken() throws IOException {
> if (!input.incrementToken())
> return false;
> char[] buffer = charTermAttr.buffer();
> String newTerm = new String(buffer);
> convertedTerm = Converter.convert(newTerm);
> charTermAttr.setEmpty();
> charTermAttr.copyBuffer(convertedTerm.toCharArray(), 0,
>
> convertedTerm.length());
> return true;
>
> }
> But its given weird results when i analyze.
>
> After applying the custom filter i am getting the result as
> termone partone twoartone assignments.
>
> It looks like the buffer length which i am setting for the first token is
> not getting reset while picking up the next token.I have a feeling that
> somewhere i am messing up with the offsets.
>
> Could you please help me in this.
>
> Thanks & Regards,
> Smitha
>
Re: Issue in custom filter
Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Simitha,
Please try below :
final String term = charTermAttr.toString();
final String s = convertedTerm = Converter.convert(term);
// If not changed, don't waste the time adjusting the token. if ((s != null) && !s.equals(term))
charTermAttr.setEmpty().append(s);
Ahmet
On Friday, January 15, 2016 11:59 AM, Smitha Rajiv <sm...@gmail.com> wrote:
Hi
I have a requirement such that while indexing if tokens contains numbers,
it needs to be converted into corresponding words.
e.g : term1 part 2 assignments -> termone part two assignments.
I have created a custom filter with following code:
@Override
public boolean incrementToken() throws IOException {
if (!input.incrementToken())
return false;
char[] buffer = charTermAttr.buffer();
String newTerm = new String(buffer);
convertedTerm = Converter.convert(newTerm);
charTermAttr.setEmpty();
charTermAttr.copyBuffer(convertedTerm.toCharArray(), 0,
convertedTerm.length());
return true;
}
But its given weird results when i analyze.
After applying the custom filter i am getting the result as
termone partone twoartone assignments.
It looks like the buffer length which i am setting for the first token is
not getting reset while picking up the next token.I have a feeling that
somewhere i am messing up with the offsets.
Could you please help me in this.
Thanks & Regards,
Smitha