You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Smitha Rajiv <sm...@gmail.com> on 2016/01/15 10:58:52 UTC

Issue in custom filter

Hi

I have a requirement such that while indexing if tokens contains numbers,
it needs to be converted into corresponding words.

e.g : term1 part 2 assignments -> termone part two assignments.

I have created a custom filter with following code:

@Override
public boolean incrementToken() throws IOException {
if (!input.incrementToken())
return false;
char[] buffer = charTermAttr.buffer();
String newTerm = new String(buffer);
convertedTerm = Converter.convert(newTerm);
charTermAttr.setEmpty();
charTermAttr.copyBuffer(convertedTerm.toCharArray(), 0,

 convertedTerm.length());
return true;

}
But its given weird results when i analyze.

After applying the custom filter i am getting the result as
termone partone twoartone assignments.

It looks like the buffer length which i am setting for the first token is
not getting reset while picking up the next token.I have a feeling that
somewhere i am messing up with the offsets.

Could you please help me in this.

Thanks & Regards,
Smitha

Re: Issue in custom filter

Posted by Smitha Rajiv <sm...@gmail.com>.
Thanks Ahmet.It worked.

As per your suggestion i have changed the code as below.

                final String term=charTermAttr.toString();
final String convertedTerm = Converter.convert(term);
charTermAttr.setEmpty().append(convertedTerm);
return true;

now for the input stream "term1 part 2 assessment" gives the result
"termone part two assessment".

Thanks for your support.

Regards,
Smitha


On Fri, Jan 15, 2016 at 3:40 PM, Ahmet Arslan <io...@yahoo.com.invalid>
wrote:

> Hi Simitha,
>
> Please try below :
>
>   final String term = charTermAttr.toString();
>   final String s = convertedTerm = Converter.convert(term);
>
>
>     // If not changed, don't waste the time adjusting the token.    if ((s
> != null) && !s.equals(term))
>         charTermAttr.setEmpty().append(s);
>
>
>
> Ahmet
>
> On Friday, January 15, 2016 11:59 AM, Smitha Rajiv <
> smitharajiv23@gmail.com> wrote:
>
>
>
> Hi
>
> I have a requirement such that while indexing if tokens contains numbers,
> it needs to be converted into corresponding words.
>
> e.g : term1 part 2 assignments -> termone part two assignments.
>
> I have created a custom filter with following code:
>
> @Override
> public boolean incrementToken() throws IOException {
> if (!input.incrementToken())
> return false;
> char[] buffer = charTermAttr.buffer();
> String newTerm = new String(buffer);
> convertedTerm = Converter.convert(newTerm);
> charTermAttr.setEmpty();
> charTermAttr.copyBuffer(convertedTerm.toCharArray(), 0,
>
> convertedTerm.length());
> return true;
>
> }
> But its given weird results when i analyze.
>
> After applying the custom filter i am getting the result as
> termone partone twoartone assignments.
>
> It looks like the buffer length which i am setting for the first token is
> not getting reset while picking up the next token.I have a feeling that
> somewhere i am messing up with the offsets.
>
> Could you please help me in this.
>
> Thanks & Regards,
> Smitha
>

Re: Issue in custom filter

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Simitha,

Please try below :

  final String term = charTermAttr.toString();
  final String s = convertedTerm = Converter.convert(term);

    
    // If not changed, don't waste the time adjusting the token.    if ((s != null) && !s.equals(term))
        charTermAttr.setEmpty().append(s);



Ahmet

On Friday, January 15, 2016 11:59 AM, Smitha Rajiv <sm...@gmail.com> wrote:



Hi

I have a requirement such that while indexing if tokens contains numbers,
it needs to be converted into corresponding words.

e.g : term1 part 2 assignments -> termone part two assignments.

I have created a custom filter with following code:

@Override
public boolean incrementToken() throws IOException {
if (!input.incrementToken())
return false;
char[] buffer = charTermAttr.buffer();
String newTerm = new String(buffer);
convertedTerm = Converter.convert(newTerm);
charTermAttr.setEmpty();
charTermAttr.copyBuffer(convertedTerm.toCharArray(), 0,

convertedTerm.length());
return true;

}
But its given weird results when i analyze.

After applying the custom filter i am getting the result as
termone partone twoartone assignments.

It looks like the buffer length which i am setting for the first token is
not getting reset while picking up the next token.I have a feeling that
somewhere i am messing up with the offsets.

Could you please help me in this.

Thanks & Regards,
Smitha