You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by abhishek jain <ab...@gmail.com> on 2014/03/09 17:06:33 UTC

Which Tokenizer to use at searching

Hi Friends,

I am concerned on Tokenizer, my scenario is:

During indexing i want to token on all punctuations, so i can use
StandardTokenizer, but at search time i want to consider punctuations as
part of text,

I dont store contents but only indexes.

What should i use.

Any advices ?


-- 
Thanks and kind Regards,
Abhishek jain

Re: Which Tokenizer to use at searching

Posted by ab...@gmail.com.

Hi,
I meant that while searching A AND B should return result individually and when together with a AND. 

I want "A B" should not give result. Though A,B is indexed with StandardTokenizer. 

Thanks 
Abhishek
  Original Message  
From: Furkan KAMACI
Sent: Monday, 10 March 2014 06:11
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Cc: Erick Erickson
Subject: Re: Which Tokenizer to use at searching

Hi;

What do you mean at here:

"While indexing A,B
A and B should give result "

Thanks;
Furkan KAMACI


2014-03-09 22:36 GMT+02:00 <ab...@gmail.com>:

> Hi
> Oops my bad. I actually meant
> While indexing A,B
> A and B should give result but
> "A B" should not give result.
>
> Also I will look at analyser.
>
> Thanks
> Abhishek
>
> Original Message
> From: Erick Erickson
> Sent: Monday, 10 March 2014 01:38
> To: abhishek jain
> Subject: Re: Which Tokenizer to use at searching
>
> Then I don't see the problem. StandardTokenizer
> (see the "text_general" fieldType) should do all this
> for you automatically.
>
> Did you look at the analysis page? I really recommend it.
>
> Best,
> Erick
>
> On Sun, Mar 9, 2014 at 3:04 PM, abhishek jain
> <ab...@gmail.com> wrote:
> > Hi Erick,
> > Thanks for replying,
> >
> > I want to index A,B (with or without space with comma) as separate words
> and
> > also want to return results when A and B searched individually and also
> > "A,B" .
> >
> > Please let me know your views.
> > Let me know if i still havent explained correctly. I will try again.
> >
> > Thanks
> > abhishek
> >
> >
> > On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >>
> >> You've contradicted yourself, so it's hard to say. Or
> >> I'm mis-reading your messages.
> >>
> >> bq: During indexing i want to token on all punctuations, so i can use
> >> StandardTokenizer, but at search time i want to consider punctuations as
> >> part of text,
> >>
> >> and in your second message:
> >>
> >> bq: when i search for "A,B" it should return result. [for input "A,B"]
> >>
> >> If, indeed, you "... at search time i want to consider punctuations as
> >> part of text" then "A,B" should NOT match the document.
> >>
> >> The admin/analysis page is your friend, I strongly suggest you spend
> >> some time looking at the various transformations performed by
> >> the various analyzers and tokenizers.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain
> >> <ab...@gmail.com> wrote:
> >> > hi,
> >> >
> >> > Thanks for replying promptly,
> >> > an example:
> >> >
> >> > I want to index for A,B
> >> > but when i search A AND B, it should return result,
> >> > when i search for "A,B" it should return result.
> >> >
> >> > Also Ideally when i search for "A , B" (with space) it should return
> >> > result.
> >> >
> >> >
> >> > please advice
> >> > thanks
> >> > abhishek
> >> >
> >> >
> >> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI
> >> > <fu...@gmail.com>wrote:
> >> >
> >> >> Hi;
> >> >>
> >> >> Firstly you have to keep in mind that if you don't index punctuation
> >> >> they
> >> >> will not be visible for search. On the other hand you can have
> >> >> different
> >> >> analyzer for index and search. You have to give more detail about
> your
> >> >> situation. What will be your tokenizer at search time,
> >> >> WhiteSpaceTokenizer?
> >> >> You can have a look at here:
> >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> >> >>
> >> >> If you can give some examples what you want for indexing and
> searching
> >> >> I
> >> >> can help you to combine index and search analyzer/tokenizer/token
> >> >> filters.
> >> >>
> >> >> Thanks;
> >> >> Furkan KAMACI
> >> >>
> >> >>
> >> >> 2014-03-09 18:06 GMT+02:00 abhishek jain <abhishek.netjain@gmail.com
> >:
> >> >>
> >> >> > Hi Friends,
> >> >> >
> >> >> > I am concerned on Tokenizer, my scenario is:
> >> >> >
> >> >> > During indexing i want to token on all punctuations, so i can use
> >> >> > StandardTokenizer, but at search time i want to consider
> punctuations
> >> >> > as
> >> >> > part of text,
> >> >> >
> >> >> > I dont store contents but only indexes.
> >> >> >
> >> >> > What should i use.
> >> >> >
> >> >> > Any advices ?
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Thanks and kind Regards,
> >> >> > Abhishek jain
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks and kind Regards,
> >> > Abhishek jain
> >> > +91 9971376767
> >
> >
> >
> >
> > --
> > Thanks and kind Regards,
> > Abhishek jain
> > +91 9971376767
>

Re: Which Tokenizer to use at searching

Posted by Furkan KAMACI <fu...@gmail.com>.

Hi;

What do you mean at here:

"While indexing A,B
A and B should give result "

Thanks;
Furkan KAMACI


2014-03-09 22:36 GMT+02:00 <ab...@gmail.com>:

> Hi
> Oops my bad. I actually meant
> While indexing A,B
> A and B should give result but
> "A B" should not give result.
>
> Also I will look at analyser.
>
> Thanks
> Abhishek
>
>   Original Message
> From: Erick Erickson
> Sent: Monday, 10 March 2014 01:38
> To: abhishek jain
> Subject: Re: Which Tokenizer to use at searching
>
> Then I don't see the problem. StandardTokenizer
> (see the "text_general" fieldType) should do all this
> for you automatically.
>
> Did you look at the analysis page? I really recommend it.
>
> Best,
> Erick
>
> On Sun, Mar 9, 2014 at 3:04 PM, abhishek jain
> <ab...@gmail.com> wrote:
> > Hi Erick,
> > Thanks for replying,
> >
> > I want to index A,B (with or without space with comma) as separate words
> and
> > also want to return results when A and B searched individually and also
> > "A,B" .
> >
> > Please let me know your views.
> > Let me know if i still havent explained correctly. I will try again.
> >
> > Thanks
> > abhishek
> >
> >
> > On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >>
> >> You've contradicted yourself, so it's hard to say. Or
> >> I'm mis-reading your messages.
> >>
> >> bq: During indexing i want to token on all punctuations, so i can use
> >> StandardTokenizer, but at search time i want to consider punctuations as
> >> part of text,
> >>
> >> and in your second message:
> >>
> >> bq: when i search for "A,B" it should return result. [for input "A,B"]
> >>
> >> If, indeed, you "... at search time i want to consider punctuations as
> >> part of text" then "A,B" should NOT match the document.
> >>
> >> The admin/analysis page is your friend, I strongly suggest you spend
> >> some time looking at the various transformations performed by
> >> the various analyzers and tokenizers.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain
> >> <ab...@gmail.com> wrote:
> >> > hi,
> >> >
> >> > Thanks for replying promptly,
> >> > an example:
> >> >
> >> > I want to index for A,B
> >> > but when i search A AND B, it should return result,
> >> > when i search for "A,B" it should return result.
> >> >
> >> > Also Ideally when i search for "A , B" (with space) it should return
> >> > result.
> >> >
> >> >
> >> > please advice
> >> > thanks
> >> > abhishek
> >> >
> >> >
> >> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI
> >> > <fu...@gmail.com>wrote:
> >> >
> >> >> Hi;
> >> >>
> >> >> Firstly you have to keep in mind that if you don't index punctuation
> >> >> they
> >> >> will not be visible for search. On the other hand you can have
> >> >> different
> >> >> analyzer for index and search. You have to give more detail about
> your
> >> >> situation. What will be your tokenizer at search time,
> >> >> WhiteSpaceTokenizer?
> >> >> You can have a look at here:
> >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> >> >>
> >> >> If you can give some examples what you want for indexing and
> searching
> >> >> I
> >> >> can help you to combine index and search analyzer/tokenizer/token
> >> >> filters.
> >> >>
> >> >> Thanks;
> >> >> Furkan KAMACI
> >> >>
> >> >>
> >> >> 2014-03-09 18:06 GMT+02:00 abhishek jain <abhishek.netjain@gmail.com
> >:
> >> >>
> >> >> > Hi Friends,
> >> >> >
> >> >> > I am concerned on Tokenizer, my scenario is:
> >> >> >
> >> >> > During indexing i want to token on all punctuations, so i can use
> >> >> > StandardTokenizer, but at search time i want to consider
> punctuations
> >> >> > as
> >> >> > part of text,
> >> >> >
> >> >> > I dont store contents but only indexes.
> >> >> >
> >> >> > What should i use.
> >> >> >
> >> >> > Any advices ?
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Thanks and kind Regards,
> >> >> > Abhishek jain
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks and kind Regards,
> >> > Abhishek jain
> >> > +91 9971376767
> >
> >
> >
> >
> > --
> > Thanks and kind Regards,
> > Abhishek jain
> > +91 9971376767
>

Re: Which Tokenizer to use at searching

Posted by Shawn Heisey <so...@elyograg.org>.

On 3/10/2014 6:20 AM, abhishek jain wrote:
> <tokenizer class="solr.PatternTokenizerFactory" pattern="\s+" />
> <filter class="solr.PatternReplaceFilterFactory" pattern="([^-\w]+)"
> replacement=" punct " replace="all"/>

<snip>

> Is there a way i can tokenize after application of filter, please suggest i
> know i am missing something basic.

Use PatternReplaceCharFilterFactory instead.  CharFilters are performed
before tokenizers, regardless of where they are defined in the analysis
chain.

Thanks,
Shawn

Re: Which Tokenizer to use at searching

Posted by abhishek jain <ab...@gmail.com>.

Hi,
As a solution, i have tried a combination of PatternTokenizerFactory and
PatternReplaceFilterFactory .

In both query and indexer i have written:

<tokenizer class="solr.PatternTokenizerFactory" pattern="\s+" />
<filter class="solr.PatternReplaceFilterFactory" pattern="([^-\w]+)"
replacement=" punct " replace="all"/>

What i am trying to do is tokenizing on space and then rewriting every
special character as " punct " .

So, A,B becomes A punct B .

but the problem is A punct B is still one word and not tokenized further
application of filter,

Is there a way i can tokenize after application of filter, please suggest i
know i am missing something basic.

thanks
abhishek


On Mon, Mar 10, 2014 at 2:06 AM, <ab...@gmail.com> wrote:

> Hi
> Oops my bad. I actually meant
> While indexing A,B
> A and B should give result but
> "A B" should not give result.
>
> Also I will look at analyser.
>
> Thanks
> Abhishek
>
>   Original Message
> From: Erick Erickson
> Sent: Monday, 10 March 2014 01:38
> To: abhishek jain
> Subject: Re: Which Tokenizer to use at searching
>
> Then I don't see the problem. StandardTokenizer
> (see the "text_general" fieldType) should do all this
> for you automatically.
>
> Did you look at the analysis page? I really recommend it.
>
> Best,
> Erick
>
> On Sun, Mar 9, 2014 at 3:04 PM, abhishek jain
> <ab...@gmail.com> wrote:
> > Hi Erick,
> > Thanks for replying,
> >
> > I want to index A,B (with or without space with comma) as separate words
> and
> > also want to return results when A and B searched individually and also
> > "A,B" .
> >
> > Please let me know your views.
> > Let me know if i still havent explained correctly. I will try again.
> >
> > Thanks
> > abhishek
> >
> >
> > On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >>
> >> You've contradicted yourself, so it's hard to say. Or
> >> I'm mis-reading your messages.
> >>
> >> bq: During indexing i want to token on all punctuations, so i can use
> >> StandardTokenizer, but at search time i want to consider punctuations as
> >> part of text,
> >>
> >> and in your second message:
> >>
> >> bq: when i search for "A,B" it should return result. [for input "A,B"]
> >>
> >> If, indeed, you "... at search time i want to consider punctuations as
> >> part of text" then "A,B" should NOT match the document.
> >>
> >> The admin/analysis page is your friend, I strongly suggest you spend
> >> some time looking at the various transformations performed by
> >> the various analyzers and tokenizers.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain
> >> <ab...@gmail.com> wrote:
> >> > hi,
> >> >
> >> > Thanks for replying promptly,
> >> > an example:
> >> >
> >> > I want to index for A,B
> >> > but when i search A AND B, it should return result,
> >> > when i search for "A,B" it should return result.
> >> >
> >> > Also Ideally when i search for "A , B" (with space) it should return
> >> > result.
> >> >
> >> >
> >> > please advice
> >> > thanks
> >> > abhishek
> >> >
> >> >
> >> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI
> >> > <fu...@gmail.com>wrote:
> >> >
> >> >> Hi;
> >> >>
> >> >> Firstly you have to keep in mind that if you don't index punctuation
> >> >> they
> >> >> will not be visible for search. On the other hand you can have
> >> >> different
> >> >> analyzer for index and search. You have to give more detail about
> your
> >> >> situation. What will be your tokenizer at search time,
> >> >> WhiteSpaceTokenizer?
> >> >> You can have a look at here:
> >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> >> >>
> >> >> If you can give some examples what you want for indexing and
> searching
> >> >> I
> >> >> can help you to combine index and search analyzer/tokenizer/token
> >> >> filters.
> >> >>
> >> >> Thanks;
> >> >> Furkan KAMACI
> >> >>
> >> >>
> >> >> 2014-03-09 18:06 GMT+02:00 abhishek jain <abhishek.netjain@gmail.com
> >:
> >> >>
> >> >> > Hi Friends,
> >> >> >
> >> >> > I am concerned on Tokenizer, my scenario is:
> >> >> >
> >> >> > During indexing i want to token on all punctuations, so i can use
> >> >> > StandardTokenizer, but at search time i want to consider
> punctuations
> >> >> > as
> >> >> > part of text,
> >> >> >
> >> >> > I dont store contents but only indexes.
> >> >> >
> >> >> > What should i use.
> >> >> >
> >> >> > Any advices ?
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Thanks and kind Regards,
> >> >> > Abhishek jain
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks and kind Regards,
> >> > Abhishek jain
> >> > +91 9971376767
> >
> >
> >
> >
> > --
> > Thanks and kind Regards,
> > Abhishek jain
> > +91 9971376767
>



-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767

Re: Which Tokenizer to use at searching

Posted by ab...@gmail.com.

‎Hi
Oops my bad. I actually meant
While indexing A,B 
A and B should ‎give result but 
"A B" should not give result.

Also I will look at analyser.

Thanks 
Abhishek

  Original Message  
From: Erick Erickson
Sent: Monday, 10 March 2014 01:38
To: abhishek jain
Subject: Re: Which Tokenizer to use at searching

Then I don't see the problem. StandardTokenizer
(see the "text_general" fieldType) should do all this
for you automatically.

Did you look at the analysis page? I really recommend it.

Best,
Erick

On Sun, Mar 9, 2014 at 3:04 PM, abhishek jain
<ab...@gmail.com> wrote:
> Hi Erick,
> Thanks for replying,
>
> I want to index A,B (with or without space with comma) as separate words and
> also want to return results when A and B searched individually and also
> "A,B" .
>
> Please let me know your views.
> Let me know if i still havent explained correctly. I will try again.
>
> Thanks
> abhishek
>
>
> On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson <er...@gmail.com>
> wrote:
>>
>> You've contradicted yourself, so it's hard to say. Or
>> I'm mis-reading your messages.
>>
>> bq: During indexing i want to token on all punctuations, so i can use
>> StandardTokenizer, but at search time i want to consider punctuations as
>> part of text,
>>
>> and in your second message:
>>
>> bq: when i search for "A,B" it should return result. [for input "A,B"]
>>
>> If, indeed, you "... at search time i want to consider punctuations as
>> part of text" then "A,B" should NOT match the document.
>>
>> The admin/analysis page is your friend, I strongly suggest you spend
>> some time looking at the various transformations performed by
>> the various analyzers and tokenizers.
>>
>> Best,
>> Erick
>>
>> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain
>> <ab...@gmail.com> wrote:
>> > hi,
>> >
>> > Thanks for replying promptly,
>> > an example:
>> >
>> > I want to index for A,B
>> > but when i search A AND B, it should return result,
>> > when i search for "A,B" it should return result.
>> >
>> > Also Ideally when i search for "A , B" (with space) it should return
>> > result.
>> >
>> >
>> > please advice
>> > thanks
>> > abhishek
>> >
>> >
>> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI
>> > <fu...@gmail.com>wrote:
>> >
>> >> Hi;
>> >>
>> >> Firstly you have to keep in mind that if you don't index punctuation
>> >> they
>> >> will not be visible for search. On the other hand you can have
>> >> different
>> >> analyzer for index and search. You have to give more detail about your
>> >> situation. What will be your tokenizer at search time,
>> >> WhiteSpaceTokenizer?
>> >> You can have a look at here:
>> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>> >>
>> >> If you can give some examples what you want for indexing and searching
>> >> I
>> >> can help you to combine index and search analyzer/tokenizer/token
>> >> filters.
>> >>
>> >> Thanks;
>> >> Furkan KAMACI
>> >>
>> >>
>> >> 2014-03-09 18:06 GMT+02:00 abhishek jain <ab...@gmail.com>:
>> >>
>> >> > Hi Friends,
>> >> >
>> >> > I am concerned on Tokenizer, my scenario is:
>> >> >
>> >> > During indexing i want to token on all punctuations, so i can use
>> >> > StandardTokenizer, but at search time i want to consider punctuations
>> >> > as
>> >> > part of text,
>> >> >
>> >> > I dont store contents but only indexes.
>> >> >
>> >> > What should i use.
>> >> >
>> >> > Any advices ?
>> >> >
>> >> >
>> >> > --
>> >> > Thanks and kind Regards,
>> >> > Abhishek jain
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Thanks and kind Regards,
>> > Abhishek jain
>> > +91 9971376767
>
>
>
>
> --
> Thanks and kind Regards,
> Abhishek jain
> +91 9971376767

Re: Which Tokenizer to use at searching

Posted by abhishek jain <ab...@gmail.com>.

Hi Erick,
Thanks for replying,

I want to index A,B (with or without space with comma) as separate words
and also want to return results when A and B searched individually and also
"A,B" .

Please let me know your views.
Let me know if i still havent explained correctly. I will try again.

Thanks
abhishek


On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson <er...@gmail.com>wrote:

> You've contradicted yourself, so it's hard to say. Or
> I'm  mis-reading your messages.
>
> bq: During indexing i want to token on all punctuations, so i can use
> StandardTokenizer, but at search time i want to consider punctuations as
> part of text,
>
> and in your second message:
>
> bq: when i search for "A,B" it should return result. [for input "A,B"]
>
> If, indeed, you "... at search time i want to consider punctuations as
> part of text" then "A,B" should NOT match the document.
>
> The admin/analysis page is your friend, I strongly suggest you spend
> some time looking at the various transformations performed by
> the various analyzers and tokenizers.
>
> Best,
> Erick
>
> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain
> <ab...@gmail.com> wrote:
> > hi,
> >
> > Thanks for replying promptly,
> > an example:
> >
> > I want to index for     A,B
> > but when i search A AND B, it should return result,
> > when i search for "A,B" it should return result.
> >
> > Also Ideally when i search for "A , B" (with space) it should return
> result.
> >
> >
> > please advice
> > thanks
> > abhishek
> >
> >
> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI <furkankamaci@gmail.com
> >wrote:
> >
> >> Hi;
> >>
> >> Firstly you have to keep in mind that if you don't index punctuation
> they
> >> will not be visible for search. On the other hand you can have different
> >> analyzer for index and search. You have to give more detail about your
> >> situation. What will be your tokenizer at search time,
> WhiteSpaceTokenizer?
> >> You can have a look at here:
> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> >>
> >> If you can give some examples what you want for indexing and searching I
> >> can help you to combine index and search analyzer/tokenizer/token
> filters.
> >>
> >> Thanks;
> >> Furkan KAMACI
> >>
> >>
> >> 2014-03-09 18:06 GMT+02:00 abhishek jain <ab...@gmail.com>:
> >>
> >> > Hi Friends,
> >> >
> >> > I am concerned on Tokenizer, my scenario is:
> >> >
> >> > During indexing i want to token on all punctuations, so i can use
> >> > StandardTokenizer, but at search time i want to consider punctuations
> as
> >> > part of text,
> >> >
> >> > I dont store contents but only indexes.
> >> >
> >> > What should i use.
> >> >
> >> > Any advices ?
> >> >
> >> >
> >> > --
> >> > Thanks and kind Regards,
> >> > Abhishek jain
> >> >
> >>
> >
> >
> >
> > --
> > Thanks and kind Regards,
> > Abhishek jain
> > +91 9971376767
>



-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767

Re: Which Tokenizer to use at searching

Posted by Erick Erickson <er...@gmail.com>.

You've contradicted yourself, so it's hard to say. Or
I'm  mis-reading your messages.

bq: During indexing i want to token on all punctuations, so i can use
StandardTokenizer, but at search time i want to consider punctuations as
part of text,

and in your second message:

bq: when i search for "A,B" it should return result. [for input "A,B"]

If, indeed, you "... at search time i want to consider punctuations as
part of text" then "A,B" should NOT match the document.

The admin/analysis page is your friend, I strongly suggest you spend
some time looking at the various transformations performed by
the various analyzers and tokenizers.

Best,
Erick

On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain
<ab...@gmail.com> wrote:
> hi,
>
> Thanks for replying promptly,
> an example:
>
> I want to index for     A,B
> but when i search A AND B, it should return result,
> when i search for "A,B" it should return result.
>
> Also Ideally when i search for "A , B" (with space) it should return result.
>
>
> please advice
> thanks
> abhishek
>
>
> On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI <fu...@gmail.com>wrote:
>
>> Hi;
>>
>> Firstly you have to keep in mind that if you don't index punctuation they
>> will not be visible for search. On the other hand you can have different
>> analyzer for index and search. You have to give more detail about your
>> situation. What will be your tokenizer at search time, WhiteSpaceTokenizer?
>> You can have a look at here:
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>>
>> If you can give some examples what you want for indexing and searching I
>> can help you to combine index and search analyzer/tokenizer/token filters.
>>
>> Thanks;
>> Furkan KAMACI
>>
>>
>> 2014-03-09 18:06 GMT+02:00 abhishek jain <ab...@gmail.com>:
>>
>> > Hi Friends,
>> >
>> > I am concerned on Tokenizer, my scenario is:
>> >
>> > During indexing i want to token on all punctuations, so i can use
>> > StandardTokenizer, but at search time i want to consider punctuations as
>> > part of text,
>> >
>> > I dont store contents but only indexes.
>> >
>> > What should i use.
>> >
>> > Any advices ?
>> >
>> >
>> > --
>> > Thanks and kind Regards,
>> > Abhishek jain
>> >
>>
>
>
>
> --
> Thanks and kind Regards,
> Abhishek jain
> +91 9971376767

Re: Which Tokenizer to use at searching

Posted by abhishek jain <ab...@gmail.com>.

hi,

Thanks for replying promptly,
an example:

I want to index for     A,B
but when i search A AND B, it should return result,
when i search for "A,B" it should return result.

Also Ideally when i search for "A , B" (with space) it should return result.


please advice
thanks
abhishek


On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI <fu...@gmail.com>wrote:

> Hi;
>
> Firstly you have to keep in mind that if you don't index punctuation they
> will not be visible for search. On the other hand you can have different
> analyzer for index and search. You have to give more detail about your
> situation. What will be your tokenizer at search time, WhiteSpaceTokenizer?
> You can have a look at here:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
> If you can give some examples what you want for indexing and searching I
> can help you to combine index and search analyzer/tokenizer/token filters.
>
> Thanks;
> Furkan KAMACI
>
>
> 2014-03-09 18:06 GMT+02:00 abhishek jain <ab...@gmail.com>:
>
> > Hi Friends,
> >
> > I am concerned on Tokenizer, my scenario is:
> >
> > During indexing i want to token on all punctuations, so i can use
> > StandardTokenizer, but at search time i want to consider punctuations as
> > part of text,
> >
> > I dont store contents but only indexes.
> >
> > What should i use.
> >
> > Any advices ?
> >
> >
> > --
> > Thanks and kind Regards,
> > Abhishek jain
> >
>



-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767

Re: Which Tokenizer to use at searching

Posted by Furkan KAMACI <fu...@gmail.com>.

Hi;

Firstly you have to keep in mind that if you don't index punctuation they
will not be visible for search. On the other hand you can have different
analyzer for index and search. You have to give more detail about your
situation. What will be your tokenizer at search time, WhiteSpaceTokenizer?
You can have a look at here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

If you can give some examples what you want for indexing and searching I
can help you to combine index and search analyzer/tokenizer/token filters.

Thanks;
Furkan KAMACI


2014-03-09 18:06 GMT+02:00 abhishek jain <ab...@gmail.com>:

> Hi Friends,
>
> I am concerned on Tokenizer, my scenario is:
>
> During indexing i want to token on all punctuations, so i can use
> StandardTokenizer, but at search time i want to consider punctuations as
> part of text,
>
> I dont store contents but only indexes.
>
> What should i use.
>
> Any advices ?
>
>
> --
> Thanks and kind Regards,
> Abhishek jain
>