You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by leostro <le...@gmail.com> on 2014/12/21 19:31:54 UTC

set keepword file to be used based on a field value

Hi all,

I made some test and now I'm able to use keepwords for searching some common
"brands" name in the docs I have in my index.
I have docs with only two fields:
- a title
- a categoryId
The tests I made right now were based on videogame related rows, so I have a
keepwords.txt containing words like "nintendo", "playstation" and so on.

Now I want to intruct solr to use a different keepword file depending on the
categoryid value specified.

So, for docs with categoryid=1 (videogame) I'd like to use keepwrods1.txt
(the one with nintendo, playstation, etcetc) but id categoryid=2 (cars) I'd
like to use keepwords2.txt (another file containg bmw, audi, etcetc)

can someone help me?
Regards

Leo




--
View this message in context: http://lucene.472066.n3.nabble.com/set-keepword-file-to-be-used-based-on-a-field-value-tp4175474.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: set keepword file to be used based on a field value

Posted by Tomoko Uchida <to...@gmail.com>.
Hi Leo,

Yes, my image is similar to yours.
> If the value ends with "_CAT1" ==> use
> as keepword file "keepwords1.txt" and so on?

But my second option is not about configurations, but "customizing" Solr.

Utilizing customizability of Lucene/Solr, you can write your own
TokenFilter class.
Maybe your requirement is satisfied by subclassing
org.apache.lucene.analysis.util.FilteringTokenFilter.

The custom filter class will take multiple keepword files, and build
multiple word sets (KeepwordFilter have only single word set),
and switch the word sets by field value's prefix (or other information.)
That is just my draft idea, there should be more sophisticated way...

If you are interested in (and familiar with Java programming of course,)
you would want to check out Solr source code from SVN and browse KeepwordFilter
/ KeepwordFilterFactory class for getting implementation image.

Thanks,
Tomoko



2014-12-22 17:10 GMT+09:00 leostro <le...@gmail.com>:

> Hi Tomoko,
>
> I understand you first reply and the first hint (one field for each
> categoryid).
> I thought this was a relatively "common" scenario.
>
> I'm interested in understanding the option you are talking about in the
> second reply.
>
> > you can tell "which keepwords set (file) shoud be used" to custom filter
> > by
> > adding special prefix (or something like) to the target field value.
> > but of course it makes indexing/querying process slightly complicated.
>
> Are you talking about adding a postfix (like _CAT1) at value of the field
> I'm going to analyze with keepwords? If the value ends with "_CAT1" ==> use
> as keepword file "keepwords1.txt" and so on?
>
> I can't understand how to reach this goal, have you seen some configuration
> examples?
> I didn't find anything :(
>
> Thanks
> Leo
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/different-keepword-files-for-differents-field-values-tp4175474p4175528.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: set keepword file to be used based on a field value

Posted by leostro <le...@gmail.com>.
Hi Tomoko,

I understand you first reply and the first hint (one field for each
categoryid).
I thought this was a relatively "common" scenario.

I'm interested in understanding the option you are talking about in the
second reply.

> you can tell "which keepwords set (file) shoud be used" to custom filter
> by
> adding special prefix (or something like) to the target field value.
> but of course it makes indexing/querying process slightly complicated. 

Are you talking about adding a postfix (like _CAT1) at value of the field
I'm going to analyze with keepwords? If the value ends with "_CAT1" ==> use
as keepword file "keepwords1.txt" and so on?

I can't understand how to reach this goal, have you seen some configuration
examples?
I didn't find anything :(

Thanks
Leo




--
View this message in context: http://lucene.472066.n3.nabble.com/different-keepword-files-for-differents-field-values-tp4175474p4175528.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: set keepword file to be used based on a field value

Posted by Tomoko Uchida <to...@gmail.com>.
Sorry this is confusing...
> have your own custom KeepwordFilter that holds multiple keepwords set and switch
them based on a parameter,

you can tell "which keepwords set (file) shoud be used" to custom filter by
adding special prefix (or something like) to the target field value.
but of course it makes indexing/querying process slightly complicated.


2014-12-22 10:58 GMT+09:00 Tomoko Uchida <to...@gmail.com>:

> Hi,
>
> I cannot fully understand your point, but
> you would like to apply KeepwordFilter to "title" field and switch
> keepword files based on "categoryId" field value ?
>
> I think that is essntially difficult because an Analyzer (including
> Filters) cannot take into account another field values except for they are
> analyzing.
>
> If you have relatively few categories, you can handle such issue by using
> multiple KeepWordFilters/Analyzers.
> One filter/analyzer for categoryid=1, another filter/analyzer for
> category=2, and so on...
> Separate title fields correspond to each categories are also needed
> (title_for_category1, title_for_category2, etc.)
>
> If you need smarter way, this is just an idea however,
> you might be able to have your own custom KeepwordFilter that holds
> multiple keepwords set and switch them based on a parameter, not another
> field value.
>
>
> Regards,
> Tomoko
>
> 2014-12-22 3:31 GMT+09:00 leostro <le...@gmail.com>:
>
>> Hi all,
>>
>> I made some test and now I'm able to use keepwords for searching some
>> common
>> "brands" name in the docs I have in my index.
>> I have docs with only two fields:
>> - a title
>> - a categoryId
>> The tests I made right now were based on videogame related rows, so I
>> have a
>> keepwords.txt containing words like "nintendo", "playstation" and so on.
>>
>> Now I want to intruct solr to use a different keepword file depending on
>> the
>> categoryid value specified.
>>
>> So, for docs with categoryid=1 (videogame) I'd like to use keepwrods1.txt
>> (the one with nintendo, playstation, etcetc) but id categoryid=2 (cars)
>> I'd
>> like to use keepwords2.txt (another file containg bmw, audi, etcetc)
>>
>> can someone help me?
>> Regards
>>
>> Leo
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/set-keepword-file-to-be-used-based-on-a-field-value-tp4175474.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>

Re: set keepword file to be used based on a field value

Posted by Tomoko Uchida <to...@gmail.com>.
Hi,

I cannot fully understand your point, but
you would like to apply KeepwordFilter to "title" field and switch keepword
files based on "categoryId" field value ?

I think that is essntially difficult because an Analyzer (including
Filters) cannot take into account another field values except for they are
analyzing.

If you have relatively few categories, you can handle such issue by using
multiple KeepWordFilters/Analyzers.
One filter/analyzer for categoryid=1, another filter/analyzer for
category=2, and so on...
Separate title fields correspond to each categories are also needed
(title_for_category1, title_for_category2, etc.)

If you need smarter way, this is just an idea however,
you might be able to have your own custom KeepwordFilter that holds
multiple keepwords set and switch them based on a parameter, not another
field value.


Regards,
Tomoko

2014-12-22 3:31 GMT+09:00 leostro <le...@gmail.com>:

> Hi all,
>
> I made some test and now I'm able to use keepwords for searching some
> common
> "brands" name in the docs I have in my index.
> I have docs with only two fields:
> - a title
> - a categoryId
> The tests I made right now were based on videogame related rows, so I have
> a
> keepwords.txt containing words like "nintendo", "playstation" and so on.
>
> Now I want to intruct solr to use a different keepword file depending on
> the
> categoryid value specified.
>
> So, for docs with categoryid=1 (videogame) I'd like to use keepwrods1.txt
> (the one with nintendo, playstation, etcetc) but id categoryid=2 (cars) I'd
> like to use keepwords2.txt (another file containg bmw, audi, etcetc)
>
> can someone help me?
> Regards
>
> Leo
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/set-keepword-file-to-be-used-based-on-a-field-value-tp4175474.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>