You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jeff Zhang <zj...@gmail.com> on 2010/03/23 07:59:04 UTC

What is the best practice of using synonymy ?

Hi all,

I'd like to use the synonymy in my project. And I think there's two
candidates solution :
1. using the synonymy in the indexing stage, enhance the index by using
synonymy
2. using the synonymy in the search stage, enhance the search query by
synonymy .

I'd like to know which one is better, any help is appreciated.



-- 
Best Regards

Jeff Zhang

Re: Optimising the lucene search

Posted by Anshum <an...@gmail.com>.
Hi,

I couldn't really get the point here. Do you think you would never have to
search the fields separately? Concatenating the fields would mean a lot of
information loss and you'd not be able to search the fields for a query like
(Field1:X AND Field2:Y ) . If that's the case you could combine the fields
at run time.
As far as relational nature is concerned, I'd say lucene's model is pretty
different from what you're taking it to be. Lucene documents are just a
collection of field/value pairs.

--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............


On Tue, Mar 23, 2010 at 12:31 PM, <su...@zapak.co.in> wrote:

>
>
> Hello,
>
> Optimising the lucene search
>
> Use combined search field for all text fields instead (or on the top) of
> indexing them separately and searching with complex query like
> field1:query OR field2:query ... OR fieldN:query
>
> Reducing number of field make indexing and search much faster. Use
> combined field instead or on the top of separate fields if needed
>
>
> Does that mean that while defining structure of lucene doc ,fields
> (key-val)
> should be in the form of
>
> field1_field2:val1_val2
> (combining fields at time of indexing itself)
>
> instead of
> field1:val1 and field2:val2
>
> for making the search faster
>
>
> Let me know for both the cases..
> CASE 1: field1 and field2 are not related to each other in any ways
> CASE 2: field1:field2 is having 1:n relation
> eg student : frnds (where student ABC can  have 5 diff frnds)
>
>
> thanks,
> Suman
>
> Ps:
> http://it-stream.blogspot.com/2007/12/full-text-search-for-database-using.html
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Optimising the lucene search

Posted by su...@zapak.co.in.

Hello,

Optimising the lucene search

Use combined search field for all text fields instead (or on the top) of
indexing them separately and searching with complex query like
field1:query OR field2:query ... OR fieldN:query

Reducing number of field make indexing and search much faster. Use
combined field instead or on the top of separate fields if needed


Does that mean that while defining structure of lucene doc ,fields (key-val)
should be in the form of

field1_field2:val1_val2
(combining fields at time of indexing itself)

instead of
field1:val1 and field2:val2

for making the search faster


Let me know for both the cases..
CASE 1: field1 and field2 are not related to each other in any ways
CASE 2: field1:field2 is having 1:n relation
eg student : frnds (where student ABC can  have 5 diff frnds)


thanks,
Suman

Ps:http://it-stream.blogspot.com/2007/12/full-text-search-for-database-using.html




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: What is the best practice of using synonymy ?

Posted by Anshum <an...@gmail.com>.
Index time is a much better approach. The only negative about it is the
index size increase. I've used it for a considerable sized dataset and even
the index time doesn't seem to go up considerably.
Searching of multiple terms is generally unoptimized when you can do it with
1.

--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............


On Tue, Mar 23, 2010 at 4:03 PM, Ahmet Arslan <io...@yahoo.com> wrote:

>
>
> > I'd like to use the synonymy in my project. And I think
> > there's two
> > candidates solution :
> > 1. using the synonymy in the indexing stage, enhance the
> > index by using
> > synonymy
> > 2. using the synonymy in the search stage, enhance the
> > search query by
> > synonymy .
> >
> > I'd like to know which one is better, any help is
> > appreciated.
>
> It is advised to use synonyms at index time for various reasons (idf
> differences, multi-word synonyms) [1].
> [1]
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: What is the best practice of using synonymy ?

Posted by Jeff Zhang <zj...@gmail.com>.
Ahmet,

Thanks for your suggestion, and could you explain more about this or give me
a refer article that explains the reason in details ?

Thanks

On Tue, Mar 23, 2010 at 6:33 PM, Ahmet Arslan <io...@yahoo.com> wrote:

>
>
> > I'd like to use the synonymy in my project. And I think
> > there's two
> > candidates solution :
> > 1. using the synonymy in the indexing stage, enhance the
> > index by using
> > synonymy
> > 2. using the synonymy in the search stage, enhance the
> > search query by
> > synonymy .
> >
> > I'd like to know which one is better, any help is
> > appreciated.
>
> It is advised to use synonyms at index time for various reasons (idf
> differences, multi-word synonyms) [1].
> [1]
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Best Regards

Jeff Zhang

Re: What is the best practice of using synonymy ?

Posted by Ahmet Arslan <io...@yahoo.com>.

> I'd like to use the synonymy in my project. And I think
> there's two
> candidates solution :
> 1. using the synonymy in the indexing stage, enhance the
> index by using
> synonymy
> 2. using the synonymy in the search stage, enhance the
> search query by
> synonymy .
> 
> I'd like to know which one is better, any help is
> appreciated.

It is advised to use synonyms at index time for various reasons (idf differences, multi-word synonyms) [1].
[1]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org