You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Karthik N S <ka...@controlnet.co.in> on 2005/06/06 08:40:57 UTC
RE REQUEST: SPECIFIC HIT
Hi
Guys.
Apologies.....
with refrence to my last main dted Mon, 14 Mar 2005
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200503.mbox/%3COBE
LINLGKPEMCIEIJJNKEEIACCAA.karthik@controlnet.co.in%3E
I would like to again request some Help in the search Concepts.
I have Indexed documents sucessfully and they would be
Document 1 contains = ELECTRONICS DIGITAL CAMERA
Document 2 contains = ELECTRONICS DIGITAL CAMERA BATTERY ACCESSORIES
Document 3 contains = ELECTRONICS DIGITAL CAMERA 0PTICS
Document 4 contains = ELECTRONICS DIGITAL CAMERA ACCESSORIES
Document 5 contains = ELECTRONICS DIGITAL CAMERA CABEL ACCESSORIES
Document 6 contains = ELECTRONICS DIGITAL CAMERA OPTICS CABEL
ACCESSORIES
Document 7 contains = ELECTRONICS DIGITAL CAMERA APPERAL ACCESSORIES
On Search "Digital Camera Optics" , the hit has to return me 3rd Document
ONLY
instead of other Documents [ The word DIGITAL CAMERA is common word in all
cases and could be in any order of sequence].
To Solve this Problem I creating a new Field called 'IGNORE WORD' and this
field would be as shown below
Document 1 contains = ELECTRONICS DIGITAL CAMERA
'IGNORE WORD = BATTERY,ACCESSORIES,0PTICS,CABEL,APPERAL
Document 2 contains = ELECTRONICS DIGITAL CAMERA BATTERY ACCESSORIES
'IGNORE WORD = ACCESSORIES,0PTICS,CABEL,APPERAL
Document 3 contains = ELECTRONICS DIGITAL CAMERA 0PTICS
'IGNORE WORD = BATTERY,ACCESSORIES,CABEL,APPERAL
Document 4 contains = ELECTRONICS DIGITAL CAMERA ACCESSORIES
'IGNORE WORD = BATTERY,0PTICS,CABEL,APPERAL
Document 5 contains = ELECTRONICS DIGITAL CAMERA CABEL ACCESSORIE
'IGNORE WORD = BATTERY,0PTICS,APPERAL
Document 6 contains = ELECTRONICS DIGITAL CAMERA OPTICS CABEL
ACCESSORIES
'IGNORE WORD = BATTERY,APPERAL
Document 7 contains = ELECTRONICS DIGITAL CAMERA APPERAL ACCESSORIES
'IGNORE WORD = BATTERY,0PTICS,CABEL
For Every search I feed the 'IGNORE WORD' to the query such as
Search = DIGITAL CAMERA 0PTICS
Query = +KEYSRC:Digital +KEYSRC:Camera +KEYSRC:Cabel -KEYSRC:(BATTERY
ACCESSORIES CABEL APPERAL)
The resultant hit would be the 3rd doc instead of 3rd and 5th..
The Problem here is of 2 conditions
1) Search could be DIGITAL CAMERA 0PTICS or OPTICS CAMERAS DIIGTAL or
CAMERA OPTICS should retrieve same hit results.
2) The process of creation of 'IGNORE WORD' list is very time
consuming...[ Document is in very large numbers ]
and also permutation /combination for the same is very expensive.
Does anybody in here have some idea on how to process.
Thx in advance
Karthik
RE: RE REQUEST: SPECIFIC HIT
Posted by Karthik N S <ka...@controlnet.co.in>.
Hi
Apologies.....
The problem is not with Optics or 'O' ,
Since the 3rd and 6th Document is Indexed as
Document 3 contains = ELECTRONICS DIGITAL CAMERA 0PTICS
Document 6 contains = ELECTRONICS DIGITAL CAMERA OPTICS CABEL
A search Criteria for 'digital camera Optics' should return only 3rd
document Only and
NOT 6th Document.
For a Query in either 'DIGITAL CAMERA 0PTICS' or 'OPTICS CAMERAS DIIGTAL'
or
'CAMERA OPTICS' to return similar hits, we have to Inject the 'IGNOREWORDS'
to the query as ' -cabel '
for Natural Language processing.
Now the main problem is to find the permutation /combination of all such
words to be fed into the field 'IGNOREWORDS'
for the query to be processed.
Any Ideas Please
Karthik
-----Original Message-----
From: Paul Elschot [mailto:paul.elschot@xs4all.nl]
Sent: Monday, June 06, 2005 10:33 PM
To: java-user@lucene.apache.org
Subject: Re: RE REQUEST: SPECIFIC HIT
On Monday 06 June 2005 08:40, Karthik N S wrote:
> Hi
>
> Guys.
>
> Apologies.....
>
> with refrence to my last main dted Mon, 14 Mar 2005
>
>
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200503.mbox/%3COBE
> LINLGKPEMCIEIJJNKEEIACCAA.karthik@controlnet.co.in%3E
>
> I would like to again request some Help in the search Concepts.
>
> I have Indexed documents sucessfully and they would be
>
> Document 1 contains = ELECTRONICS DIGITAL CAMERA
> Document 2 contains = ELECTRONICS DIGITAL CAMERA BATTERY ACCESSORIES
> Document 3 contains = ELECTRONICS DIGITAL CAMERA 0PTICS
> Document 4 contains = ELECTRONICS DIGITAL CAMERA ACCESSORIES
> Document 5 contains = ELECTRONICS DIGITAL CAMERA CABEL ACCESSORIES
> Document 6 contains = ELECTRONICS DIGITAL CAMERA OPTICS CABEL
> ACCESSORIES
> Document 7 contains = ELECTRONICS DIGITAL CAMERA APPERAL ACCESSORIES
>
> On Search "Digital Camera Optics" , the hit has to return me 3rd Document
> ONLY
> instead of other Documents [ The word DIGITAL CAMERA is common word in
all
> cases and could be in any order of sequence].
>
> To Solve this Problem I creating a new Field called 'IGNORE WORD' and this
> field would be as shown below
>
> Document 1 contains = ELECTRONICS DIGITAL CAMERA
> 'IGNORE WORD = BATTERY,ACCESSORIES,0PTICS,CABEL,APPERAL
>
> Document 2 contains = ELECTRONICS DIGITAL CAMERA BATTERY ACCESSORIES
> 'IGNORE WORD = ACCESSORIES,0PTICS,CABEL,APPERAL
>
> Document 3 contains = ELECTRONICS DIGITAL CAMERA 0PTICS
> 'IGNORE WORD = BATTERY,ACCESSORIES,CABEL,APPERAL
>
> Document 4 contains = ELECTRONICS DIGITAL CAMERA ACCESSORIES
> 'IGNORE WORD = BATTERY,0PTICS,CABEL,APPERAL
>
> Document 5 contains = ELECTRONICS DIGITAL CAMERA CABEL ACCESSORIE
> 'IGNORE WORD = BATTERY,0PTICS,APPERAL
>
> Document 6 contains = ELECTRONICS DIGITAL CAMERA OPTICS CABEL
> ACCESSORIES
> 'IGNORE WORD = BATTERY,APPERAL
>
> Document 7 contains = ELECTRONICS DIGITAL CAMERA APPERAL ACCESSORIES
> 'IGNORE WORD = BATTERY,0PTICS,CABEL
>
>
> For Every search I feed the 'IGNORE WORD' to the query such as
>
>
> Search = DIGITAL CAMERA 0PTICS
> Query = +KEYSRC:Digital +KEYSRC:Camera +KEYSRC:Cabel -KEYSRC:(BATTERY
> ACCESSORIES CABEL APPERAL)
>
> The resultant hit would be the 3rd doc instead of 3rd and 5th..
>
>
> The Problem here is of 2 conditions
>
> 1) Search could be DIGITAL CAMERA 0PTICS or OPTICS CAMERAS DIIGTAL or
> CAMERA OPTICS should retrieve same hit results.
Here (and also above) 'optics' is written in to forms:
starting with a zero as 0PTICS, and
starting with a capital O as OPTICS.
Is that perhaps part of the problem?
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: RE REQUEST: SPECIFIC HIT
Posted by Paul Elschot <pa...@xs4all.nl>.
On Monday 06 June 2005 08:40, Karthik N S wrote:
> Hi
>
> Guys.
>
> Apologies.....
>
> with refrence to my last main dted Mon, 14 Mar 2005
>
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/200503.mbox/%3COBE
> LINLGKPEMCIEIJJNKEEIACCAA.karthik@controlnet.co.in%3E
>
> I would like to again request some Help in the search Concepts.
>
> I have Indexed documents sucessfully and they would be
>
> Document 1 contains = ELECTRONICS DIGITAL CAMERA
> Document 2 contains = ELECTRONICS DIGITAL CAMERA BATTERY ACCESSORIES
> Document 3 contains = ELECTRONICS DIGITAL CAMERA 0PTICS
> Document 4 contains = ELECTRONICS DIGITAL CAMERA ACCESSORIES
> Document 5 contains = ELECTRONICS DIGITAL CAMERA CABEL ACCESSORIES
> Document 6 contains = ELECTRONICS DIGITAL CAMERA OPTICS CABEL
> ACCESSORIES
> Document 7 contains = ELECTRONICS DIGITAL CAMERA APPERAL ACCESSORIES
>
> On Search "Digital Camera Optics" , the hit has to return me 3rd Document
> ONLY
> instead of other Documents [ The word DIGITAL CAMERA is common word in all
> cases and could be in any order of sequence].
>
> To Solve this Problem I creating a new Field called 'IGNORE WORD' and this
> field would be as shown below
>
> Document 1 contains = ELECTRONICS DIGITAL CAMERA
> 'IGNORE WORD = BATTERY,ACCESSORIES,0PTICS,CABEL,APPERAL
>
> Document 2 contains = ELECTRONICS DIGITAL CAMERA BATTERY ACCESSORIES
> 'IGNORE WORD = ACCESSORIES,0PTICS,CABEL,APPERAL
>
> Document 3 contains = ELECTRONICS DIGITAL CAMERA 0PTICS
> 'IGNORE WORD = BATTERY,ACCESSORIES,CABEL,APPERAL
>
> Document 4 contains = ELECTRONICS DIGITAL CAMERA ACCESSORIES
> 'IGNORE WORD = BATTERY,0PTICS,CABEL,APPERAL
>
> Document 5 contains = ELECTRONICS DIGITAL CAMERA CABEL ACCESSORIE
> 'IGNORE WORD = BATTERY,0PTICS,APPERAL
>
> Document 6 contains = ELECTRONICS DIGITAL CAMERA OPTICS CABEL
> ACCESSORIES
> 'IGNORE WORD = BATTERY,APPERAL
>
> Document 7 contains = ELECTRONICS DIGITAL CAMERA APPERAL ACCESSORIES
> 'IGNORE WORD = BATTERY,0PTICS,CABEL
>
>
> For Every search I feed the 'IGNORE WORD' to the query such as
>
>
> Search = DIGITAL CAMERA 0PTICS
> Query = +KEYSRC:Digital +KEYSRC:Camera +KEYSRC:Cabel -KEYSRC:(BATTERY
> ACCESSORIES CABEL APPERAL)
>
> The resultant hit would be the 3rd doc instead of 3rd and 5th..
>
>
> The Problem here is of 2 conditions
>
> 1) Search could be DIGITAL CAMERA 0PTICS or OPTICS CAMERAS DIIGTAL or
> CAMERA OPTICS should retrieve same hit results.
Here (and also above) 'optics' is written in to forms:
starting with a zero as 0PTICS, and
starting with a capital O as OPTICS.
Is that perhaps part of the problem?
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org