You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Karthik N S <ka...@controlnet.co.in> on 2005/03/09 10:22:21 UTC

SPAN QUERY [HOW TO]

Hi

Guys

Apologies..........



The new Feature of lucene 'span query' really is interesting

But need expert suggestions on achieveing the same.

I have 3 documents

Document 1 contains   =  ELECTRONICS  DIGITAL CAMERA
Document 2 contains   =  ELECTRONICS  DIGITAL CAMERA 0PTICS
Document 3 contains  =   ELECTRONICS  DIGITAL CAMERA ACCESSIORIES



search word = " DIGITAL CAMERA "

Returned hits  = 1st doc   ONLY [ 2 and 3rd document should not be in the
hit ]

SpanQuery /PharseQuery  ????



How would one achieve this ??? Please




WITH WARM REGARDS
HAVE A NICE DAY
[ N.S.KARTHIK]


RE: SPAN QUERY [HOW TO]

Posted by Miles Barr <mi...@runtime-collective.com>.
What fields do you have and what are you putting in them?



On Thu, 2005-03-10 at 17:56 +0530, Karthik N S wrote:
> Hi Guys
> 
>  Apologies.......
> 
> 
>   I ditto as u said but the SpanNearQuery is
> 
>   returning me all the 3 documents containing  for rollover of  words
> 
>  'DIGITAL CAMERAS' instead of returning me the 1st doc, Or none by changing
> the slop factor
> 
> Any more ideas Please do .......... B(
> 
> with regards
> karthik
> 
> 
> -----Original Message-----
> From: Miles Barr [mailto:miles@runtime-collective.com]
> Sent: Thursday, March 10, 2005 2:53 PM
> To: java-user@lucene.apache.org
> Subject: RE: SPAN QUERY [HOW TO]
> 
> 
> On Thu, 2005-03-10 at 12:02 +0530, Karthik N S wrote:
> >  U got it bingo,Am trying to do something similar as u replied.
> >  But there is a glitch in the  process
> >
> >  If the search is done on the 'leaf_category'  as u said
> >
> >  with word such as  'CAMERA DIGITAL'  instead of  'DIGITAL CAMERA'  the
> > resultant
> >
> >  return hits will be  ZERO '0'. Usage of SpanQuery  for such conditions
> > applied should return still
> >
> >  the 1st document of 3.
> >
> >  A permutation combination of words entered should result in the specific
> > document being returned.
> 
> If depends what the type of leaf_category is. If you made it Keyword as
> I suggested then it won't be tokenized. i.e. there's one token 'DIGITAL
> CAMERA' instead of the two tokens you normally get, 'digital' and
> 'camera'.
> 
> If you change the field type to Text you should be able to use a
> SpanNearQuery to do your search.
> 
> --
> Miles Barr <mi...@runtime-collective.com>
> Runtime Collective Ltd.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
-- 
Miles Barr <mi...@runtime-collective.com>
Runtime Collective Ltd.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: SPAN QUERY [HOW TO]

Posted by Karthik N S <ka...@controlnet.co.in>.
Hi Guys

 Apologies.......


  I ditto as u said but the SpanNearQuery is

  returning me all the 3 documents containing  for rollover of  words

 'DIGITAL CAMERAS' instead of returning me the 1st doc, Or none by changing
the slop factor

Any more ideas Please do .......... B(

with regards
karthik


-----Original Message-----
From: Miles Barr [mailto:miles@runtime-collective.com]
Sent: Thursday, March 10, 2005 2:53 PM
To: java-user@lucene.apache.org
Subject: RE: SPAN QUERY [HOW TO]


On Thu, 2005-03-10 at 12:02 +0530, Karthik N S wrote:
>  U got it bingo,Am trying to do something similar as u replied.
>  But there is a glitch in the  process
>
>  If the search is done on the 'leaf_category'  as u said
>
>  with word such as  'CAMERA DIGITAL'  instead of  'DIGITAL CAMERA'  the
> resultant
>
>  return hits will be  ZERO '0'. Usage of SpanQuery  for such conditions
> applied should return still
>
>  the 1st document of 3.
>
>  A permutation combination of words entered should result in the specific
> document being returned.

If depends what the type of leaf_category is. If you made it Keyword as
I suggested then it won't be tokenized. i.e. there's one token 'DIGITAL
CAMERA' instead of the two tokens you normally get, 'digital' and
'camera'.

If you change the field type to Text you should be able to use a
SpanNearQuery to do your search.

--
Miles Barr <mi...@runtime-collective.com>
Runtime Collective Ltd.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: SPAN QUERY [HOW TO]

Posted by Miles Barr <mi...@runtime-collective.com>.
On Thu, 2005-03-10 at 12:02 +0530, Karthik N S wrote:
>  U got it bingo,Am trying to do something similar as u replied.
>  But there is a glitch in the  process
> 
>  If the search is done on the 'leaf_category'  as u said
> 
>  with word such as  'CAMERA DIGITAL'  instead of  'DIGITAL CAMERA'  the
> resultant
> 
>  return hits will be  ZERO '0'. Usage of SpanQuery  for such conditions
> applied should return still
> 
>  the 1st document of 3.
> 
>  A permutation combination of words entered should result in the specific
> document being returned.

If depends what the type of leaf_category is. If you made it Keyword as
I suggested then it won't be tokenized. i.e. there's one token 'DIGITAL
CAMERA' instead of the two tokens you normally get, 'digital' and
'camera'.

If you change the field type to Text you should be able to use a
SpanNearQuery to do your search.

-- 
Miles Barr <mi...@runtime-collective.com>
Runtime Collective Ltd.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: SPAN QUERY [HOW TO]

Posted by Karthik N S <ka...@controlnet.co.in>.
Hi

  Guys.

Apologies........

 U got it bingo,Am trying to do something similar as u replied.
 But there is a glitch in the  process

 If the search is done on the 'leaf_category'  as u said

 with word such as  'CAMERA DIGITAL'  instead of  'DIGITAL CAMERA'  the
resultant

 return hits will be  ZERO '0'. Usage of SpanQuery  for such conditions
applied should return still

 the 1st document of 3.

 A permutation combination of words entered should result in the specific
document being returned.



with  regards

Karthik


-----Original Message-----
From: Miles Barr [mailto:miles@runtime-collective.com]
Sent: Wednesday, March 09, 2005 7:10 PM
To: java-user@lucene.apache.org
Subject: RE: SPAN QUERY [HOW TO]


It's not clear what you're trying to achieve. PhraseQuery and
SpanNearQuery can help you find tokens that are close to each other. It
you're using the standard analyzer, tokens are words. They won't help
you group documents under a topic.

You should setup some other fields in your Lucene document to hold
category information. e.g. for document 1:

text = ELECTRONICS  DIGITAL CAMERA
parent_category = ELECTRONICS
leaf_category = DIGITAL CAMERA

for document 2:

text = ELECTRONICS  DIGITAL CAMERA OPTICS
parent_category = ELECTRONICS
parent_category = DIGITAL CAMERA
leaf_category = OPTICS

Then search on the leaf_category. Make sure you setup the category
fields to be type KEYWORD, i.e. not tokenized.



On Wed, 2005-03-09 at 18:07 +0530, Karthik N S wrote:
> Hi Guys
>
> Apologies....
>
> Some body Please Help me for this Form
>
>
> with regards
> Karthik
>
>
> -----Original Message-----
> From: Miles Barr [mailto:miles@runtime-collective.com]
> Sent: Wednesday, March 09, 2005 3:02 PM
> To: java-user@lucene.apache.org
> Subject: Re: SPAN QUERY [HOW TO]
>
>
> On Wed, 2005-03-09 at 14:52 +0530, Karthik N S wrote:
> > The new Feature of lucene 'span query' really is interesting
> >
> > But need expert suggestions on achieveing the same.
> >
> > I have 3 documents
> >
> > Document 1 contains   =  ELECTRONICS  DIGITAL CAMERA
> > Document 2 contains   =  ELECTRONICS  DIGITAL CAMERA 0PTICS
> > Document 3 contains  =   ELECTRONICS  DIGITAL CAMERA ACCESSIORIES
> >
> >
> >
> > search word = " DIGITAL CAMERA "
> >
> > Returned hits  = 1st doc   ONLY [ 2 and 3rd document should not be in
> > the hit ]
> >
> > SpanQuery /PharseQuery  ????
> >
> >
> >
> > How would one achieve this ??? Please
>
> I've used span queries to boost the scores of results where words appear
> close together. I'm not sure exactly what you're trying to achieve. All
> three documents contain the search phrase, so both span and phrase
> queries would return all the documents.
>
> Are you trying to setup a taxonomy? i.e. only display documents in the
> category Electronics > Digital Camera, and not those in sub categories?
> If this is the case you should try to build the categorisation at the
> same time as the indexing process and either add explicit clauses in the
> search query or filter afterwards.
>
>
>
--
Miles Barr <mi...@runtime-collective.com>
Runtime Collective Ltd.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: SPAN QUERY [HOW TO]

Posted by Miles Barr <mi...@runtime-collective.com>.
It's not clear what you're trying to achieve. PhraseQuery and
SpanNearQuery can help you find tokens that are close to each other. It
you're using the standard analyzer, tokens are words. They won't help
you group documents under a topic. 

You should setup some other fields in your Lucene document to hold
category information. e.g. for document 1:

text = ELECTRONICS  DIGITAL CAMERA
parent_category = ELECTRONICS
leaf_category = DIGITAL CAMERA

for document 2:

text = ELECTRONICS  DIGITAL CAMERA OPTICS
parent_category = ELECTRONICS
parent_category = DIGITAL CAMERA
leaf_category = OPTICS

Then search on the leaf_category. Make sure you setup the category
fields to be type KEYWORD, i.e. not tokenized.



On Wed, 2005-03-09 at 18:07 +0530, Karthik N S wrote:
> Hi Guys
> 
> Apologies....
> 
> Some body Please Help me for this Form
> 
> 
> with regards
> Karthik
> 
> 
> -----Original Message-----
> From: Miles Barr [mailto:miles@runtime-collective.com]
> Sent: Wednesday, March 09, 2005 3:02 PM
> To: java-user@lucene.apache.org
> Subject: Re: SPAN QUERY [HOW TO]
> 
> 
> On Wed, 2005-03-09 at 14:52 +0530, Karthik N S wrote:
> > The new Feature of lucene 'span query' really is interesting
> > 
> > But need expert suggestions on achieveing the same.
> > 
> > I have 3 documents 
> > 
> > Document 1 contains   =  ELECTRONICS  DIGITAL CAMERA 
> > Document 2 contains   =  ELECTRONICS  DIGITAL CAMERA 0PTICS
> > Document 3 contains  =   ELECTRONICS  DIGITAL CAMERA ACCESSIORIES
> > 
> >  
> > 
> > search word = " DIGITAL CAMERA "
> > 
> > Returned hits  = 1st doc   ONLY [ 2 and 3rd document should not be in
> > the hit ]
> > 
> > SpanQuery /PharseQuery  ????
> > 
> >  
> > 
> > How would one achieve this ??? Please
> 
> I've used span queries to boost the scores of results where words appear
> close together. I'm not sure exactly what you're trying to achieve. All
> three documents contain the search phrase, so both span and phrase
> queries would return all the documents.
> 
> Are you trying to setup a taxonomy? i.e. only display documents in the
> category Electronics > Digital Camera, and not those in sub categories?
> If this is the case you should try to build the categorisation at the
> same time as the indexing process and either add explicit clauses in the
> search query or filter afterwards.
> 
> 
> 
-- 
Miles Barr <mi...@runtime-collective.com>
Runtime Collective Ltd.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: SPAN QUERY [HOW TO]

Posted by Karthik N S <ka...@controlnet.co.in>.
Hi Guys

Apologies....

Some body Please Help me for this Form


with regards
Karthik


-----Original Message-----
From: Miles Barr [mailto:miles@runtime-collective.com]
Sent: Wednesday, March 09, 2005 3:02 PM
To: java-user@lucene.apache.org
Subject: Re: SPAN QUERY [HOW TO]


On Wed, 2005-03-09 at 14:52 +0530, Karthik N S wrote:
> The new Feature of lucene 'span query' really is interesting
> 
> But need expert suggestions on achieveing the same.
> 
> I have 3 documents 
> 
> Document 1 contains   =  ELECTRONICS  DIGITAL CAMERA 
> Document 2 contains   =  ELECTRONICS  DIGITAL CAMERA 0PTICS
> Document 3 contains  =   ELECTRONICS  DIGITAL CAMERA ACCESSIORIES
> 
>  
> 
> search word = " DIGITAL CAMERA "
> 
> Returned hits  = 1st doc   ONLY [ 2 and 3rd document should not be in
> the hit ]
> 
> SpanQuery /PharseQuery  ????
> 
>  
> 
> How would one achieve this ??? Please

I've used span queries to boost the scores of results where words appear
close together. I'm not sure exactly what you're trying to achieve. All
three documents contain the search phrase, so both span and phrase
queries would return all the documents.

Are you trying to setup a taxonomy? i.e. only display documents in the
category Electronics > Digital Camera, and not those in sub categories?
If this is the case you should try to build the categorisation at the
same time as the indexing process and either add explicit clauses in the
search query or filter afterwards.



-- 
Miles Barr <mi...@runtime-collective.com>
Runtime Collective Ltd.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: SPAN QUERY [HOW TO]

Posted by Karthik N S <ka...@controlnet.co.in>.
Hi
Guys

Apologies..........

Yes some similar concept in my head is thundering.

I will be using a field 'text'  for the same

Can u guys please tell me using spanQuery or pahrse Query  would do the job
,If so How to proceeed.


Thx in advance
Karthik

-----Original Message-----
From: Miles Barr [mailto:miles@runtime-collective.com]
Sent: Wednesday, March 09, 2005 3:02 PM
To: java-user@lucene.apache.org
Subject: Re: SPAN QUERY [HOW TO]


On Wed, 2005-03-09 at 14:52 +0530, Karthik N S wrote:
> The new Feature of lucene 'span query' really is interesting
>
> But need expert suggestions on achieveing the same.
>
> I have 3 documents
>
> Document 1 contains   =  ELECTRONICS  DIGITAL CAMERA
> Document 2 contains   =  ELECTRONICS  DIGITAL CAMERA 0PTICS
> Document 3 contains  =   ELECTRONICS  DIGITAL CAMERA ACCESSIORIES
>
>
>
> search word = " DIGITAL CAMERA "
>
> Returned hits  = 1st doc   ONLY [ 2 and 3rd document should not be in
> the hit ]
>
> SpanQuery /PharseQuery  ????
>
>
>
> How would one achieve this ??? Please

I've used span queries to boost the scores of results where words appear
close together. I'm not sure exactly what you're trying to achieve. All
three documents contain the search phrase, so both span and phrase
queries would return all the documents.

Are you trying to setup a taxonomy? i.e. only display documents in the
category Electronics > Digital Camera, and not those in sub categories?
If this is the case you should try to build the categorisation at the
same time as the indexing process and either add explicit clauses in the
search query or filter afterwards.



--
Miles Barr <mi...@runtime-collective.com>
Runtime Collective Ltd.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: SPAN QUERY [HOW TO]

Posted by Miles Barr <mi...@runtime-collective.com>.
On Wed, 2005-03-09 at 14:52 +0530, Karthik N S wrote:
> The new Feature of lucene 'span query' really is interesting
> 
> But need expert suggestions on achieveing the same.
> 
> I have 3 documents 
> 
> Document 1 contains   =  ELECTRONICS  DIGITAL CAMERA 
> Document 2 contains   =  ELECTRONICS  DIGITAL CAMERA 0PTICS
> Document 3 contains  =   ELECTRONICS  DIGITAL CAMERA ACCESSIORIES
> 
>  
> 
> search word = " DIGITAL CAMERA "
> 
> Returned hits  = 1st doc   ONLY [ 2 and 3rd document should not be in
> the hit ]
> 
> SpanQuery /PharseQuery  ????
> 
>  
> 
> How would one achieve this ??? Please

I've used span queries to boost the scores of results where words appear
close together. I'm not sure exactly what you're trying to achieve. All
three documents contain the search phrase, so both span and phrase
queries would return all the documents.

Are you trying to setup a taxonomy? i.e. only display documents in the
category Electronics > Digital Camera, and not those in sub categories?
If this is the case you should try to build the categorisation at the
same time as the indexing process and either add explicit clauses in the
search query or filter afterwards.



-- 
Miles Barr <mi...@runtime-collective.com>
Runtime Collective Ltd.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org