You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Kainth, Sachin" <Sa...@atkinsglobal.com> on 2007/02/20 13:29:25 UTC

Search for a term in all fields

Hi all,

How do I search for a term in all fields of a document?

Cheers

Sachin


This email and any attached files are confidential and copyright protected. If you are not the addressee, any dissemination of this communication is strictly prohibited. Unless otherwise expressly agreed in writing, nothing stated in this communication shall be legally binding.

The ultimate parent company of the Atkins Group is WS Atkins plc.  Registered in England No. 1885586.  Registered Office Woodcote Grove, Ashley Road, Epsom, Surrey KT18 5BW.

Consider the environment. Please don't print this e-mail unless you really need to. 

Re: Search for a term in all fields

Posted by karl wettin <ka...@gmail.com>.
20 feb 2007 kl. 13.29 skrev Kainth, Sachin:

> How do I search for a term in all fields of a document?

You create a boolean query with a term query for each field available  
in the document you are searching for.

<http://lucene.apache.org/java/docs/api/org/apache/lucene/index/ 
IndexReader.html#getFieldNames 
(org.apache.lucene.index.IndexReader.FieldOption)>


-- 
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search for a term in all fields

Posted by Erick Erickson <er...@gmail.com>.
Nothing jumps out at me....

Erick

On 2/21/07, Kainth, Sachin <Sa...@atkinsglobal.com> wrote:
>
> Sorry I didn't make myself clear at all.  Remember you said that it is
> possible to do this:
>
> > Sure. Convert your simple queries into span queries (which are also
> > relatively simple). Then, when you index everything in the "all"
> > field, subclass your analyzer to return a large PositionIncrementGap.
> > Explaining how this works with words is awkward, so....
> >
> > doc.add("all", "one two three");
> > doc.add("all", "four five six");
> > doc.add("all", "seven eight nine");
> > index the document.
> >
> > Assume you've implemented an analyzer that returns 1000 for
> > getPositionIncrementGap.
> >
> > Now, the term offsets in the single document will be one - 0 two - 1
> > three - 2 four 1003 five 1004 six 1005 seven 2006 eight 2007 nine 2008
> >
> > Now, if you use SpanNearQuery with a slop of 900 (i.e. "one nine"~900)
>
> > you won't get a match because the "distance" between one and nine is
> > more than 900. But "one three"~900 will match.
> >
> > It's possible to transform any query into a set of span queries, See
> > the thread "Multiword Highlighting" that Mark Miller and I were
> > exchanging ideas on recently. Be aware that the code we were talking
> > about has to have a modification when used on a "regular" index where
> > it pays attention to the document that each sub-clause comes. The
> > code, as written, assumes you're using a MemoryIndex for one and only
> > one document, so unless you need complex queries, I'd just think about
>
> > rewriting simple queries with ANDs as a SpanNearQuery.
>
> Well, what I meant was instead of using a gap of 1000 what I was
> thinking is could we not replace that gap of a 1000 characters with a ~.
> Then, if this is possible what I was wondering is whether there is a way
> of performing searches using the ~.
>
> Cheers
>
> Sachin
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: 21 February 2007 13:05
> To: java-user@lucene.apache.org
> Subject: Re: Search for a term in all fields
>
> I don't see what you're getting at. There are only two forms of a query
> term,,,, field:value value
>
> And the second is really the first with the default field you specified
> in the parser implied. So just think of all terms you specify in a query
> as field:term.
>
> Having some "special character" in the index doesn't help you because
> you still have to specify the field. And your two choices are still
> either a BooleanQuery that mentions all fields or indexing the data into
> a single field.
>
> Best
> Erick
>
>
>
> On 2/21/07, Kainth, Sachin <Sa...@atkinsglobal.com> wrote:
> >
> > Well, here's my current thoughts on acheiveing this.  Instead of
> > putting a 1000 space gap between elements of the 1ll field could I not
>
> > use a character that isn't used in the data such as ~ and then somehow
>
> > (don't know how) use that to search all fields?
> >
> > -----Original Message-----
> > From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> > Sent: 20 February 2007 18:30
> > To: java-user@lucene.apache.org
> > Subject: Re: Search for a term in all fields
> >
> >
> > The information Erick gave you when you asked this question yesterday
> > is all very accurate -- the one addition i would make is that you
> > don't need SpanNear queries to take advantage of positionINcrimentGap
> > -- PhraseQueries do that to.
> >
> > Consolidating your fields into a single "all" field, or constructing a
>
> > BoolenQuery across all of your existing fields are really the two main
>
> > options -- each with their tradeoffs.
> >
> > http://www.nabble.com/Search-in-all-fields-tf3254569.html
> >
> > : Date: Tue, 20 Feb 2007 12:29:25 -0000
> > : From: "Kainth, Sachin" <Sa...@atkinsglobal.com>
> > : Reply-To: java-user@lucene.apache.org
> > : To: java-user@lucene.apache.org
> > : Subject: Search for a term in all fields
> > :
> > : Hi all,
> > :
> > : How do I search for a term in all fields of a document?
> > :
> > : Cheers
> > :
> > : Sachin
> > :
> > :
> > : This email and any attached files are confidential and copyright
> > protected. If you are not the addressee, any dissemination of this
> > communication is strictly prohibited. Unless otherwise expressly
> > agreed in writing, nothing stated in this communication shall be
> > legally binding.
> > :
> > : The ultimate parent company of the Atkins Group is WS Atkins plc.
> > Registered in England No. 1885586.  Registered Office Woodcote Grove,
> > Ashley Road, Epsom, Surrey KT18 5BW.
> > :
> > : Consider the environment. Please don't print this e-mail unless you
> > really need to.
> > :
> >
> >
> >
> > -Hoss
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> > This message has been scanned for viruses by MailControl - (see
> > http://bluepages.wsatkins.co.uk/?6875772)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Search for a term in all fields

Posted by "Kainth, Sachin" <Sa...@atkinsglobal.com>.
Sorry I didn't make myself clear at all.  Remember you said that it is
possible to do this:

> Sure. Convert your simple queries into span queries (which are also 
> relatively simple). Then, when you index everything in the "all" 
> field, subclass your analyzer to return a large PositionIncrementGap.
> Explaining how this works with words is awkward, so....
>
> doc.add("all", "one two three");
> doc.add("all", "four five six");
> doc.add("all", "seven eight nine");
> index the document.
>
> Assume you've implemented an analyzer that returns 1000 for 
> getPositionIncrementGap.
>
> Now, the term offsets in the single document will be one - 0 two - 1 
> three - 2 four 1003 five 1004 six 1005 seven 2006 eight 2007 nine 2008
>
> Now, if you use SpanNearQuery with a slop of 900 (i.e. "one nine"~900)

> you won't get a match because the "distance" between one and nine is 
> more than 900. But "one three"~900 will match.
>
> It's possible to transform any query into a set of span queries, See 
> the thread "Multiword Highlighting" that Mark Miller and I were 
> exchanging ideas on recently. Be aware that the code we were talking 
> about has to have a modification when used on a "regular" index where 
> it pays attention to the document that each sub-clause comes. The 
> code, as written, assumes you're using a MemoryIndex for one and only 
> one document, so unless you need complex queries, I'd just think about

> rewriting simple queries with ANDs as a SpanNearQuery.

Well, what I meant was instead of using a gap of 1000 what I was
thinking is could we not replace that gap of a 1000 characters with a ~.
Then, if this is possible what I was wondering is whether there is a way
of performing searches using the ~.

Cheers

Sachin
 

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: 21 February 2007 13:05
To: java-user@lucene.apache.org
Subject: Re: Search for a term in all fields

I don't see what you're getting at. There are only two forms of a query
term,,,, field:value value

And the second is really the first with the default field you specified
in the parser implied. So just think of all terms you specify in a query
as field:term.

Having some "special character" in the index doesn't help you because
you still have to specify the field. And your two choices are still
either a BooleanQuery that mentions all fields or indexing the data into
a single field.

Best
Erick



On 2/21/07, Kainth, Sachin <Sa...@atkinsglobal.com> wrote:
>
> Well, here's my current thoughts on acheiveing this.  Instead of 
> putting a 1000 space gap between elements of the 1ll field could I not

> use a character that isn't used in the data such as ~ and then somehow

> (don't know how) use that to search all fields?
>
> -----Original Message-----
> From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> Sent: 20 February 2007 18:30
> To: java-user@lucene.apache.org
> Subject: Re: Search for a term in all fields
>
>
> The information Erick gave you when you asked this question yesterday 
> is all very accurate -- the one addition i would make is that you 
> don't need SpanNear queries to take advantage of positionINcrimentGap 
> -- PhraseQueries do that to.
>
> Consolidating your fields into a single "all" field, or constructing a

> BoolenQuery across all of your existing fields are really the two main

> options -- each with their tradeoffs.
>
> http://www.nabble.com/Search-in-all-fields-tf3254569.html
>
> : Date: Tue, 20 Feb 2007 12:29:25 -0000
> : From: "Kainth, Sachin" <Sa...@atkinsglobal.com>
> : Reply-To: java-user@lucene.apache.org
> : To: java-user@lucene.apache.org
> : Subject: Search for a term in all fields
> :
> : Hi all,
> :
> : How do I search for a term in all fields of a document?
> :
> : Cheers
> :
> : Sachin
> :
> :
> : This email and any attached files are confidential and copyright 
> protected. If you are not the addressee, any dissemination of this 
> communication is strictly prohibited. Unless otherwise expressly 
> agreed in writing, nothing stated in this communication shall be 
> legally binding.
> :
> : The ultimate parent company of the Atkins Group is WS Atkins plc.
> Registered in England No. 1885586.  Registered Office Woodcote Grove, 
> Ashley Road, Epsom, Surrey KT18 5BW.
> :
> : Consider the environment. Please don't print this e-mail unless you 
> really need to.
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> This message has been scanned for viruses by MailControl - (see
> http://bluepages.wsatkins.co.uk/?6875772)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search for a term in all fields

Posted by Erick Erickson <er...@gmail.com>.
I don't see what you're getting at. There are only two forms of a query
term,,,,
field:value
value

And the second is really the first with the default field you specified in
the parser implied. So just think of all terms you specify in a query as
field:term.

Having some "special character" in the index doesn't help you because you
still have to specify the field. And your two choices are still either a
BooleanQuery that mentions all fields or indexing the data into a single
field.

Best
Erick



On 2/21/07, Kainth, Sachin <Sa...@atkinsglobal.com> wrote:
>
> Well, here's my current thoughts on acheiveing this.  Instead of putting
> a 1000 space gap between elements of the 1ll field could I not use a
> character that isn't used in the data such as ~ and then somehow (don't
> know how) use that to search all fields?
>
> -----Original Message-----
> From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> Sent: 20 February 2007 18:30
> To: java-user@lucene.apache.org
> Subject: Re: Search for a term in all fields
>
>
> The information Erick gave you when you asked this question yesterday is
> all very accurate -- the one addition i would make is that you don't
> need SpanNear queries to take advantage of positionINcrimentGap --
> PhraseQueries do that to.
>
> Consolidating your fields into a single "all" field, or constructing a
> BoolenQuery across all of your existing fields are really the two main
> options -- each with their tradeoffs.
>
> http://www.nabble.com/Search-in-all-fields-tf3254569.html
>
> : Date: Tue, 20 Feb 2007 12:29:25 -0000
> : From: "Kainth, Sachin" <Sa...@atkinsglobal.com>
> : Reply-To: java-user@lucene.apache.org
> : To: java-user@lucene.apache.org
> : Subject: Search for a term in all fields
> :
> : Hi all,
> :
> : How do I search for a term in all fields of a document?
> :
> : Cheers
> :
> : Sachin
> :
> :
> : This email and any attached files are confidential and copyright
> protected. If you are not the addressee, any dissemination of this
> communication is strictly prohibited. Unless otherwise expressly agreed
> in writing, nothing stated in this communication shall be legally
> binding.
> :
> : The ultimate parent company of the Atkins Group is WS Atkins plc.
> Registered in England No. 1885586.  Registered Office Woodcote Grove,
> Ashley Road, Epsom, Surrey KT18 5BW.
> :
> : Consider the environment. Please don't print this e-mail unless you
> really need to.
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> This message has been scanned for viruses by MailControl - (see
> http://bluepages.wsatkins.co.uk/?6875772)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Search for a term in all fields

Posted by Chris Hostetter <ho...@fucit.org>.
: Well, here's my current thoughts on acheiveing this.  Instead of putting
: a 1000 space gap between elements of the 1ll field could I not use a
: character that isn't used in the data such as ~ and then somehow (don't
: know how) use that to search all fields?

you could certianly introduce an artifical token into the stream, and then
use a SpanNotQuery to ensure that SpanNearQueries don't cross that
artificial token -- but it doesn't get arround the fact that you need to
add all of your text to a single ield ... that's not the same as "search
all field" (you have to be very clear about your terminology, if you
really ant to "search all fields" you need a boolean query across all
fields -- if you want to "search all text" then you haveto put "all text"
into a single field)

the real question you have to ask is why do you think a special character
would be better then a positionIncrimentGap? ... adding the gap is free,
easy to do, doesn't affect your index size at all, and doesn't require any
special parsing or index knowledge at query time -- just do phrase queries
as normal with moderate slop.

searching the archive for "sentence boundary" will find you some
justifications as to why you might want special marker terms in your data
to query on if you are trying to do searhes confined to a single
page/paragrpah/sentence/etc...  but if you don't have special needsl ike
that, a position gap is trivial and logical.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Search for a term in all fields

Posted by "Kainth, Sachin" <Sa...@atkinsglobal.com>.
Well, here's my current thoughts on acheiveing this.  Instead of putting
a 1000 space gap between elements of the 1ll field could I not use a
character that isn't used in the data such as ~ and then somehow (don't
know how) use that to search all fields? 

-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: 20 February 2007 18:30
To: java-user@lucene.apache.org
Subject: Re: Search for a term in all fields


The information Erick gave you when you asked this question yesterday is
all very accurate -- the one addition i would make is that you don't
need SpanNear queries to take advantage of positionINcrimentGap --
PhraseQueries do that to.

Consolidating your fields into a single "all" field, or constructing a
BoolenQuery across all of your existing fields are really the two main
options -- each with their tradeoffs.

http://www.nabble.com/Search-in-all-fields-tf3254569.html

: Date: Tue, 20 Feb 2007 12:29:25 -0000
: From: "Kainth, Sachin" <Sa...@atkinsglobal.com>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Search for a term in all fields
:
: Hi all,
:
: How do I search for a term in all fields of a document?
:
: Cheers
:
: Sachin
:
:
: This email and any attached files are confidential and copyright
protected. If you are not the addressee, any dissemination of this
communication is strictly prohibited. Unless otherwise expressly agreed
in writing, nothing stated in this communication shall be legally
binding.
:
: The ultimate parent company of the Atkins Group is WS Atkins plc.
Registered in England No. 1885586.  Registered Office Woodcote Grove,
Ashley Road, Epsom, Surrey KT18 5BW.
:
: Consider the environment. Please don't print this e-mail unless you
really need to.
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



This message has been scanned for viruses by MailControl - (see
http://bluepages.wsatkins.co.uk/?6875772)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search for a term in all fields

Posted by Chris Hostetter <ho...@fucit.org>.
The information Erick gave you when you asked this question yesterday is
all very accurate -- the one addition i would make is that you don't need
SpanNear queries to take advantage of positionINcrimentGap --
PhraseQueries do that to.

Consolidating your fields into a single "all" field, or constructing a
BoolenQuery across all of your existing fields are really the two
main options -- each with their tradeoffs.

http://www.nabble.com/Search-in-all-fields-tf3254569.html

: Date: Tue, 20 Feb 2007 12:29:25 -0000
: From: "Kainth, Sachin" <Sa...@atkinsglobal.com>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Search for a term in all fields
:
: Hi all,
:
: How do I search for a term in all fields of a document?
:
: Cheers
:
: Sachin
:
:
: This email and any attached files are confidential and copyright protected. If you are not the addressee, any dissemination of this communication is strictly prohibited. Unless otherwise expressly agreed in writing, nothing stated in this communication shall be legally binding.
:
: The ultimate parent company of the Atkins Group is WS Atkins plc.  Registered in England No. 1885586.  Registered Office Woodcote Grove, Ashley Road, Epsom, Surrey KT18 5BW.
:
: Consider the environment. Please don't print this e-mail unless you really need to.
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org