You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Moray McConnachie <mm...@oxford-analytica.com> on 2004/02/27 11:16:37 UTC

Indexing multiple instances of the same field for each document

I note from previous entries on the mailing list and my own experiments that
you can add many entries to the same field for each document. Example: a
given document belongs to more than one product, ergo I index the product
field with values "PROD_A" and "PROD_B".

If I don't tokenise the fields when adding them to the document, then when
storing the values and printing them out before adding them to the index, so
I can see what the index is recording, I do indeed get

Keyword<product:PROD_A> Keyword <product:PROD_B>

However, a query on product:PROD_A returns no results, neither does a query
on product:PROD_B.

If I tokenize the fields (i.e. the document content reads
Text<product:PROD_A> Text<product:PROD_B), then it works correctly.

[n.b. I am using the .NET implementation of Lucene, but its behaviour is
said to be identical to the Java Lucene.]

1) Is this expected behaviour? 

If so, are multiple fields of the same name to a document silently converted
to a string/array representation of some kind?

2) Is it sensible behaviour?

I ask because it seems to me contrary to instinct, and also because my guess
would be that a Keyword index would be faster to add (and faster to query?)
than a Text index.

Yours,
Moray McConnachie
------------------------------------
Moray McConnachie, IT Manager
Oxford Analytica http://www.oxan.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Indexing multiple instances of the same field for each document

Posted by Boris Goldowsky <bo...@alum.mit.edu>.

On Fri, 2004-02-27 at 12:12, Roy Klein wrote:
> Doug,
> 
> The query results are different, I'm attaching my test code.
> 
> >> Also FYI, I found that phrase queries don't work against a field 
> >> that's been added multiple times. If I query the phrase "brown fox", 
> >> against the two example docs above, only the second matches.

> My results:
> Query: contents:quick contents:brown 
> Hits: 2 
> 1  
> 2  
> PQuery: 
> contents:"quick brown" 
> Phrase Hits: 1 
> 2 

By my intuition, what Lucene is doing here is the correct behavior.  Say
for a moment your field was author, and you index multiple authors by
having multiple occurrences of a Field named author.  So you might have:
author: John Smith
author: Mary Jones

now the query 
  author:smith author:jones 
should return this document, but the query 
  author:"john jones"
should not.  It would be unfortunate if this were "fixed" to return the
document with the phrasal query, since the two words in different fields
do not in fact occur as a phrase.

If you have "the quick brown fox..." as in your example, your indexing
code should combine them all into a single field before adding them to
the Document.

Just my humble opinion, of course...

Boris

-- 
Boris Goldowsky
boris@alum.mit.edu
www.goldowsky.com/consulting

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Indexing multiple instances of the same field for each document

Posted by Roy Klein <kl...@sitescape.com>.

Thanks Doug!

I was in the midst of testing my fix to it and noticed your checkin...

    Roy

-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org] 
Sent: Monday, March 01, 2004 12:33 PM
To: Lucene Users List
Subject: Re: Indexing multiple instances of the same field for each
document


Erik Hatcher wrote:
> On Feb 27, 2004, at 6:17 PM, Doug Cutting wrote:
> 
>> I think it's document.add().  Fields are pushed onto the front, 
>> rather
>> than added to the end.
> 
> 
> Ah, ok.... DocumentFieldList/DocumentFieldEnumeration are the 
> culprits.
> 
> This is certainly a bug.

Yes, a bug that's been there since the genesis of Lucene, six years ago.

  It is surprising that something like this could go so long unnoticed.

I just fixed this in CVS.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Indexing multiple instances of the same field for each document

Posted by Doug Cutting <cu...@apache.org>.

Erik Hatcher wrote:
> On Feb 27, 2004, at 6:17 PM, Doug Cutting wrote:
> 
>> I think it's document.add().  Fields are pushed onto the front, rather 
>> than added to the end.
> 
> 
> Ah, ok.... DocumentFieldList/DocumentFieldEnumeration are the culprits.
> 
> This is certainly a bug.

Yes, a bug that's been there since the genesis of Lucene, six years ago. 
  It is surprising that something like this could go so long unnoticed.

I just fixed this in CVS.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Indexing multiple instances of the same field for each document

Posted by Roy Klein <kl...@sitescape.com>.

Hi Markus,

What you're saying would work if I wasn't concerned about query
performance.

If I add the synonym's at document index time, then I only process the
word "quick" once (when I insert the doc into the index).

If I process each query to convert "fast" and "speedy" to "quick" at
query time, then I might wind up processing those words millions of
times. (once for each query)   Yes, I could come up with a cache so that
the processing is at a minimum, however, it still makes more sense to do
it once, at index time.

    Roy

-----Original Message-----
From: Markus Spath [mailto:mspath@arcor.de] 
Sent: Sunday, February 29, 2004 5:45 AM
To: Lucene Users List
Subject: Re: Indexing multiple instances of the same field for each
document


Roy Klein wrote:

> Erik,
> 
> Indexing a single field in chunks solves a design problem I'm working 
> on. It's not the only way to do it, but, it would certainly be the 
> most straightforward.  However, if using this method makes phrase 
> searching unusable, then I'll have to go another route.
> 

hmm, wouldn't it be easier to index only one term for a list of synomys
instead 
of indexing each synonym for one term?

quick, fast, speedy -> quick (both when building the index and building
the query)

this also would solve your problems with the (somehow counterintuative
but 
probably well reasoned) behaviour of lucene to add Fields with the same
name at 
the beginning instead of appending them.


Markus

> Here's a brief example of the type of thing I'm trying to do:
> 
> I have a file that contains the words:
> 
> The quick brown fox jumped over the lazy dog.
> 
> I run that file through a utility that produces the following xml
> document:
> <document>
>   <field name=wordposition1>
>     <word>The</word>
>   </field>
>   <field name=wordposition2>
>     <word>quick</word>
>     <word>fast</word>
>     <word>speedy</word>
>   </field>
>   <field name=wordposition3>
>     <word>brown</word>
>     <word>tan</word>
>     <word>dark</word>
>   </field>
>   .
>   .
>   .
> 
> I parse that document (via the digester), and add all the words from 
> each of the fields to one lucene field: "contents".  The tricky part 
> is that I want to have each word position contain all the words at 
> that position in the lucene index.  I.e. word location 1 in the index 
> contains "The", word location 2: "quick, fast, and speedy", word 
> location 3: "brown, tan, and dark", etc.
> 
> That way, all the following phrase queries will match this document:
> 	"fast tan"
> 	"quick brown"
>       "fast brown"
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Indexing multiple instances of the same field for each document

Posted by Markus Spath <ms...@arcor.de>.

Roy Klein wrote:

> Erik,
> 
> Indexing a single field in chunks solves a design problem I'm working
> on. It's not the only way to do it, but, it would certainly be the most
> straightforward.  However, if using this method makes phrase searching
> unusable, then I'll have to go another route.
> 

hmm, wouldn't it be easier to index only one term for a list of synomys instead 
of indexing each synonym for one term?

quick, fast, speedy -> quick (both when building the index and building the query)

this also would solve your problems with the (somehow counterintuative but 
probably well reasoned) behaviour of lucene to add Fields with the same name at 
the beginning instead of appending them.


Markus

> Here's a brief example of the type of thing I'm trying to do:
> 
> I have a file that contains the words:
> 
> The quick brown fox jumped over the lazy dog.
> 
> I run that file through a utility that produces the following xml
> document:
> <document>
>   <field name=wordposition1>
>     <word>The</word>
>   </field>
>   <field name=wordposition2>
>     <word>quick</word>
>     <word>fast</word>
>     <word>speedy</word>
>   </field>
>   <field name=wordposition3>
>     <word>brown</word>
>     <word>tan</word>
>     <word>dark</word>
>   </field>
>   .
>   .
>   .
> 
> I parse that document (via the digester), and add all the words from
> each of the fields to one lucene field: "contents".  The tricky part is
> that I want to have each word position contain all the words at that
> position in the lucene index.  I.e. word location 1 in the index
> contains "The", word location 2: "quick, fast, and speedy", word
> location 3: "brown, tan, and dark", etc.
> 
> That way, all the following phrase queries will match this document:
> 	"fast tan"
> 	"quick brown"
>       "fast brown"
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Indexing multiple instances of the same field for each document

Posted by Roy Klein <kl...@sitescape.com>.

I don't have access to the process that created the XML, it was done in
the past.

As I stated in the beginning of this thread, this is just an example of
the type of thing I'm trying to accomplish.

I think the real issue herein is that the fields are being inserted in
reverse order.  Here's the comments in the code (for Document.add()):

  /** Adds a field to a document.  Several fields may be added with
   * the same name.  In this case, if the fields are indexed, their text
is
   * treated as though appended for the purposes of search. */

I guess it doesn't specify the order they're appended, however, when I
read that comment, I thought that it meant "in the order added".  It's a
pretty simple change to the Document class to make this work as I'd
expect it.  From Doug's initial response, I think he expected this
behavior as well.

Thanks again for all your help!

    Roy

-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
Sent: Sunday, February 29, 2004 9:10 AM
To: Lucene Users List
Subject: Re: Indexing multiple instances of the same field for each
document

What you are doing is really the job of an Analyzer.  You are doing 
pre-analysis, when instead you could do all of this within the context 
of a custom analyzer and avoid many of these issues altogether.

Do you use the XML only during indexing?  If so, you could bypass the 
whole conversion to XML and then back through Digester all within an 
analyzer.

Or am I missing something that prevents you from doing it this way?

	Erik

On Feb 28, 2004, at 10:05 PM, Roy Klein wrote:
> Erik,
> Here's a brief example of the type of thing I'm trying to do:
>
> I have a file that contains the words:
>
> The quick brown fox jumped over the lazy dog.
>
> I run that file through a utility that produces the following xml
> document:
> <document>
>   <field name=wordposition1>
>     <word>The</word>
>   </field>
>   <field name=wordposition2>
>     <word>quick</word>
>     <word>fast</word>
>     <word>speedy</word>
>   </field>
>   <field name=wordposition3>
>     <word>brown</word>
>     <word>tan</word>
>     <word>dark</word>
>   </field>
>   .
>   .
>   .
>
> I parse that document (via the digester), and add all the words from 
> each of the fields to one lucene field: "contents".  The tricky part 
> is that I want to have each word position contain all the words at 
> that position in the lucene index.  I.e. word location 1 in the index 
> contains "The", word location 2: "quick, fast, and speedy", word 
> location 3: "brown, tan, and dark", etc.
>
> That way, all the following phrase queries will match this document:
> 	"fast tan"
> 	"quick brown"
>       "fast brown"
>
> I wrote a "TermAnalyzer" that adds all the words from a field into the

> index at the same position. (via setPositionIncrement(0)).  That way I

> can simply add each set of words to the "contents" field, and it'll 
> just keep adding them to the same field.  However, since it's 
> reversing them,
> I can't match phrases.
>
>
>     Roy

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Indexing multiple instances of the same field for each document

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

What you are doing is really the job of an Analyzer.  You are doing 
pre-analysis, when instead you could do all of this within the context 
of a custom analyzer and avoid many of these issues altogether.

Do you use the XML only during indexing?  If so, you could bypass the 
whole conversion to XML and then back through Digester all within an 
analyzer.

Or am I missing something that prevents you from doing it this way?

	Erik


On Feb 28, 2004, at 10:05 PM, Roy Klein wrote:
> Erik,
> Here's a brief example of the type of thing I'm trying to do:
>
> I have a file that contains the words:
>
> The quick brown fox jumped over the lazy dog.
>
> I run that file through a utility that produces the following xml
> document:
> <document>
>   <field name=wordposition1>
>     <word>The</word>
>   </field>
>   <field name=wordposition2>
>     <word>quick</word>
>     <word>fast</word>
>     <word>speedy</word>
>   </field>
>   <field name=wordposition3>
>     <word>brown</word>
>     <word>tan</word>
>     <word>dark</word>
>   </field>
>   .
>   .
>   .
>
> I parse that document (via the digester), and add all the words from
> each of the fields to one lucene field: "contents".  The tricky part is
> that I want to have each word position contain all the words at that
> position in the lucene index.  I.e. word location 1 in the index
> contains "The", word location 2: "quick, fast, and speedy", word
> location 3: "brown, tan, and dark", etc.
>
> That way, all the following phrase queries will match this document:
> 	"fast tan"
> 	"quick brown"
>       "fast brown"
>
> I wrote a "TermAnalyzer" that adds all the words from a field into the
> index at the same position. (via setPositionIncrement(0)).  That way I
> can simply add each set of words to the "contents" field, and it'll 
> just
> keep adding them to the same field.  However, since it's reversing 
> them,
> I can't match phrases.
>
>
>     Roy


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Indexing multiple instances of the same field for each document

Posted by Roy Klein <kl...@sitescape.com>.

Erik,

Indexing a single field in chunks solves a design problem I'm working
on. It's not the only way to do it, but, it would certainly be the most
straightforward.  However, if using this method makes phrase searching
unusable, then I'll have to go another route.

Here's a brief example of the type of thing I'm trying to do:

I have a file that contains the words:

The quick brown fox jumped over the lazy dog.

I run that file through a utility that produces the following xml
document:
<document>
  <field name=wordposition1>
    <word>The</word>
  </field>
  <field name=wordposition2>
    <word>quick</word>
    <word>fast</word>
    <word>speedy</word>
  </field>
  <field name=wordposition3>
    <word>brown</word>
    <word>tan</word>
    <word>dark</word>
  </field>
  .
  .
  .

I parse that document (via the digester), and add all the words from
each of the fields to one lucene field: "contents".  The tricky part is
that I want to have each word position contain all the words at that
position in the lucene index.  I.e. word location 1 in the index
contains "The", word location 2: "quick, fast, and speedy", word
location 3: "brown, tan, and dark", etc.

That way, all the following phrase queries will match this document:
	"fast tan"
	"quick brown"
      "fast brown"

I wrote a "TermAnalyzer" that adds all the words from a field into the
index at the same position. (via setPositionIncrement(0)).  That way I
can simply add each set of words to the "contents" field, and it'll just
keep adding them to the same field.  However, since it's reversing them,
I can't match phrases.


    Roy

(I just looked at the Document class, seems like it shouldn't be that
difficult to make the DocumentFieldList add new fields onto the end
instead of the beginning of the list.  I'll try to change it, and submit
a fix once I get it working.)


-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
Sent: Friday, February 27, 2004 10:28 PM
To: Lucene Users List
Subject: Re: Indexing multiple instances of the same field for each
document


>I don't personally see why you would index text in chunks like this 
>rather than aggregating it all into a Reader or String, so certainly 
>this is an uncommon usage pattern.
>
>	Erik



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Indexing multiple instances of the same field for each document

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Feb 27, 2004, at 6:17 PM, Doug Cutting wrote:
> I think it's document.add().  Fields are pushed onto the front, rather 
> than added to the end.

Ah, ok.... DocumentFieldList/DocumentFieldEnumeration are the culprits.

This is certainly a bug.  With things going in reverse order as they 
are now, a PhraseQuery for "brown quick" matches this document:

     doc.add(Field.Keyword("contents", "quick"));
     doc.add(Field.Keyword("contents", "brown"));

There is merit to what Boris said about phrase queries not matching 
across, but if that effect is desired the position increments can be 
adjusted somehow (but how could someone do this?  a stateful analyzer?)

I don't personally see why you would index text in chunks like this 
rather than aggregating it all into a Reader or String, so certainly 
this is an uncommon usage pattern.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Field boosting Was: Indexing multiple instances of the same field for each document

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.

Cheers, I index information in chunks. The reason for this is that I have
an IR tool that returns information ordered by confidence rather than by
the fields I index. I just add fields as they come, but I would be
interested in knowing how other people deal with confidence. Following
your answer, I can't add my confidence as boosts to the terms as I index
them, do you have any suggestions? I'm guessing that I'll probably have to
add multiple copies of my fields to simulate boosting.

sv

On Fri, 27 Feb 2004, Erik Hatcher wrote:

> On Feb 27, 2004, at 6:26 PM, Stephane James Vaucher wrote:
> > Slightly off topic to this thread, but how would adding different
> > fields
> > with the same name deal with boosts? I've looked at the javadoc and
> > FAQ,
> > but I think it's not a common use of this feature, any insight?
>
> There is only one boost per field name.  However, the effect is the
> multiplication of them all interestingly.  So, in your example below,
> the boost of the "fieldName" is 2.
>
> 	Erik
>
> >
> > E.G.
> > Document doc = new Document();
> > Field f1 = Field.Keyword("fieldName", "foo");
> > f1.setBoost(1);
> > doc.add(f1);
> >
> > Field f2 = Field.Keyword("fieldName", "bar");
> > f2.setBoost(2);
> > doc.add(f2);
> >
> > Cheers,
> > sv
> >
> > On Fri, 27 Feb 2004, Doug Cutting wrote:
> >
> >> I think it's document.add().  Fields are pushed onto the front, rather
> >> than added to the end.
> >>
> >> Doug
> >>
> >> Roy Klein wrote:
> >>> I think it's got something to do with Document.invertDocument().
> >>>
> >>> When I reverse the words in the phrase, the other document matches
> >>> the
> >>> phrase query.
> >>>
> >>>     Roy
> >>>
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> >>> Sent: Friday, February 27, 2004 4:34 PM
> >>> To: Lucene Users List
> >>> Subject: Re: Indexing multiple instances of the same field for each
> >>> document
> >>>
> >>>
> >>> On Feb 27, 2004, at 4:10 PM, Roy Klein wrote:
> >>>
> >>>> Hi Erik,
> >>>>
> >>>> While you might be right in this example (using Field.Keyword), I
> >>>> can
> >>>> see how this would still be a problem in other cases. For instance,
> >>>> if
> >>>
> >>>
> >>>> I were adding more than one word at a time in the example I
> >>>> attached.
> >>>
> >>>
> >>> I concur that it appears to be a bug.  It is unlikely folks use
> >>> Lucene
> >>> like this too much though - there probably are not too many scenarios
> >>> where combining things into a single String or Reader is a burden.
> >>>
> >>> I'm interested to know where in the code this oddity occurs so I can
> >>> understand it more.  I did a brief bit of troubleshooting but haven't
> >>> figured it out yet.  Something in DocumentWriter I presume.
> >>>
> >>> 	Erik
> >>>
> >>>
> >>>
> >>>
> >>>>    Roy
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> >>>> Sent: Friday, February 27, 2004 2:12 PM
> >>>> To: Lucene Users List
> >>>> Subject: Re: Indexing multiple instances of the same field for each
> >>>> document
> >>>>
> >>>>
> >>>> Roy,
> >>>>
> >>>> On Feb 27, 2004, at 12:12 PM, Roy Klein wrote:
> >>>>
> >>>>>        Document doc = new Document();
> >>>>>        doc.add(Field.Text("contents", "the"));
> >>>>
> >>>> Changing these to Field.Keyword gets it to work.  I'm delving a
> >>>> little
> >>>
> >>>
> >>>> bit to understand why, but it seems if you are adding words
> >>>> individually anyway you'd want them to be untokenized, right?
> >>>>
> >>>> 	Erik
> >>>>
> >>>>
> >>>>
> >>>>>        doc.add(Field.Text("contents", "quick"));
> >>>>>        doc.add(Field.Text("contents", "brown"));
> >>>>>        doc.add(Field.Text("contents", "fox"));
> >>>>>        doc.add(Field.Text("contents", "jumped"));
> >>>>>        doc.add(Field.Text("contents", "over"));
> >>>>>        doc.add(Field.Text("contents", "the"));
> >>>>>        doc.add(Field.Text("contents", "lazy"));
> >>>>>        doc.add(Field.Text("contents", "dogs"));
> >>>>>        doc.add(Field.Keyword("docnumber", "1"));
> >>>>>        writer.addDocument(doc);
> >>>>>        doc = new Document();
> >>>>>        doc.add(Field.Text("contents", "the quick brown fox jumped
> >>>>> over the lazy dogs"));
> >>>>>        doc.add(Field.Keyword("docnumber", "2"));
> >>>>>        writer.addDocument(doc);
> >>>>>        writer.close();
> >>>>>    }
> >>>>>
> >>>>>    public static void query(File indexDir) throws IOException
> >>>>>    {
> >>>>>        Query query = null;
> >>>>>        PhraseQuery pquery = new PhraseQuery();
> >>>>>        Hits hits = null;
> >>>>>
> >>>>>        try {
> >>>>>            query = QueryParser.parse("quick brown", "contents", new
> >>>>> StandardAnalyzer());
> >>>>>        } catch (Exception qe) {System.out.println(qe.toString());}
> >>>>>        if (query == null) return;
> >>>>>        System.out.println("Query: " + query.toString());
> >>>>>        IndexReader reader = IndexReader.open(indexDir);
> >>>>>        IndexSearcher searcher = new IndexSearcher(reader);
> >>>>>
> >>>>>        hits = searcher.search(query);
> >>>>>        System.out.println("Hits: " + hits.length());
> >>>>>
> >>>>>        for (int i = 0; i < hits.length(); i++)
> >>>>>        {
> >>>>>            System.out.println( hits.doc(i).get("docnumber") + " ");
> >>>>>        }
> >>>>>
> >>>>>
> >>>>>        pquery.add(new Term("contents", "quick"));
> >>>>>        pquery.add(new Term("contents", "brown"));
> >>>>>        System.out.println("PQuery: " + pquery.toString());
> >>>>>        hits = searcher.search(pquery);
> >>>>>        System.out.println("Phrase Hits: " + hits.length());
> >>>>>        for (int i = 0; i < hits.length(); i++)
> >>>>>        {
> >>>>>            System.out.println( hits.doc(i).get("docnumber") + " ");
> >>>>>        }
> >>>>>
> >>>>>        searcher.close();
> >>>>>        reader.close();
> >>>>>
> >>>>>    }
> >>>>>    public static void main(String[] args) throws Exception {
> >>>>>        if (args.length != 1) {
> >>>>>            throw new Exception("Usage: " + test.class.getName() + "
> >>>>> <index dir>");
> >>>>>        }
> >>>>>        File indexDir = new File(args[0]);
> >>>>>        test(indexDir);
> >>>>>        query(indexDir);
> >>>>>    }
> >>>>> }
> >>>>>
> >>>>> -------------------------------------------------------------------
> >>>>> --
> >>>>> -
> >>>>> -
> >>>>> -
> >>>>> -------
> >>>>> My results:
> >>>>> Query: contents:quick contents:brown
> >>>>> Hits: 2
> >>>>> 1
> >>>>> 2
> >>>>> PQuery:
> >>>>> contents:"quick brown"
> >>>>> Phrase Hits: 1
> >>>>> 2
> >>>>>
> >>>>>
> >>>>> -------------------------------------------------------------------
> >>>>> --
> >>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>>> For additional commands, e-mail:
> >>>>> lucene-user-help@jakarta.apache.org
> >>>>
> >>>>
> >>>> --------------------------------------------------------------------
> >>>> -
> >>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>>>
> >>>>
> >>>> --------------------------------------------------------------------
> >>>> -
> >>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Field boosting Was: Indexing multiple instances of the same field for each document

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Feb 27, 2004, at 6:26 PM, Stephane James Vaucher wrote:
> Slightly off topic to this thread, but how would adding different  
> fields
> with the same name deal with boosts? I've looked at the javadoc and  
> FAQ,
> but I think it's not a common use of this feature, any insight?

There is only one boost per field name.  However, the effect is the  
multiplication of them all interestingly.  So, in your example below,  
the boost of the "fieldName" is 2.

	Erik

>
> E.G.
> Document doc = new Document();
> Field f1 = Field.Keyword("fieldName", "foo");
> f1.setBoost(1);
> doc.add(f1);
>
> Field f2 = Field.Keyword("fieldName", "bar");
> f2.setBoost(2);
> doc.add(f2);
>
> Cheers,
> sv
>
> On Fri, 27 Feb 2004, Doug Cutting wrote:
>
>> I think it's document.add().  Fields are pushed onto the front, rather
>> than added to the end.
>>
>> Doug
>>
>> Roy Klein wrote:
>>> I think it's got something to do with Document.invertDocument().
>>>
>>> When I reverse the words in the phrase, the other document matches  
>>> the
>>> phrase query.
>>>
>>>     Roy
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
>>> Sent: Friday, February 27, 2004 4:34 PM
>>> To: Lucene Users List
>>> Subject: Re: Indexing multiple instances of the same field for each
>>> document
>>>
>>>
>>> On Feb 27, 2004, at 4:10 PM, Roy Klein wrote:
>>>
>>>> Hi Erik,
>>>>
>>>> While you might be right in this example (using Field.Keyword), I  
>>>> can
>>>> see how this would still be a problem in other cases. For instance,  
>>>> if
>>>
>>>
>>>> I were adding more than one word at a time in the example I  
>>>> attached.
>>>
>>>
>>> I concur that it appears to be a bug.  It is unlikely folks use  
>>> Lucene
>>> like this too much though - there probably are not too many scenarios
>>> where combining things into a single String or Reader is a burden.
>>>
>>> I'm interested to know where in the code this oddity occurs so I can
>>> understand it more.  I did a brief bit of troubleshooting but haven't
>>> figured it out yet.  Something in DocumentWriter I presume.
>>>
>>> 	Erik
>>>
>>>
>>>
>>>
>>>>    Roy
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
>>>> Sent: Friday, February 27, 2004 2:12 PM
>>>> To: Lucene Users List
>>>> Subject: Re: Indexing multiple instances of the same field for each
>>>> document
>>>>
>>>>
>>>> Roy,
>>>>
>>>> On Feb 27, 2004, at 12:12 PM, Roy Klein wrote:
>>>>
>>>>>        Document doc = new Document();
>>>>>        doc.add(Field.Text("contents", "the"));
>>>>
>>>> Changing these to Field.Keyword gets it to work.  I'm delving a  
>>>> little
>>>
>>>
>>>> bit to understand why, but it seems if you are adding words
>>>> individually anyway you'd want them to be untokenized, right?
>>>>
>>>> 	Erik
>>>>
>>>>
>>>>
>>>>>        doc.add(Field.Text("contents", "quick"));
>>>>>        doc.add(Field.Text("contents", "brown"));
>>>>>        doc.add(Field.Text("contents", "fox"));
>>>>>        doc.add(Field.Text("contents", "jumped"));
>>>>>        doc.add(Field.Text("contents", "over"));
>>>>>        doc.add(Field.Text("contents", "the"));
>>>>>        doc.add(Field.Text("contents", "lazy"));
>>>>>        doc.add(Field.Text("contents", "dogs"));
>>>>>        doc.add(Field.Keyword("docnumber", "1"));
>>>>>        writer.addDocument(doc);
>>>>>        doc = new Document();
>>>>>        doc.add(Field.Text("contents", "the quick brown fox jumped
>>>>> over the lazy dogs"));
>>>>>        doc.add(Field.Keyword("docnumber", "2"));
>>>>>        writer.addDocument(doc);
>>>>>        writer.close();
>>>>>    }
>>>>>
>>>>>    public static void query(File indexDir) throws IOException
>>>>>    {
>>>>>        Query query = null;
>>>>>        PhraseQuery pquery = new PhraseQuery();
>>>>>        Hits hits = null;
>>>>>
>>>>>        try {
>>>>>            query = QueryParser.parse("quick brown", "contents", new
>>>>> StandardAnalyzer());
>>>>>        } catch (Exception qe) {System.out.println(qe.toString());}
>>>>>        if (query == null) return;
>>>>>        System.out.println("Query: " + query.toString());
>>>>>        IndexReader reader = IndexReader.open(indexDir);
>>>>>        IndexSearcher searcher = new IndexSearcher(reader);
>>>>>
>>>>>        hits = searcher.search(query);
>>>>>        System.out.println("Hits: " + hits.length());
>>>>>
>>>>>        for (int i = 0; i < hits.length(); i++)
>>>>>        {
>>>>>            System.out.println( hits.doc(i).get("docnumber") + " ");
>>>>>        }
>>>>>
>>>>>
>>>>>        pquery.add(new Term("contents", "quick"));
>>>>>        pquery.add(new Term("contents", "brown"));
>>>>>        System.out.println("PQuery: " + pquery.toString());
>>>>>        hits = searcher.search(pquery);
>>>>>        System.out.println("Phrase Hits: " + hits.length());
>>>>>        for (int i = 0; i < hits.length(); i++)
>>>>>        {
>>>>>            System.out.println( hits.doc(i).get("docnumber") + " ");
>>>>>        }
>>>>>
>>>>>        searcher.close();
>>>>>        reader.close();
>>>>>
>>>>>    }
>>>>>    public static void main(String[] args) throws Exception {
>>>>>        if (args.length != 1) {
>>>>>            throw new Exception("Usage: " + test.class.getName() + "
>>>>> <index dir>");
>>>>>        }
>>>>>        File indexDir = new File(args[0]);
>>>>>        test(indexDir);
>>>>>        query(indexDir);
>>>>>    }
>>>>> }
>>>>>
>>>>> ------------------------------------------------------------------- 
>>>>> --
>>>>> -
>>>>> -
>>>>> -
>>>>> -------
>>>>> My results:
>>>>> Query: contents:quick contents:brown
>>>>> Hits: 2
>>>>> 1
>>>>> 2
>>>>> PQuery:
>>>>> contents:"quick brown"
>>>>> Phrase Hits: 1
>>>>> 2
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------- 
>>>>> --
>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>> For additional commands, e-mail:  
>>>>> lucene-user-help@jakarta.apache.org
>>>>
>>>>
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>
>>>>
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Field boosting Was: Indexing multiple instances of the same field for each document

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.

Slightly off topic to this thread, but how would adding different fields 
with the same name deal with boosts? I've looked at the javadoc and FAQ, 
but I think it's not a common use of this feature, any insight?

E.G.
Document doc = new Document();
Field f1 = Field.Keyword("fieldName", "foo");
f1.setBoost(1);
doc.add(f1);

Field f2 = Field.Keyword("fieldName", "bar");
f2.setBoost(2);
doc.add(f2);

Cheers,
sv

On Fri, 27 Feb 2004, Doug Cutting wrote:

> I think it's document.add().  Fields are pushed onto the front, rather 
> than added to the end.
> 
> Doug
> 
> Roy Klein wrote:
> > I think it's got something to do with Document.invertDocument().
> > 
> > When I reverse the words in the phrase, the other document matches the
> > phrase query.
> > 
> >     Roy
> > 
> >    
> > 
> > -----Original Message-----
> > From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
> > Sent: Friday, February 27, 2004 4:34 PM
> > To: Lucene Users List
> > Subject: Re: Indexing multiple instances of the same field for each
> > document
> > 
> > 
> > On Feb 27, 2004, at 4:10 PM, Roy Klein wrote:
> > 
> >>Hi Erik,
> >>
> >>While you might be right in this example (using Field.Keyword), I can 
> >>see how this would still be a problem in other cases. For instance, if
> > 
> > 
> >>I were adding more than one word at a time in the example I attached.
> > 
> > 
> > I concur that it appears to be a bug.  It is unlikely folks use Lucene 
> > like this too much though - there probably are not too many scenarios 
> > where combining things into a single String or Reader is a burden.
> > 
> > I'm interested to know where in the code this oddity occurs so I can 
> > understand it more.  I did a brief bit of troubleshooting but haven't 
> > figured it out yet.  Something in DocumentWriter I presume.
> > 
> > 	Erik
> > 
> > 
> > 
> > 
> >>    Roy
> >>
> >>
> >>-----Original Message-----
> >>From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> >>Sent: Friday, February 27, 2004 2:12 PM
> >>To: Lucene Users List
> >>Subject: Re: Indexing multiple instances of the same field for each 
> >>document
> >>
> >>
> >>Roy,
> >>
> >>On Feb 27, 2004, at 12:12 PM, Roy Klein wrote:
> >>
> >>>        Document doc = new Document();
> >>>        doc.add(Field.Text("contents", "the"));
> >>
> >>Changing these to Field.Keyword gets it to work.  I'm delving a little
> > 
> > 
> >>bit to understand why, but it seems if you are adding words 
> >>individually anyway you'd want them to be untokenized, right?
> >>
> >>	Erik
> >>
> >>
> >>
> >>>        doc.add(Field.Text("contents", "quick"));
> >>>        doc.add(Field.Text("contents", "brown"));
> >>>        doc.add(Field.Text("contents", "fox"));
> >>>        doc.add(Field.Text("contents", "jumped"));
> >>>        doc.add(Field.Text("contents", "over"));
> >>>        doc.add(Field.Text("contents", "the"));
> >>>        doc.add(Field.Text("contents", "lazy"));
> >>>        doc.add(Field.Text("contents", "dogs"));
> >>>        doc.add(Field.Keyword("docnumber", "1"));
> >>>        writer.addDocument(doc);
> >>>        doc = new Document();
> >>>        doc.add(Field.Text("contents", "the quick brown fox jumped 
> >>>over the lazy dogs"));
> >>>        doc.add(Field.Keyword("docnumber", "2"));
> >>>        writer.addDocument(doc);
> >>>        writer.close();
> >>>    }
> >>>
> >>>    public static void query(File indexDir) throws IOException
> >>>    {
> >>>        Query query = null;
> >>>        PhraseQuery pquery = new PhraseQuery();
> >>>        Hits hits = null;
> >>>
> >>>        try {
> >>>            query = QueryParser.parse("quick brown", "contents", new 
> >>>StandardAnalyzer());
> >>>        } catch (Exception qe) {System.out.println(qe.toString());}
> >>>        if (query == null) return;
> >>>        System.out.println("Query: " + query.toString());
> >>>        IndexReader reader = IndexReader.open(indexDir);
> >>>        IndexSearcher searcher = new IndexSearcher(reader);
> >>>
> >>>        hits = searcher.search(query);
> >>>        System.out.println("Hits: " + hits.length());
> >>>
> >>>        for (int i = 0; i < hits.length(); i++)
> >>>        {
> >>>            System.out.println( hits.doc(i).get("docnumber") + " ");
> >>>        }
> >>>
> >>>
> >>>        pquery.add(new Term("contents", "quick"));
> >>>        pquery.add(new Term("contents", "brown"));
> >>>        System.out.println("PQuery: " + pquery.toString());
> >>>        hits = searcher.search(pquery);
> >>>        System.out.println("Phrase Hits: " + hits.length());
> >>>        for (int i = 0; i < hits.length(); i++)
> >>>        {
> >>>            System.out.println( hits.doc(i).get("docnumber") + " ");
> >>>        }
> >>>
> >>>        searcher.close();
> >>>        reader.close();
> >>>
> >>>    }
> >>>    public static void main(String[] args) throws Exception {
> >>>        if (args.length != 1) {
> >>>            throw new Exception("Usage: " + test.class.getName() + " 
> >>><index dir>");
> >>>        }
> >>>        File indexDir = new File(args[0]);
> >>>        test(indexDir);
> >>>        query(indexDir);
> >>>    }
> >>>}
> >>>
> >>>---------------------------------------------------------------------
> >>>-
> >>>-
> >>>-
> >>>-------
> >>>My results:
> >>>Query: contents:quick contents:brown
> >>>Hits: 2
> >>>1
> >>>2
> >>>PQuery:
> >>>contents:"quick brown"
> >>>Phrase Hits: 1
> >>>2
> >>>
> >>>
> >>>---------------------------------------------------------------------
> >>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Indexing multiple instances of the same field for each document

Posted by Doug Cutting <cu...@apache.org>.

I think it's document.add().  Fields are pushed onto the front, rather 
than added to the end.

Doug

Roy Klein wrote:
> I think it's got something to do with Document.invertDocument().
> 
> When I reverse the words in the phrase, the other document matches the
> phrase query.
> 
>     Roy
> 
>    
> 
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
> Sent: Friday, February 27, 2004 4:34 PM
> To: Lucene Users List
> Subject: Re: Indexing multiple instances of the same field for each
> document
> 
> 
> On Feb 27, 2004, at 4:10 PM, Roy Klein wrote:
> 
>>Hi Erik,
>>
>>While you might be right in this example (using Field.Keyword), I can 
>>see how this would still be a problem in other cases. For instance, if
> 
> 
>>I were adding more than one word at a time in the example I attached.
> 
> 
> I concur that it appears to be a bug.  It is unlikely folks use Lucene 
> like this too much though - there probably are not too many scenarios 
> where combining things into a single String or Reader is a burden.
> 
> I'm interested to know where in the code this oddity occurs so I can 
> understand it more.  I did a brief bit of troubleshooting but haven't 
> figured it out yet.  Something in DocumentWriter I presume.
> 
> 	Erik
> 
> 
> 
> 
>>    Roy
>>
>>
>>-----Original Message-----
>>From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
>>Sent: Friday, February 27, 2004 2:12 PM
>>To: Lucene Users List
>>Subject: Re: Indexing multiple instances of the same field for each 
>>document
>>
>>
>>Roy,
>>
>>On Feb 27, 2004, at 12:12 PM, Roy Klein wrote:
>>
>>>        Document doc = new Document();
>>>        doc.add(Field.Text("contents", "the"));
>>
>>Changing these to Field.Keyword gets it to work.  I'm delving a little
> 
> 
>>bit to understand why, but it seems if you are adding words 
>>individually anyway you'd want them to be untokenized, right?
>>
>>	Erik
>>
>>
>>
>>>        doc.add(Field.Text("contents", "quick"));
>>>        doc.add(Field.Text("contents", "brown"));
>>>        doc.add(Field.Text("contents", "fox"));
>>>        doc.add(Field.Text("contents", "jumped"));
>>>        doc.add(Field.Text("contents", "over"));
>>>        doc.add(Field.Text("contents", "the"));
>>>        doc.add(Field.Text("contents", "lazy"));
>>>        doc.add(Field.Text("contents", "dogs"));
>>>        doc.add(Field.Keyword("docnumber", "1"));
>>>        writer.addDocument(doc);
>>>        doc = new Document();
>>>        doc.add(Field.Text("contents", "the quick brown fox jumped 
>>>over the lazy dogs"));
>>>        doc.add(Field.Keyword("docnumber", "2"));
>>>        writer.addDocument(doc);
>>>        writer.close();
>>>    }
>>>
>>>    public static void query(File indexDir) throws IOException
>>>    {
>>>        Query query = null;
>>>        PhraseQuery pquery = new PhraseQuery();
>>>        Hits hits = null;
>>>
>>>        try {
>>>            query = QueryParser.parse("quick brown", "contents", new 
>>>StandardAnalyzer());
>>>        } catch (Exception qe) {System.out.println(qe.toString());}
>>>        if (query == null) return;
>>>        System.out.println("Query: " + query.toString());
>>>        IndexReader reader = IndexReader.open(indexDir);
>>>        IndexSearcher searcher = new IndexSearcher(reader);
>>>
>>>        hits = searcher.search(query);
>>>        System.out.println("Hits: " + hits.length());
>>>
>>>        for (int i = 0; i < hits.length(); i++)
>>>        {
>>>            System.out.println( hits.doc(i).get("docnumber") + " ");
>>>        }
>>>
>>>
>>>        pquery.add(new Term("contents", "quick"));
>>>        pquery.add(new Term("contents", "brown"));
>>>        System.out.println("PQuery: " + pquery.toString());
>>>        hits = searcher.search(pquery);
>>>        System.out.println("Phrase Hits: " + hits.length());
>>>        for (int i = 0; i < hits.length(); i++)
>>>        {
>>>            System.out.println( hits.doc(i).get("docnumber") + " ");
>>>        }
>>>
>>>        searcher.close();
>>>        reader.close();
>>>
>>>    }
>>>    public static void main(String[] args) throws Exception {
>>>        if (args.length != 1) {
>>>            throw new Exception("Usage: " + test.class.getName() + " 
>>><index dir>");
>>>        }
>>>        File indexDir = new File(args[0]);
>>>        test(indexDir);
>>>        query(indexDir);
>>>    }
>>>}
>>>
>>>---------------------------------------------------------------------
>>>-
>>>-
>>>-
>>>-------
>>>My results:
>>>Query: contents:quick contents:brown
>>>Hits: 2
>>>1
>>>2
>>>PQuery:
>>>contents:"quick brown"
>>>Phrase Hits: 1
>>>2
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Indexing multiple instances of the same field for each document

Posted by Roy Klein <kl...@sitescape.com>.

I think it's got something to do with Document.invertDocument().

When I reverse the words in the phrase, the other document matches the
phrase query.

    Roy

   

-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
Sent: Friday, February 27, 2004 4:34 PM
To: Lucene Users List
Subject: Re: Indexing multiple instances of the same field for each
document


On Feb 27, 2004, at 4:10 PM, Roy Klein wrote:
> Hi Erik,
>
> While you might be right in this example (using Field.Keyword), I can 
> see how this would still be a problem in other cases. For instance, if

> I were adding more than one word at a time in the example I attached.

I concur that it appears to be a bug.  It is unlikely folks use Lucene 
like this too much though - there probably are not too many scenarios 
where combining things into a single String or Reader is a burden.

I'm interested to know where in the code this oddity occurs so I can 
understand it more.  I did a brief bit of troubleshooting but haven't 
figured it out yet.  Something in DocumentWriter I presume.

	Erik



>
>     Roy
>
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Friday, February 27, 2004 2:12 PM
> To: Lucene Users List
> Subject: Re: Indexing multiple instances of the same field for each 
> document
>
>
> Roy,
>
> On Feb 27, 2004, at 12:12 PM, Roy Klein wrote:
>>         Document doc = new Document();
>>         doc.add(Field.Text("contents", "the"));
>
> Changing these to Field.Keyword gets it to work.  I'm delving a little

> bit to understand why, but it seems if you are adding words 
> individually anyway you'd want them to be untokenized, right?
>
> 	Erik
>
>
>>         doc.add(Field.Text("contents", "quick"));
>>         doc.add(Field.Text("contents", "brown"));
>>         doc.add(Field.Text("contents", "fox"));
>>         doc.add(Field.Text("contents", "jumped"));
>>         doc.add(Field.Text("contents", "over"));
>>         doc.add(Field.Text("contents", "the"));
>>         doc.add(Field.Text("contents", "lazy"));
>>         doc.add(Field.Text("contents", "dogs"));
>>         doc.add(Field.Keyword("docnumber", "1"));
>>         writer.addDocument(doc);
>>         doc = new Document();
>>         doc.add(Field.Text("contents", "the quick brown fox jumped 
>> over the lazy dogs"));
>>         doc.add(Field.Keyword("docnumber", "2"));
>>         writer.addDocument(doc);
>>         writer.close();
>>     }
>>
>>     public static void query(File indexDir) throws IOException
>>     {
>>         Query query = null;
>>         PhraseQuery pquery = new PhraseQuery();
>>         Hits hits = null;
>>
>>         try {
>>             query = QueryParser.parse("quick brown", "contents", new 
>> StandardAnalyzer());
>>         } catch (Exception qe) {System.out.println(qe.toString());}
>>         if (query == null) return;
>>         System.out.println("Query: " + query.toString());
>>         IndexReader reader = IndexReader.open(indexDir);
>>         IndexSearcher searcher = new IndexSearcher(reader);
>>
>>         hits = searcher.search(query);
>>         System.out.println("Hits: " + hits.length());
>>
>>         for (int i = 0; i < hits.length(); i++)
>>         {
>>             System.out.println( hits.doc(i).get("docnumber") + " ");
>>         }
>>
>>
>>         pquery.add(new Term("contents", "quick"));
>>         pquery.add(new Term("contents", "brown"));
>>         System.out.println("PQuery: " + pquery.toString());
>>         hits = searcher.search(pquery);
>>         System.out.println("Phrase Hits: " + hits.length());
>>         for (int i = 0; i < hits.length(); i++)
>>         {
>>             System.out.println( hits.doc(i).get("docnumber") + " ");
>>         }
>>
>>         searcher.close();
>>         reader.close();
>>
>>     }
>>     public static void main(String[] args) throws Exception {
>>         if (args.length != 1) {
>>             throw new Exception("Usage: " + test.class.getName() + " 
>> <index dir>");
>>         }
>>         File indexDir = new File(args[0]);
>>         test(indexDir);
>>         query(indexDir);
>>     }
>> }
>>
>> ---------------------------------------------------------------------
>> -
>> -
>> -
>> -------
>> My results:
>> Query: contents:quick contents:brown
>> Hits: 2
>> 1
>> 2
>> PQuery:
>> contents:"quick brown"
>> Phrase Hits: 1
>> 2
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Indexing multiple instances of the same field for each document

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Feb 27, 2004, at 4:10 PM, Roy Klein wrote:
> Hi Erik,
>
> While you might be right in this example (using Field.Keyword), I can
> see how this would still be a problem in other cases. For instance, if 
> I
> were adding more than one word at a time in the example I attached.

I concur that it appears to be a bug.  It is unlikely folks use Lucene 
like this too much though - there probably are not too many scenarios 
where combining things into a single String or Reader is a burden.

I'm interested to know where in the code this oddity occurs so I can 
understand it more.  I did a brief bit of troubleshooting but haven't 
figured it out yet.  Something in DocumentWriter I presume.

	Erik



>
>     Roy
>
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Friday, February 27, 2004 2:12 PM
> To: Lucene Users List
> Subject: Re: Indexing multiple instances of the same field for each
> document
>
>
> Roy,
>
> On Feb 27, 2004, at 12:12 PM, Roy Klein wrote:
>>         Document doc = new Document();
>>         doc.add(Field.Text("contents", "the"));
>
> Changing these to Field.Keyword gets it to work.  I'm delving a little
> bit to understand why, but it seems if you are adding words
> individually anyway you'd want them to be untokenized, right?
>
> 	Erik
>
>
>>         doc.add(Field.Text("contents", "quick"));
>>         doc.add(Field.Text("contents", "brown"));
>>         doc.add(Field.Text("contents", "fox"));
>>         doc.add(Field.Text("contents", "jumped"));
>>         doc.add(Field.Text("contents", "over"));
>>         doc.add(Field.Text("contents", "the"));
>>         doc.add(Field.Text("contents", "lazy"));
>>         doc.add(Field.Text("contents", "dogs"));
>>         doc.add(Field.Keyword("docnumber", "1"));
>>         writer.addDocument(doc);
>>         doc = new Document();
>>         doc.add(Field.Text("contents", "the quick brown fox jumped
>> over the lazy dogs"));
>>         doc.add(Field.Keyword("docnumber", "2"));
>>         writer.addDocument(doc);
>>         writer.close();
>>     }
>>
>>     public static void query(File indexDir) throws IOException
>>     {
>>         Query query = null;
>>         PhraseQuery pquery = new PhraseQuery();
>>         Hits hits = null;
>>
>>         try {
>>             query = QueryParser.parse("quick brown", "contents", new
>> StandardAnalyzer());
>>         } catch (Exception qe) {System.out.println(qe.toString());}
>>         if (query == null) return;
>>         System.out.println("Query: " + query.toString());
>>         IndexReader reader = IndexReader.open(indexDir);
>>         IndexSearcher searcher = new IndexSearcher(reader);
>>
>>         hits = searcher.search(query);
>>         System.out.println("Hits: " + hits.length());
>>
>>         for (int i = 0; i < hits.length(); i++)
>>         {
>>             System.out.println( hits.doc(i).get("docnumber") + " ");
>>         }
>>
>>
>>         pquery.add(new Term("contents", "quick"));
>>         pquery.add(new Term("contents", "brown"));
>>         System.out.println("PQuery: " + pquery.toString());
>>         hits = searcher.search(pquery);
>>         System.out.println("Phrase Hits: " + hits.length());
>>         for (int i = 0; i < hits.length(); i++)
>>         {
>>             System.out.println( hits.doc(i).get("docnumber") + " ");
>>         }
>>
>>         searcher.close();
>>         reader.close();
>>
>>     }
>>     public static void main(String[] args) throws Exception {
>>         if (args.length != 1) {
>>             throw new Exception("Usage: " + test.class.getName() + "
>> <index dir>");
>>         }
>>         File indexDir = new File(args[0]);
>>         test(indexDir);
>>         query(indexDir);
>>     }
>> }
>>
>> ----------------------------------------------------------------------
>> -
>> -
>> -------
>> My results:
>> Query: contents:quick contents:brown
>> Hits: 2
>> 1
>> 2
>> PQuery:
>> contents:"quick brown"
>> Phrase Hits: 1
>> 2
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Indexing multiple instances of the same field for each document

Posted by Roy Klein <kl...@sitescape.com>.

Hi Erik,

While you might be right in this example (using Field.Keyword), I can
see how this would still be a problem in other cases. For instance, if I
were adding more than one word at a time in the example I attached.

    Roy


-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
Sent: Friday, February 27, 2004 2:12 PM
To: Lucene Users List
Subject: Re: Indexing multiple instances of the same field for each
document


Roy,

On Feb 27, 2004, at 12:12 PM, Roy Klein wrote:
>         Document doc = new Document();
>         doc.add(Field.Text("contents", "the"));

Changing these to Field.Keyword gets it to work.  I'm delving a little  
bit to understand why, but it seems if you are adding words  
individually anyway you'd want them to be untokenized, right?

	Erik


>         doc.add(Field.Text("contents", "quick"));
>         doc.add(Field.Text("contents", "brown"));
>         doc.add(Field.Text("contents", "fox"));
>         doc.add(Field.Text("contents", "jumped"));
>         doc.add(Field.Text("contents", "over"));
>         doc.add(Field.Text("contents", "the"));
>         doc.add(Field.Text("contents", "lazy"));
>         doc.add(Field.Text("contents", "dogs"));
>         doc.add(Field.Keyword("docnumber", "1"));
>         writer.addDocument(doc);
>         doc = new Document();
>         doc.add(Field.Text("contents", "the quick brown fox jumped 
> over the lazy dogs"));
>         doc.add(Field.Keyword("docnumber", "2"));
>         writer.addDocument(doc);
>         writer.close();
>     }
>
>     public static void query(File indexDir) throws IOException
>     {
>         Query query = null;
>         PhraseQuery pquery = new PhraseQuery();
>         Hits hits = null;
>
>         try {
>             query = QueryParser.parse("quick brown", "contents", new 
> StandardAnalyzer());
>         } catch (Exception qe) {System.out.println(qe.toString());}
>         if (query == null) return;
>         System.out.println("Query: " + query.toString());
>         IndexReader reader = IndexReader.open(indexDir);
>         IndexSearcher searcher = new IndexSearcher(reader);
>
>         hits = searcher.search(query);
>         System.out.println("Hits: " + hits.length());
>
>         for (int i = 0; i < hits.length(); i++)
>         {
>             System.out.println( hits.doc(i).get("docnumber") + " ");
>         }
>
>
>         pquery.add(new Term("contents", "quick"));
>         pquery.add(new Term("contents", "brown"));
>         System.out.println("PQuery: " + pquery.toString());
>         hits = searcher.search(pquery);
>         System.out.println("Phrase Hits: " + hits.length());
>         for (int i = 0; i < hits.length(); i++)
>         {
>             System.out.println( hits.doc(i).get("docnumber") + " ");
>         }
>
>         searcher.close();
>         reader.close();
>
>     }
>     public static void main(String[] args) throws Exception {
>         if (args.length != 1) {
>             throw new Exception("Usage: " + test.class.getName() + " 
> <index dir>");
>         }
>         File indexDir = new File(args[0]);
>         test(indexDir);
>         query(indexDir);
>     }
> }
>
> ----------------------------------------------------------------------
> -
> -
> -------
> My results:
> Query: contents:quick contents:brown
> Hits: 2
> 1
> 2
> PQuery:
> contents:"quick brown"
> Phrase Hits: 1
> 2
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Indexing multiple instances of the same field for each document

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

Roy,

On Feb 27, 2004, at 12:12 PM, Roy Klein wrote:
>         Document doc = new Document();
>         doc.add(Field.Text("contents", "the"));

Changing these to Field.Keyword gets it to work.  I'm delving a little  
bit to understand why, but it seems if you are adding words  
individually anyway you'd want them to be untokenized, right?

	Erik


>         doc.add(Field.Text("contents", "quick"));
>         doc.add(Field.Text("contents", "brown"));
>         doc.add(Field.Text("contents", "fox"));
>         doc.add(Field.Text("contents", "jumped"));
>         doc.add(Field.Text("contents", "over"));
>         doc.add(Field.Text("contents", "the"));
>         doc.add(Field.Text("contents", "lazy"));
>         doc.add(Field.Text("contents", "dogs"));
>         doc.add(Field.Keyword("docnumber", "1"));
>         writer.addDocument(doc);
>         doc = new Document();
>         doc.add(Field.Text("contents", "the quick brown fox jumped over
> the lazy dogs"));
>         doc.add(Field.Keyword("docnumber", "2"));
>         writer.addDocument(doc);
>         writer.close();
>     }
>
>     public static void query(File indexDir) throws IOException
>     {
>         Query query = null;
>         PhraseQuery pquery = new PhraseQuery();
>         Hits hits = null;
>
>         try {
>             query = QueryParser.parse("quick brown", "contents", new
> StandardAnalyzer());
>         } catch (Exception qe) {System.out.println(qe.toString());}
>         if (query == null) return;
>         System.out.println("Query: " + query.toString());
>         IndexReader reader = IndexReader.open(indexDir);
>         IndexSearcher searcher = new IndexSearcher(reader);
>
>         hits = searcher.search(query);
>         System.out.println("Hits: " + hits.length());
>
>         for (int i = 0; i < hits.length(); i++)
>         {
>             System.out.println( hits.doc(i).get("docnumber") + " ");
>         }
>
>
>         pquery.add(new Term("contents", "quick"));
>         pquery.add(new Term("contents", "brown"));
>         System.out.println("PQuery: " + pquery.toString());
>         hits = searcher.search(pquery);
>         System.out.println("Phrase Hits: " + hits.length());
>         for (int i = 0; i < hits.length(); i++)
>         {
>             System.out.println( hits.doc(i).get("docnumber") + " ");
>         }
>
>         searcher.close();
>         reader.close();
>
>     }
>     public static void main(String[] args) throws Exception {
>         if (args.length != 1) {
>             throw new Exception("Usage: " + test.class.getName() + "
> <index dir>");
>         }
>         File indexDir = new File(args[0]);
>         test(indexDir);
>         query(indexDir);
>     }
> }
>
> ----------------------------------------------------------------------- 
> -
> -------
> My results:
> Query: contents:quick contents:brown
> Hits: 2
> 1
> 2
> PQuery:
> contents:"quick brown"
> Phrase Hits: 1
> 2
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Indexing multiple instances of the same field for each document

Posted by Roy Klein <kl...@sitescape.com>.

Doug,

The query results are different, I'm attaching my test code.

>> Also FYI, I found that phrase queries don't work against a field 
>> that's been added multiple times. If I query the phrase "brown fox", 
>> against the two example docs above, only the second matches.

>They should work the same.  I'm not sure what Field.indexed does. 
>That's not a normal Lucene method.

>Doug

Here's my test code  (my results follow it):
------------------------------------------------------------------------
---

package lucenetest;


import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;

import org.apache.lucene.search.Query;
import org.apache.lucene.search.Hits;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.PhraseQuery;
import org.apache.lucene.index.Term;



import java.io.File;
import java.io.IOException;
import java.io.FileReader;

public class test {
    public static void test(File indexDir) throws IOException {

        IndexWriter writer = new IndexWriter(indexDir, new
SimpleAnalyzer(), true);
        //indexDirectory(writer, dataDir);

        Document doc = new Document();
        doc.add(Field.Text("contents", "the"));
        doc.add(Field.Text("contents", "quick"));
        doc.add(Field.Text("contents", "brown"));
        doc.add(Field.Text("contents", "fox"));
        doc.add(Field.Text("contents", "jumped"));
        doc.add(Field.Text("contents", "over"));
        doc.add(Field.Text("contents", "the"));
        doc.add(Field.Text("contents", "lazy"));
        doc.add(Field.Text("contents", "dogs"));
        doc.add(Field.Keyword("docnumber", "1"));
        writer.addDocument(doc);
        doc = new Document();
        doc.add(Field.Text("contents", "the quick brown fox jumped over
the lazy dogs"));
        doc.add(Field.Keyword("docnumber", "2"));
        writer.addDocument(doc);
        writer.close();
    }

    public static void query(File indexDir) throws IOException
    {
        Query query = null;
        PhraseQuery pquery = new PhraseQuery();
        Hits hits = null;

        try {
            query = QueryParser.parse("quick brown", "contents", new
StandardAnalyzer());
        } catch (Exception qe) {System.out.println(qe.toString());}
        if (query == null) return;
        System.out.println("Query: " + query.toString());
        IndexReader reader = IndexReader.open(indexDir);
        IndexSearcher searcher = new IndexSearcher(reader);

        hits = searcher.search(query);
        System.out.println("Hits: " + hits.length());

        for (int i = 0; i < hits.length(); i++)
        {
            System.out.println( hits.doc(i).get("docnumber") + " ");
        }


        pquery.add(new Term("contents", "quick"));
        pquery.add(new Term("contents", "brown"));
        System.out.println("PQuery: " + pquery.toString());
        hits = searcher.search(pquery);
        System.out.println("Phrase Hits: " + hits.length());
        for (int i = 0; i < hits.length(); i++)
        {
            System.out.println( hits.doc(i).get("docnumber") + " ");
        }

        searcher.close();
        reader.close();

    }
    public static void main(String[] args) throws Exception {
        if (args.length != 1) {
            throw new Exception("Usage: " + test.class.getName() + "
<index dir>");
        }
        File indexDir = new File(args[0]);
        test(indexDir);
        query(indexDir);
    }
}

------------------------------------------------------------------------
-------
My results:
Query: contents:quick contents:brown 
Hits: 2 
1  
2  
PQuery: 
contents:"quick brown" 
Phrase Hits: 1 
2 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Indexing multiple instances of the same field for each document

Posted by Doug Cutting <cu...@apache.org>.

Roy Klein wrote:
> E.g.
>    doc1.add(Field.indexed("field","the");
>    doc1.add(Field.indexed("field","quick");
>    doc1.add(Field.indexed("field","brown");
>    doc1.add(Field.indexed("field","fox");
>    doc1.add(Field.indexed("field","jumped");
>    writer.addDocument(doc1);
> Vs.
>    doc2.add(Field.indexed("field","the quick brown fox jumped");
>    writer.addDocument(doc2);
> 
> Is there a difference in query performance when I query on fields that
> have been added multiple times vs fields which were added with the
> entire field contents at once?

No.

> Also FYI, I found that phrase queries don't work against a field that's
> been added multiple times. If I query the phrase "brown fox", against
> the two example docs above, only the second matches.

They should work the same.  I'm not sure what Field.indexed does. 
That's not a normal Lucene method.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Indexing multiple instances of the same field for each document

Posted by Roy Klein <kl...@sitescape.com>.

If one adds multiple terms to a single fieldname two ways:

    1) Adding a single field with all the content 
    2) Adding that same field several times, each time with a different
term.

E.g.
   doc1.add(Field.indexed("field","the");
   doc1.add(Field.indexed("field","quick");
   doc1.add(Field.indexed("field","brown");
   doc1.add(Field.indexed("field","fox");
   doc1.add(Field.indexed("field","jumped");
   writer.addDocument(doc1);
Vs.
   doc2.add(Field.indexed("field","the quick brown fox jumped");
   writer.addDocument(doc2);

Is there a difference in query performance when I query on fields that
have been added multiple times vs fields which were added with the
entire field contents at once?

Also FYI, I found that phrase queries don't work against a field that's
been added multiple times. If I query the phrase "brown fox", against
the two example docs above, only the second matches.

   Roy

-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
Sent: Friday, February 27, 2004 5:29 AM
To: Lucene Users List
Subject: Re: Indexing multiple instances of the same field for each
document

On Feb 27, 2004, at 5:16 AM, Moray McConnachie wrote:
> I note from previous entries on the mailing list and my own
> experiments that
> you can add many entries to the same field for each document. Example:

> a
> given document belongs to more than one product, ergo I index the 
> product
> field with values "PROD_A" and "PROD_B".
>
> If I don't tokenise the fields when adding them to the document, then
> when
> storing the values and printing them out before adding them to the 
> index, so
> I can see what the index is recording, I do indeed get
>
> Keyword<product:PROD_A> Keyword <product:PROD_B>
>
> However, a query on product:PROD_A returns no results, neither does a
> query
> on product:PROD_B.

Are you using QueryParser?  Try using a TermQuery("product", "PROD_A") 
when indexing as a Keyword and see what you get.  If that finds it, 
then you are suffering from analysis paralysis.  QueryParser, Keyword 
fields, and analyzers are a very "interesting" combination.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Indexing multiple instances of the same field for each document

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Feb 27, 2004, at 5:16 AM, Moray McConnachie wrote:
> I note from previous entries on the mailing list and my own 
> experiments that
> you can add many entries to the same field for each document. Example: 
> a
> given document belongs to more than one product, ergo I index the 
> product
> field with values "PROD_A" and "PROD_B".
>
> If I don't tokenise the fields when adding them to the document, then 
> when
> storing the values and printing them out before adding them to the 
> index, so
> I can see what the index is recording, I do indeed get
>
> Keyword<product:PROD_A> Keyword <product:PROD_B>
>
> However, a query on product:PROD_A returns no results, neither does a 
> query
> on product:PROD_B.

Are you using QueryParser?  Try using a TermQuery("product", "PROD_A") 
when indexing as a Keyword and see what you get.  If that finds it, 
then you are suffering from analysis paralysis.  QueryParser, Keyword 
fields, and analyzers are a very "interesting" combination.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org