You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Riccardo Daviddi <rd...@gmail.com> on 2005/10/29 16:20:15 UTC

appendable filed problem

Hello,

here is the problem.

I have to index a text contained in some section tags (xml format) like
this:

<section>
.... text ...
</section>

<section>
.... text ...
</section>

<section>
.... text ...
</section>

I want to put all the text contained in the various section in a same field
named "section".
I read that field are appendable (pag 33 of Lucene in Action), so it's easy
just add each extracted text to the same index field "section".
What it happens is not the expected result, as it's said in Lucene in
Action, that is, Lucene doesn't internally appends all the texts together
and index them in a single Field named "section". Or better, it does a
mixture: if I use Luke to see the index, i saw a number of single fields
called "section" each one with the corresponding text.
If i try to retrieve the section field with a simple query, it only returns
me the first text contained in the first "section" field, instead of all the
texts, as if they were indexed in the same field "section".

Instead, if I try to remove the field "section" it removes correctly all
these section fields, as if there was only a single section field.

Where am I wrong?

--
Riccardo Daviddi
University of Siena - Information Engeneering
rdaviddi@gmail.com

Re: appendable filed problem

Posted by Riccardo Daviddi <rd...@gmail.com>.

Ok, thank you for the clarification.

Cheers,

Riccardo

On 10/29/05, Chris Hostetter <ho...@fucit.org> wrote:
>
>
> : I read that field are appendable (pag 33 of Lucene in Action), so it's
> easy
> : just add each extracted text to the same index field "section".
> : What it happens is not the expected result, as it's said in Lucene in
> : Action, that is, Lucene doesn't internally appends all the texts
> together
> : and index them in a single Field named "section". Or better, it does a
> : mixture: if I use Luke to see the index, i saw a number of single fields
> : called "section" each one with the corresponding text.
>
> The wording choice may not be entirely clear ... the field values are not
> "appended" in a java String/StringBuffer sense. For the purposes of
> *searching* the terms extracted from the field values are "appended" in
> that the term position of the last term from the first value immediately
> preceeds the term position of the first term from the second value.
>
> (at least, i think that's how it workds)
>
> as far as as the *stored* values of Fields, they are stored individually.
> you can even have one Field with a field named "foo" that is not stored,
> and another Field named "foo" that is stored -- both will be searchable,
> but only the second will be returned.
>
> : If i try to retrieve the section field with a simple query, it only
> returns
> : me the first text contained in the first "section" field, instead of all
> the
> : texts, as if they were indexed in the same field "section".
>
> I'm guessing you are using Document.get(String), or
> Document.getField(String) ... try using Document.getValues(String) or
> Document.getFields(String).
>
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


--
Riccardo Daviddi
University of Siena - Information Engeneering
rdaviddi@gmail.com

Re: appendable filed problem

Posted by Chris Hostetter <ho...@fucit.org>.

: I read that field are appendable (pag 33 of Lucene in Action), so it's easy
: just add each extracted text to the same index field "section".
: What it happens is not the expected result, as it's said in Lucene in
: Action, that is, Lucene doesn't internally appends all the texts together
: and index them in a single Field named "section". Or better, it does a
: mixture: if I use Luke to see the index, i saw a number of single fields
: called "section" each one with the corresponding text.

The wording choice may not be entirely clear ... the field values are not
"appended" in a java String/StringBuffer sense.  For the purposes of
*searching* the terms extracted from the field values are "appended" in
that the term position of the last term from the first value immediately
preceeds the term position of the first term from the second value.

(at least, i think that's how it workds)

as far as as the *stored* values of Fields, they are stored individually.
you can even have one Field with a field named "foo" that is not stored,
and another Field named "foo" that is stored -- both will be searchable,
but only the second will be returned.

: If i try to retrieve the section field with a simple query, it only returns
: me the first text contained in the first "section" field, instead of all the
: texts, as if they were indexed in the same field "section".

I'm guessing you are using Document.get(String), or
Document.getField(String) ... try using Document.getValues(String) or
Document.getFields(String).




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org