You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Reza Ghaffaripour <re...@gmail.com> on 2005/12/07 09:49:45 UTC

repeating fields

hi all,
im new to lucene. i have an xml with repeating tags.something like :
<a>
<p>x</p>
<p>xx</p>
<p>xxx</p>
<p>xxxx</p>
</a>

I add the "p" field as follows:
myDocument.add(Field.Text("p", "x"));
myDocument.add(Field.Text("p", "xx"));

but when i search for "x" it returns the first hit only.
what should i do ? i want to search for "x" and get all the 4 hits.

--
Reza Ghaffaripour
www.rezaghp.com

Re: repeating fields

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Dec 7, 2005, at 8:48 AM, Reza Ghaffaripour wrote:
> I think having different documents  will not be a good idea.
> for me each xml is an ebook. and "p" means paragraph.
> i have hundereds of paragraphs in every ebook. and i think i should  
> keep
> each ebook in a single
> document. am i right ?

How you design your index requires consideration of all you're trying  
to do with it.  It's an art form, in fact.  So while we can offer  
some ideas, ultimately you have to find what fits.   The granularity  
of what you index as a Document is the granularity of what you get  
back from searches as Hits.

There are blended approaches - an index does not have to be  
homogeneous in Document design.  You could have documents that  
represent the entire e-book, and documents that represent each  
paragraph.  You can use a field on each document "type" to  
distinguish them and filter in a search.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: repeating fields

Posted by Malcolm <ma...@btinternet.com>.
That's what I have, loads of different <p> tags and <abs>(abstract) tags etc 
in each xml document so a lucene document for each is okay.
malcolm 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: repeating fields

Posted by Reza Ghaffaripour <re...@gmail.com>.
I think having different documents  will not be a good idea.
for me each xml is an ebook. and "p" means paragraph.
i have hundereds of paragraphs in every ebook. and i think i should keep
each ebook in a single
document. am i right ?

On 12/7/05, Malcolm <ma...@btinternet.com> wrote:
>
>
> Firstly you should obtain LUKE and check everything is layed out correctly
> in your index.
> Secondly maybe a Wildcard/prefix query or termquery.forexample(termquery):
>
> TermQuery heTerm = new TermQuery(
>            new Term("p",
>                "x"));
>    TermQuery sheTerm = new TermQuery(
>            new Term("p",
>                "xx"));
>    TermQuery theyTerm = new TermQuery(
>            new Term("p",
>                "xxx"));
>
> I'm sure the folks on here will be able to come up with a more efficient
> method.Try obtaining Lucene in Action or look at the examples at
> http://lucenebook.com/
> cheers,
> Malcolm Clark
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


--
Reza Ghaffaripour
www.rezaghp.com

Re: repeating fields

Posted by Malcolm <ma...@btinternet.com>.
Firstly you should obtain LUKE and check everything is layed out correctly 
in your index.
Secondly maybe a Wildcard/prefix query or termquery.for example(termquery):

TermQuery heTerm = new TermQuery(
           new Term("p",
               "x"));
   TermQuery sheTerm = new TermQuery(
           new Term("p",
               "xx"));
   TermQuery theyTerm = new TermQuery(
           new Term("p",
               "xxx"));

I'm sure the folks on here will be able to come up with a more efficient 
method.Try obtaining Lucene in Action or look at the examples at 
http://lucenebook.com/
cheers,
Malcolm Clark 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: repeating fields

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Dec 7, 2005, at 3:49 AM, Reza Ghaffaripour wrote:

> hi all,
> im new to lucene. i have an xml with repeating tags.something like :
> <a>
> <p>x</p>
> <p>xx</p>
> <p>xxx</p>
> <p>xxxx</p>
> </a>
>
> I add the "p" field as follows:
> myDocument.add(Field.Text("p", "x"));
> myDocument.add(Field.Text("p", "xx"));
>
> but when i search for "x" it returns the first hit only.
> what should i do ? i want to search for "x" and get all the 4 hits.

Hits return Documents.  You indexed only a single document, not 4.   
If you would like each <p> element to be a separate hit then index  
each as a separate Document.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org