You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gustavo Corral <gu...@gmail.com> on 2008/03/13 09:48:04 UTC

Weird results with appendable fields

Hi list,

I'm new in Lucene and I'm trying to index a set of XML documents
(document-centric) with the same structure. All this documents have a
header, a front, and a body (where there's a lot of text).

The problem is that in the header I have two fields author and title, but
one document can have more than one author, so I tried to index as
appendable field in this way:

ArrayList <String> authors = front.getAuthors();

for(Iterator <String> it = authors.iterator(); it.hasNext();){
    String out (String) it.next();
    if((aut != null) && !aut.equals("")){
         doc.add(new Field("author",aut,Field.StoreYES,Field.Index.TOKENIZED
));
    }
}

and I was searching in my index with Lukeand I obtained rare results. For
example: There's a document with 3 authors which appears as appendable
fields in the index this way: Freddy Pantoja Timaran, Ph.D. Gabriel Pantoja
Barrios Jorge Ivan Londoño.

The thing is that when I search in Luke for Freddy, Pantoja, Gabriel,
Barrios, Iván (all in a different query) i got this document as a Hit,
that's correct, but when I search for Timaran, Londoño I get no Hits, which
is not correct.

I'm using by now WhiteSpaceAnalyzer. Any idea???

Thanks
Gustavo

Re: Weird results with appendable fields

Posted by Gustavo Corral <gu...@gmail.com>.
Thanks Erick,

I have not the code with me this weekend, but as soon as I can I'll try with
others analizers, check the results in Luke, and I'll post them to the list.

Gustavo

On Sat, Mar 15, 2008 at 3:21 PM, Erick Erickson <er...@gmail.com>
wrote:

> Are you using  WhitespaceAnalyzer BOTH at index and search time? If not
> that is one problem, especially since, for instance, there is a comma
> after
> Timaran,.
>
> The two best things I've found are
> 1> make sure you have a copy of Luke and use it to examine your
> index and see if what you *think* your index contains is actually what
> it contains. You can also use Luke to see how queries parse using various
> analyzers.
>
> 2> use query.toString() to see what your query looks like when you submit
> it.
>
> If neither of these help, could you post the briefest possible example
> that
> illustrates the problem?
>
> Best
> Erick
>
> On Thu, Mar 13, 2008 at 4:48 AM, Gustavo Corral <gu...@gmail.com>
> wrote:
>
> > Hi list,
> >
> > I'm new in Lucene and I'm trying to index a set of XML documents
> > (document-centric) with the same structure. All this documents have a
> > header, a front, and a body (where there's a lot of text).
> >
> > The problem is that in the header I have two fields author and title,
> but
> > one document can have more than one author, so I tried to index as
> > appendable field in this way:
> >
> > ArrayList <String> authors = front.getAuthors();
> >
> > for(Iterator <String> it = authors.iterator(); it.hasNext();){ If
> >    String out (String) it.next();
> >    if((aut != null) && !aut.equals("")){
> >         doc.add(new Field("author",aut,Field.StoreYES,
> > Field.Index.TOKENIZED
> > ));
> >    }
> > }
> >
> > and I was searching in my index with Lukeand I obtained rare results.
> For
> > example: There's a document with 3 authors which appears as appendable
> > fields in the index this way: Freddy Pantoja Timaran, Ph.D. Gabriel
> > Pantoja
> > Barrios Jorge Ivan Londoño.
> >
> > The thing is that when I search in Luke for Freddy, Pantoja, Gabriel,
> > Barrios, Iván (all in a different query) i got this document as a Hit,
> > that's correct, but when I search for Timaran, Londoño I get no Hits,
> > which
> > is not correct.
> >
> > I'm using by now WhiteSpaceAnalyzer. Any idea???
> >
> > Thanks
> > Gustavo
> >
>

Re: Weird results with appendable fields

Posted by Erick Erickson <er...@gmail.com>.
Are you using  WhitespaceAnalyzer BOTH at index and search time? If not
that is one problem, especially since, for instance, there is a comma after
Timaran,.

The two best things I've found are
1> make sure you have a copy of Luke and use it to examine your
index and see if what you *think* your index contains is actually what
it contains. You can also use Luke to see how queries parse using various
analyzers.

2> use query.toString() to see what your query looks like when you submit
it.

If neither of these help, could you post the briefest possible example that
illustrates the problem?

Best
Erick

On Thu, Mar 13, 2008 at 4:48 AM, Gustavo Corral <gu...@gmail.com>
wrote:

> Hi list,
>
> I'm new in Lucene and I'm trying to index a set of XML documents
> (document-centric) with the same structure. All this documents have a
> header, a front, and a body (where there's a lot of text).
>
> The problem is that in the header I have two fields author and title, but
> one document can have more than one author, so I tried to index as
> appendable field in this way:
>
> ArrayList <String> authors = front.getAuthors();
>
> for(Iterator <String> it = authors.iterator(); it.hasNext();){ If
>    String out (String) it.next();
>    if((aut != null) && !aut.equals("")){
>         doc.add(new Field("author",aut,Field.StoreYES,
> Field.Index.TOKENIZED
> ));
>    }
> }
>
> and I was searching in my index with Lukeand I obtained rare results. For
> example: There's a document with 3 authors which appears as appendable
> fields in the index this way: Freddy Pantoja Timaran, Ph.D. Gabriel
> Pantoja
> Barrios Jorge Ivan Londoño.
>
> The thing is that when I search in Luke for Freddy, Pantoja, Gabriel,
> Barrios, Iván (all in a different query) i got this document as a Hit,
> that's correct, but when I search for Timaran, Londoño I get no Hits,
> which
> is not correct.
>
> I'm using by now WhiteSpaceAnalyzer. Any idea???
>
> Thanks
> Gustavo
>