You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Leonid M." <le...@gmail.com> on 2009/12/28 19:08:45 UTC

Multi-value (complex) field indexing

*Problem description*

   - I have a complex multi-value field. So, each value consist from several
   rows.
   - Each rows consists from several cells/items


I want to be able to match those issues, which have a *row* with cellA="AAA"
and cellB="BBB". Having a search by all the table (meaning - any row
cellA="AAA" and any row cellB="BBB") is something I understand and hopefully
could easily implement by having different FieldIndexers for each column.

*Example:*
*
*
*Field value:*
*Row 1: [AAA] [BBB] [XXX]*
*Row 2: [CCC] [BBB] [XXX]
*
I would like to run [CCC][BBB] query to look for rows containing these
values and in this case I will get empty result set, since no such field is
present.

*Question*
Is there any option to index / look for particular row only?

I understand that I could make each row to be a separate document, but it
doesn't sit me (it's 3rd party system and I have no access for new document
creation, all I could do - to extend system's indexing mechanism).

So, could I somehow index multiple rows within one document and construct
Lucene search against some particular *row*?

--
Best regards,
Leonids Malovs

Re: Multi-value (complex) field indexing

Posted by "Leonid M." <le...@gmail.com>.

* Yes, I understand the first part about two rows and querying.

* The problem is - I'm not the one creating those Analyzers and storing
documents into indexes. All I could say - "add this field to document", it's
as simple as this.

Luckily the system is built using Pico and OSGi, so I will try to replace
system Indexer.
Currently it seems to me that having a document per row is simpler solution,
am I wright? (no need for tuning analyzers and creating Span/Phrase queries,
which as far I understand could be inefficient)

Thanks a lot for Your feedback and input. You obviously have made the big
picture more clear for me.

btw, Happy New Year (it's coming, ho-ho-ho)
--
Best regards,
Leonids Maslovs

On Wed, Dec 30, 2009 at 6:33 PM, Erick Erickson <er...@gmail.com>wrote:

> Say you index row 1 with "aaa" "bbb" "ccc", then row two with
>

Re: Multi-value (complex) field indexing

Posted by Erick Erickson <er...@gmail.com>.

You'll have one problem if you can't return a different increment gap,
you'll
match across rows.

Say you index row 1 with "aaa" "bbb" "ccc", then row two with
"ddd", "eee", "fff". Just adding multiple rows to a single document,
that document would match the phrase "ccc ddd".

I don't understand why you are able to index things but "have no
access to analyzer configuration", unless you're talking about an
index that's already been built. If you are indexing documents, it
seems to me that you *have* to be able to derive from one of the
standard analyzers and override getPositionIncrementGap,
that's all there is to providing a different value here.....

FWIW
Erick

On Tue, Dec 29, 2009 at 3:55 AM, Leonid M. <le...@gmail.com> wrote:

> You got me, thanks a lot.
> This is exactly I was trying to ask (meaning find this values within the
> row
> number 2).
>
> I'm afraid I'm not be able to proceed because I have no access to analyzer
> configuration (the system - JIRA 4.0) uses the hardcoded pre-configured set
> of default analyzers.
>
> Thanks a lot, you provided clear and brief answer to my question.
> --
> Best regards,
> Leonids Maslovs
>
>
>
> On Mon, Dec 28, 2009 at 10:51 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > I'm not following entirely here, but multi-valued fields are supported.
> > Something like (bad pseudo-code here)
> > doc = new Document
> > doc.add(new Field("rows", <row1 text", stored, indexed));
> > doc.add(new Field("rows", <row2 text", stored, indexed));
> > indexWriter.addDocument(doc);
> >
> > If your analyzer implements getPositionIncrementGap, you can keep your
> > rows separate (search the mail archives for getPositionIncrementGap for
> > more explanation).
> >
> > Then, if you're searching with a proximity (or phrase) less than your
> > increment
> > gap, you'll only get matches within a row. You wouldn't get a search
> > "against
> > a particular row" if, by that phrase, you meant "only look in row 2". If
> > you
> >
> > mean "only match the document if the phrase is in a *some* row in the
> doc",
> > it should work.
> >
> > The SpanNear query should work, as should regular phrase queries.
> >
> > If this is off base, could you provide some more examples?
> >
> > HTH
> > Erick
> >
> >
> >
> > On Mon, Dec 28, 2009 at 1:08 PM, Leonid M. <le...@gmail.com> wrote:
> >
> > > *Problem description*
> > >
> > >   - I have a complex multi-value field. So, each value consist from
> > several
> > >   rows.
> > >   - Each rows consists from several cells/items
> > >
> > >
> > > I want to be able to match those issues, which have a *row* with
> > > cellA="AAA"
> > > and cellB="BBB". Having a search by all the table (meaning - any row
> > > cellA="AAA" and any row cellB="BBB") is something I understand and
> > > hopefully
> > > could easily implement by having different FieldIndexers for each
> column.
> > >
> > > *Example:*
> > > *
> > > *
> > > *Field value:*
> > > *Row 1: [AAA] [BBB] [XXX]*
> > > *Row 2: [CCC] [BBB] [XXX]
> > > *
> > > I would like to run [CCC][BBB] query to look for rows containing these
> > > values and in this case I will get empty result set, since no such
> field
> > is
> > > present.
> > >
> > > *Question*
> > > Is there any option to index / look for particular row only?
> > >
> > > I understand that I could make each row to be a separate document, but
> it
> > > doesn't sit me (it's 3rd party system and I have no access for new
> > document
> > > creation, all I could do - to extend system's indexing mechanism).
> > >
> > > So, could I somehow index multiple rows within one document and
> construct
> > > Lucene search against some particular *row*?
> > >
> > > --
> > > Best regards,
> > > Leonids Malovs
> > >
> >
>

Re: Multi-value (complex) field indexing

Posted by "Leonid M." <le...@gmail.com>.

You got me, thanks a lot.
This is exactly I was trying to ask (meaning find this values within the row
number 2).

I'm afraid I'm not be able to proceed because I have no access to analyzer
configuration (the system - JIRA 4.0) uses the hardcoded pre-configured set
of default analyzers.

Thanks a lot, you provided clear and brief answer to my question.
--
Best regards,
Leonids Maslovs



On Mon, Dec 28, 2009 at 10:51 PM, Erick Erickson <er...@gmail.com>wrote:

> I'm not following entirely here, but multi-valued fields are supported.
> Something like (bad pseudo-code here)
> doc = new Document
> doc.add(new Field("rows", <row1 text", stored, indexed));
> doc.add(new Field("rows", <row2 text", stored, indexed));
> indexWriter.addDocument(doc);
>
> If your analyzer implements getPositionIncrementGap, you can keep your
> rows separate (search the mail archives for getPositionIncrementGap for
> more explanation).
>
> Then, if you're searching with a proximity (or phrase) less than your
> increment
> gap, you'll only get matches within a row. You wouldn't get a search
> "against
> a particular row" if, by that phrase, you meant "only look in row 2". If
> you
>
> mean "only match the document if the phrase is in a *some* row in the doc",
> it should work.
>
> The SpanNear query should work, as should regular phrase queries.
>
> If this is off base, could you provide some more examples?
>
> HTH
> Erick
>
>
>
> On Mon, Dec 28, 2009 at 1:08 PM, Leonid M. <le...@gmail.com> wrote:
>
> > *Problem description*
> >
> >   - I have a complex multi-value field. So, each value consist from
> several
> >   rows.
> >   - Each rows consists from several cells/items
> >
> >
> > I want to be able to match those issues, which have a *row* with
> > cellA="AAA"
> > and cellB="BBB". Having a search by all the table (meaning - any row
> > cellA="AAA" and any row cellB="BBB") is something I understand and
> > hopefully
> > could easily implement by having different FieldIndexers for each column.
> >
> > *Example:*
> > *
> > *
> > *Field value:*
> > *Row 1: [AAA] [BBB] [XXX]*
> > *Row 2: [CCC] [BBB] [XXX]
> > *
> > I would like to run [CCC][BBB] query to look for rows containing these
> > values and in this case I will get empty result set, since no such field
> is
> > present.
> >
> > *Question*
> > Is there any option to index / look for particular row only?
> >
> > I understand that I could make each row to be a separate document, but it
> > doesn't sit me (it's 3rd party system and I have no access for new
> document
> > creation, all I could do - to extend system's indexing mechanism).
> >
> > So, could I somehow index multiple rows within one document and construct
> > Lucene search against some particular *row*?
> >
> > --
> > Best regards,
> > Leonids Malovs
> >
>

Re: Multi-value (complex) field indexing

Posted by Erick Erickson <er...@gmail.com>.

I'm not following entirely here, but multi-valued fields are supported.
Something like (bad pseudo-code here)
doc = new Document
doc.add(new Field("rows", <row1 text", stored, indexed));
doc.add(new Field("rows", <row2 text", stored, indexed));
indexWriter.addDocument(doc);

If your analyzer implements getPositionIncrementGap, you can keep your
rows separate (search the mail archives for getPositionIncrementGap for
more explanation).

Then, if you're searching with a proximity (or phrase) less than your
increment
gap, you'll only get matches within a row. You wouldn't get a search
"against
a particular row" if, by that phrase, you meant "only look in row 2". If you

mean "only match the document if the phrase is in a *some* row in the doc",
it should work.

The SpanNear query should work, as should regular phrase queries.

If this is off base, could you provide some more examples?

HTH
Erick

On Mon, Dec 28, 2009 at 1:08 PM, Leonid M. <le...@gmail.com> wrote:

> *Problem description*
>
>   - I have a complex multi-value field. So, each value consist from several
>   rows.
>   - Each rows consists from several cells/items
>
>
> I want to be able to match those issues, which have a *row* with
> cellA="AAA"
> and cellB="BBB". Having a search by all the table (meaning - any row
> cellA="AAA" and any row cellB="BBB") is something I understand and
> hopefully
> could easily implement by having different FieldIndexers for each column.
>
> *Example:*
> *
> *
> *Field value:*
> *Row 1: [AAA] [BBB] [XXX]*
> *Row 2: [CCC] [BBB] [XXX]
> *
> I would like to run [CCC][BBB] query to look for rows containing these
> values and in this case I will get empty result set, since no such field is
> present.
>
> *Question*
> Is there any option to index / look for particular row only?
>
> I understand that I could make each row to be a separate document, but it
> doesn't sit me (it's 3rd party system and I have no access for new document
> creation, all I could do - to extend system's indexing mechanism).
>
> So, could I somehow index multiple rows within one document and construct
> Lucene search against some particular *row*?
>
> --
> Best regards,
> Leonids Malovs
>