You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Alice Wong <ai...@gmail.com> on 2013/10/03 01:12:27 UTC
Associated values for a field and its value
Hello,
We would like to index some documents. Each field of a document may have
multiple values. And for each (field,value) pair there are some associated
values. These associated values are just for retrieving, not searching.
For example, a document D could have a field named A. This field has two
values a1 and a2.
It is easy to index D, adding term a1 and a2 to field A, so either query
"A=a1" or "A=a2" will return D.
Assuming we have other values associated with (A,a1) and (A,a2) for D. We
would like to retrieve these associated values depending on whether "A=a1"
or "A=a2" is queried.
For example, if query "A=a1" returns D, we would like to return values 1
and 2. And if query "A=a2" returns D, we want to return values 3 and 10.
Is it possible to do this with Lucene? Initially we want to hack postings
to return associated values, but this seems quite complex.
Thanks!
Re: Associated values for a field and its value
Posted by Alice Wong <ai...@gmail.com>.
Okay, it makes complete sense. Thanks.
On Fri, Oct 4, 2013 at 5:15 AM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:
> On 10/3/13 6:04 PM, Alice Wong wrote:
>
> Mike,
>
> That's an interesting idea. The only drawback is we have to re-parse the
> doc and find where it matches and what the associated values are. It could
> be a performance issue if the doc becomes bigger and more complex.
>
> It's true there is some overhead for document-oriented processing. Lux
> ameliorates this by storing a predigested binary xml form that can be
> traversed efficiently without the need for xml parsing. However,
>
>
> I am wondering if there is a way to index a value a1 for a field A and
> store a different value "1,2" associated with a1 in Lucene. Or there might
> be a hack for this?
>
> If you want to use only low-level Lucene constructs, I think payloads
> and/or complicated field values are the way to go. You could, for example,
> index for document D, a field called "extra" with values like "a1:1,2",
> "a2:2,3". I think that's what Aditya suggested. You still have to parse
> these though, so why not use a prebuilt flexible parsing infrastructure?
>
>
> Thanks.
>
>
> On Thu, Oct 3, 2013 at 1:49 PM, Michael Sokolov <
> msokolov@safaribooksonline.com> wrote:
>
>> On 10/02/2013 07:12 PM, Alice Wong wrote:
>>
>>> Hello,
>>>
>>> We would like to index some documents. Each field of a document may have
>>> multiple values. And for each (field,value) pair there are some
>>> associated
>>> values. These associated values are just for retrieving, not searching.
>>>
>>> For example, a document D could have a field named A. This field has two
>>> values a1 and a2.
>>>
>>> It is easy to index D, adding term a1 and a2 to field A, so either query
>>> "A=a1" or "A=a2" will return D.
>>>
>>> Assuming we have other values associated with (A,a1) and (A,a2) for D. We
>>> would like to retrieve these associated values depending on whether
>>> "A=a1"
>>> or "A=a2" is queried.
>>>
>>> For example, if query "A=a1" returns D, we would like to return values 1
>>> and 2. And if query "A=a2" returns D, we want to return values 3 and 10.
>>>
>>> Is it possible to do this with Lucene? Initially we want to hack postings
>>> to return associated values, but this seems quite complex.
>>>
>>> Thanks!
>>>
>>> Why not store a (nonindexed) text field with some internal structure
>> (XML, JSON, CSV) that you can analyze after retrieving. For example,
>>
>> <D>
>> <A>
>> <value>a1</value>
>> <associated-values>
>> ... whatever you want ...
>> </associated-values>
>> </A>
>> </D>
>>
>> If you use Lux (luxdb.org), which is XML query support on top of Lucene,
>> you can do this all automatically, and retrieve the results with a simple
>> query like:
>>
>> /D[A=a1]/associated-values
>>
>> plus if you want to pull out the values and manipulate them, you have
>> XQuery to do it with.
>>
>> -Mike
>>
>
>
>
Re: Associated values for a field and its value
Posted by Michael Sokolov <ms...@safaribooksonline.com>.
On 10/3/13 6:04 PM, Alice Wong wrote:
> Mike,
>
> That's an interesting idea. The only drawback is we have to re-parse
> the doc and find where it matches and what the associated values are.
> It could be a performance issue if the doc becomes bigger and more
> complex.
It's true there is some overhead for document-oriented processing. Lux
ameliorates this by storing a predigested binary xml form that can be
traversed efficiently without the need for xml parsing. However,
>
> I am wondering if there is a way to index a value a1 for a field A and
> store a different value "1,2" associated with a1 in Lucene. Or there
> might be a hack for this?
If you want to use only low-level Lucene constructs, I think payloads
and/or complicated field values are the way to go. You could, for
example, index for document D, a field called "extra" with values like
"a1:1,2", "a2:2,3". I think that's what Aditya suggested. You still
have to parse these though, so why not use a prebuilt flexible parsing
infrastructure?
>
> Thanks.
>
>
> On Thu, Oct 3, 2013 at 1:49 PM, Michael Sokolov
> <msokolov@safaribooksonline.com
> <ma...@safaribooksonline.com>> wrote:
>
> On 10/02/2013 07:12 PM, Alice Wong wrote:
>
> Hello,
>
> We would like to index some documents. Each field of a
> document may have
> multiple values. And for each (field,value) pair there are
> some associated
> values. These associated values are just for retrieving, not
> searching.
>
> For example, a document D could have a field named A. This
> field has two
> values a1 and a2.
>
> It is easy to index D, adding term a1 and a2 to field A, so
> either query
> "A=a1" or "A=a2" will return D.
>
> Assuming we have other values associated with (A,a1) and
> (A,a2) for D. We
> would like to retrieve these associated values depending on
> whether "A=a1"
> or "A=a2" is queried.
>
> For example, if query "A=a1" returns D, we would like to
> return values 1
> and 2. And if query "A=a2" returns D, we want to return values
> 3 and 10.
>
> Is it possible to do this with Lucene? Initially we want to
> hack postings
> to return associated values, but this seems quite complex.
>
> Thanks!
>
> Why not store a (nonindexed) text field with some internal
> structure (XML, JSON, CSV) that you can analyze after retrieving.
> For example,
>
> <D>
> <A>
> <value>a1</value>
> <associated-values>
> ... whatever you want ...
> </associated-values>
> </A>
> </D>
>
> If you use Lux (luxdb.org <http://luxdb.org>), which is XML query
> support on top of Lucene, you can do this all automatically, and
> retrieve the results with a simple query like:
>
> /D[A=a1]/associated-values
>
> plus if you want to pull out the values and manipulate them, you
> have XQuery to do it with.
>
> -Mike
>
>
Re: Associated values for a field and its value
Posted by Alice Wong <ai...@gmail.com>.
Mike,
That's an interesting idea. The only drawback is we have to re-parse the
doc and find where it matches and what the associated values are. It could
be a performance issue if the doc becomes bigger and more complex.
I am wondering if there is a way to index a value a1 for a field A and
store a different value "1,2" associated with a1 in Lucene. Or there might
be a hack for this?
Thanks.
On Thu, Oct 3, 2013 at 1:49 PM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:
> On 10/02/2013 07:12 PM, Alice Wong wrote:
>
>> Hello,
>>
>> We would like to index some documents. Each field of a document may have
>> multiple values. And for each (field,value) pair there are some associated
>> values. These associated values are just for retrieving, not searching.
>>
>> For example, a document D could have a field named A. This field has two
>> values a1 and a2.
>>
>> It is easy to index D, adding term a1 and a2 to field A, so either query
>> "A=a1" or "A=a2" will return D.
>>
>> Assuming we have other values associated with (A,a1) and (A,a2) for D. We
>> would like to retrieve these associated values depending on whether "A=a1"
>> or "A=a2" is queried.
>>
>> For example, if query "A=a1" returns D, we would like to return values 1
>> and 2. And if query "A=a2" returns D, we want to return values 3 and 10.
>>
>> Is it possible to do this with Lucene? Initially we want to hack postings
>> to return associated values, but this seems quite complex.
>>
>> Thanks!
>>
>> Why not store a (nonindexed) text field with some internal structure
> (XML, JSON, CSV) that you can analyze after retrieving. For example,
>
> <D>
> <A>
> <value>a1</value>
> <associated-values>
> ... whatever you want ...
> </associated-values>
> </A>
> </D>
>
> If you use Lux (luxdb.org), which is XML query support on top of Lucene,
> you can do this all automatically, and retrieve the results with a simple
> query like:
>
> /D[A=a1]/associated-values
>
> plus if you want to pull out the values and manipulate them, you have
> XQuery to do it with.
>
> -Mike
>
Re: Associated values for a field and its value
Posted by Michael Sokolov <ms...@safaribooksonline.com>.
On 10/02/2013 07:12 PM, Alice Wong wrote:
> Hello,
>
> We would like to index some documents. Each field of a document may have
> multiple values. And for each (field,value) pair there are some associated
> values. These associated values are just for retrieving, not searching.
>
> For example, a document D could have a field named A. This field has two
> values a1 and a2.
>
> It is easy to index D, adding term a1 and a2 to field A, so either query
> "A=a1" or "A=a2" will return D.
>
> Assuming we have other values associated with (A,a1) and (A,a2) for D. We
> would like to retrieve these associated values depending on whether "A=a1"
> or "A=a2" is queried.
>
> For example, if query "A=a1" returns D, we would like to return values 1
> and 2. And if query "A=a2" returns D, we want to return values 3 and 10.
>
> Is it possible to do this with Lucene? Initially we want to hack postings
> to return associated values, but this seems quite complex.
>
> Thanks!
>
Why not store a (nonindexed) text field with some internal structure
(XML, JSON, CSV) that you can analyze after retrieving. For example,
<D>
<A>
<value>a1</value>
<associated-values>
... whatever you want ...
</associated-values>
</A>
</D>
If you use Lux (luxdb.org), which is XML query support on top of Lucene,
you can do this all automatically, and retrieve the results with a
simple query like:
/D[A=a1]/associated-values
plus if you want to pull out the values and manipulate them, you have
XQuery to do it with.
-Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Associated values for a field and its value
Posted by Aditya <fi...@gmail.com>.
Hi
You need to expand the field as below. Store the document and its
associated values as one document.
Document Field-A Stored-Field
D1 a1 1,2
D2 a2 3,10
Other alternative approach is to store these fields external to Lucene, may
be in database or key-value-store and fetch it on demand.
Regards
Aditya
www.findbestopensource.com -- we have collection of more than 1 million
open source projects
On Thu, Oct 3, 2013 at 4:42 AM, Alice Wong <ai...@gmail.com> wrote:
> Hello,
>
> We would like to index some documents. Each field of a document may have
> multiple values. And for each (field,value) pair there are some associated
> values. These associated values are just for retrieving, not searching.
>
> For example, a document D could have a field named A. This field has two
> values a1 and a2.
>
> It is easy to index D, adding term a1 and a2 to field A, so either query
> "A=a1" or "A=a2" will return D.
>
> Assuming we have other values associated with (A,a1) and (A,a2) for D. We
> would like to retrieve these associated values depending on whether "A=a1"
> or "A=a2" is queried.
>
> For example, if query "A=a1" returns D, we would like to return values 1
> and 2. And if query "A=a2" returns D, we want to return values 3 and 10.
>
> Is it possible to do this with Lucene? Initially we want to hack postings
> to return associated values, but this seems quite complex.
>
> Thanks!
>