You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tiernan OToole <ls...@gmail.com> on 2011/10/18 17:41:38 UTC

Solr MultiValue Fields and adding values

Good morning.

I asked this question on StackOverflow, but though this group may be
able to help... the question is available on SO here: http://bit.ly/r6MAWU

here goes:

I am building a search engine, and have a not so unique ID for a lot of
different names... So, for example, there could be an id of B0051QVF7A
which would have multiple names like "Kindle" "Amazon Kindle" "Amazon
Kindle 3G" "Kindle Ebook Reader" "New Kindle" etc.

The problem, and question i have, is that i am trying to enter this data
from a DB of 11 ish million rows. each is being read one at a time. So i
dont have all the names of each ID. I am adding new documents to the
list each time.

What i am trying to find out is how do i add names to an existing
Document? if i am reading documentation correctly, it seems to overwrite
the whole document, not add extra info to the field... i just want to
add an extra name to the document multivalue field...

I know this could cause some weird and wonderful "issues" if a name is
removed (in the example above, "New Kindle" could be removed when a
newer Kindle gets released) but i am thinking of recreating the index
every now and again, to clear out issues like that (once a month or so.
Its taking about 45min currently to create the index).

So, how do you add a value to a multivalue field in solr for an existing
document?

Thanks in advance.

--Tiernan

Re: Solr MultiValue Fields and adding values

Posted by Tiernan OToole <ls...@gmail.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 
Thats what i though too... we see what the speed difference actually
is... running some tests now...
 
Thanks for the info!
 
- --Tiernan
 
On 19/10/2011 16:07, Dyer, James wrote:
>
> Not that I am doing this with any of my indexes, but I'm pretty sure
the "get doc from solr, modify, commit back to solr" approach really is
that simple. Just be sure you are storing the exact raw data that came
from your database (typically you would). The problem with this approach
is it potentially could be very slow if you're updating lots of documents.
>
>
>
> James Dyer
>
> E-Commerce Systems
>
> Ingram Content Group
>
> (615) 213-4311
>
>
>
> From: Tiernan OToole [mailto:lsmartman@gmail.com]
> Sent: Wednesday, October 19, 2011 10:01 AM
> To: solr-user@lucene.apache.org
> Cc: Dyer, James
> Subject: Re: Solr MultiValue Fields and adding values
>
>
>
>
> Thanks for the comment. Sounds like too much of a change in all
fairness... I have actually made a tweak to my DB to allow multiple
names, and storing them off the main table. my query then only needs to
query the IDs, and then the second table to get the names. but i will
keep the comments in mind and see how things go over the next while.
>
> As a side note, if i were to go down the "get doc from solr, modify,
commit back to solr" is it really that simple? run a query on Solr, get
the document, add the extra data, and insert back to solr?
>
> Thanks.
>
> --Tiernan
>
> On 19/10/2011 15:26, Dyer, James wrote:
> > While Solr/Lucene can't support
>
> true document updates, there are 2 ways you might be able to work
>
> around this in your situation.
>
>
>
>
>
>
>
> > 1. If you store all of the fields, you can write something
>
> that will read back everything already indexed to the document,
>
> append whatever data you want, then write it back. This will
>
> increase index size and possibly make indexing too slow. On the
>
> other hand, it might be more efficient than requiring the database
>
> to return everything in order.
>
>
>
>
>
>
>
> > 2. You could store your data as multiple documents per id
>
> (pick something else as your unique id). Then use the grouping
>
> functionality to roll up on your unique id whenever you query.
>
> This will mean changes to your application, probably a bigger
>
> index, and likely somewhat slower querying. But the performance
>
> losses might be slight and this seems to me like it maybe would be
>
> a good solution in your case. Perhaps it would make it so you
>
> wouldn't have to entirely re-index each month or so. See
>
> http://wiki.apache.org/solr/FieldCollapsing for more information.
>
>
>
>
>
>
>
> > James Dyer
>
>
>
> > E-Commerce Systems
>
>
>
> > Ingram Content Group
>
>
>
> > (615) 213-4311
>
>
>
>
>
>
>
>
>
>
>
> > -----Original Message-----
>
>
>
> > From: Tiernan OToole [mailto:lsmartman@gmail.com]
>
>
>
> > Sent: Wednesday, October 19, 2011 5:11 AM
>
>
>
> > To: solr-user@lucene.apache.org
>
>
>
> > Cc: Otis Gospodnetic
>
>
>
> > Subject: Re: Solr MultiValue Fields and adding values
>
>
>
>
>
>
>
>
>
>
>
> > I was hoping that wasent going to be the case... I ended up
>
> querying for
>
>
>
> > all unique IDs in the DB, and then querying for each unique
>
> ID and
>
>
>
> > getting all names, and then inserting them that way... Seems
>
> a lot
>
>
>
> > slower than in theory it really should be...
>
>
>
>
>
>
>
> > Thanks.
>
>
>
>
>
>
>
> > --Tiernan
>
>
>
>
>
>
>
> > On 18/10/2011 23:20, Otis Gospodnetic wrote:
>
>
>
> > > Hi,
>
>
>
>
>
>
>
> > > You'll need to construct the whole document and index it
>
> as such. You
>
>
>
> > can't append values to document fields.
>
>
>
>
>
>
>
> > > Otis
>
>
>
> > > ----
>
>
>
>
>
>
>
> > > Sematext :: http://sematext.com/ :: Solr - Lucene -
>
> Nutch
>
>
>
> > > Lucene ecosystem search :: http://search-lucene.com/
>
>
>
>
>
>
>
>
>
>
>
> > >> ________________________________
>
>
>
> > >> From: Tiernan OToole <ls...@gmail.com>
>
>
>
> > >> To: solr-user@lucene.apache.org
>
>
>
> > >> Sent: Tuesday, October 18, 2011 11:41 AM
>
>
>
> > >> Subject: Solr MultiValue Fields and adding values
>
>
>
> > >>
>
>
>
> > >> Good morning.
>
>
>
> > >>
>
>
>
> > >> I asked this question on StackOverflow, but though
>
> this group may be
>
>
>
> > >> able to help... the question is available on SO
>
> here:
>
>
>
> > http://bit.ly/r6MAWU
>
>
>
> > >>
>
>
>
> > >> here goes:
>
>
>
> > >>
>
>
>
> > >> I am building a search engine, and have a not so
>
> unique ID for a lot of
>
>
>
> > >> different names... So, for example, there could be
>
> an id of B0051QVF7A
>
>
>
> > >> which would have multiple names like "Kindle"
>
> "Amazon Kindle" "Amazon
>
>
>
> > >> Kindle 3G" "Kindle Ebook Reader" "New Kindle" etc.
>
>
>
> > >>
>
>
>
> > >> The problem, and question i have, is that i am
>
> trying to enter this data
>
>
>
> > > >from a DB of 11 ish million rows. each is being read
>
> one at a time. So i
>
>
>
> > >> dont have all the names of each ID. I am adding new
>
> documents to the
>
>
>
> > >> list each time.
>
>
>
> > >>
>
>
>
> > >> What i am trying to find out is how do i add names
>
> to an existing
>
>
>
> > >> Document? if i am reading documentation correctly,
>
> it seems to overwrite
>
>
>
> > >> the whole document, not add extra info to the
>
> field... i just want to
>
>
>
> > >> add an extra name to the document multivalue
>
> field...
>
>
>
> > >>
>
>
>
> > >> I know this could cause some weird and wonderful
>
> "issues" if a name is
>
>
>
> > >> removed (in the example above, "New Kindle" could be
>
> removed when a
>
>
>
> > >> newer Kindle gets released) but i am thinking of
>
> recreating the index
>
>
>
> > >> every now and again, to clear out issues like that
>
> (once a month or so.
>
>
>
> > >> Its taking about 45min currently to create the
>
> index).
>
>
>
> > >>
>
>
>
> > >> So, how do you add a value to a multivalue field in
>
> solr for an existing
>
>
>
> > >> document?
>
>
>
> > >>
>
>
>
> > >> Thanks in advance.
>
>
>
> > >>
>
>
>
> > >> --Tiernan
>
>
>
> > >>
>
>
>
> > >>
>
>
>
> > >>
>
>
>
>
>
>
>
>
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
iEYEARECAAYFAk6e7SIACgkQW5AKVqf62MHy/QCdG+6+cy/PllGZ8DNO3y4N2kcy
qBsAn3JX1zINU/OJV77+9JrNT6waZ6br
=47Sr
-----END PGP SIGNATURE-----


Re: Solr MultiValue Fields and adding values

Posted by Tiernan OToole <ls...@gmail.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 
Thanks for the comment. Sounds like too much of a change in all
fairness... I have actually made a tweak to my DB to allow multiple
names, and storing them off the main table. my query then only needs to
query the IDs, and then the second table to get the names. but i will
keep the comments in mind and see how things go over the next while.
 
As a side note, if i were to go down the "get doc from solr, modify,
commit back to solr" is it really that simple? run a query on Solr, get
the document, add the extra data, and insert back to solr?
 
Thanks.
 
- --Tiernan
 
On 19/10/2011 15:26, Dyer, James wrote:
> While Solr/Lucene can't support true document updates, there are 2 ways
you might be able to work around this in your situation.
>
> 1. If you store all of the fields, you can write something that will
read back everything already indexed to the document, append whatever
data you want, then write it back. This will increase index size and
possibly make indexing too slow. On the other hand, it might be more
efficient than requiring the database to return everything in order.
>
> 2. You could store your data as multiple documents per id (pick
something else as your unique id). Then use the grouping functionality
to roll up on your unique id whenever you query. This will mean changes
to your application, probably a bigger index, and likely somewhat slower
querying. But the performance losses might be slight and this seems to
me like it maybe would be a good solution in your case. Perhaps it would
make it so you wouldn't have to entirely re-index each month or so. See
http://wiki.apache.org/solr/FieldCollapsing for more information.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Tiernan OToole [mailto:lsmartman@gmail.com]
> Sent: Wednesday, October 19, 2011 5:11 AM
> To: solr-user@lucene.apache.org
> Cc: Otis Gospodnetic
> Subject: Re: Solr MultiValue Fields and adding values
>
>
> I was hoping that wasent going to be the case... I ended up querying for
> all unique IDs in the DB, and then querying for each unique ID and
> getting all names, and then inserting them that way... Seems a lot
> slower than in theory it really should be...
>
> Thanks.
>
> --Tiernan
>
> On 18/10/2011 23:20, Otis Gospodnetic wrote:
> > Hi,
>
> > You'll need to construct the whole document and index it as such. You
> can't append values to document fields.
>
> > Otis
> > ----
>
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
>
>
> >> ________________________________
> >> From: Tiernan OToole <ls...@gmail.com>
> >> To: solr-user@lucene.apache.org
> >> Sent: Tuesday, October 18, 2011 11:41 AM
> >> Subject: Solr MultiValue Fields and adding values
> >>
> >> Good morning.
> >>
> >> I asked this question on StackOverflow, but though this group may be
> >> able to help... the question is available on SO here:
> http://bit.ly/r6MAWU
> >>
> >> here goes:
> >>
> >> I am building a search engine, and have a not so unique ID for a lot of
> >> different names... So, for example, there could be an id of B0051QVF7A
> >> which would have multiple names like "Kindle" "Amazon Kindle" "Amazon
> >> Kindle 3G" "Kindle Ebook Reader" "New Kindle" etc.
> >>
> >> The problem, and question i have, is that i am trying to enter this
data
> > >from a DB of 11 ish million rows. each is being read one at a time.
So i
> >> dont have all the names of each ID. I am adding new documents to the
> >> list each time.
> >>
> >> What i am trying to find out is how do i add names to an existing
> >> Document? if i am reading documentation correctly, it seems to
overwrite
> >> the whole document, not add extra info to the field... i just want to
> >> add an extra name to the document multivalue field...
> >>
> >> I know this could cause some weird and wonderful "issues" if a name is
> >> removed (in the example above, "New Kindle" could be removed when a
> >> newer Kindle gets released) but i am thinking of recreating the index
> >> every now and again, to clear out issues like that (once a month or so.
> >> Its taking about 45min currently to create the index).
> >>
> >> So, how do you add a value to a multivalue field in solr for an
existing
> >> document?
> >>
> >> Thanks in advance.
> >>
> >> --Tiernan
> >>
> >>
> >>
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
iEYEARECAAYFAk6e5j4ACgkQW5AKVqf62MHcnACbBGtTs25FjGe8Rs7q9DyO0J5r
VnEAnRiPe4KCe717i//aPFiAlYsLwELB
=eqRg
-----END PGP SIGNATURE-----


RE: Solr MultiValue Fields and adding values

Posted by "Dyer, James" <Ja...@ingrambook.com>.
While Solr/Lucene can't support true document updates, there are 2 ways you might be able to work around this in your situation.

1. If you store all of the fields, you can write something that will read back everything already indexed to the document, append whatever data you want, then write it back.  This will increase index size and possibly make indexing too slow.  On the other hand, it might be more efficient than requiring the database to return everything in order.

2. You could store your data as multiple documents per id (pick something else as your unique id).  Then use the grouping functionality to roll up on your unique id whenever you query.  This will mean changes to your application, probably a bigger index, and likely somewhat slower querying.  But the performance losses might be slight and this seems to me like it maybe would be a good solution in your case.  Perhaps it would make it so you wouldn't have to entirely re-index each month or so.  See http://wiki.apache.org/solr/FieldCollapsing for more information.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Tiernan OToole [mailto:lsmartman@gmail.com] 
Sent: Wednesday, October 19, 2011 5:11 AM
To: solr-user@lucene.apache.org
Cc: Otis Gospodnetic
Subject: Re: Solr MultiValue Fields and adding values


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 
I was hoping that wasent going to be the case... I ended up querying for
all unique IDs in the DB, and then querying for each unique ID and
getting all names, and then inserting them that way... Seems a lot
slower than in theory it really should be...
 
Thanks.
 
- --Tiernan
 
On 18/10/2011 23:20, Otis Gospodnetic wrote:
> Hi,
>
> You'll need to construct the whole document and index it as such. You
can't append values to document fields.
>
> Otis
> ----
>
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>> ________________________________
>> From: Tiernan OToole <ls...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, October 18, 2011 11:41 AM
>> Subject: Solr MultiValue Fields and adding values
>>
>> Good morning.
>>
>> I asked this question on StackOverflow, but though this group may be
>> able to help... the question is available on SO here:
http://bit.ly/r6MAWU
>>
>> here goes:
>>
>> I am building a search engine, and have a not so unique ID for a lot of
>> different names... So, for example, there could be an id of B0051QVF7A
>> which would have multiple names like "Kindle" "Amazon Kindle" "Amazon
>> Kindle 3G" "Kindle Ebook Reader" "New Kindle" etc.
>>
>> The problem, and question i have, is that i am trying to enter this data
> >from a DB of 11 ish million rows. each is being read one at a time. So i
>> dont have all the names of each ID. I am adding new documents to the
>> list each time.
>>
>> What i am trying to find out is how do i add names to an existing
>> Document? if i am reading documentation correctly, it seems to overwrite
>> the whole document, not add extra info to the field... i just want to
>> add an extra name to the document multivalue field...
>>
>> I know this could cause some weird and wonderful "issues" if a name is
>> removed (in the example above, "New Kindle" could be removed when a
>> newer Kindle gets released) but i am thinking of recreating the index
>> every now and again, to clear out issues like that (once a month or so.
>> Its taking about 45min currently to create the index).
>>
>> So, how do you add a value to a multivalue field in solr for an existing
>> document?
>>
>> Thanks in advance.
>>
>> --Tiernan
>>
>>
>>
 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
iEYEARECAAYFAk6eohcACgkQW5AKVqf62MEDiACgrYRvLITHbR2fv//dokfRem1g
gJcAoN0f8geuBJHHASRNGS4yDWc/RX2H
=4exA
-----END PGP SIGNATURE-----


Re: Solr MultiValue Fields and adding values

Posted by Tiernan OToole <ls...@gmail.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 
I was hoping that wasent going to be the case... I ended up querying for
all unique IDs in the DB, and then querying for each unique ID and
getting all names, and then inserting them that way... Seems a lot
slower than in theory it really should be...
 
Thanks.
 
- --Tiernan
 
On 18/10/2011 23:20, Otis Gospodnetic wrote:
> Hi,
>
> You'll need to construct the whole document and index it as such. You
can't append values to document fields.
>
> Otis
> ----
>
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>> ________________________________
>> From: Tiernan OToole <ls...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, October 18, 2011 11:41 AM
>> Subject: Solr MultiValue Fields and adding values
>>
>> Good morning.
>>
>> I asked this question on StackOverflow, but though this group may be
>> able to help... the question is available on SO here:
http://bit.ly/r6MAWU
>>
>> here goes:
>>
>> I am building a search engine, and have a not so unique ID for a lot of
>> different names... So, for example, there could be an id of B0051QVF7A
>> which would have multiple names like "Kindle" "Amazon Kindle" "Amazon
>> Kindle 3G" "Kindle Ebook Reader" "New Kindle" etc.
>>
>> The problem, and question i have, is that i am trying to enter this data
> >from a DB of 11 ish million rows. each is being read one at a time. So i
>> dont have all the names of each ID. I am adding new documents to the
>> list each time.
>>
>> What i am trying to find out is how do i add names to an existing
>> Document? if i am reading documentation correctly, it seems to overwrite
>> the whole document, not add extra info to the field... i just want to
>> add an extra name to the document multivalue field...
>>
>> I know this could cause some weird and wonderful "issues" if a name is
>> removed (in the example above, "New Kindle" could be removed when a
>> newer Kindle gets released) but i am thinking of recreating the index
>> every now and again, to clear out issues like that (once a month or so.
>> Its taking about 45min currently to create the index).
>>
>> So, how do you add a value to a multivalue field in solr for an existing
>> document?
>>
>> Thanks in advance.
>>
>> --Tiernan
>>
>>
>>
 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
iEYEARECAAYFAk6eohcACgkQW5AKVqf62MEDiACgrYRvLITHbR2fv//dokfRem1g
gJcAoN0f8geuBJHHASRNGS4yDWc/RX2H
=4exA
-----END PGP SIGNATURE-----


Re: Solr MultiValue Fields and adding values

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,

You'll need to construct the whole document and index it as such.  You can't append values to document fields.

Otis
----

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>________________________________
>From: Tiernan OToole <ls...@gmail.com>
>To: solr-user@lucene.apache.org
>Sent: Tuesday, October 18, 2011 11:41 AM
>Subject: Solr MultiValue Fields and adding values
>
>Good morning.
>
>I asked this question on StackOverflow, but though this group may be
>able to help... the question is available on SO here: http://bit.ly/r6MAWU
>
>here goes:
>
>I am building a search engine, and have a not so unique ID for a lot of
>different names... So, for example, there could be an id of B0051QVF7A
>which would have multiple names like "Kindle" "Amazon Kindle" "Amazon
>Kindle 3G" "Kindle Ebook Reader" "New Kindle" etc.
>
>The problem, and question i have, is that i am trying to enter this data
>from a DB of 11 ish million rows. each is being read one at a time. So i
>dont have all the names of each ID. I am adding new documents to the
>list each time.
>
>What i am trying to find out is how do i add names to an existing
>Document? if i am reading documentation correctly, it seems to overwrite
>the whole document, not add extra info to the field... i just want to
>add an extra name to the document multivalue field...
>
>I know this could cause some weird and wonderful "issues" if a name is
>removed (in the example above, "New Kindle" could be removed when a
>newer Kindle gets released) but i am thinking of recreating the index
>every now and again, to clear out issues like that (once a month or so.
>Its taking about 45min currently to create the index).
>
>So, how do you add a value to a multivalue field in solr for an existing
>document?
>
>Thanks in advance.
>
>--Tiernan
>
>
>