You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marc Brette <ma...@gmail.com> on 2013/08/02 17:53:43 UTC

Incremental update, of a sort

Hi,
I would like to completely populate a field for all the documents in the
index, without re-indexing the documents.

I know Solr supports 'Atomic Update', but this is no a real incremental
update of a document: it costs as much as re-indexing the document (and
require to store the document).
As Solr does not support a real incremental update, I wondered if that
would be easier to completely re-populate a field (i.e. easier than
inserting/modifying in the middle of a field index).

My use-case is the following:
- I have an index with a bunch of documents.
- A background process computes some additional metadata for the documents.
It produces metadata in batch for all the documents.
- These metadata are added in bulk to the existing index.

Any ideas? Let me know if this is more a question for the dev list.

Thanks,
Marc

Re: Incremental update, of a sort

Posted by Marc Brette <ma...@gmail.com>.
The field will contain semantic information about the document. It would
need to be searchable plus it will contain information that will be used as
part of the score. Probably a payload that will be used by a custom scorer.

On Friday, August 2, 2013, Mikhail Khludnev wrote:

> Marc,
>
> I wonder what's type of the field what kind of search you need on it
> filtering/ranking/boosting etc.
>
> Thanks
>
>
> On Fri, Aug 2, 2013 at 11:00 PM, Marc Brette <marc.brette@gmail.com<javascript:;>>
> wrote:
>
> > This is something I am considering.
> >
> > Ideally, I'd like to use the same index though.
> > I do need to query with other constraint but that could be resolved to
> some
> > extent by merging results post query.
> > The real headache with different indexes is management: deleting
> document,
> > backup/restore. We also have some internal index splitting mechanism that
> > would need to be taken into account.
> >
> > On Friday, August 2, 2013, Michael Della Bitta wrote:
> >
> > > Marc,
> > >
> > > Do you need to be able to query this field at the same time as other
> > > fields, or is the searching case isolated?
> > >
> > > Because if you can isolate searches that hit this field to just this
> > field,
> > > you could do it with a sidecar index and joins.
> > >
> > > Michael Della Bitta
> > >
> > > Applications Developer
> > >
> > > o: +1 646 532 3062  | c: +1 917 477 7906
> > >
> > > appinions inc.
> > >
> > > “The Science of Influence Marketing”
> > >
> > > 18 East 41st Street
> > >
> > > New York, NY 10017
> > >
> > > t: @appinions <https://twitter.com/Appinions> | g+:
> > > plus.google.com/appinions<
> > >
> >
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> > > >
> > > w: appinions.com <http://www.appinions.com/>
> > >
> > >
> > > On Fri, Aug 2, 2013 at 2:44 PM, Marc Brette <marc.brette@gmail.com<javascript:;>
> > <javascript:;>>
> > > wrote:
> > >
> > > > Unfortunately, it needs to be searchable
> > > >
> > > > Very good pointer anyway, I'll keep that in mind
> > > >
> > > > On Friday, August 2, 2013, Michael Della Bitta wrote:
> > > >
> > > > > Hi Marc,
> > > > >
> > > > > Have you considered using ExternalFileField for this?
> > > > >  On Aug 2, 2013 11:54 AM, "Marc Brette" <marc.brette@gmail.com<javascript:;>
> > <javascript:;>
> > > > <javascript:;>>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > > I would like to completely populate a field for all the documents
> > in
> > > > the
> > > > > > index, without re-indexing the documents.
> > > > > >
> > > > > > I know Solr supports 'Atomic Update', but this is no a real
> > > incremental
> > > > > > update of a document: it costs as much as re-indexing the
> document
> > > (and
> > > > > > require to store the document).
> > > > > > As Solr does not support a real incremental update, I wondered if
> > > that
> > > > > > would be easier to completely re-populate a field (i.e. easier
> than
> > > > > > inserting/modifying in the middle of a field index).
> > > > > >
> > > > > > My use-case is the following:
> > > > > > - I have an index with a bunch of documents.
> > > > > > - A background process computes some additional metadata for the
> > > > > documents.
> > > > > > It produces metadata in batch for all the documents.
> > > > > > - These metadata are added in bulk to the existing index.
> > > > > >
> > > > > > Any ideas? Let me know if this is more a question for the dev
> list.
> > > > > >
> > > > > > Thanks,
> > > > > > Marc
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mkhludnev@griddynamics.com <javascript:;>>
>

Re: Incremental update, of a sort

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Marc,

I wonder what's type of the field what kind of search you need on it
filtering/ranking/boosting etc.

Thanks


On Fri, Aug 2, 2013 at 11:00 PM, Marc Brette <ma...@gmail.com> wrote:

> This is something I am considering.
>
> Ideally, I'd like to use the same index though.
> I do need to query with other constraint but that could be resolved to some
> extent by merging results post query.
> The real headache with different indexes is management: deleting document,
> backup/restore. We also have some internal index splitting mechanism that
> would need to be taken into account.
>
> On Friday, August 2, 2013, Michael Della Bitta wrote:
>
> > Marc,
> >
> > Do you need to be able to query this field at the same time as other
> > fields, or is the searching case isolated?
> >
> > Because if you can isolate searches that hit this field to just this
> field,
> > you could do it with a sidecar index and joins.
> >
> > Michael Della Bitta
> >
> > Applications Developer
> >
> > o: +1 646 532 3062  | c: +1 917 477 7906
> >
> > appinions inc.
> >
> > “The Science of Influence Marketing”
> >
> > 18 East 41st Street
> >
> > New York, NY 10017
> >
> > t: @appinions <https://twitter.com/Appinions> | g+:
> > plus.google.com/appinions<
> >
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> > >
> > w: appinions.com <http://www.appinions.com/>
> >
> >
> > On Fri, Aug 2, 2013 at 2:44 PM, Marc Brette <marc.brette@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Unfortunately, it needs to be searchable
> > >
> > > Very good pointer anyway, I'll keep that in mind
> > >
> > > On Friday, August 2, 2013, Michael Della Bitta wrote:
> > >
> > > > Hi Marc,
> > > >
> > > > Have you considered using ExternalFileField for this?
> > > >  On Aug 2, 2013 11:54 AM, "Marc Brette" <marc.brette@gmail.com
> <javascript:;>
> > > <javascript:;>>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > > I would like to completely populate a field for all the documents
> in
> > > the
> > > > > index, without re-indexing the documents.
> > > > >
> > > > > I know Solr supports 'Atomic Update', but this is no a real
> > incremental
> > > > > update of a document: it costs as much as re-indexing the document
> > (and
> > > > > require to store the document).
> > > > > As Solr does not support a real incremental update, I wondered if
> > that
> > > > > would be easier to completely re-populate a field (i.e. easier than
> > > > > inserting/modifying in the middle of a field index).
> > > > >
> > > > > My use-case is the following:
> > > > > - I have an index with a bunch of documents.
> > > > > - A background process computes some additional metadata for the
> > > > documents.
> > > > > It produces metadata in batch for all the documents.
> > > > > - These metadata are added in bulk to the existing index.
> > > > >
> > > > > Any ideas? Let me know if this is more a question for the dev list.
> > > > >
> > > > > Thanks,
> > > > > Marc
> > > > >
> > > >
> > >
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Incremental update, of a sort

Posted by Marc Brette <ma...@gmail.com>.
This is something I am considering.

Ideally, I'd like to use the same index though.
I do need to query with other constraint but that could be resolved to some
extent by merging results post query.
The real headache with different indexes is management: deleting document,
backup/restore. We also have some internal index splitting mechanism that
would need to be taken into account.

On Friday, August 2, 2013, Michael Della Bitta wrote:

> Marc,
>
> Do you need to be able to query this field at the same time as other
> fields, or is the searching case isolated?
>
> Because if you can isolate searches that hit this field to just this field,
> you could do it with a sidecar index and joins.
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062  | c: +1 917 477 7906
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions <https://twitter.com/Appinions> | g+:
> plus.google.com/appinions<
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> >
> w: appinions.com <http://www.appinions.com/>
>
>
> On Fri, Aug 2, 2013 at 2:44 PM, Marc Brette <marc.brette@gmail.com<javascript:;>>
> wrote:
>
> > Unfortunately, it needs to be searchable
> >
> > Very good pointer anyway, I'll keep that in mind
> >
> > On Friday, August 2, 2013, Michael Della Bitta wrote:
> >
> > > Hi Marc,
> > >
> > > Have you considered using ExternalFileField for this?
> > >  On Aug 2, 2013 11:54 AM, "Marc Brette" <marc.brette@gmail.com<javascript:;>
> > <javascript:;>>
> > > wrote:
> > >
> > > > Hi,
> > > > I would like to completely populate a field for all the documents in
> > the
> > > > index, without re-indexing the documents.
> > > >
> > > > I know Solr supports 'Atomic Update', but this is no a real
> incremental
> > > > update of a document: it costs as much as re-indexing the document
> (and
> > > > require to store the document).
> > > > As Solr does not support a real incremental update, I wondered if
> that
> > > > would be easier to completely re-populate a field (i.e. easier than
> > > > inserting/modifying in the middle of a field index).
> > > >
> > > > My use-case is the following:
> > > > - I have an index with a bunch of documents.
> > > > - A background process computes some additional metadata for the
> > > documents.
> > > > It produces metadata in batch for all the documents.
> > > > - These metadata are added in bulk to the existing index.
> > > >
> > > > Any ideas? Let me know if this is more a question for the dev list.
> > > >
> > > > Thanks,
> > > > Marc
> > > >
> > >
> >
>

Re: Incremental update, of a sort

Posted by Michael Della Bitta <mi...@appinions.com>.
Marc,

Do you need to be able to query this field at the same time as other
fields, or is the searching case isolated?

Because if you can isolate searches that hit this field to just this field,
you could do it with a sidecar index and joins.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
w: appinions.com <http://www.appinions.com/>


On Fri, Aug 2, 2013 at 2:44 PM, Marc Brette <ma...@gmail.com> wrote:

> Unfortunately, it needs to be searchable
>
> Very good pointer anyway, I'll keep that in mind
>
> On Friday, August 2, 2013, Michael Della Bitta wrote:
>
> > Hi Marc,
> >
> > Have you considered using ExternalFileField for this?
> >  On Aug 2, 2013 11:54 AM, "Marc Brette" <marc.brette@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Hi,
> > > I would like to completely populate a field for all the documents in
> the
> > > index, without re-indexing the documents.
> > >
> > > I know Solr supports 'Atomic Update', but this is no a real incremental
> > > update of a document: it costs as much as re-indexing the document (and
> > > require to store the document).
> > > As Solr does not support a real incremental update, I wondered if that
> > > would be easier to completely re-populate a field (i.e. easier than
> > > inserting/modifying in the middle of a field index).
> > >
> > > My use-case is the following:
> > > - I have an index with a bunch of documents.
> > > - A background process computes some additional metadata for the
> > documents.
> > > It produces metadata in batch for all the documents.
> > > - These metadata are added in bulk to the existing index.
> > >
> > > Any ideas? Let me know if this is more a question for the dev list.
> > >
> > > Thanks,
> > > Marc
> > >
> >
>

Re: Incremental update, of a sort

Posted by Marc Brette <ma...@gmail.com>.
Unfortunately, it needs to be searchable

Very good pointer anyway, I'll keep that in mind

On Friday, August 2, 2013, Michael Della Bitta wrote:

> Hi Marc,
>
> Have you considered using ExternalFileField for this?
>  On Aug 2, 2013 11:54 AM, "Marc Brette" <marc.brette@gmail.com<javascript:;>>
> wrote:
>
> > Hi,
> > I would like to completely populate a field for all the documents in the
> > index, without re-indexing the documents.
> >
> > I know Solr supports 'Atomic Update', but this is no a real incremental
> > update of a document: it costs as much as re-indexing the document (and
> > require to store the document).
> > As Solr does not support a real incremental update, I wondered if that
> > would be easier to completely re-populate a field (i.e. easier than
> > inserting/modifying in the middle of a field index).
> >
> > My use-case is the following:
> > - I have an index with a bunch of documents.
> > - A background process computes some additional metadata for the
> documents.
> > It produces metadata in batch for all the documents.
> > - These metadata are added in bulk to the existing index.
> >
> > Any ideas? Let me know if this is more a question for the dev list.
> >
> > Thanks,
> > Marc
> >
>

Re: Incremental update, of a sort

Posted by Michael Della Bitta <mi...@appinions.com>.
Hi Marc,

Have you considered using ExternalFileField for this?
 On Aug 2, 2013 11:54 AM, "Marc Brette" <ma...@gmail.com> wrote:

> Hi,
> I would like to completely populate a field for all the documents in the
> index, without re-indexing the documents.
>
> I know Solr supports 'Atomic Update', but this is no a real incremental
> update of a document: it costs as much as re-indexing the document (and
> require to store the document).
> As Solr does not support a real incremental update, I wondered if that
> would be easier to completely re-populate a field (i.e. easier than
> inserting/modifying in the middle of a field index).
>
> My use-case is the following:
> - I have an index with a bunch of documents.
> - A background process computes some additional metadata for the documents.
> It produces metadata in batch for all the documents.
> - These metadata are added in bulk to the existing index.
>
> Any ideas? Let me know if this is more a question for the dev list.
>
> Thanks,
> Marc
>