You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ryan mckinley <ry...@gmail.com> on 2006/12/13 02:44:25 UTC

automatic index time field?

Is there a way to automatically set a field when a document is indexed?
Specifically, I'd like to have a date field updated to the current time when
a document is indexed.

I am trying to find the best way to re-index content on a live server.  I
can't wipe the index and start over as there will be active queries through
the whole process.

I have a bunch of stuff stored in SQL, my plan is to:
 * note the current time
 * Cycle through everything, <add>100 documents at a time</add>
 * when it is done, delete everything that was not updated since we started
this process.  Something like:
  <delete><query>index_time:[0 TO 2006-12-13T05:40:08,703]</query></delete>
 * <commit/>
 * <optimize/>

My options are:
1) Send the index time along with the document.
2) extend UpdateHandler (DirectUpdateHandler2) to do this automatically

1) is the easiest but requires that everyone sending data sends a valid
"index_time" field.
2) more complicated, but then we know everything has a valid "index_time"
field.

Thanks for any pointers!
ryan

Re: automatic index time field?

Posted by ryan mckinley <ry...@gmail.com>.
thanks for the advice.  I implemented option #2, followed the directions on:
 http://wiki.apache.org/solr/HowToContribute

and made:
  http://issues.apache.org/jira/browse/SOLR-82

The only change I might make is to have the schema store if it has fields
with default values so that DocumentBuilder.getDoc() does not cycle through
all fields if there aren't any.

Thanks
ryan



On 12/13/06, Chris Hostetter <ho...@fucit.org> wrote:
>
>
> : Is there a way to automatically set a field when a document is indexed?
> : Specifically, I'd like to have a date field updated to the current time
> when
> : a document is indexed.
>
> Your message reminded me that i never announced the new "Date Match"
> parsing code, which does let you say something like...
>
>   <field name="timestamp">NOW</field>
>
> ...in your <add><doc> calls, but there is currently no way to have
> "default" values for fields in your schema ... it's on the wishlist, but
> no one is currently pursueing it as far as i know.
>
> : I have a bunch of stuff stored in SQL, my plan is to:
> :  * note the current time
>
> ...the gist of your plan is sound, but to eliminate possible headaches
> from clock sync issues, instead of getting the "current time" from
> somewhere, i would query your index for the all docs (of the type
> you are interested in) sorted by date desc, and then note the date of the
> newst doc and later delete all docs with dates up to and including that
> one.
>
> : My options are:
> : 1) Send the index time along with the document.
> : 2) extend UpdateHandler (DirectUpdateHandler2) to do this automatically
> :
> : 1) is the easiest but requires that everyone sending data sends a valid
> : "index_time" field.
> : 2) more complicated, but then we know everything has a valid
> "index_time"
> : field.
>
> As i said, you could just put "NOW" in all of your docs, but if you are
> interested in pursuing option#2, the most general purpose and reusable
> approach miht be to add an optional default="value" attribute to the
> <field> declarations in the schema.xml (relevant classes are SchemaField
> and IndexSchema) and then modify the DocumentBuilder.getDoc method to
> check for any default values of fields the Document doesn't already have
> values for and add them .. then your timestamp field becomes...
>
> <field name="timestamp" type="date" indexed="true" stored="true"
> default="NOW" />
>
> ..but you can also have other default fields...
>
> <field name="forSale" type="boolean" indexed="true" stored="true"
> default="false" />
> <field name="type" type="string" indexed="true" stored="true"
> default="unknown" />
>
> ...etc.
>
>
> -Hoss
>
>

Re: automatic index time field?

Posted by Chris Hostetter <ho...@fucit.org>.
: Is there a way to automatically set a field when a document is indexed?
: Specifically, I'd like to have a date field updated to the current time when
: a document is indexed.

Your message reminded me that i never announced the new "Date Match"
parsing code, which does let you say something like...

  <field name="timestamp">NOW</field>

...in your <add><doc> calls, but there is currently no way to have
"default" values for fields in your schema ... it's on the wishlist, but
no one is currently pursueing it as far as i know.

: I have a bunch of stuff stored in SQL, my plan is to:
:  * note the current time

...the gist of your plan is sound, but to eliminate possible headaches
from clock sync issues, instead of getting the "current time" from
somewhere, i would query your index for the all docs (of the type
you are interested in) sorted by date desc, and then note the date of the
newst doc and later delete all docs with dates up to and including that
one.

: My options are:
: 1) Send the index time along with the document.
: 2) extend UpdateHandler (DirectUpdateHandler2) to do this automatically
:
: 1) is the easiest but requires that everyone sending data sends a valid
: "index_time" field.
: 2) more complicated, but then we know everything has a valid "index_time"
: field.

As i said, you could just put "NOW" in all of your docs, but if you are
interested in pursuing option#2, the most general purpose and reusable
approach miht be to add an optional default="value" attribute to the
<field> declarations in the schema.xml (relevant classes are SchemaField
and IndexSchema) and then modify the DocumentBuilder.getDoc method to
check for any default values of fields the Document doesn't already have
values for and add them .. then your timestamp field becomes...

<field name="timestamp" type="date" indexed="true" stored="true" default="NOW" />

..but you can also have other default fields...

<field name="forSale" type="boolean" indexed="true" stored="true" default="false" />
<field name="type" type="string" indexed="true" stored="true" default="unknown" />

...etc.


-Hoss