You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vannia Rajan <kv...@gmail.com> on 2009/07/31 08:37:55 UTC

Recreating SOLR index after a schema change - without having to re-post the data

Hi,

  We are using solr-server for a large data-set. We need some changes in
solr schema.xml (datatype change from integer to sint for few fields). It
turns out that the two datatypes (integer and sint) are incompatible and
hence we need to re-index SOLR.

My question is:
   Is there any way by which i can just re-create the index files for
existing data/documents in solr? (without having to re-post the documents)

   I searched through many forums and everything seems to say : "I have to
re-post ALL documents to solr for re-indexing". Please suggest me a better
alternative to achieve my schema-change (I have very large solr-index -
sized around 10GB and it will be tough to query the whole data-set, store it
somewhere as XMLs and then to repost)

-- 
Thanks,
Vanniarajan

Re: Recreating SOLR index after a schema change - without having to re-post the data

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Fri, Jul 31, 2009 at 6:29 PM, Erik Hatcher <er...@ehatchersolutions.com>wrote:

>
> On Jul 31, 2009, at 7:01 AM, Vannia Rajan wrote:
>
>  On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher <erik@ehatchersolutions.com
>> >wrote:
>>
>>  You'll have to reindex your documents from scratch.  Such is the nature
>>> of
>>> changing the schema of an index.  It's always a great idea (in fact, I'd
>>> say
>>> mandatory) to have a full reindex process handy.
>>>
>>>
>>>  Thank you for your response. Yes, i need to make the setup handy to
>> query &
>> repost to solr - till this new feature is included in SOLR.
>>
>
> It's only tractable to do this if the original field values are stored,
> which is quite prohibitive in many cases.  So I don't think this is a
> feature that you'll see in Solr any time soon.
>

Yes, it would be more expensive. However for those wishing for such a
feature, there are two issues:

https://issues.apache.org/jira/browse/SOLR-828
https://issues.apache.org/jira/browse/SOLR-139

-- 
Regards,
Shalin Shekhar Mangar.

Re: Recreating SOLR index after a schema change - without having to re-post the data

Posted by Chantal Ackermann <ch...@btelligent.de>.
Hi Edwin,

what prevents you of storing the data (possibly formatted in SOLR xml 
input format) yourself on some disk?

Cheers,
Chantal

Edwin Stauthamer schrieb:
> That is a shame. I have much experience with Autonomy IDOL and the
> possibility of quickly reindexing the content without making a call to the
> original source is great. Just Export, update the config, and import
> (=reindex) to see if, for instance the performance is better or just to
> transport the information to an other server.
> 
> This can only be done of course when there are no fields added etc.
> 
> On Fri, Jul 31, 2009 at 2:59 PM, Erik Hatcher <er...@ehatchersolutions.com>wrote:
> 
>> On Jul 31, 2009, at 7:01 AM, Vannia Rajan wrote:
>>
>>  On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher <erik@ehatchersolutions.com
>>>> wrote:
>>>  You'll have to reindex your documents from scratch.  Such is the nature
>>>> of
>>>> changing the schema of an index.  It's always a great idea (in fact, I'd
>>>> say
>>>> mandatory) to have a full reindex process handy.
>>>>
>>>>
>>>>  Thank you for your response. Yes, i need to make the setup handy to
>>> query &
>>> repost to solr - till this new feature is included in SOLR.
>>>
>> It's only tractable to do this if the original field values are stored,
>> which is quite prohibitive in many cases.  So I don't think this is a
>> feature that you'll see in Solr any time soon.
>>
>>        Erik
>>
>>
> 
> 
> --
> Met vriendelijke groet / Kind regards,
> 
> Edwin Stauthamer
> Adviser Search & Collaboration
> Emid Consult
> T: +31 (0) 70 8870700
> M: +31 (0) 6 4555 4994
> E: estauthamer@emidconsult.com
> I: http://www.emidconsult.com

-- 
Chantal Ackermann
Consultant

mobil    +49 (176) 10 00 09 45
email    chantal.ackermann@btelligent.de

--------------------------------------------------------------------------------------------------------

b.telligent GmbH & Co. KG
Lichtenbergstraße 8
D-85748 Garching / München

fon       +49 (89) 54 84 25 60
fax        +49 (89) 54 84 25 69
web      www.btelligent.de

Registered in Munich: HRA 84393
Managing Director: b.telligent Verwaltungs GmbH, HRB 153164 represented 
by Sebastian Amtage and Klaus Blaschek
USt.Id.-Nr. DE814054803



Confidentiality Note
This email is intended only for the use of the individual or entity to 
which it is addressed, and may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. If the 
reader of this email message is not the intended recipient, or the 
employee or agent responsible for delivery of the message to the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this communication is prohibited. If you have 
received this email in error, please notify us immediately by telephone 
at +49 (0) 89 54 84 25 60. Thank you.

Re: Recreating SOLR index after a schema change - without having to re-post the data

Posted by Bill Au <bi...@gmail.com>.
The CSVLoader is very fast but it doesn't support document or field boosting
at index time.  If you don't need that you can also generate input data to
Solr into file(s) to be loaded by the CSVLoader.  Just reload whenever you
change the schema.  You will need to regenerate data if you add/remove
fields.  But you can simple reload from existing input file(s) if you are
only changing the properties of a field.

Bill

On Fri, Jul 31, 2009 at 9:41 AM, Edwin Stauthamer <
estauthamer@emidconsult.com> wrote:

> Simple but effective ;-)
>
> On Fri, Jul 31, 2009 at 3:23 PM, Erik Hatcher <erik@ehatchersolutions.com
> >wrote:
>
> > There certainly could be some intermediate storage of documents prior to
> > indexing, but as far as the Lucene index goes it is inherently a one-way
> > process.  Solr could facilitate this pretty easily... with an update
> > processor that wrote the documents coming in to some other storage (one
> > option: simple Solr XML files on the filesystem).  So hope is not lost.
> >
> >        Erik
> >
> >
> >
> >
> > On Jul 31, 2009, at 9:07 AM, Edwin Stauthamer wrote:
> >
> >  That is a shame. I have much experience with Autonomy IDOL and the
> >> possibility of quickly reindexing the content without making a call to
> the
> >> original source is great. Just Export, update the config, and import
> >> (=reindex) to see if, for instance the performance is better or just to
> >> transport the information to an other server.
> >>
> >> This can only be done of course when there are no fields added etc.
> >>
> >> On Fri, Jul 31, 2009 at 2:59 PM, Erik Hatcher <
> erik@ehatchersolutions.com
> >> >wrote:
> >>
> >>
> >>> On Jul 31, 2009, at 7:01 AM, Vannia Rajan wrote:
> >>>
> >>> On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher <
> >>> erik@ehatchersolutions.com
> >>>
> >>>> wrote:
> >>>>>
> >>>>
> >>>> You'll have to reindex your documents from scratch.  Such is the
> nature
> >>>>
> >>>>> of
> >>>>> changing the schema of an index.  It's always a great idea (in fact,
> >>>>> I'd
> >>>>> say
> >>>>> mandatory) to have a full reindex process handy.
> >>>>>
> >>>>>
> >>>>> Thank you for your response. Yes, i need to make the setup handy to
> >>>>>
> >>>> query &
> >>>> repost to solr - till this new feature is included in SOLR.
> >>>>
> >>>>
> >>> It's only tractable to do this if the original field values are stored,
> >>> which is quite prohibitive in many cases.  So I don't think this is a
> >>> feature that you'll see in Solr any time soon.
> >>>
> >>>      Erik
> >>>
> >>>
> >>>
> >>
> >> --
> >> Met vriendelijke groet / Kind regards,
> >>
> >> Edwin Stauthamer
> >> Adviser Search & Collaboration
> >> Emid Consult
> >> T: +31 (0) 70 8870700
> >> M: +31 (0) 6 4555 4994
> >> E: estauthamer@emidconsult.com
> >> I: http://www.emidconsult.com
> >>
> >
> >
>
>
> --
> Met vriendelijke groet / Kind regards,
>
> Edwin Stauthamer
> Adviser Search & Collaboration
> Emid Consult
> T: +31 (0) 70 8870700
> M: +31 (0) 6 4555 4994
> E: estauthamer@emidconsult.com
> I: http://www.emidconsult.com
>

Re: Recreating SOLR index after a schema change - without having to re-post the data

Posted by Edwin Stauthamer <es...@emidconsult.com>.
Simple but effective ;-)

On Fri, Jul 31, 2009 at 3:23 PM, Erik Hatcher <er...@ehatchersolutions.com>wrote:

> There certainly could be some intermediate storage of documents prior to
> indexing, but as far as the Lucene index goes it is inherently a one-way
> process.  Solr could facilitate this pretty easily... with an update
> processor that wrote the documents coming in to some other storage (one
> option: simple Solr XML files on the filesystem).  So hope is not lost.
>
>        Erik
>
>
>
>
> On Jul 31, 2009, at 9:07 AM, Edwin Stauthamer wrote:
>
>  That is a shame. I have much experience with Autonomy IDOL and the
>> possibility of quickly reindexing the content without making a call to the
>> original source is great. Just Export, update the config, and import
>> (=reindex) to see if, for instance the performance is better or just to
>> transport the information to an other server.
>>
>> This can only be done of course when there are no fields added etc.
>>
>> On Fri, Jul 31, 2009 at 2:59 PM, Erik Hatcher <erik@ehatchersolutions.com
>> >wrote:
>>
>>
>>> On Jul 31, 2009, at 7:01 AM, Vannia Rajan wrote:
>>>
>>> On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher <
>>> erik@ehatchersolutions.com
>>>
>>>> wrote:
>>>>>
>>>>
>>>> You'll have to reindex your documents from scratch.  Such is the nature
>>>>
>>>>> of
>>>>> changing the schema of an index.  It's always a great idea (in fact,
>>>>> I'd
>>>>> say
>>>>> mandatory) to have a full reindex process handy.
>>>>>
>>>>>
>>>>> Thank you for your response. Yes, i need to make the setup handy to
>>>>>
>>>> query &
>>>> repost to solr - till this new feature is included in SOLR.
>>>>
>>>>
>>> It's only tractable to do this if the original field values are stored,
>>> which is quite prohibitive in many cases.  So I don't think this is a
>>> feature that you'll see in Solr any time soon.
>>>
>>>      Erik
>>>
>>>
>>>
>>
>> --
>> Met vriendelijke groet / Kind regards,
>>
>> Edwin Stauthamer
>> Adviser Search & Collaboration
>> Emid Consult
>> T: +31 (0) 70 8870700
>> M: +31 (0) 6 4555 4994
>> E: estauthamer@emidconsult.com
>> I: http://www.emidconsult.com
>>
>
>


-- 
Met vriendelijke groet / Kind regards,

Edwin Stauthamer
Adviser Search & Collaboration
Emid Consult
T: +31 (0) 70 8870700
M: +31 (0) 6 4555 4994
E: estauthamer@emidconsult.com
I: http://www.emidconsult.com

Re: Recreating SOLR index after a schema change - without having to re-post the data

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
There certainly could be some intermediate storage of documents prior  
to indexing, but as far as the Lucene index goes it is inherently a  
one-way process.  Solr could facilitate this pretty easily... with an  
update processor that wrote the documents coming in to some other  
storage (one option: simple Solr XML files on the filesystem).  So  
hope is not lost.

	Erik



On Jul 31, 2009, at 9:07 AM, Edwin Stauthamer wrote:

> That is a shame. I have much experience with Autonomy IDOL and the
> possibility of quickly reindexing the content without making a call  
> to the
> original source is great. Just Export, update the config, and import
> (=reindex) to see if, for instance the performance is better or just  
> to
> transport the information to an other server.
>
> This can only be done of course when there are no fields added etc.
>
> On Fri, Jul 31, 2009 at 2:59 PM, Erik Hatcher <erik@ehatchersolutions.com 
> >wrote:
>
>>
>> On Jul 31, 2009, at 7:01 AM, Vannia Rajan wrote:
>>
>> On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher <erik@ehatchersolutions.com
>>>> wrote:
>>>
>>> You'll have to reindex your documents from scratch.  Such is the  
>>> nature
>>>> of
>>>> changing the schema of an index.  It's always a great idea (in  
>>>> fact, I'd
>>>> say
>>>> mandatory) to have a full reindex process handy.
>>>>
>>>>
>>>> Thank you for your response. Yes, i need to make the setup handy to
>>> query &
>>> repost to solr - till this new feature is included in SOLR.
>>>
>>
>> It's only tractable to do this if the original field values are  
>> stored,
>> which is quite prohibitive in many cases.  So I don't think this is a
>> feature that you'll see in Solr any time soon.
>>
>>       Erik
>>
>>
>
>
> -- 
> Met vriendelijke groet / Kind regards,
>
> Edwin Stauthamer
> Adviser Search & Collaboration
> Emid Consult
> T: +31 (0) 70 8870700
> M: +31 (0) 6 4555 4994
> E: estauthamer@emidconsult.com
> I: http://www.emidconsult.com


Re: Recreating SOLR index after a schema change - without having to re-post the data

Posted by Edwin Stauthamer <es...@emidconsult.com>.
That is a shame. I have much experience with Autonomy IDOL and the
possibility of quickly reindexing the content without making a call to the
original source is great. Just Export, update the config, and import
(=reindex) to see if, for instance the performance is better or just to
transport the information to an other server.

This can only be done of course when there are no fields added etc.

On Fri, Jul 31, 2009 at 2:59 PM, Erik Hatcher <er...@ehatchersolutions.com>wrote:

>
> On Jul 31, 2009, at 7:01 AM, Vannia Rajan wrote:
>
>  On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher <erik@ehatchersolutions.com
>> >wrote:
>>
>>  You'll have to reindex your documents from scratch.  Such is the nature
>>> of
>>> changing the schema of an index.  It's always a great idea (in fact, I'd
>>> say
>>> mandatory) to have a full reindex process handy.
>>>
>>>
>>>  Thank you for your response. Yes, i need to make the setup handy to
>> query &
>> repost to solr - till this new feature is included in SOLR.
>>
>
> It's only tractable to do this if the original field values are stored,
> which is quite prohibitive in many cases.  So I don't think this is a
> feature that you'll see in Solr any time soon.
>
>        Erik
>
>


-- 
Met vriendelijke groet / Kind regards,

Edwin Stauthamer
Adviser Search & Collaboration
Emid Consult
T: +31 (0) 70 8870700
M: +31 (0) 6 4555 4994
E: estauthamer@emidconsult.com
I: http://www.emidconsult.com

Re: Recreating SOLR index after a schema change - without having to re-post the data

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jul 31, 2009, at 7:01 AM, Vannia Rajan wrote:

> On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher <erik@ehatchersolutions.com 
> >wrote:
>
>> You'll have to reindex your documents from scratch.  Such is the  
>> nature of
>> changing the schema of an index.  It's always a great idea (in  
>> fact, I'd say
>> mandatory) to have a full reindex process handy.
>>
>>
> Thank you for your response. Yes, i need to make the setup handy to  
> query &
> repost to solr - till this new feature is included in SOLR.

It's only tractable to do this if the original field values are  
stored, which is quite prohibitive in many cases.  So I don't think  
this is a feature that you'll see in Solr any time soon.

	Erik


Re: Recreating SOLR index after a schema change - without having to re-post the data

Posted by Vannia Rajan <kv...@gmail.com>.
On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher <er...@ehatchersolutions.com>wrote:

> You'll have to reindex your documents from scratch.  Such is the nature of
> changing the schema of an index.  It's always a great idea (in fact, I'd say
> mandatory) to have a full reindex process handy.
>
>
Thank you for your response. Yes, i need to make the setup handy to query &
repost to solr - till this new feature is included in SOLR.

-- 
Thanks,
Vanniarajan

Re: Recreating SOLR index after a schema change - without having to re-post the data

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
You'll have to reindex your documents from scratch.  Such is the  
nature of changing the schema of an index.  It's always a great idea  
(in fact, I'd say mandatory) to have a full reindex process handy.

	Erik


On Jul 31, 2009, at 2:37 AM, Vannia Rajan wrote:

> Hi,
>
>  We are using solr-server for a large data-set. We need some changes  
> in
> solr schema.xml (datatype change from integer to sint for few  
> fields). It
> turns out that the two datatypes (integer and sint) are incompatible  
> and
> hence we need to re-index SOLR.
>
> My question is:
>   Is there any way by which i can just re-create the index files for
> existing data/documents in solr? (without having to re-post the  
> documents)
>
>   I searched through many forums and everything seems to say : "I  
> have to
> re-post ALL documents to solr for re-indexing". Please suggest me a  
> better
> alternative to achieve my schema-change (I have very large solr- 
> index -
> sized around 10GB and it will be tough to query the whole data-set,  
> store it
> somewhere as XMLs and then to repost)
>
> -- 
> Thanks,
> Vanniarajan


Re: Recreating SOLR index after a schema change - without having to re-post the data

Posted by Vannia Rajan <kv...@gmail.com>.
On Fri, Jul 31, 2009 at 3:17 PM, Tim Sell <tr...@gmail.com> wrote:

> Are you using solr as a data store?
>

No, data comes from somewhere else, solr is just for indexing giving back
query results.

>
> It is not possible via solr to change existing documents in a solr
> index. It would be a nice feature though.
>

Yes, it would be a nice feature. Is there any particular url where i can
submit new feature requests to solr? (The feature is: separating the "index"
and "data" in solr and if "index" is not available, re-create it from the
"data" automatically)

-- 
Thanks,
Vanniarajan

Re: Recreating SOLR index after a schema change - without having to re-post the data

Posted by Tim Sell <tr...@gmail.com>.
That really is the only way, it would be far easier if you were
importing from another source.
Are you using solr as a data store?

It is not possible via solr to change existing documents in a solr
index. It would be a nice feature though.

~Tim.

2009/7/31 Vannia Rajan <kv...@gmail.com>:
> Hi,
>
>  We are using solr-server for a large data-set. We need some changes in
> solr schema.xml (datatype change from integer to sint for few fields). It
> turns out that the two datatypes (integer and sint) are incompatible and
> hence we need to re-index SOLR.
>
> My question is:
>   Is there any way by which i can just re-create the index files for
> existing data/documents in solr? (without having to re-post the documents)
>
>   I searched through many forums and everything seems to say : "I have to
> re-post ALL documents to solr for re-indexing". Please suggest me a better
> alternative to achieve my schema-change (I have very large solr-index -
> sized around 10GB and it will be tough to query the whole data-set, store it
> somewhere as XMLs and then to repost)
>
> --
> Thanks,
> Vanniarajan
>