You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by yriveiro <ya...@gmail.com> on 2014/11/06 15:19:50 UTC

Delete data from stored documents

Hi,

It's possible remove store data of an index deleting the unwanted fields
from schema.xml and after do an optimize over the index?

Thanks,

/yago



-----
Best regards
--
View this message in context: http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delete data from stored documents

Posted by Jack Krupansky <ja...@basetechnology.com>.
Agreed, but I think it would be great if Lucene and Solr provided an API to 
delete a single field for the entire index. We could file a Jira, but can 
Lucene accommodate it? Maybe we'll just have to wait for Elasticsearch to 
implement this feature!

-- Jack Krupansky

-----Original Message----- 
From: Anurag Sharma
Sent: Saturday, November 8, 2014 6:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Delete data from stored documents

Since the data already existing and need is to remove unwanted fields using
a custom update processor looks less useful here. Erick's
recommendation on re-indexing
into a new collection if at all possible looks simple and safe.



On Sat, Nov 8, 2014 at 12:44 AM, Erick Erickson <er...@gmail.com>
wrote:

> bq: My question is if I can delete the field definition from the
> schema.xml and do an optimize and the fields “magically” disappears
>
> no. schema.xml is really just about regularizing how Lucene indexes
> things. Lucene (where this would have to take place) doesn't have any
> understanding of schema.xml, so changing it then optimizing (and
> optimizing is also a Lucene function) won't have any effect.
>
> If you
> 1> change the schema
> and
> 2> update documents
> the data will be purged as background merges happen.
>
> But really, I'd recommend re-indexing into a new collection if at all
> possible.
>
>
> Best,
> Erick
>
> On Fri, Nov 7, 2014 at 4:26 AM, Yago Riveiro <ya...@gmail.com>
> wrote:
> > Jack,
> >
> >
> >
> >
> > I have some data indexed that I don’t need any more. My question is if I
> can delete the field definition from the schema.xml and do an optimize and
> the fields “magically” disappears (and free space from disk).
> >
> >
> >
> >
> > Re-index data to delete fields is to expensive in collections with
> hundreds of millions of documents.
> >
> >
> >
> >
> > Optimize operation seems to be a good place to shrink to documents ...
> >
> >
> >
> > —
> > /Yago Riveiro
> >
> > On Fri, Nov 7, 2014 at 12:19 PM, Jack Krupansky <jack@basetechnology.com
> >
> > wrote:
> >
> >> Could you clarify exactly what you are trying to do, like with an
> example? I
> >> mean, how exactly are you determining what fields are "unwanted"? Are
> you
> >> simply asking whether fields can be deleted from the index (and 
> >> schema)?
> >> -- Jack Krupansky
> >> -----Original Message-----
> >> From: yriveiro
> >> Sent: Thursday, November 6, 2014 9:19 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Delete data from stored documents
> >> Hi,
> >> It's possible remove store data of an index deleting the unwanted 
> >> fields
> >> from schema.xml and after do an optimize over the index?
> >> Thanks,
> >> /yago
> >> -----
> >> Best regards
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: Delete data from stored documents

Posted by Anurag Sharma <an...@gmail.com>.
Since the data already existing and need is to remove unwanted fields using
a custom update processor looks less useful here. Erick's
recommendation on re-indexing
into a new collection if at all possible looks simple and safe.



On Sat, Nov 8, 2014 at 12:44 AM, Erick Erickson <er...@gmail.com>
wrote:

> bq: My question is if I can delete the field definition from the
> schema.xml and do an optimize and the fields “magically” disappears
>
> no. schema.xml is really just about regularizing how Lucene indexes
> things. Lucene (where this would have to take place) doesn't have any
> understanding of schema.xml, so changing it then optimizing (and
> optimizing is also a Lucene function) won't have any effect.
>
> If you
> 1> change the schema
> and
> 2> update documents
> the data will be purged as background merges happen.
>
> But really, I'd recommend re-indexing into a new collection if at all
> possible.
>
>
> Best,
> Erick
>
> On Fri, Nov 7, 2014 at 4:26 AM, Yago Riveiro <ya...@gmail.com>
> wrote:
> > Jack,
> >
> >
> >
> >
> > I have some data indexed that I don’t need any more. My question is if I
> can delete the field definition from the schema.xml and do an optimize and
> the fields “magically” disappears (and free space from disk).
> >
> >
> >
> >
> > Re-index data to delete fields is to expensive in collections with
> hundreds of millions of documents.
> >
> >
> >
> >
> > Optimize operation seems to be a good place to shrink to documents ...
> >
> >
> >
> > —
> > /Yago Riveiro
> >
> > On Fri, Nov 7, 2014 at 12:19 PM, Jack Krupansky <jack@basetechnology.com
> >
> > wrote:
> >
> >> Could you clarify exactly what you are trying to do, like with an
> example? I
> >> mean, how exactly are you determining what fields are "unwanted"? Are
> you
> >> simply asking whether fields can be deleted from the index (and schema)?
> >> -- Jack Krupansky
> >> -----Original Message-----
> >> From: yriveiro
> >> Sent: Thursday, November 6, 2014 9:19 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Delete data from stored documents
> >> Hi,
> >> It's possible remove store data of an index deleting the unwanted fields
> >> from schema.xml and after do an optimize over the index?
> >> Thanks,
> >> /yago
> >> -----
> >> Best regards
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Delete data from stored documents

Posted by Erick Erickson <er...@gmail.com>.
bq: My question is if I can delete the field definition from the
schema.xml and do an optimize and the fields “magically” disappears

no. schema.xml is really just about regularizing how Lucene indexes
things. Lucene (where this would have to take place) doesn't have any
understanding of schema.xml, so changing it then optimizing (and
optimizing is also a Lucene function) won't have any effect.

If you
1> change the schema
and
2> update documents
the data will be purged as background merges happen.

But really, I'd recommend re-indexing into a new collection if at all possible.


Best,
Erick

On Fri, Nov 7, 2014 at 4:26 AM, Yago Riveiro <ya...@gmail.com> wrote:
> Jack,
>
>
>
>
> I have some data indexed that I don’t need any more. My question is if I can delete the field definition from the schema.xml and do an optimize and the fields “magically” disappears (and free space from disk).
>
>
>
>
> Re-index data to delete fields is to expensive in collections with hundreds of millions of documents.
>
>
>
>
> Optimize operation seems to be a good place to shrink to documents ...
>
>
>
> —
> /Yago Riveiro
>
> On Fri, Nov 7, 2014 at 12:19 PM, Jack Krupansky <ja...@basetechnology.com>
> wrote:
>
>> Could you clarify exactly what you are trying to do, like with an example? I
>> mean, how exactly are you determining what fields are "unwanted"? Are you
>> simply asking whether fields can be deleted from the index (and schema)?
>> -- Jack Krupansky
>> -----Original Message-----
>> From: yriveiro
>> Sent: Thursday, November 6, 2014 9:19 AM
>> To: solr-user@lucene.apache.org
>> Subject: Delete data from stored documents
>> Hi,
>> It's possible remove store data of an index deleting the unwanted fields
>> from schema.xml and after do an optimize over the index?
>> Thanks,
>> /yago
>> -----
>> Best regards
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delete data from stored documents

Posted by Yago Riveiro <ya...@gmail.com>.
Jack, 




I have some data indexed that I don’t need any more. My question is if I can delete the field definition from the schema.xml and do an optimize and the fields “magically” disappears (and free space from disk).




Re-index data to delete fields is to expensive in collections with hundreds of millions of documents.




Optimize operation seems to be a good place to shrink to documents ...



—
/Yago Riveiro

On Fri, Nov 7, 2014 at 12:19 PM, Jack Krupansky <ja...@basetechnology.com>
wrote:

> Could you clarify exactly what you are trying to do, like with an example? I 
> mean, how exactly are you determining what fields are "unwanted"? Are you 
> simply asking whether fields can be deleted from the index (and schema)?
> -- Jack Krupansky
> -----Original Message----- 
> From: yriveiro
> Sent: Thursday, November 6, 2014 9:19 AM
> To: solr-user@lucene.apache.org
> Subject: Delete data from stored documents
> Hi,
> It's possible remove store data of an index deleting the unwanted fields
> from schema.xml and after do an optimize over the index?
> Thanks,
> /yago
> -----
> Best regards
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
> Sent from the Solr - User mailing list archive at Nabble.com. 

Re: Delete data from stored documents

Posted by Jack Krupansky <ja...@basetechnology.com>.
Could you clarify exactly what you are trying to do, like with an example? I 
mean, how exactly are you determining what fields are "unwanted"? Are you 
simply asking whether fields can be deleted from the index (and schema)?

-- Jack Krupansky

-----Original Message----- 
From: yriveiro
Sent: Thursday, November 6, 2014 9:19 AM
To: solr-user@lucene.apache.org
Subject: Delete data from stored documents

Hi,

It's possible remove store data of an index deleting the unwanted fields
from schema.xml and after do an optimize over the index?

Thanks,

/yago



-----
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Delete data from stored documents

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
nope.

On Thu, Nov 6, 2014 at 5:19 PM, yriveiro <ya...@gmail.com> wrote:

> Hi,
>
> It's possible remove store data of an index deleting the unwanted fields
> from schema.xml and after do an optimize over the index?
>
> Thanks,
>
> /yago
>
>
>
> -----
> Best regards
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: Delete data from stored documents

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
On 7 November 2014 06:57, andrey prokopenko <an...@gmail.com> wrote:
> Full list of updateprocessors for 4.10 version can  be found here:
> http://lucene.apache.org/solr/4_10_2/solr-core/org/apache/solr/update/processor/UpdateRequestProcessorFactory.html

Actually, that's just the top level of the inheritance hierarchy and
you need to realize that lots of interesting URPs are hiding lower
down. Hence: http://www.solr-start.com/info/update-request-processors/

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

Re: Delete data from stored documents

Posted by andrey prokopenko <an...@gmail.com>.
Take a look over here: https://wiki.apache.org/solr/UpdateRequestProcessor
Full list of updateprocessors for 4.10 version can  be found here:
http://lucene.apache.org/solr/4_10_2/solr-core/org/apache/solr/update/processor/UpdateRequestProcessorFactory.html
You may pick up the most suitable for you as a template and make a custom
version, tailored to your needs.

On Fri, Nov 7, 2014 at 12:21 PM, Yago Riveiro <ya...@gmail.com>
wrote:

> Andrey
>
>
> Can you point me to any tutorial or howto where I can see how develop
> custom UpdateProcessor class?
>
>
> —
> /Yago Riveiro
>
> On Fri, Nov 7, 2014 at 10:39 AM, andrey prokopenko <an...@gmail.com>
> wrote:
>
> > With "out of the box" functionality, no. You have to develop custom
> > UpdateProcessor and add it to the updateprocessors chain.
> > On Thu, Nov 6, 2014 at 3:19 PM, yriveiro <ya...@gmail.com> wrote:
> >> Hi,
> >>
> >> It's possible remove store data of an index deleting the unwanted fields
> >> from schema.xml and after do an optimize over the index?
> >>
> >> Thanks,
> >>
> >> /yago
> >>
> >>
> >>
> >> -----
> >> Best regards
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>

Re: Delete data from stored documents

Posted by Yago Riveiro <ya...@gmail.com>.
Andrey


Can you point me to any tutorial or howto where I can see how develop custom UpdateProcessor class?


—
/Yago Riveiro

On Fri, Nov 7, 2014 at 10:39 AM, andrey prokopenko <an...@gmail.com>
wrote:

> With "out of the box" functionality, no. You have to develop custom
> UpdateProcessor and add it to the updateprocessors chain.
> On Thu, Nov 6, 2014 at 3:19 PM, yriveiro <ya...@gmail.com> wrote:
>> Hi,
>>
>> It's possible remove store data of an index deleting the unwanted fields
>> from schema.xml and after do an optimize over the index?
>>
>> Thanks,
>>
>> /yago
>>
>>
>>
>> -----
>> Best regards
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>

Re: Delete data from stored documents

Posted by andrey prokopenko <an...@gmail.com>.
With "out of the box" functionality, no. You have to develop custom
UpdateProcessor and add it to the updateprocessors chain.

On Thu, Nov 6, 2014 at 3:19 PM, yriveiro <ya...@gmail.com> wrote:

> Hi,
>
> It's possible remove store data of an index deleting the unwanted fields
> from schema.xml and after do an optimize over the index?
>
> Thanks,
>
> /yago
>
>
>
> -----
> Best regards
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Delete-data-from-stored-documents-tp4167990.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>