You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by eShard <zi...@yahoo.com> on 2014/01/30 21:40:49 UTC

Is there a way to get Solr to delete an uploaded document after its been indexed?

Hi,
My crawler uploads all the documents to Solr for indexing to a tomcat/temp
folder.  
Over time this folder grows so large that I run out of disk space.  
So, I wrote a bash script to delete the files and put it in the crontab.
However, if I delete the docs too soon, it doesn't get indexed; too late and
I run out of disk.
I'm still trying to find the right window...
So, (and this is probably a long shot)  I'm wondering if there's anything in
Solr that can delete these docs from /temp after they've been indexed...

Thank you,




--
View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-Solr-to-delete-an-uploaded-document-after-its-been-indexed-tp4114463.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a way to get Solr to delete an uploaded document after its been indexed?

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Well, it's your crawler that submits them, so the crawler should know
when to delete them.

If you want some sort of trigger from Solr, look at postCommit hook
defined in solrconfig.xml. Though all that gives you is timing, not
which documents to deal with.

You could probably also plug into UpdateRequestProcessor chain, where
you do have access to the document content.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Jan 31, 2014 at 3:40 AM, eShard <zi...@yahoo.com> wrote:
> Hi,
> My crawler uploads all the documents to Solr for indexing to a tomcat/temp
> folder.
> Over time this folder grows so large that I run out of disk space.
> So, I wrote a bash script to delete the files and put it in the crontab.
> However, if I delete the docs too soon, it doesn't get indexed; too late and
> I run out of disk.
> I'm still trying to find the right window...
> So, (and this is probably a long shot)  I'm wondering if there's anything in
> Solr that can delete these docs from /temp after they've been indexed...
>
> Thank you,
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-Solr-to-delete-an-uploaded-document-after-its-been-indexed-tp4114463.html
> Sent from the Solr - User mailing list archive at Nabble.com.