You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by smanad <sm...@gmail.com> on 2013/05/22 21:36:28 UTC

Scheduling DataImports

Hi, 

I am new to Solr and recently started exploring it for search/sort needs in
our webapp. 
I have couple of questions as below, (I am using solr 4.2.1 with default
core named collection1)
1. We have a use case where we would like to index data every 10 mins (avg).
Whats the best way to schedule data import every 10 mins or so? cron job?
2. Also, We are indexing data returned from an api which returns different
cache ttls. How can I re-index after ttl its expired? some process which
polls for the expiring soon entries and issues data-import command?

Any pointers will be much appreciated.
Thanks, 
-M



--
View this message in context: http://lucene.472066.n3.nabble.com/Scheduling-DataImports-tp4065435.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Scheduling DataImports

Posted by smanad <sm...@gmail.com>.
Thanks for the reply. 

Regarding second question, actually thats what I am looking for. 

My use case is, my DIH runs for 2 httpdatasources, api1 and api2 with
different ttls returned. I was thinking of saving this in a file something
like, 
url:api1, timestamp:100, expires: 60
url:api2, timestamp:101, expires: 30

Then, a cron job will run every min to see what entries are expiring in next
30 secs? entry#2 will be expiring so it will re-index that entry by running
DIH curl command for that entity.

Is there a better of scheduling DIH imports automatically? 

I read about NRT, is that related to this problem at all?

Thanks, 
-M




--
View this message in context: http://lucene.472066.n3.nabble.com/Scheduling-DataImports-tp4065435p4065873.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Scheduling DataImports

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
On first, the cron job that hits the DIH trigger URL will probably be
the easiest way.

Not sure I understood the second question. How do you store/know that
the entries expire. And how do you pull for those specific entries?

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, May 22, 2013 at 3:36 PM, smanad <sm...@gmail.com> wrote:
> Hi,
>
> I am new to Solr and recently started exploring it for search/sort needs in
> our webapp.
> I have couple of questions as below, (I am using solr 4.2.1 with default
> core named collection1)
> 1. We have a use case where we would like to index data every 10 mins (avg).
> Whats the best way to schedule data import every 10 mins or so? cron job?
> 2. Also, We are indexing data returned from an api which returns different
> cache ttls. How can I re-index after ttl its expired? some process which
> polls for the expiring soon entries and issues data-import command?
>
> Any pointers will be much appreciated.
> Thanks,
> -M
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Scheduling-DataImports-tp4065435.html
> Sent from the Solr - User mailing list archive at Nabble.com.