You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2009/07/09 16:27:16 UTC
DIH: URLDataSource and incremental indexing
I'm exploring other ways of getting data into Solr via
DataImportHandler than through a relational database, particularly the
URLDataSource.
I see the special commands for deleting by id and query as well as the
$hasMore/$nextUrl techniques, but I'm unclear on exactly how one would
go about designing a data source over HTTP that worked cleanly for
full importing and also for delta indexing.
For sake of argument, suppose I have /data.xml[?since=<some timestamp>]
[&start=X&rows=Y] and it could return documents in Solr XML (or really
any basic format) since the last time it was updated (or all records
if no since parameter is provided). And the service could also return
which records to remove since that timestamp too. Can I get there
from here using URLDataSource?
Have folks been doing this? If so, anyone care to share some basic
tips/tricks/examples?
Thanks,
Erik
Re: DIH: URLDataSource and incremental indexing
Posted by Noble Paul നോബിള് नोब्ळ् <no...@corp.aol.com>.
hi Erik,
It is designed to achieve this using a Transformer.
I am assuming that your API gives delta "deleted/modified/added" documents.
Always run a full-import with clean=false. Depending on the values
returned by the API your transformer can use $deleteById for deletes
etc.
$nextUrl and $hasMore can also be used to fetch more and more data .
Again these variables can be generated and put into the row by the
Transformer
we did it for one of our internal API for amessage boards using a
jsvascript transformer. you can do this with a java transformer as
well
On Thu, Jul 9, 2009 at 7:57 PM, Erik Hatcher<er...@ehatchersolutions.com> wrote:
> I'm exploring other ways of getting data into Solr via DataImportHandler
> than through a relational database, particularly the URLDataSource.
>
> I see the special commands for deleting by id and query as well as the
> $hasMore/$nextUrl techniques, but I'm unclear on exactly how one would go
> about designing a data source over HTTP that worked cleanly for full
> importing and also for delta indexing.
>
> For sake of argument, suppose I have /data.xml[?since=<some
> timestamp>][&start=X&rows=Y] and it could return documents in Solr XML (or
> really any basic format) since the last time it was updated (or all records
> if no since parameter is provided). And the service could also return which
> records to remove since that timestamp too. Can I get there from here using
> URLDataSource?
>
> Have folks been doing this? If so, anyone care to share some basic
> tips/tricks/examples?
>
> Thanks,
> Erik
>
>
--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com