You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Venkata krishna <ve...@gmail.com> on 2014/06/12 16:22:45 UTC
Indexing Files Month by Month
Hi ,
I am using lucene solr , would like to use Data import handler for to index
files but millions of files are there to import so indexing process will
take more time. I decided to import files month by month,so could you please
provide an suggestion to import files month by month basis.
Thanks,
Venkata Krishna Tolusuri.
--
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Files-Month-by-Month-tp4141443.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing Files Month by Month
Posted by Erick Erickson <er...@gmail.com>.
Partition your files into month-size folders and have DIH work on one
directory at a time....
What I'd do is move away from DIH and use SolrJ. That way
1> you can take full control over what you do
2> you can offload the heavy lifting of parsing the various files
(I'm assuming here that you're indexing PDFs, Word docs, etc)
to a bunch of clients.
Here's some code samples:http://searchhub.org/2012/02/14/indexing-with-solrj/
Or, if you really want to get wild, consider the MapReduceIndexerTool. That
requires some infrastructure though.
Best,
Erick
On Thu, Jun 12, 2014 at 7:22 AM, Venkata krishna <ve...@gmail.com> wrote:
> Hi ,
>
> I am using lucene solr , would like to use Data import handler for to index
> files but millions of files are there to import so indexing process will
> take more time. I decided to import files month by month,so could you please
> provide an suggestion to import files month by month basis.
>
>
>
>
>
>
>
>
> Thanks,
>
> Venkata Krishna Tolusuri.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Files-Month-by-Month-tp4141443.html
> Sent from the Solr - User mailing list archive at Nabble.com.