You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Venkata krishna <ve...@gmail.com> on 2014/06/12 16:22:45 UTC

Indexing Files Month by Month

Hi ,

I am using lucene solr , would like to use Data import handler for to index
files but millions of files are there to import so indexing process will
take more time. I decided to import files month by month,so could you please
provide an suggestion  to import files month by month basis.








Thanks,

Venkata Krishna Tolusuri.



--
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Files-Month-by-Month-tp4141443.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing Files Month by Month

Posted by Erick Erickson <er...@gmail.com>.
Partition your files into month-size folders and have DIH work on one
directory at a time....

What I'd do is move away from DIH and use SolrJ. That way
1> you can take full control over what you do
2> you can offload the heavy lifting of parsing the various files
    (I'm assuming here that you're indexing PDFs, Word docs, etc)
    to a bunch of clients.

Here's some code samples:http://searchhub.org/2012/02/14/indexing-with-solrj/

Or, if you really want to get wild, consider the MapReduceIndexerTool. That
requires some infrastructure though.

Best,
Erick

On Thu, Jun 12, 2014 at 7:22 AM, Venkata krishna <ve...@gmail.com> wrote:
> Hi ,
>
> I am using lucene solr , would like to use Data import handler for to index
> files but millions of files are there to import so indexing process will
> take more time. I decided to import files month by month,so could you please
> provide an suggestion  to import files month by month basis.
>
>
>
>
>
>
>
>
> Thanks,
>
> Venkata Krishna Tolusuri.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Files-Month-by-Month-tp4141443.html
> Sent from the Solr - User mailing list archive at Nabble.com.