You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Peter Gabriel <za...@gmx.net> on 2009/11/13 16:26:48 UTC

scanning folders recursively / Tika

Hello.

I am on work with Tika 0.5 and want to scan a folder system about 10GB. 
Is there a comfortable way to scan folders recursively with an existing class or have i to write it myself? 

Any tips for best practise?

Greetings, Peter
-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser

Re: scanning folders recursively / Tika

Posted by Glen Newton <gl...@gmail.com>.
Have one thread recursing depth first down the directories & adding to
a queue (fixed size).
Have many threads reading off of the queue and doing the work.

-glen
http://zzzoot.blogspot.com/

2009/11/13 Peter Gabriel <za...@gmx.net>:
> Hello.
>
> I am on work with Tika 0.5 and want to scan a folder system about 10GB.
> Is there a comfortable way to scan folders recursively with an existing class or have i to write it myself?
>
> Any tips for best practise?
>
> Greetings, Peter
> --
> Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
> sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser
>



-- 

-

Re: scanning folders recursively / Tika

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Peter - if you want, download the code from Lucene in Action 1 or 2, it has index traversal and indexing.  2nd edition uses Tika.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: Peter Gabriel <za...@gmx.net>
> To: solr-user@lucene.apache.org
> Sent: Fri, November 13, 2009 10:26:48 AM
> Subject: scanning folders recursively / Tika
> 
> Hello.
> 
> I am on work with Tika 0.5 and want to scan a folder system about 10GB. 
> Is there a comfortable way to scan folders recursively with an existing class or 
> have i to write it myself? 
> 
> Any tips for best practise?
> 
> Greetings, Peter
> -- 
> Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
> sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser