You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by trupti mulajkar <ac...@sheffield.ac.uk> on 2006/04/07 16:20:35 UTC

lucene indexing

hi 
can anyone suggest how to split files using lucene.
i am trying to index the TREC collection using lucene-1.4.3
i want lucene to read the multiple files within single TREC file and create an
index accordingly.

cheers,
trupti mulajkar
MSc Advanced Computer Science




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene indexing

Posted by Grant Ingersoll <gs...@syr.edu>.
Lucene does not provide this out of the box.  You will have to write a 
program to do it and feed the results to Lucene. 

If I remember right, these files are in XML, so you can probably use SAX 
or a pull parser. 

I think a number of TREC participants, in the past, have used Lucene, so 
you may be able to find someone on the web who is generous enough to 
have shared their implementation.

trupti mulajkar wrote:
> hi 
> can anyone suggest how to split files using lucene.
> i am trying to index the TREC collection using lucene-1.4.3
> i want lucene to read the multiple files within single TREC file and create an
> index accordingly.
>
> cheers,
> trupti mulajkar
> MSc Advanced Computer Science
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>   

-- 

Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
335 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org