You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Veselin K <ve...@campbell-lange.net> on 2009/04/05 14:20:46 UTC

Schema for indexing PDF/Doc/XLS files

Hello, 
I just got the example of Solr 1.4 and Jetty running nicely, as expected.

I'm trying to design my Schema at the moment.
My goal is to index PDF/Doc/XLS files with the following fields:

0. ID number
1. Filename
2. File path
3. Modification date
4. File contents 
5. Number of pages

- Any tips on what type of fields should I use to get this data indexed?

- Is there a way to get the ID number incremented automatically by Solr,
  each time a document is added to the index?

- Would I be able to extract the information above using just the
  Solr/Tika features? Or would I have to source all values myself, except
  "file contents" and pass them to solr when indexing?


Thank you much.

Regards,
Veselin K