You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Veselin K <ve...@campbell-lange.net> on 2009/04/05 14:20:46 UTC
Schema for indexing PDF/Doc/XLS files
Hello,
I just got the example of Solr 1.4 and Jetty running nicely, as expected.
I'm trying to design my Schema at the moment.
My goal is to index PDF/Doc/XLS files with the following fields:
0. ID number
1. Filename
2. File path
3. Modification date
4. File contents
5. Number of pages
- Any tips on what type of fields should I use to get this data indexed?
- Is there a way to get the ID number incremented automatically by Solr,
each time a document is added to the index?
- Would I be able to extract the information above using just the
Solr/Tika features? Or would I have to source all values myself, except
"file contents" and pass them to solr when indexing?
Thank you much.
Regards,
Veselin K