You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2007/06/28 14:00:40 UTC

[Lucene-java Wiki] Update of "LuceneFAQ" by NickBurch

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by NickBurch:
http://wiki.apache.org/lucene-java/LuceneFAQ

The comment on the change is:
Jakarta POI -> Apache POI, and note on using POI for visio files

------------------------------------------------------------------------------
  
  In order to index Word documents you need to first parse them to extract text that you want to index from them.  Here are some Word parsers that can help you with that:
  
- [http://jakarta.apache.org/poi/ Jakarta Apache POI] has an early development level Microsoft Word parser for versions of Word from Office 97, 2000, and XP.
+ [http://poi.apache.org/hwpf/ Apache POI] has an early development level Microsoft Word parser for versions of Word from Office 97, 2000, and XP.
- 
  
  ==== How can I index MS-Excel documents? ====
  
  In order to index Excel documents you need to first parse them to extract text that you want to index from them.  Here are some Excel parsers that can help you with that:
  
- [http://jakarta.apache.org/poi/ Jakarta Apache POI] has an excellent Microsoft Excel parser for versions of Excel from Office 97, 2000, and XP.  You can also modify Excel files with this tool.
+ [http://poi.apache.org/hssf/ Apache POI] has an excellent Microsoft Excel parser for versions of Excel from Office 97, 2000, and XP.  You can also modify Excel files with this tool.
- 
  
  ==== How can I index MS-Powerpoint documents? ====
  
- In order to index Powerpoint documents you need to first parse them to extract text that you want to index from them.  You can use the [http://jakarta.apache.org/poi/ Jakarta Apache POI], as it contains a parser for Powerpoint documents.
+ In order to index Powerpoint documents you need to first parse them to extract text that you want to index from them.  You can use the [http://poi.apache.org/hslf/  Apache POI], as it contains a parser for Powerpoint documents.
+ 
+ ==== How can I index MS-Visio documents? ====
+ 
+ In order to index Visio documents you need to first parse them to extract text that you want to index from them.  You can use the [http://poi.apache.org/hdgf/ Apache POI], as it contains a parser for Visio documents.
  
  
  ==== How can I index Email (from MS-Exchange or another IMAP server) ? ====