You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sa...@cfl.rr.com on 2006/02/09 17:13:10 UTC

anyone interested in taking over textmining.org?

The TextMining.org website keeps getting hacked and I don't have the 
time to upgrade postnuke to a more secure version. Also, because of 
legal reasons I can't maintain the software. I am more than willing 
to "hand-off" the project to lucene or someone else. It's an apache 2 
license so anyone can branch at anytime and use any license they want. 
However, if someone wants to take over and gets my seal of approval, I 
will make the textmining.org home page redirect to your site. 

It extracts text from Word documents pretty solidly. If there are 
problems, they are caused by fast-saved files or files saved with the 
doc extensions that aren't actually Word documents (rtf, html). Unlike 
POI, it supports Word 6.0/95 documents. There are many ways it can be 
improved but they are trivial changes in my opinion. The core logic is 
solid and is used in commercial/gov't applications. 

Send me an email directly if you are interested. 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org