You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2010/05/26 12:39:18 UTC

[SOT] Tika + Hadoop

https://issues.apache.org/jira/browse/TIKA-433 might be of interest to those people looking to extract text from Office/PDF, etc. and then convert into Mahout vectors.

-Grant