You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Drew Kutcharian <dr...@venarc.com> on 2011/03/07 19:39:59 UTC
Looking for a Lucene Contractor
Hi Everyone,
We are looking for someone to help us build a similarity engine. Here are some preliminary specs for the project.
1) We want to be able to show similar posts when a user posts a new block of text. A good example of this is StackOverflow. When a user tries to ask a new question, the system displays similar questions.
2) This is for a messaging system, so indexing/analysis should happen preferably at the time of posting, not later.
3) The posts are going to be less than 1000 characters.
4) We anticipate to have a millions of posts so the solution should consider sharding techniques to shard the indexes on many machines.
5) The solution can be delivered as a stand alone Java SE solution which can be run from the command line, no web development necessary.
6) We expect clean APIs.
Thanks,
Drew