You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Drew Kutcharian <dr...@venarc.com> on 2011/03/07 19:39:59 UTC

Looking for a Lucene Contractor

Hi Everyone,

We are looking for someone to help us build a similarity engine. Here are some preliminary specs for the project.

1) We want to be able to show similar posts when a user posts a new block of text. A good example of this is StackOverflow. When a user tries to ask a new question, the system displays similar questions.

2) This is for a messaging system, so indexing/analysis should happen preferably at the time of posting, not later.

3) The posts are going to be less than 1000 characters.

4) We anticipate to have a millions of posts so the solution should consider sharding techniques to shard the indexes on many machines.

5) The solution can be delivered as a stand alone Java SE solution which can be run from the command line, no web development necessary.

6) We expect clean APIs.

Thanks,

Drew