You are viewing a plain text version of this content. The canonical link for it is here.
Posted to openrelevance-user@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2010/02/13 17:36:10 UTC

Re: or-user perspective, teams, etc (was: Comments on ORP Wiki Additions ?)

Inline

On Feb 12, 2010, at 12:45 PM, Mark Bennett wrote:

> Hi Robert,
> 
> Discussing a few more of your terms, by "or-user" you mean folks that research search engine relevancy almost fulltime?  And then or-dev as the folks who write those tools?  Even counting TREC alums, academics and search engine vendor CTO's that seems like a rather specialized, relatively small group?
> 
> And this in contrast to orP, folks who are specifically interested in those roles in our merry little band?
> 
> The "teams" aspect of TREC is interesting one.  I'm mostly comfortable with letting that organically evolve into the opt-in model of ASF.  If academic teams or vendors want to participate, that's fine, if not, that's OK too (as far as I'm concerned).  Perhaps we should simply ask for disclosure when somebody wants to contribute.  There was some tension between TREC and commercial vendors in later years, as I understand it.  Maybe having an additional venue will catch some of their attention.

The ASF is about individuals.  Companies may fund individuals to work on the ASF, but at the end of the day sponsorship gets you a link on the sponsors page and shows that your values are in line w/ the ASF and that it is something a company wishes to show support for, but nothing else.  Thus, we don't need disclosure for anybody wanting to contribute.  Well, check that.  We need disclosure that they are in fact the individual they say they are, but even this is taken on face value.  The ASF strives to be vendor neutral.   We should evaluate the contributions solely on their merit.  It is the committers job to make such an evaluation on behalf of the ASF and then it is the PMC's job to make sure said contributions, when part of an official release, are correct and legal to the best of our knowledge.

> 
> I'm probably coming at this from a slightly different angle, as I don't research "relevancy" fulltime, it's just one of the aspects of search engines tech that companies care about.  And of course I also find it personally interesting, though my expectations for it have changed over the years.
> 
> Some aspects of TREC that we would might want to consider in the ASF model include:
> * What do we do when folks make claims outrageous about their engine's performance and claiming ORP validation.  I DO think we want to allow folks to make some public claims, if they are justified, I'd just like to see some guidelines about disclosure and vetting.

We don't need to do anything.  It's not our job to police.  Our job is to produce tools for people to perform relevance assessments under the Apache license.  The marketplace will take care of exposing the fallacy of any ill-gotten results. 

Again, however, remember that while ORP may be useful in a broader sense, it is not a requirement.  ORP is setup to give people a way to talk about relevance in a completely open way.  If someone chooses exaggerate claims or abuse it, that's their problem for being an idiot, not ours.

> * How do we "market" to the vendors and academics.  The opt-in model is not about coercion, but you can't opt in to something you don't know about.  :-)

We don't need to market ourselves at all.  If people find it of value, then they will come and pitch in.  If it's not of value, they won't.  Our focus, again, should be on producing competent code and good content which can be used to rigorously study relevance.

> 
> I've previously disclosed that I'm a search engine consultant, and our participation is ORP is often driven by what clients ask us about.  Some examples include:
> * Comparing engine A to engine B, after one or more engines have made relevancy claims
> * Checking a new search engine implementation, as part of a larger user acceptance process
> * Seeing the results of various relevancy tweaking techniques within the same engine; sometimes you can fix one problematic search and unknowingly break 10 others!
> * Initial validation against new content or content in different multiple languages
> * Having some type of baseline relevancy test that can be included in unit testing / regression testing

All of these are great reasons for you to participate and many align with the goals of the project.  Above all else, O/S is about scratching your own itch.

> 
> I realize these are very different motives than an academic who's devoted 30 or 40 years to studying IR.  Compared to that level detail these tests could be called a "drive-by".  :-)  As I've previously mentioned, convenient interaction with multiple engines is a paramount concern.  

Patches welcome, but we'll likely have to be careful about the rights on that code if it requires libraries from proprietary vendors.  

> I've started the process of actually contributing some code that does this.  I assume I'd use the JIRA/patch route for this.

+1

Re: or-user perspective, teams, etc (was: Comments on ORP Wiki Additions ?)

Posted by Mark Bennett <mb...@ideaeng.com>.
* inline

On Sat, Feb 13, 2010 at 8:36 AM, Grant Ingersoll <gs...@apache.org>wrote:

> We don't need to do anything.  It's not our job to police.  ...
>
* Do you know if ASF has ever gone after anybody for making any outrageous
claim?

> ... convenient interaction with multiple engines is a paramount concern.
>
> Patches welcome, but we'll likely have to be careful about the rights on
> that code if it requires libraries from proprietary vendors.
>
* The good news is most engines provide an XML interface these days, so it's
mostly a matter of XSLT, which is mine.