You are viewing a plain text version of this content. The canonical link for it is here.

Posted to openrelevance-dev@lucene.apache.org by Omar Alonso <or...@yahoo.com> on 2009/10/16 23:49:01 UTC

OpenRelevance and crowdsourcing

Hello,

I would like to know if there is interest in trying some experiments on Mechanical Turk for the OpenRelevance project. I've done TREC and INEX on MTurk and is a good platform for trying relevance experiments.

Regards,

Omar

Re: OpenRelevance and crowdsourcing

Posted by Omar Alonso <or...@yahoo.com>.

Forgot to add this extra info.

Here is an example of a graded relevance evaluation experiment that I'm currently running: 

https://www.mturk.com/mturk/preview?groupId=5WPZ72HM8TVZZV1XGYG0

You can login to MTurk using your Amazon account and do a few assignments just to get an idea of the kind of work.

o.

--- On Fri, 10/16/09, Grant Ingersoll <gs...@apache.org> wrote:

> From: Grant Ingersoll <gs...@apache.org>
> Subject: Re: OpenRelevance and crowdsourcing
> To: openrelevance-dev@lucene.apache.org
> Cc: openrelevance-user@lucene.apache.org
> Date: Friday, October 16, 2009, 3:38 PM
> Hi Omar,
> 
> It sounds interesting, can you elaborate more on what you
> had in mind?
> 
> A few questions come to mind:
> 
> 1. Cost associated w/ Turk.
> 2. What dataset would you use?
> 
> -Grant
> 
> On Oct 16, 2009, at 5:49 PM, Omar Alonso wrote:
> 
> > Hello,
> > 
> > I would like to know if there is interest in trying
> some experiments on Mechanical Turk for the OpenRelevance
> project. I've done TREC and INEX on MTurk and is a good
> platform for trying relevance experiments.
> > 
> > Regards,
> > 
> > Omar
> > 
> > 
> > 
> 
> 
>

Re: OpenRelevance and crowdsourcing

Posted by Omar Alonso <or...@yahoo.com>.

Forgot to add this extra info.

Here is an example of a graded relevance evaluation experiment that I'm currently running: 

https://www.mturk.com/mturk/preview?groupId=5WPZ72HM8TVZZV1XGYG0

You can login to MTurk using your Amazon account and do a few assignments just to get an idea of the kind of work.

o.

--- On Fri, 10/16/09, Grant Ingersoll <gs...@apache.org> wrote:

> From: Grant Ingersoll <gs...@apache.org>
> Subject: Re: OpenRelevance and crowdsourcing
> To: openrelevance-dev@lucene.apache.org
> Cc: openrelevance-user@lucene.apache.org
> Date: Friday, October 16, 2009, 3:38 PM
> Hi Omar,
> 
> It sounds interesting, can you elaborate more on what you
> had in mind?
> 
> A few questions come to mind:
> 
> 1. Cost associated w/ Turk.
> 2. What dataset would you use?
> 
> -Grant
> 
> On Oct 16, 2009, at 5:49 PM, Omar Alonso wrote:
> 
> > Hello,
> > 
> > I would like to know if there is interest in trying
> some experiments on Mechanical Turk for the OpenRelevance
> project. I've done TREC and INEX on MTurk and is a good
> platform for trying relevance experiments.
> > 
> > Regards,
> > 
> > Omar
> > 
> > 
> > 
> 
> 
>

Re: OpenRelevance and crowdsourcing

Posted by Omar Alonso <or...@yahoo.com>.

> While I realize $100 isn't a lot, we simply don't have a
> budget for such experiments and the point of ORP is to be
> able do this in the community.  I suppose we could ask
> the ASF board for the money, but I don't think we are ready
> for that anyway.  I very much have a "If you build it,
> they will come" mentality, so I know if we can just get
> bootstrapped with some data and some queries and a way to
> collect their judgments, we can get people interested.

I'm not defending MTurk but it gives you a "world view" in terms of assessments versus a specific community. You can run the test within the community but you may also introduce bias in the experiment. There is a paper from Ellen Voorhees on SIGIR where she shows different agreement levels between NIST and U. Waterloo assessors.

You can still do a closed HIT (Human Intelligence Task) that pays $0 cent and is by invitation only. You probably need to pay Amazon something for hosting the experiment but that would reduce the cost dramatically. Of course, only the community would have access to it not all workers on MTurk.

If you want to build everything that is possible too. You can have a website that collects judgments for a set of query/docs. 

The INEX folks do the assessments on a volunteer basis but it takes quite a bit of time.

In any case, MTurk or not MTurk, I have some spare cycles in case people are interested in trying ideas.

Regards,

o.

Re: OpenRelevance and crowdsourcing

Posted by Omar Alonso <or...@yahoo.com>.

> While I realize $100 isn't a lot, we simply don't have a
> budget for such experiments and the point of ORP is to be
> able do this in the community.  I suppose we could ask
> the ASF board for the money, but I don't think we are ready
> for that anyway.  I very much have a "If you build it,
> they will come" mentality, so I know if we can just get
> bootstrapped with some data and some queries and a way to
> collect their judgments, we can get people interested.

I'm not defending MTurk but it gives you a "world view" in terms of assessments versus a specific community. You can run the test within the community but you may also introduce bias in the experiment. There is a paper from Ellen Voorhees on SIGIR where she shows different agreement levels between NIST and U. Waterloo assessors.

You can still do a closed HIT (Human Intelligence Task) that pays $0 cent and is by invitation only. You probably need to pay Amazon something for hosting the experiment but that would reduce the cost dramatically. Of course, only the community would have access to it not all workers on MTurk.

If you want to build everything that is possible too. You can have a website that collects judgments for a set of query/docs. 

The INEX folks do the assessments on a volunteer basis but it takes quite a bit of time.

In any case, MTurk or not MTurk, I have some spare cycles in case people are interested in trying ideas.

Regards,

o.

Re: OpenRelevance and crowdsourcing

Posted by Grant Ingersoll <gs...@apache.org>.

On Oct 16, 2009, at 8:30 PM, Omar Alonso wrote:

> Sure.
>
> 1- We can start by paying between 2 and 5 cents per document/query  
> pair (or document/topic) on a short data set (say 200 docs). That  
> should be in the order of $25 (assuming 2 cents and 5 turkers per  
> assignment + AMZN fee).
>
> It also depends how many experiments one would like to run. My  
> suggestion would be to run 2 or 3 experiments with some small data  
> sets for say $100 to see what kind of response we get back and then  
> think about something else at large scale.

While I realize $100 isn't a lot, we simply don't have a budget for  
such experiments and the point of ORP is to be able do this in the  
community.  I suppose we could ask the ASF board for the money, but I  
don't think we are ready for that anyway.  I very much have a "If you  
build it, they will come" mentality, so I know if we can just get  
bootstrapped with some data and some queries and a way to collect  
their judgments, we can get people interested.

>
> I have some tips on how to run crowdsourcing for relevance  
> evaluation here: http://wwwcsif.cs.ucdavis.edu/~alonsoom/ExperimentDesign.pdf

Thanks!

>
> 2- If the goal is to have everything open source (gold set +  
> relevance judgments), we need to produce a new data set from  
> scratch. Also, what is the goal here? What is the domain? Enterprise  
> search? Ad-hoc retrieval?

Yes.  I think the primary goal of ORP is to give people within Lucene  
a way to judge relevance that doesn't require us to purchase datasets,  
just like the contrib/benchmarker gives us a way to talk about  
performance.  So, while it may evolve to be more, I'd be happy with a  
simple, fixed collection at this point.  Wikipedia is OK, but in my  
experience, there is often only a few good answers for a query to  
begin with, so it's harder to judge recall, but that doesn't mean it  
isn't useful.

I know there are a lot of issues around curating a good collection,  
but I'd like to be pragmatic and just say, what can we arrive at in a  
reasonable amount of time that best approximates what someone doing,  
say, genetic/biopharma research might do.  Just getting a raw dataset  
like PubMed on a given day seems like a good first step, then we can  
work to clean it up and generate queries on it.

>
> In summary, I would start with something small (English only,  
> Creative Commons or Wikipedia). Build a few experiments and see the  
> results. Then expand on data sets and also make it multilingual.

Agreed.  I'm not too worried about multilingual just yet, but it is a  
fun problem.

>
> o.
>
> --- On Fri, 10/16/09, Grant Ingersoll <gs...@apache.org> wrote:
>
>> From: Grant Ingersoll <gs...@apache.org>
>> Subject: Re: OpenRelevance and crowdsourcing
>> To: openrelevance-dev@lucene.apache.org
>> Cc: openrelevance-user@lucene.apache.org
>> Date: Friday, October 16, 2009, 3:38 PM
>> Hi Omar,
>>
>> It sounds interesting, can you elaborate more on what you
>> had in mind?
>>
>> A few questions come to mind:
>>
>> 1. Cost associated w/ Turk.
>> 2. What dataset would you use?
>>
>> -Grant
>>
>> On Oct 16, 2009, at 5:49 PM, Omar Alonso wrote:
>>
>>> Hello,
>>>
>>> I would like to know if there is interest in trying
>> some experiments on Mechanical Turk for the OpenRelevance
>> project. I've done TREC and INEX on MTurk and is a good
>> platform for trying relevance experiments.
>>>
>>> Regards,
>>>
>>> Omar
>>>
>>>
>>>
>>
>>
>>
>
>
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

Re: OpenRelevance and crowdsourcing

Posted by Grant Ingersoll <gs...@apache.org>.

On Oct 16, 2009, at 8:30 PM, Omar Alonso wrote:

> Sure.
>
> 1- We can start by paying between 2 and 5 cents per document/query  
> pair (or document/topic) on a short data set (say 200 docs). That  
> should be in the order of $25 (assuming 2 cents and 5 turkers per  
> assignment + AMZN fee).
>
> It also depends how many experiments one would like to run. My  
> suggestion would be to run 2 or 3 experiments with some small data  
> sets for say $100 to see what kind of response we get back and then  
> think about something else at large scale.

While I realize $100 isn't a lot, we simply don't have a budget for  
such experiments and the point of ORP is to be able do this in the  
community.  I suppose we could ask the ASF board for the money, but I  
don't think we are ready for that anyway.  I very much have a "If you  
build it, they will come" mentality, so I know if we can just get  
bootstrapped with some data and some queries and a way to collect  
their judgments, we can get people interested.

>
> I have some tips on how to run crowdsourcing for relevance  
> evaluation here: http://wwwcsif.cs.ucdavis.edu/~alonsoom/ExperimentDesign.pdf

Thanks!

>
> 2- If the goal is to have everything open source (gold set +  
> relevance judgments), we need to produce a new data set from  
> scratch. Also, what is the goal here? What is the domain? Enterprise  
> search? Ad-hoc retrieval?

Yes.  I think the primary goal of ORP is to give people within Lucene  
a way to judge relevance that doesn't require us to purchase datasets,  
just like the contrib/benchmarker gives us a way to talk about  
performance.  So, while it may evolve to be more, I'd be happy with a  
simple, fixed collection at this point.  Wikipedia is OK, but in my  
experience, there is often only a few good answers for a query to  
begin with, so it's harder to judge recall, but that doesn't mean it  
isn't useful.

I know there are a lot of issues around curating a good collection,  
but I'd like to be pragmatic and just say, what can we arrive at in a  
reasonable amount of time that best approximates what someone doing,  
say, genetic/biopharma research might do.  Just getting a raw dataset  
like PubMed on a given day seems like a good first step, then we can  
work to clean it up and generate queries on it.

>
> In summary, I would start with something small (English only,  
> Creative Commons or Wikipedia). Build a few experiments and see the  
> results. Then expand on data sets and also make it multilingual.

Agreed.  I'm not too worried about multilingual just yet, but it is a  
fun problem.

>
> o.
>
> --- On Fri, 10/16/09, Grant Ingersoll <gs...@apache.org> wrote:
>
>> From: Grant Ingersoll <gs...@apache.org>
>> Subject: Re: OpenRelevance and crowdsourcing
>> To: openrelevance-dev@lucene.apache.org
>> Cc: openrelevance-user@lucene.apache.org
>> Date: Friday, October 16, 2009, 3:38 PM
>> Hi Omar,
>>
>> It sounds interesting, can you elaborate more on what you
>> had in mind?
>>
>> A few questions come to mind:
>>
>> 1. Cost associated w/ Turk.
>> 2. What dataset would you use?
>>
>> -Grant
>>
>> On Oct 16, 2009, at 5:49 PM, Omar Alonso wrote:
>>
>>> Hello,
>>>
>>> I would like to know if there is interest in trying
>> some experiments on Mechanical Turk for the OpenRelevance
>> project. I've done TREC and INEX on MTurk and is a good
>> platform for trying relevance experiments.
>>>
>>> Regards,
>>>
>>> Omar
>>>
>>>
>>>
>>
>>
>>
>
>
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

Re: OpenRelevance and crowdsourcing

Posted by Omar Alonso <or...@yahoo.com>.

Sure.

1- We can start by paying between 2 and 5 cents per document/query pair (or document/topic) on a short data set (say 200 docs). That should be in the order of $25 (assuming 2 cents and 5 turkers per assignment + AMZN fee). 

It also depends how many experiments one would like to run. My suggestion would be to run 2 or 3 experiments with some small data sets for say $100 to see what kind of response we get back and then think about something else at large scale. 

I have some tips on how to run crowdsourcing for relevance evaluation here: http://wwwcsif.cs.ucdavis.edu/~alonsoom/ExperimentDesign.pdf

2- If the goal is to have everything open source (gold set + relevance judgments), we need to produce a new data set from scratch. Also, what is the goal here? What is the domain? Enterprise search? Ad-hoc retrieval? 

In summary, I would start with something small (English only, Creative Commons or Wikipedia). Build a few experiments and see the results. Then expand on data sets and also make it multilingual.

o.

--- On Fri, 10/16/09, Grant Ingersoll <gs...@apache.org> wrote:

> From: Grant Ingersoll <gs...@apache.org>
> Subject: Re: OpenRelevance and crowdsourcing
> To: openrelevance-dev@lucene.apache.org
> Cc: openrelevance-user@lucene.apache.org
> Date: Friday, October 16, 2009, 3:38 PM
> Hi Omar,
> 
> It sounds interesting, can you elaborate more on what you
> had in mind?
> 
> A few questions come to mind:
> 
> 1. Cost associated w/ Turk.
> 2. What dataset would you use?
> 
> -Grant
> 
> On Oct 16, 2009, at 5:49 PM, Omar Alonso wrote:
> 
> > Hello,
> > 
> > I would like to know if there is interest in trying
> some experiments on Mechanical Turk for the OpenRelevance
> project. I've done TREC and INEX on MTurk and is a good
> platform for trying relevance experiments.
> > 
> > Regards,
> > 
> > Omar
> > 
> > 
> > 
> 
> 
>

Re: OpenRelevance and crowdsourcing

Posted by Omar Alonso <or...@yahoo.com>.

Sure.

1- We can start by paying between 2 and 5 cents per document/query pair (or document/topic) on a short data set (say 200 docs). That should be in the order of $25 (assuming 2 cents and 5 turkers per assignment + AMZN fee). 

It also depends how many experiments one would like to run. My suggestion would be to run 2 or 3 experiments with some small data sets for say $100 to see what kind of response we get back and then think about something else at large scale. 

I have some tips on how to run crowdsourcing for relevance evaluation here: http://wwwcsif.cs.ucdavis.edu/~alonsoom/ExperimentDesign.pdf

2- If the goal is to have everything open source (gold set + relevance judgments), we need to produce a new data set from scratch. Also, what is the goal here? What is the domain? Enterprise search? Ad-hoc retrieval? 

In summary, I would start with something small (English only, Creative Commons or Wikipedia). Build a few experiments and see the results. Then expand on data sets and also make it multilingual.

o.

--- On Fri, 10/16/09, Grant Ingersoll <gs...@apache.org> wrote:

> From: Grant Ingersoll <gs...@apache.org>
> Subject: Re: OpenRelevance and crowdsourcing
> To: openrelevance-dev@lucene.apache.org
> Cc: openrelevance-user@lucene.apache.org
> Date: Friday, October 16, 2009, 3:38 PM
> Hi Omar,
> 
> It sounds interesting, can you elaborate more on what you
> had in mind?
> 
> A few questions come to mind:
> 
> 1. Cost associated w/ Turk.
> 2. What dataset would you use?
> 
> -Grant
> 
> On Oct 16, 2009, at 5:49 PM, Omar Alonso wrote:
> 
> > Hello,
> > 
> > I would like to know if there is interest in trying
> some experiments on Mechanical Turk for the OpenRelevance
> project. I've done TREC and INEX on MTurk and is a good
> platform for trying relevance experiments.
> > 
> > Regards,
> > 
> > Omar
> > 
> > 
> > 
> 
> 
>

Re: OpenRelevance and crowdsourcing

Posted by Grant Ingersoll <gs...@apache.org>.

Hi Omar,

It sounds interesting, can you elaborate more on what you had in mind?

A few questions come to mind:

1. Cost associated w/ Turk.
2. What dataset would you use?

-Grant

On Oct 16, 2009, at 5:49 PM, Omar Alonso wrote:

> Hello,
>
> I would like to know if there is interest in trying some experiments  
> on Mechanical Turk for the OpenRelevance project. I've done TREC and  
> INEX on MTurk and is a good platform for trying relevance experiments.
>
> Regards,
>
> Omar
>
>
>

Re: OpenRelevance and crowdsourcing

Posted by Grant Ingersoll <gs...@apache.org>.

Hi Omar,

It sounds interesting, can you elaborate more on what you had in mind?

A few questions come to mind:

1. Cost associated w/ Turk.
2. What dataset would you use?

-Grant

On Oct 16, 2009, at 5:49 PM, Omar Alonso wrote:

> Hello,
>
> I would like to know if there is interest in trying some experiments  
> on Mechanical Turk for the OpenRelevance project. I've done TREC and  
> INEX on MTurk and is a good platform for trying relevance experiments.
>
> Regards,
>
> Omar
>
>
>