You are viewing a plain text version of this content. The canonical link for it is here.
Posted to openrelevance-user@lucene.apache.org by Mauro Dragoni <ma...@gmail.com> on 2012/10/04 16:59:53 UTC

Full document collection

Hi to everyone,
I am looking for a complete document collection, freely available, containing:

* full document content
* queries
* relevance judgments

in order to test the enrichment of these documents with semantic
information coming from external knowledge bases.
Do you know of such a collection?

I found a link to the Ancestry.com Forum Dataset collection, but I am
not able to find the documents.
This collection should be very interesting for my experiments;
obviously others are welcome... :)

Thank you in advance for any help about it.

Mauro.

-- 
Dr. Mauro Dragoni
Ph.D. in Computer Science.
Post-Doc Researcher at Fondazione Bruno Kessler (FBK-IRST), Trento, Italy.

My Research Site: https://dkm.fbk.eu/index.php/Mauro_Dragoni
My Business Site: http://www.dragotechpro.com
My News-Project Site: http://www.dragotimes.com


Confidentially Notice. This electronic mail transmission may contain
legally privileged and/or confidential information. Do not read this,
if you are not the person named to.
Any use, distribution, copying or disclosure by any other person is
strictly prohibited.
If you received this transmission in error, please notify the sender
and delete the original transmission and its attachments without
reading or saving it in any manner.

Re: Full document collection

Posted by David Potocnik <da...@gmail.com>.
"""
Two large datasets with tens of thousands of queries (30,000+/10,000)
were released.
136 features have been extracted for each query-url pair
"""

Great! Thank you, I'll play with this asap.

d.

On 5 October 2012 10:37, Mauro Dragoni <ma...@gmail.com> wrote:
> Hi,
> if I understood correctly your problem, you should use Learning To
> Rank datasets.
> There are some very good datasets available, for example this one:
>
> http://research.microsoft.com/en-us/projects/mslr/
>
> Mauro.
>
> On Thu, Oct 4, 2012 at 7:45 PM, David Potocnik <da...@gmail.com> wrote:
>> Very interested in this as well!
>>
>> I am trying to optimize query parameters via reinforcement learning
>> (how people interact with the results), are there datasets for this?
>>
>> David Potocnik
>> --middlemachine
>>
>> On 4 October 2012 16:59, Mauro Dragoni <ma...@gmail.com> wrote:
>>> Hi to everyone,
>>> I am looking for a complete document collection, freely available, containing:
>>>
>>> * full document content
>>> * queries
>>> * relevance judgments
>>>
>>> in order to test the enrichment of these documents with semantic
>>> information coming from external knowledge bases.
>>> Do you know of such a collection?
>>>
>>> I found a link to the Ancestry.com Forum Dataset collection, but I am
>>> not able to find the documents.
>>> This collection should be very interesting for my experiments;
>>> obviously others are welcome... :)
>>>
>>> Thank you in advance for any help about it.
>>>
>>> Mauro.
>>>
>>> --
>>> Dr. Mauro Dragoni
>>> Ph.D. in Computer Science.
>>> Post-Doc Researcher at Fondazione Bruno Kessler (FBK-IRST), Trento, Italy.
>>>
>>> My Research Site: https://dkm.fbk.eu/index.php/Mauro_Dragoni
>>> My Business Site: http://www.dragotechpro.com
>>> My News-Project Site: http://www.dragotimes.com
>>>
>>>
>>> Confidentially Notice. This electronic mail transmission may contain
>>> legally privileged and/or confidential information. Do not read this,
>>> if you are not the person named to.
>>> Any use, distribution, copying or disclosure by any other person is
>>> strictly prohibited.
>>> If you received this transmission in error, please notify the sender
>>> and delete the original transmission and its attachments without
>>> reading or saving it in any manner.
>
>
>
> --
> Dr. Mauro Dragoni
> Ph.D. in Computer Science.
> Post-Doc Researcher at Fondazione Bruno Kessler (FBK-IRST), Trento, Italy.
>
> My Research Site: https://dkm.fbk.eu/index.php/Mauro_Dragoni
> My Business Site: http://www.dragotechpro.com
> My News-Project Site: http://www.dragotimes.com
>
>
> Confidentially Notice. This electronic mail transmission may contain
> legally privileged and/or confidential information. Do not read this,
> if you are not the person named to.
> Any use, distribution, copying or disclosure by any other person is
> strictly prohibited.
> If you received this transmission in error, please notify the sender
> and delete the original transmission and its attachments without
> reading or saving it in any manner.

Re: Full document collection

Posted by Mauro Dragoni <ma...@gmail.com>.
Hi,
if I understood correctly your problem, you should use Learning To
Rank datasets.
There are some very good datasets available, for example this one:

http://research.microsoft.com/en-us/projects/mslr/

Mauro.

On Thu, Oct 4, 2012 at 7:45 PM, David Potocnik <da...@gmail.com> wrote:
> Very interested in this as well!
>
> I am trying to optimize query parameters via reinforcement learning
> (how people interact with the results), are there datasets for this?
>
> David Potocnik
> --middlemachine
>
> On 4 October 2012 16:59, Mauro Dragoni <ma...@gmail.com> wrote:
>> Hi to everyone,
>> I am looking for a complete document collection, freely available, containing:
>>
>> * full document content
>> * queries
>> * relevance judgments
>>
>> in order to test the enrichment of these documents with semantic
>> information coming from external knowledge bases.
>> Do you know of such a collection?
>>
>> I found a link to the Ancestry.com Forum Dataset collection, but I am
>> not able to find the documents.
>> This collection should be very interesting for my experiments;
>> obviously others are welcome... :)
>>
>> Thank you in advance for any help about it.
>>
>> Mauro.
>>
>> --
>> Dr. Mauro Dragoni
>> Ph.D. in Computer Science.
>> Post-Doc Researcher at Fondazione Bruno Kessler (FBK-IRST), Trento, Italy.
>>
>> My Research Site: https://dkm.fbk.eu/index.php/Mauro_Dragoni
>> My Business Site: http://www.dragotechpro.com
>> My News-Project Site: http://www.dragotimes.com
>>
>>
>> Confidentially Notice. This electronic mail transmission may contain
>> legally privileged and/or confidential information. Do not read this,
>> if you are not the person named to.
>> Any use, distribution, copying or disclosure by any other person is
>> strictly prohibited.
>> If you received this transmission in error, please notify the sender
>> and delete the original transmission and its attachments without
>> reading or saving it in any manner.



-- 
Dr. Mauro Dragoni
Ph.D. in Computer Science.
Post-Doc Researcher at Fondazione Bruno Kessler (FBK-IRST), Trento, Italy.

My Research Site: https://dkm.fbk.eu/index.php/Mauro_Dragoni
My Business Site: http://www.dragotechpro.com
My News-Project Site: http://www.dragotimes.com


Confidentially Notice. This electronic mail transmission may contain
legally privileged and/or confidential information. Do not read this,
if you are not the person named to.
Any use, distribution, copying or disclosure by any other person is
strictly prohibited.
If you received this transmission in error, please notify the sender
and delete the original transmission and its attachments without
reading or saving it in any manner.

Re: Full document collection

Posted by David Potocnik <da...@gmail.com>.
Very interested in this as well!

I am trying to optimize query parameters via reinforcement learning
(how people interact with the results), are there datasets for this?

David Potocnik
--middlemachine

On 4 October 2012 16:59, Mauro Dragoni <ma...@gmail.com> wrote:
> Hi to everyone,
> I am looking for a complete document collection, freely available, containing:
>
> * full document content
> * queries
> * relevance judgments
>
> in order to test the enrichment of these documents with semantic
> information coming from external knowledge bases.
> Do you know of such a collection?
>
> I found a link to the Ancestry.com Forum Dataset collection, but I am
> not able to find the documents.
> This collection should be very interesting for my experiments;
> obviously others are welcome... :)
>
> Thank you in advance for any help about it.
>
> Mauro.
>
> --
> Dr. Mauro Dragoni
> Ph.D. in Computer Science.
> Post-Doc Researcher at Fondazione Bruno Kessler (FBK-IRST), Trento, Italy.
>
> My Research Site: https://dkm.fbk.eu/index.php/Mauro_Dragoni
> My Business Site: http://www.dragotechpro.com
> My News-Project Site: http://www.dragotimes.com
>
>
> Confidentially Notice. This electronic mail transmission may contain
> legally privileged and/or confidential information. Do not read this,
> if you are not the person named to.
> Any use, distribution, copying or disclosure by any other person is
> strictly prohibited.
> If you received this transmission in error, please notify the sender
> and delete the original transmission and its attachments without
> reading or saving it in any manner.