You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Mattmann, Chris A (3010)" <ch...@jpl.nasa.gov> on 2017/08/09 16:55:34 UTC

Release of TREC Dynamic Domain: Polar Dataset

Hi,

We have released our dataset collected from 2015-16 in the Polar Domain, called
the TREC Dynamic Domain Polar dataset.

Researchers interested in a rich dataset collected across the Scientific and Deep web
can use mine HTML pages, PDF files, images, video, audio, and other formats for 
scientific insights.

The data is described here:

https://github.com/chrismattmann/trec-dd-polar

And available from the NSF Arctic Data Center here:

https://arcticdata.io/catalog/#view/doi:10.18739/A2280J

If you use the dataset in your work, please consider citing it:

@inproceedings{burgess2015trec,
  title={TREC Dynamic Domain: Polar Science.},
  author={Burgess, Annie Bryant and Mattmann, Chris and Totaro, Giuseppe and McGibbney, Lewis John and Ramirez, Paul M},
  booktitle={TREC},
  year={2015}
}

(our TREC paper, and/or the DOI from the actual dataset).

Enjoy!

Cheers,
Chris Mattmann



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 


Re: Release of TREC Dynamic Domain: Polar Dataset

Posted by Tommaso Teofili <to...@gmail.com>.
cool, thanks Chris for sharing.

Regards,
Tommaso

Il giorno mer 9 ago 2017 alle ore 18:56 Mattmann, Chris A (3010) <
chris.a.mattmann@jpl.nasa.gov> ha scritto:

> Hi,
>
> We have released our dataset collected from 2015-16 in the Polar Domain,
> called
> the TREC Dynamic Domain Polar dataset.
>
> Researchers interested in a rich dataset collected across the Scientific
> and Deep web
> can use mine HTML pages, PDF files, images, video, audio, and other
> formats for
> scientific insights.
>
> The data is described here:
>
> https://github.com/chrismattmann/trec-dd-polar
>
> And available from the NSF Arctic Data Center here:
>
> https://arcticdata.io/catalog/#view/doi:10.18739/A2280J
>
> If you use the dataset in your work, please consider citing it:
>
> @inproceedings{burgess2015trec,
>   title={TREC Dynamic Domain: Polar Science.},
>   author={Burgess, Annie Bryant and Mattmann, Chris and Totaro, Giuseppe
> and McGibbney, Lewis John and Ramirez, Paul M},
>   booktitle={TREC},
>   year={2015}
> }
>
> (our TREC paper, and/or the DOI from the actual dataset).
>
> Enjoy!
>
> Cheers,
> Chris Mattmann
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, NSF & Open Source Projects Formulation and Development Offices
> (8212)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 180-503E, Mailstop: 180-503
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>