You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2012/07/17 21:24:20 UTC

Apache Nutch being used at National Snow and Ice Data Center: ESIP Federation

Hey Folks,

Ruth Duerr is presenting at today's ESIP Federation and Discovery Hackathon:

http://commons.esipfed.org/node/424

The U.S. National Snow and Ice Data Center (NSIDC) is deploying Apache Nutch and 
Solr to support discovery of datasets (called "casting").

Really interesting stuff, and worth contacting Ruth and NSIDC if you're interested.
I'm highly suggesting to to the NSIDC folks to try and contribute any updates or plugins
they are making to the software upstream here to the ASF.

Thanks!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Apache Nutch being used at National Snow and Ice Data Center: ESIP Federation

Posted by Ruth Duerr <rd...@nsidc.org>.
Hi Markus,

We are just starting this project.  The real goal is to be able to find new or updated data casts where ever they are on the web.  We haven't gotten there yet.  We have the concept of broad but shallow crawl of the web to find interesting sites, and deep crawl of interesting sites found.

Ruth

Sent from my iPad

On Jul 18, 2012, at 4:18 PM, Markus Jelsma <ma...@openindex.io> wrote:

> Hi Ian,
> 
> Thanks for sharing your work and experience. Do you use a fixed set of sites and data formats or extensions for data extraction or can you also discover new data casts on the web?
> 
> Cheers,
> 
> 
> 
> -----Original message-----
>> From:Ian Truslove <ia...@nsidc.org>
>> Sent: Wed 18-Jul-2012 17:03
>> To: Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>; <de...@nutch.apache.org> <de...@nutch.apache.org>
>> Cc: Ruth Duerr <rd...@nsidc.org>
>> Subject: Re: Apache Nutch being used at National Snow and Ice Data Center: ESIP Federation
>> 
>> Chris: message received - I signed up :)
>> 
>> As part of Ruth's Libre project (http://nsidc.org/libre/) we are using
>> Nutch to find various types of XML data.  We're targeting our search at
>> geospatial data, and more specifically cryospheric data, but the tools
>> will remain more broadly applicable.  Specifically we are looking for ESIP
>> data casts, collection casts, service casts, and ESIP Discovery OpenSearch
>> services (all the specs are in
>> http://wiki.esipfed.org/index.php/Discovery_Cluster).  These XML documents
>> and services are characterizable through fairly simple means such as XML
>> namespaces.
>> 
>> We are currently developing against the Nutch 1.4 tarball distribution
>> (SVN HEAD was moving quicker than our configuration could keep up with)
>> and plugging into a standalone Solr instance.
>> 
>> What we have done to date is do some basic configuration work, set the
>> code up to play nice(-ish) with Eclipse, our internal SVN, and our
>> CI/deployment system, and write some plugins to help us find our various
>> XML docs.  We wrote a pair to extract and index the full raw XML content
>> of the source document, extending the HtmlParseFilter and IndexingFilter
>> respectively.  XML (and of course HTML too) are just wrapped within a
>> CDATA section (and CDATA sections within the document are just removed),
>> and indexed as a big text blob in Solr.  We can do naive text matching and
>> are having success extracting the URLs of the data feeds we're after.
>> 
>> We also wrote a pair of plugins to keep track of the original index date
>> of a document (the overarching use case is to determine documents that are
>> newly found).  We used the ScoringFilter and IndexingFilter for those.
>> 
>> Planned work includes extracting data from the XML before indexing and
>> using Solr fields more effectively, indexing GCMD keywords, simple spatial
>> subsetting, and tweaking the ranking algorithms to do a broad search to
>> identify good sites for deep data searches.
>> 
>> Thanks for the interest - it's been a fun project to work on so far, and
>> I'm sure we'd be happy to talk more or provide more details.
>> 
>> -Ian.
>> 
>> 
>> 
>> --
>> Ian Truslove
>> Senior Software Engineer
>> National Snow and Ice Data Center
>> University of Colorado
>> 449 UCB,  Boulder, CO 80309
>> 
>> 
>> 
>> 
>> 
>> 
>> On 7/17/12 9:38 PM, "Mattmann, Chris A (388J)"
>> <ch...@jpl.nasa.gov> wrote:
>> 
>>> Hi Markus,
>>> 
>>> Great question. I am CC'ing Ruth Duerr and Ian Truslove and Ruth Duerr at
>>> NSIDC -- maybe they
>>> can provide more information?
>>> 
>>> Ruth, ian, please consider subcribing to dev@nutch.apache.org and/or
>>> user@nutch.apache.org
>>> by sending blank emails to:
>>> 
>>> dev-subscribe@nutch.apache.org
>>> user-subscribe@nutch.apache.org
>>> 
>>> To follow along in the conversation.
>>> 
>>> Thanks all!
>>> 
>>> Cheers,
>>> Chris
>>> 
>>> On Jul 17, 2012, at 5:27 PM, Markus Jelsma wrote:
>>> 
>>>> Cool!
>>>> 
>>>> What are they exactly doing with Apache Nutch? And, more interesting,
>>>> what non-standard stuff do they use?
>>>> 
>>>> Cheers
>>>> 
>>>> -----Original message-----
>>>>> From:Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>
>>>>> Sent: Tue 17-Jul-2012 21:29
>>>>> To: dev@nutch.apache.org
>>>>> Subject: Apache Nutch being used at National Snow and Ice Data Center:
>>>>> ESIP Federation
>>>>> 
>>>>> Hey Folks,
>>>>> 
>>>>> Ruth Duerr is presenting at today's ESIP Federation and Discovery
>>>>> Hackathon:
>>>>> 
>>>>> http://commons.esipfed.org/node/424
>>>>> 
>>>>> The U.S. National Snow and Ice Data Center (NSIDC) is deploying Apache
>>>>> Nutch and 
>>>>> Solr to support discovery of datasets (called "casting").
>>>>> 
>>>>> Really interesting stuff, and worth contacting Ruth and NSIDC if
>>>>> you're interested.
>>>>> I'm highly suggesting to to the NSIDC folks to try and contribute any
>>>>> updates or plugins
>>>>> they are making to the software upstream here to the ASF.
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> Cheers,
>>>>> Chris
>>>>> 
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Chris Mattmann, Ph.D.
>>>>> Senior Computer Scientist
>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>> Office: 171-266B, Mailstop: 171-246
>>>>> Email: chris.a.mattmann@nasa.gov
>>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> 
>>>>> 
>>> 
>>> 
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> 
>> 

RE: Apache Nutch being used at National Snow and Ice Data Center: ESIP Federation

Posted by Markus Jelsma <ma...@openindex.io>.
Hi Ian,

Thanks for sharing your work and experience. Do you use a fixed set of sites and data formats or extensions for data extraction or can you also discover new data casts on the web?

Cheers,

 
 
-----Original message-----
> From:Ian Truslove <ia...@nsidc.org>
> Sent: Wed 18-Jul-2012 17:03
> To: Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>; <de...@nutch.apache.org> <de...@nutch.apache.org>
> Cc: Ruth Duerr <rd...@nsidc.org>
> Subject: Re: Apache Nutch being used at National Snow and Ice Data Center: ESIP Federation
> 
> Chris: message received - I signed up :)
> 
> As part of Ruth's Libre project (http://nsidc.org/libre/) we are using
> Nutch to find various types of XML data.  We're targeting our search at
> geospatial data, and more specifically cryospheric data, but the tools
> will remain more broadly applicable.  Specifically we are looking for ESIP
> data casts, collection casts, service casts, and ESIP Discovery OpenSearch
> services (all the specs are in
> http://wiki.esipfed.org/index.php/Discovery_Cluster).  These XML documents
> and services are characterizable through fairly simple means such as XML
> namespaces.
> 
> We are currently developing against the Nutch 1.4 tarball distribution
> (SVN HEAD was moving quicker than our configuration could keep up with)
> and plugging into a standalone Solr instance.
> 
> What we have done to date is do some basic configuration work, set the
> code up to play nice(-ish) with Eclipse, our internal SVN, and our
> CI/deployment system, and write some plugins to help us find our various
> XML docs.  We wrote a pair to extract and index the full raw XML content
> of the source document, extending the HtmlParseFilter and IndexingFilter
> respectively.  XML (and of course HTML too) are just wrapped within a
> CDATA section (and CDATA sections within the document are just removed),
> and indexed as a big text blob in Solr.  We can do naive text matching and
> are having success extracting the URLs of the data feeds we're after.
> 
> We also wrote a pair of plugins to keep track of the original index date
> of a document (the overarching use case is to determine documents that are
> newly found).  We used the ScoringFilter and IndexingFilter for those.
> 
> Planned work includes extracting data from the XML before indexing and
> using Solr fields more effectively, indexing GCMD keywords, simple spatial
> subsetting, and tweaking the ranking algorithms to do a broad search to
> identify good sites for deep data searches.
> 
> Thanks for the interest - it's been a fun project to work on so far, and
> I'm sure we'd be happy to talk more or provide more details.
> 
> -Ian.
> 
> 
> 
> --
> Ian Truslove
> Senior Software Engineer
> National Snow and Ice Data Center
> University of Colorado
> 449 UCB,  Boulder, CO 80309
> 
> 
> 
> 
> 
> 
> On 7/17/12 9:38 PM, "Mattmann, Chris A (388J)"
> <ch...@jpl.nasa.gov> wrote:
> 
> >Hi Markus,
> >
> >Great question. I am CC'ing Ruth Duerr and Ian Truslove and Ruth Duerr at
> >NSIDC -- maybe they
> >can provide more information?
> >
> >Ruth, ian, please consider subcribing to dev@nutch.apache.org and/or
> >user@nutch.apache.org
> >by sending blank emails to:
> >
> >dev-subscribe@nutch.apache.org
> >user-subscribe@nutch.apache.org
> >
> >To follow along in the conversation.
> >
> >Thanks all!
> >
> >Cheers,
> >Chris
> >
> >On Jul 17, 2012, at 5:27 PM, Markus Jelsma wrote:
> >
> >> Cool!
> >> 
> >> What are they exactly doing with Apache Nutch? And, more interesting,
> >>what non-standard stuff do they use?
> >> 
> >> Cheers
> >> 
> >> -----Original message-----
> >>> From:Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>
> >>> Sent: Tue 17-Jul-2012 21:29
> >>> To: dev@nutch.apache.org
> >>> Subject: Apache Nutch being used at National Snow and Ice Data Center:
> >>>ESIP Federation
> >>> 
> >>> Hey Folks,
> >>> 
> >>> Ruth Duerr is presenting at today's ESIP Federation and Discovery
> >>>Hackathon:
> >>> 
> >>> http://commons.esipfed.org/node/424
> >>> 
> >>> The U.S. National Snow and Ice Data Center (NSIDC) is deploying Apache
> >>>Nutch and 
> >>> Solr to support discovery of datasets (called "casting").
> >>> 
> >>> Really interesting stuff, and worth contacting Ruth and NSIDC if
> >>>you're interested.
> >>> I'm highly suggesting to to the NSIDC folks to try and contribute any
> >>>updates or plugins
> >>> they are making to the software upstream here to the ASF.
> >>> 
> >>> Thanks!
> >>> 
> >>> Cheers,
> >>> Chris
> >>> 
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Chris Mattmann, Ph.D.
> >>> Senior Computer Scientist
> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> Office: 171-266B, Mailstop: 171-246
> >>> Email: chris.a.mattmann@nasa.gov
> >>> WWW:   http://sunset.usc.edu/~mattmann/
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Adjunct Assistant Professor, Computer Science Department
> >>> University of Southern California, Los Angeles, CA 90089 USA
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> 
> >>> 
> >
> >
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >Chris Mattmann, Ph.D.
> >Senior Computer Scientist
> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >Office: 171-266B, Mailstop: 171-246
> >Email: chris.a.mattmann@nasa.gov
> >WWW:   http://sunset.usc.edu/~mattmann/
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >Adjunct Assistant Professor, Computer Science Department
> >University of Southern California, Los Angeles, CA 90089 USA
> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> 

Re: Apache Nutch being used at National Snow and Ice Data Center: ESIP Federation

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Ian,

On Jul 18, 2012, at 10:01 AM, Ian Truslove wrote:

> Chris: message received - I signed up :)

Thanks for doing this!

> 
> As part of Ruth's Libre project (http://nsidc.org/libre/) we are using
> Nutch to find various types of XML data.  We're targeting our search at
> geospatial data, and more specifically cryospheric data, but the tools
> will remain more broadly applicable.  Specifically we are looking for ESIP
> data casts, collection casts, service casts, and ESIP Discovery OpenSearch
> services (all the specs are in
> http://wiki.esipfed.org/index.php/Discovery_Cluster).  These XML documents
> and services are characterizable through fairly simple means such as XML
> namespaces.
> 
> We are currently developing against the Nutch 1.4 tarball distribution
> (SVN HEAD was moving quicker than our configuration could keep up with)
> and plugging into a standalone Solr instance.
> 
> What we have done to date is do some basic configuration work, set the
> code up to play nice(-ish) with Eclipse, our internal SVN, and our
> CI/deployment system, and write some plugins to help us find our various
> XML docs.  We wrote a pair to extract and index the full raw XML content
> of the source document, extending the HtmlParseFilter and IndexingFilter
> respectively.  XML (and of course HTML too) are just wrapped within a
> CDATA section (and CDATA sections within the document are just removed),
> and indexed as a big text blob in Solr.  We can do naive text matching and
> are having success extracting the URLs of the data feeds we're after.
> 
> We also wrote a pair of plugins to keep track of the original index date
> of a document (the overarching use case is to determine documents that are
> newly found).  We used the ScoringFilter and IndexingFilter for those.
> 
> Planned work includes extracting data from the XML before indexing and
> using Solr fields more effectively, indexing GCMD keywords, simple spatial
> subsetting, and tweaking the ranking algorithms to do a broad search to
> identify good sites for deep data searches.
> 
> Thanks for the interest - it's been a fun project to work on so far, and
> I'm sure we'd be happy to talk more or provide more details.

Super awesome! 

Well if you get around to it, feel free to:

1. file JIRA issues at our JIRA issue tracker https://issues.apache.org/jira/browse/NUTCH identifying, as incrementally and as easily revertible and small as possible your changes.
2. create patch files and attach them to our JIRA issue tracker for the issues that you create in #1
3. work with a committer here in Nutch to get your patches contributed. Usually having unit tests, code that conforms to the rest of the Nutch style (e.g., no tabs, etc.), are all good helpers. Doug Cutting used to say if he could apply a few of your patches without modification, then you are well on the track towards getting your code included in the project.

Thanks much! Any questions, let me or any of the rest of the Nutch devs that hang out here know.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Apache Nutch being used at National Snow and Ice Data Center: ESIP Federation

Posted by Ian Truslove <ia...@nsidc.org>.
Chris: message received - I signed up :)

As part of Ruth's Libre project (http://nsidc.org/libre/) we are using
Nutch to find various types of XML data.  We're targeting our search at
geospatial data, and more specifically cryospheric data, but the tools
will remain more broadly applicable.  Specifically we are looking for ESIP
data casts, collection casts, service casts, and ESIP Discovery OpenSearch
services (all the specs are in
http://wiki.esipfed.org/index.php/Discovery_Cluster).  These XML documents
and services are characterizable through fairly simple means such as XML
namespaces.

We are currently developing against the Nutch 1.4 tarball distribution
(SVN HEAD was moving quicker than our configuration could keep up with)
and plugging into a standalone Solr instance.

What we have done to date is do some basic configuration work, set the
code up to play nice(-ish) with Eclipse, our internal SVN, and our
CI/deployment system, and write some plugins to help us find our various
XML docs.  We wrote a pair to extract and index the full raw XML content
of the source document, extending the HtmlParseFilter and IndexingFilter
respectively.  XML (and of course HTML too) are just wrapped within a
CDATA section (and CDATA sections within the document are just removed),
and indexed as a big text blob in Solr.  We can do naive text matching and
are having success extracting the URLs of the data feeds we're after.

We also wrote a pair of plugins to keep track of the original index date
of a document (the overarching use case is to determine documents that are
newly found).  We used the ScoringFilter and IndexingFilter for those.

Planned work includes extracting data from the XML before indexing and
using Solr fields more effectively, indexing GCMD keywords, simple spatial
subsetting, and tweaking the ranking algorithms to do a broad search to
identify good sites for deep data searches.

Thanks for the interest - it's been a fun project to work on so far, and
I'm sure we'd be happy to talk more or provide more details.

-Ian.



--
Ian Truslove
Senior Software Engineer
National Snow and Ice Data Center
University of Colorado
449 UCB,  Boulder, CO 80309






On 7/17/12 9:38 PM, "Mattmann, Chris A (388J)"
<ch...@jpl.nasa.gov> wrote:

>Hi Markus,
>
>Great question. I am CC'ing Ruth Duerr and Ian Truslove and Ruth Duerr at
>NSIDC -- maybe they
>can provide more information?
>
>Ruth, ian, please consider subcribing to dev@nutch.apache.org and/or
>user@nutch.apache.org
>by sending blank emails to:
>
>dev-subscribe@nutch.apache.org
>user-subscribe@nutch.apache.org
>
>To follow along in the conversation.
>
>Thanks all!
>
>Cheers,
>Chris
>
>On Jul 17, 2012, at 5:27 PM, Markus Jelsma wrote:
>
>> Cool!
>> 
>> What are they exactly doing with Apache Nutch? And, more interesting,
>>what non-standard stuff do they use?
>> 
>> Cheers
>> 
>> -----Original message-----
>>> From:Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>
>>> Sent: Tue 17-Jul-2012 21:29
>>> To: dev@nutch.apache.org
>>> Subject: Apache Nutch being used at National Snow and Ice Data Center:
>>>ESIP Federation
>>> 
>>> Hey Folks,
>>> 
>>> Ruth Duerr is presenting at today's ESIP Federation and Discovery
>>>Hackathon:
>>> 
>>> http://commons.esipfed.org/node/424
>>> 
>>> The U.S. National Snow and Ice Data Center (NSIDC) is deploying Apache
>>>Nutch and 
>>> Solr to support discovery of datasets (called "casting").
>>> 
>>> Really interesting stuff, and worth contacting Ruth and NSIDC if
>>>you're interested.
>>> I'm highly suggesting to to the NSIDC folks to try and contribute any
>>>updates or plugins
>>> they are making to the software upstream here to the ASF.
>>> 
>>> Thanks!
>>> 
>>> Cheers,
>>> Chris
>>> 
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 
>>> 
>
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Senior Computer Scientist
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 171-266B, Mailstop: 171-246
>Email: chris.a.mattmann@nasa.gov
>WWW:   http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Assistant Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Re: Apache Nutch being used at National Snow and Ice Data Center: ESIP Federation

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Markus,

Great question. I am CC'ing Ruth Duerr and Ian Truslove and Ruth Duerr at NSIDC -- maybe they
can provide more information?

Ruth, ian, please consider subcribing to dev@nutch.apache.org and/or user@nutch.apache.org
by sending blank emails to:

dev-subscribe@nutch.apache.org
user-subscribe@nutch.apache.org

To follow along in the conversation.

Thanks all!

Cheers,
Chris

On Jul 17, 2012, at 5:27 PM, Markus Jelsma wrote:

> Cool!
> 
> What are they exactly doing with Apache Nutch? And, more interesting, what non-standard stuff do they use?
> 
> Cheers
> 
> -----Original message-----
>> From:Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>
>> Sent: Tue 17-Jul-2012 21:29
>> To: dev@nutch.apache.org
>> Subject: Apache Nutch being used at National Snow and Ice Data Center: ESIP Federation
>> 
>> Hey Folks,
>> 
>> Ruth Duerr is presenting at today's ESIP Federation and Discovery Hackathon:
>> 
>> http://commons.esipfed.org/node/424
>> 
>> The U.S. National Snow and Ice Data Center (NSIDC) is deploying Apache Nutch and 
>> Solr to support discovery of datasets (called "casting").
>> 
>> Really interesting stuff, and worth contacting Ruth and NSIDC if you're interested.
>> I'm highly suggesting to to the NSIDC folks to try and contribute any updates or plugins
>> they are making to the software upstream here to the ASF.
>> 
>> Thanks!
>> 
>> Cheers,
>> Chris
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


RE: Apache Nutch being used at National Snow and Ice Data Center: ESIP Federation

Posted by Markus Jelsma <ma...@openindex.io>.
Cool!

What are they exactly doing with Apache Nutch? And, more interesting, what non-standard stuff do they use?

Cheers
 
-----Original message-----
> From:Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>
> Sent: Tue 17-Jul-2012 21:29
> To: dev@nutch.apache.org
> Subject: Apache Nutch being used at National Snow and Ice Data Center: ESIP Federation
> 
> Hey Folks,
> 
> Ruth Duerr is presenting at today's ESIP Federation and Discovery Hackathon:
> 
> http://commons.esipfed.org/node/424
> 
> The U.S. National Snow and Ice Data Center (NSIDC) is deploying Apache Nutch and 
> Solr to support discovery of datasets (called "casting").
> 
> Really interesting stuff, and worth contacting Ruth and NSIDC if you're interested.
> I'm highly suggesting to to the NSIDC folks to try and contribute any updates or plugins
> they are making to the software upstream here to the ASF.
> 
> Thanks!
> 
> Cheers,
> Chris
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
>