You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Markus Jelsma <ma...@openindex.io> on 2011/03/18 16:31:34 UTC

Differences 1.x and trunk

Hi all,

I'm giving it a try to patch https://issues.apache.org/jira/browse/NUTCH-963 
to trunk after committing to 1.3. There are of course a lot of differences so 
i need a little advice on how to procede:

- instead of using CrawlDB and CrawlDatum we now need WebTableReader?
- trunk uses slf instead of commons logging now?
- a page is now represented by storage.WebPage?

Any more good advice on this one? I need it ;)

Cheers,
-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Differences 1.x and trunk

Posted by Markus Jelsma <ma...@openindex.io>.
Thanks! I'll try and come up with a working patch in the next few weeks orso.

On Friday 18 March 2011 16:57:20 Andrzej Bialecki wrote:
> On 3/18/11 4:31 PM, Markus Jelsma wrote:
> > Hi all,
> > 
> > I'm giving it a try to patch
> > https://issues.apache.org/jira/browse/NUTCH-963 to trunk after
> > committing to 1.3. There are of course a lot of differences so i need a
> > little advice on how to procede:
> > 
> > - instead of using CrawlDB and CrawlDatum we now need WebTableReader?
> 
> Actually you need to use StorageUtils to set up Mapper or Reducer
> contexts. See other tools, e.g. Fetcher or Generator.
> 
> > - trunk uses slf instead of commons logging now?
> 
> Yes.
> 
> > - a page is now represented by storage.WebPage?
> 
> Yes. When you prepare a Job you also need to specify what fields from
> WebPage you are interested in (and only these fields will be pulled in
> from the storage). This is all handled by StorageUtils methods.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Differences 1.x and trunk

Posted by Andrzej Bialecki <ab...@getopt.org>.
On 3/18/11 4:31 PM, Markus Jelsma wrote:
> Hi all,
>
> I'm giving it a try to patch https://issues.apache.org/jira/browse/NUTCH-963
> to trunk after committing to 1.3. There are of course a lot of differences so
> i need a little advice on how to procede:
>
> - instead of using CrawlDB and CrawlDatum we now need WebTableReader?

Actually you need to use StorageUtils to set up Mapper or Reducer 
contexts. See other tools, e.g. Fetcher or Generator.

> - trunk uses slf instead of commons logging now?

Yes.

> - a page is now represented by storage.WebPage?

Yes. When you prepare a Job you also need to specify what fields from 
WebPage you are interested in (and only these fields will be pulled in 
from the storage). This is all handled by StorageUtils methods.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com