You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/02/05 02:17:34 UTC

[jira] [Updated] (NUTCH-1934) Refactor Fetcher in trunk

     [ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-1934:
----------------------------------------
    Attachment: NUTCH-1934.patch

Patch for trunk.
Some early observations:
 * Existing Nutch tests pass locally
 * The way I have approach this is to make explicit casts to existing fetchQueue objects as **FetcherThread** is now an independent Class. In my test crawling, i have come across no ClassCastExceptions (as of yet!!!) however this is something we should remain vigilant about e.g.
{code}
((FetchItemQueues) fetchQueues).getTotalSize()
{code}
 * We now have pretty verbose constructor for **FetcherThread** (hey whats new it's the Nutch Fetcher.java), however this is pretty verbose even by Nutch Fetcher.java standards.
{code}
  public FetcherThread(Configuration conf, AtomicInteger activeThreads, FetchItemQueues fetchQueues, 
      QueueFeeder feeder, AtomicInteger spinWaiting, AtomicLong lastRequestStart, Reporter reporter,
      AtomicInteger errors, String segmentName, boolean parsing, OutputCollector<Text, NutchWritable> output,
      boolean storingContent, AtomicInteger pages, AtomicLong bytes) {
{code}

Some initial comments would be very helpful. Thanks

> Refactor Fetcher in trunk
> -------------------------
>
>                 Key: NUTCH-1934
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1934
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.10
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>         Attachments: NUTCH-1934.patch
>
>
> Put simply [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java] is too big.
> This is kinda strange as the size of this file is unique (I think) from every other class within Nutch. The others are reasonably well modularized and split into constituent classes which make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)