You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by lewis john mcgibbney <le...@apache.org> on 2017/09/06 20:57:43 UTC

Request for Review

Hi user@ and dev@,

As part of the Nutch Google Summer of Code effort this year, Omkar Reddy
and I have been working persistently throughout the summer months on the
Hadoop MapReduce API upgrade e.g. NUTCH-2375 Upgrade the code base from
org.apache.hadoop.mapred to org.apache.hadoop.mapreduce [0].
We believe we are now at a stage where this code is stable and should be
opened for widespread community review. It is a large patch, so the more
eyes we can get on this the better. Upgrading MapReduce API usage in Nutch
is long overdue so this will be a significant addition to the Nutch project.

The proposed pull request can be found at [1]. Please report any outcomes
back to the issue tracker at [1].

Thank you
Lewis

N.B. Please note that the official version of Apache Hadoop supported by
Nutch master branch at this time is 2.7.2.

[0] https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375
[1] https://github.com/apache/nutch/pull/188

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney

Re: Request for Review

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi,

short status of testing from my side:

- successfully run a small test crawl in local mode
  (only inject + few generate-fetch-parse-update cycles)

- crawling in distributed mode (on Hadoop cluster) fails,
  generator does not generate fetch lists:
    17/09/14 13:56:09 WARN crawl.Generator: Generator: 0 records selected for fetching, exiting ...

  I've retried generator with the current master: it's definitely related to the
  current NUTCH-2375 branch/PR. Afaics, this is due to not properly set configuration variables,
  changes are requested.


Best,
Sebastian



On 09/11/2017 08:06 AM, Omkar Reddy wrote:
> Hi,
> 
> Kenneth thank you for your appreciation. Please participate in the code review. As Lewis said the
> more eyes we get on this the better.
> 
> Sebastian please find the pull request here [0]. The code is stable with "ant clean runtime test"
> passing successfully. This is my first experience submitting a java patch at this scale. Please feel
> free to provide any suggestion. 
> 
> Everyone is welcome to test this code and review it.
> 
> Thanks,
> Omkar
> 
> [0] https://github.com/apache/nutch/pull/221 
> 
> On 11 September 2017 at 00:03, kenneth mcfarland <kennethpmcfarland@gmail.com
> <ma...@gmail.com>> wrote:
> 
>     Nice work Omkar, thumbs up from a fellow student.
> 
>     On Sep 10, 2017 10:37 AM, "Omkar Reddy" <omkarreddy2008@gmail.com
>     <ma...@gmail.com>> wrote:
> 
> 
>         Hi Sebastian,
> 
>         While squashing the pull request there was some mistake and the commits were deleted. I will
>         send a new pull request and keep you posted in this thread.
> 
>         Thanks,
>         ~Omkar
> 
>         > On 10-Sep-2017, at 11:01 PM, Sebastian Nagel <wastl.nagel@googlemail.com
>         <ma...@googlemail.com>> wrote:
>         >
>         > Hi,
>         >
>         > thanks, Omkar for your work!
>         >
>         > Just wanted to start testing, but looks like the pull request is lost.
>         >
>         > Thanks,
>         > Sebastian
>         >
>         >> On 09/06/2017 10:57 PM, lewis john mcgibbney wrote:
>         >> Hi user@ and dev@,
>         >>
>         >> As part of the Nutch Google Summer of Code effort this year, Omkar Reddy and I have been
>         working
>         >> persistently throughout the summer months on the Hadoop MapReduce API upgrade e.g. NUTCH-2375
>         >> Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce [0].
>         >> We believe we are now at a stage where this code is stable and should be opened for
>         widespread
>         >> community review. It is a large patch, so the more eyes we can get on this the better.
>         Upgrading
>         >> MapReduce API usage in Nutch is long overdue so this will be a significant addition to
>         the Nutch
>         >> project.
>         >>
>         >> The proposed pull request can be found at [1]. Please report any outcomes back to the
>         issue tracker
>         >> at [1].
>         >>
>         >> Thank you
>         >> Lewis
>         >>
>         >> N.B. Please note that the official version of Apache Hadoop supported by Nutch master
>         branch at this
>         >> time is 2.7.2.
>         >>
>         >> [0] https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375
>         <https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375>
>         >> [1] https://github.com/apache/nutch/pull/188 <https://github.com/apache/nutch/pull/188>
>         >>
>         >> --
>         >> http://home.apache.org/~lewismc/ <http://home.apache.org/~lewismc/>
>         >> @hectorMcSpector
>         >> http://www.linkedin.com/in/lmcgibbney <http://www.linkedin.com/in/lmcgibbney>
>         >
> 
> 


Re: Request for Review

Posted by Omkar Reddy <om...@gmail.com>.
Hi,

Kenneth thank you for your appreciation. Please participate in the code
review. As Lewis said the more eyes we get on this the better.

Sebastian please find the pull request here [0]. The code is stable with
"ant clean runtime test" passing successfully. This is my first experience
submitting a java patch at this scale. Please feel free to provide any
suggestion.

Everyone is welcome to test this code and review it.

Thanks,
Omkar

[0] https://github.com/apache/nutch/pull/221

On 11 September 2017 at 00:03, kenneth mcfarland <
kennethpmcfarland@gmail.com> wrote:

> Nice work Omkar, thumbs up from a fellow student.
>
> On Sep 10, 2017 10:37 AM, "Omkar Reddy" <om...@gmail.com> wrote:
>
>>
>> Hi Sebastian,
>>
>> While squashing the pull request there was some mistake and the commits
>> were deleted. I will send a new pull request and keep you posted in this
>> thread.
>>
>> Thanks,
>> ~Omkar
>>
>> > On 10-Sep-2017, at 11:01 PM, Sebastian Nagel <
>> wastl.nagel@googlemail.com> wrote:
>> >
>> > Hi,
>> >
>> > thanks, Omkar for your work!
>> >
>> > Just wanted to start testing, but looks like the pull request is lost.
>> >
>> > Thanks,
>> > Sebastian
>> >
>> >> On 09/06/2017 10:57 PM, lewis john mcgibbney wrote:
>> >> Hi user@ and dev@,
>> >>
>> >> As part of the Nutch Google Summer of Code effort this year, Omkar
>> Reddy and I have been working
>> >> persistently throughout the summer months on the Hadoop MapReduce API
>> upgrade e.g. NUTCH-2375
>> >> Upgrade the code base from org.apache.hadoop.mapred to
>> org.apache.hadoop.mapreduce [0].
>> >> We believe we are now at a stage where this code is stable and should
>> be opened for widespread
>> >> community review. It is a large patch, so the more eyes we can get on
>> this the better. Upgrading
>> >> MapReduce API usage in Nutch is long overdue so this will be a
>> significant addition to the Nutch
>> >> project.
>> >>
>> >> The proposed pull request can be found at [1]. Please report any
>> outcomes back to the issue tracker
>> >> at [1].
>> >>
>> >> Thank you
>> >> Lewis
>> >>
>> >> N.B. Please note that the official version of Apache Hadoop supported
>> by Nutch master branch at this
>> >> time is 2.7.2.
>> >>
>> >> [0] https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375
>> >> [1] https://github.com/apache/nutch/pull/188
>> >>
>> >> --
>> >> http://home.apache.org/~lewismc/
>> >> @hectorMcSpector
>> >> http://www.linkedin.com/in/lmcgibbney
>> >
>>
>

Re: Request for Review

Posted by kenneth mcfarland <ke...@gmail.com>.
Nice work Omkar, thumbs up from a fellow student.

On Sep 10, 2017 10:37 AM, "Omkar Reddy" <om...@gmail.com> wrote:

>
> Hi Sebastian,
>
> While squashing the pull request there was some mistake and the commits
> were deleted. I will send a new pull request and keep you posted in this
> thread.
>
> Thanks,
> ~Omkar
>
> > On 10-Sep-2017, at 11:01 PM, Sebastian Nagel <wa...@googlemail.com>
> wrote:
> >
> > Hi,
> >
> > thanks, Omkar for your work!
> >
> > Just wanted to start testing, but looks like the pull request is lost.
> >
> > Thanks,
> > Sebastian
> >
> >> On 09/06/2017 10:57 PM, lewis john mcgibbney wrote:
> >> Hi user@ and dev@,
> >>
> >> As part of the Nutch Google Summer of Code effort this year, Omkar
> Reddy and I have been working
> >> persistently throughout the summer months on the Hadoop MapReduce API
> upgrade e.g. NUTCH-2375
> >> Upgrade the code base from org.apache.hadoop.mapred to
> org.apache.hadoop.mapreduce [0].
> >> We believe we are now at a stage where this code is stable and should
> be opened for widespread
> >> community review. It is a large patch, so the more eyes we can get on
> this the better. Upgrading
> >> MapReduce API usage in Nutch is long overdue so this will be a
> significant addition to the Nutch
> >> project.
> >>
> >> The proposed pull request can be found at [1]. Please report any
> outcomes back to the issue tracker
> >> at [1].
> >>
> >> Thank you
> >> Lewis
> >>
> >> N.B. Please note that the official version of Apache Hadoop supported
> by Nutch master branch at this
> >> time is 2.7.2.
> >>
> >> [0] https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375
> >> [1] https://github.com/apache/nutch/pull/188
> >>
> >> --
> >> http://home.apache.org/~lewismc/
> >> @hectorMcSpector
> >> http://www.linkedin.com/in/lmcgibbney
> >
>

Re: Request for Review

Posted by Omkar Reddy <om...@gmail.com>.
Hi Sebastian, 

While squashing the pull request there was some mistake and the commits were deleted. I will send a new pull request and keep you posted in this thread. 

Thanks, 
~Omkar

> On 10-Sep-2017, at 11:01 PM, Sebastian Nagel <wa...@googlemail.com> wrote:
> 
> Hi,
> 
> thanks, Omkar for your work!
> 
> Just wanted to start testing, but looks like the pull request is lost.
> 
> Thanks,
> Sebastian
> 
>> On 09/06/2017 10:57 PM, lewis john mcgibbney wrote:
>> Hi user@ and dev@,
>> 
>> As part of the Nutch Google Summer of Code effort this year, Omkar Reddy and I have been working
>> persistently throughout the summer months on the Hadoop MapReduce API upgrade e.g. NUTCH-2375
>> Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce [0].
>> We believe we are now at a stage where this code is stable and should be opened for widespread
>> community review. It is a large patch, so the more eyes we can get on this the better. Upgrading
>> MapReduce API usage in Nutch is long overdue so this will be a significant addition to the Nutch
>> project.
>> 
>> The proposed pull request can be found at [1]. Please report any outcomes back to the issue tracker
>> at [1].
>> 
>> Thank you
>> Lewis
>> 
>> N.B. Please note that the official version of Apache Hadoop supported by Nutch master branch at this
>> time is 2.7.2.
>> 
>> [0] https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375
>> [1] https://github.com/apache/nutch/pull/188
>> 
>> -- 
>> http://home.apache.org/~lewismc/
>> @hectorMcSpector
>> http://www.linkedin.com/in/lmcgibbney
> 

Re: Request for Review

Posted by Omkar Reddy <om...@gmail.com>.
Hi Sebastian, 

While squashing the pull request there was some mistake and the commits were deleted. I will send a new pull request and keep you posted in this thread. 

Thanks, 
~Omkar

> On 10-Sep-2017, at 11:01 PM, Sebastian Nagel <wa...@googlemail.com> wrote:
> 
> Hi,
> 
> thanks, Omkar for your work!
> 
> Just wanted to start testing, but looks like the pull request is lost.
> 
> Thanks,
> Sebastian
> 
>> On 09/06/2017 10:57 PM, lewis john mcgibbney wrote:
>> Hi user@ and dev@,
>> 
>> As part of the Nutch Google Summer of Code effort this year, Omkar Reddy and I have been working
>> persistently throughout the summer months on the Hadoop MapReduce API upgrade e.g. NUTCH-2375
>> Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce [0].
>> We believe we are now at a stage where this code is stable and should be opened for widespread
>> community review. It is a large patch, so the more eyes we can get on this the better. Upgrading
>> MapReduce API usage in Nutch is long overdue so this will be a significant addition to the Nutch
>> project.
>> 
>> The proposed pull request can be found at [1]. Please report any outcomes back to the issue tracker
>> at [1].
>> 
>> Thank you
>> Lewis
>> 
>> N.B. Please note that the official version of Apache Hadoop supported by Nutch master branch at this
>> time is 2.7.2.
>> 
>> [0] https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375
>> [1] https://github.com/apache/nutch/pull/188
>> 
>> -- 
>> http://home.apache.org/~lewismc/
>> @hectorMcSpector
>> http://www.linkedin.com/in/lmcgibbney
> 

Re: Request for Review

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi,

thanks, Omkar for your work!

Just wanted to start testing, but looks like the pull request is lost.

Thanks,
Sebastian

On 09/06/2017 10:57 PM, lewis john mcgibbney wrote:
> Hi user@ and dev@,
> 
> As part of the Nutch Google Summer of Code effort this year, Omkar Reddy and I have been working
> persistently throughout the summer months on the Hadoop MapReduce API upgrade e.g. NUTCH-2375
> Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce [0].
> We believe we are now at a stage where this code is stable and should be opened for widespread
> community review. It is a large patch, so the more eyes we can get on this the better. Upgrading
> MapReduce API usage in Nutch is long overdue so this will be a significant addition to the Nutch
> project.
> 
> The proposed pull request can be found at [1]. Please report any outcomes back to the issue tracker
> at [1].
> 
> Thank you
> Lewis
> 
> N.B. Please note that the official version of Apache Hadoop supported by Nutch master branch at this
> time is 2.7.2.
> 
> [0] https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375
> [1] https://github.com/apache/nutch/pull/188
> 
> -- 
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney


Re: Request for Review

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi,

thanks, Omkar for your work!

Just wanted to start testing, but looks like the pull request is lost.

Thanks,
Sebastian

On 09/06/2017 10:57 PM, lewis john mcgibbney wrote:
> Hi user@ and dev@,
> 
> As part of the Nutch Google Summer of Code effort this year, Omkar Reddy and I have been working
> persistently throughout the summer months on the Hadoop MapReduce API upgrade e.g. NUTCH-2375
> Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce [0].
> We believe we are now at a stage where this code is stable and should be opened for widespread
> community review. It is a large patch, so the more eyes we can get on this the better. Upgrading
> MapReduce API usage in Nutch is long overdue so this will be a significant addition to the Nutch
> project.
> 
> The proposed pull request can be found at [1]. Please report any outcomes back to the issue tracker
> at [1].
> 
> Thank you
> Lewis
> 
> N.B. Please note that the official version of Apache Hadoop supported by Nutch master branch at this
> time is 2.7.2.
> 
> [0] https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375
> [1] https://github.com/apache/nutch/pull/188
> 
> -- 
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney