You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Ayush Saxena <ay...@gmail.com> on 2022/01/06 05:04:56 UTC

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Moved to Dev lists.

Not sure about this though:
 when a PR is submitted to Nutch project it will run some MR job in Hadoop CI.

Whatever that PR requires should run as part of Nutch Infra. Why in Hadoop CI?
Our CI is already loaded with our own workloads.
If by any chance the above assertion gets a pass, then secondly we have very less number of people managing work related to CI and Infra. I don’t think most of the people won’t have context or say in the Nutch project, neither bandwidth to fix stuff if it gets broken.

Just my thoughts. Looped in the dev lists, if others have any feedback. As for the process, this would require a consensus from the Hadoop PMC

-Ayush

> On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org> wrote:
> 
> Hi general@,
> 
> Not sure if this is the correct mailing list. Please redirect me if there
> is a more suitable location. Thank you
> 
> I am PMC over on the Nutch project (https://nutch.apache.org). I would like
> to investigate whether we can build an integration testing capability for
> the project. This would involve running a Nutch integration test suite
> (collection of MR jobs) in a Hadoop CI environment. For example whenever a
> pull request is submitted to the Nutch project. This could easily be
> automated through Jenkins.
> 
> I’m not sure if this is something the Hadoop PMC would consider. Thank you
> for the consideration.
> 
> lewismc
> -- 
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Thank you for the response and for directing the conversation to the
correct places.
I may have misunderstood what ci-hadoop.apache.org actually is. We are
looking for a non-production Hadoop cluster which we can use to simulate
Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
is...
Instead it looks like lots of compute resources used to perform Jenkins
CI/CD tasks for Hadoop and associated projects rather than test things
on-top of Hadoop (and associated projects).
Any clarity on what ci-hadoop.apache.org actually is would be greatly
appreciated.
Thanks
lewismc

On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:

> Moved to Dev lists.
>
> Not sure about this though:
>  when a PR is submitted to Nutch project it will run some MR job in Hadoop
> CI.
>
> Whatever that PR requires should run as part of Nutch Infra. Why in Hadoop
> CI?
> Our CI is already loaded with our own workloads.
> If by any chance the above assertion gets a pass, then secondly we have
> very less number of people managing work related to CI and Infra. I don’t
> think most of the people won’t have context or say in the Nutch project,
> neither bandwidth to fix stuff if it gets broken.
>
> Just my thoughts. Looped in the dev lists, if others have any feedback. As
> for the process, this would require a consensus from the Hadoop PMC
>
> -Ayush
>
> > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
> wrote:
> >
> > Hi general@,
> >
> > Not sure if this is the correct mailing list. Please redirect me if there
> > is a more suitable location. Thank you
> >
> > I am PMC over on the Nutch project (https://nutch.apache.org). I would
> like
> > to investigate whether we can build an integration testing capability for
> > the project. This would involve running a Nutch integration test suite
> > (collection of MR jobs) in a Hadoop CI environment. For example whenever
> a
> > pull request is submitted to the Nutch project. This could easily be
> > automated through Jenkins.
> >
> > I’m not sure if this is something the Hadoop PMC would consider. Thank
> you
> > for the consideration.
> >
> > lewismc
> > --
> > http://home.apache.org/~lewismc/
> > http://people.apache.org/keys/committer/lewismc
>


-- 

*Lewis*
Dr. Lewis J. McGibbney Ph.D, B.Sc
*Skype*: lewis.john.mcgibbney

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by lewis john mcgibbney <le...@apache.org>.
Thank you for the information. I agree with you. I totally misunderstood
what ci-hadoop.a.o was.
Sorry for the noise.
lewismc

On Wed, Jan 5, 2022 at 21:56 Akira Ajisaka <aa...@apache.org> wrote:

> (Adding builds@)
>
> Hi Lewis,
>
> Nutch is already using ci-builds.apache.org, so I think Nutch can
> continue using it. ci-hadoop.apache.org provides almost the same
> functionality as ci-builds.apache.org and there is no non-production
> Hadoop cluster running there. Therefore moving to ci-hadoop does not make
> sense.
>
> Short history: In the past there were some jenkins hosts that were labeled
> for Hadoop and its related projects. After the migration to cloudbees, the
> labeled hosts are moved under ci-hadoop.apache.org.
>
> Thanks,
> Akira
>
>
> On Thu, Jan 6, 2022 at 2:20 PM lewis john mcgibbney <le...@apache.org>
> wrote:
>
>> Thank you for the response and for directing the conversation to the
>> correct places.
>> I may have misunderstood what ci-hadoop.apache.org actually is. We are
>> looking for a non-production Hadoop cluster which we can use to simulate
>> Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
>> is...
>> Instead it looks like lots of compute resources used to perform Jenkins
>> CI/CD tasks for Hadoop and associated projects rather than test things
>> on-top of Hadoop (and associated projects).
>> Any clarity on what ci-hadoop.apache.org actually is would be greatly
>> appreciated.
>>
>> Let me also clarify my language, rather than have the integration tests
>> run
>> on every PR, we could trigger the integration tests to be run by tagging a
>> Github bot i.e., "@nutchbot integration-test". Similar to what is done
>> with
>> Dependabot or conda-forge for anyon familiar with those mechanisms.
>>
>> Thanks for any advice or comments.
>> lewismc
>>
>> On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:
>>
>> > Moved to Dev lists.
>> >
>> > Not sure about this though:
>> >  when a PR is submitted to Nutch project it will run some MR job in
>> Hadoop
>> > CI.
>> >
>> > Whatever that PR requires should run as part of Nutch Infra. Why in
>> Hadoop
>> > CI?
>> > Our CI is already loaded with our own workloads.
>> > If by any chance the above assertion gets a pass, then secondly we have
>> > very less number of people managing work related to CI and Infra. I
>> don’t
>> > think most of the people won’t have context or say in the Nutch project,
>> > neither bandwidth to fix stuff if it gets broken.
>> >
>> > Just my thoughts. Looped in the dev lists, if others have any feedback.
>> As
>> > for the process, this would require a consensus from the Hadoop PMC
>> >
>> > -Ayush
>> >
>> > > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
>> > wrote:
>> > >
>> > > Hi general@,
>> > >
>> > > Not sure if this is the correct mailing list. Please redirect me if
>> there
>> > > is a more suitable location. Thank you
>> > >
>> > > I am PMC over on the Nutch project (https://nutch.apache.org). I
>> would
>> > like
>> > > to investigate whether we can build an integration testing capability
>> for
>> > > the project. This would involve running a Nutch integration test suite
>> > > (collection of MR jobs) in a Hadoop CI environment. For example
>> whenever
>> > a
>> > > pull request is submitted to the Nutch project. This could easily be
>> > > automated through Jenkins.
>> > >
>> > > I’m not sure if this is something the Hadoop PMC would consider. Thank
>> > you
>> > > for the consideration.
>> > >
>> > > lewismc
>> > > --
>> > > http://home.apache.org/~lewismc/
>> > > http://people.apache.org/keys/committer/lewismc
>> >
>>
>>
>> --
>> http://home.apache.org/~lewismc/
>> http://people.apache.org/keys/committer/lewismc
>>
> --
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by lewis john mcgibbney <le...@apache.org>.
Thank you for the information. I agree with you. I totally misunderstood
what ci-hadoop.a.o was.
Sorry for the noise.
lewismc

On Wed, Jan 5, 2022 at 21:56 Akira Ajisaka <aa...@apache.org> wrote:

> (Adding builds@)
>
> Hi Lewis,
>
> Nutch is already using ci-builds.apache.org, so I think Nutch can
> continue using it. ci-hadoop.apache.org provides almost the same
> functionality as ci-builds.apache.org and there is no non-production
> Hadoop cluster running there. Therefore moving to ci-hadoop does not make
> sense.
>
> Short history: In the past there were some jenkins hosts that were labeled
> for Hadoop and its related projects. After the migration to cloudbees, the
> labeled hosts are moved under ci-hadoop.apache.org.
>
> Thanks,
> Akira
>
>
> On Thu, Jan 6, 2022 at 2:20 PM lewis john mcgibbney <le...@apache.org>
> wrote:
>
>> Thank you for the response and for directing the conversation to the
>> correct places.
>> I may have misunderstood what ci-hadoop.apache.org actually is. We are
>> looking for a non-production Hadoop cluster which we can use to simulate
>> Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
>> is...
>> Instead it looks like lots of compute resources used to perform Jenkins
>> CI/CD tasks for Hadoop and associated projects rather than test things
>> on-top of Hadoop (and associated projects).
>> Any clarity on what ci-hadoop.apache.org actually is would be greatly
>> appreciated.
>>
>> Let me also clarify my language, rather than have the integration tests
>> run
>> on every PR, we could trigger the integration tests to be run by tagging a
>> Github bot i.e., "@nutchbot integration-test". Similar to what is done
>> with
>> Dependabot or conda-forge for anyon familiar with those mechanisms.
>>
>> Thanks for any advice or comments.
>> lewismc
>>
>> On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:
>>
>> > Moved to Dev lists.
>> >
>> > Not sure about this though:
>> >  when a PR is submitted to Nutch project it will run some MR job in
>> Hadoop
>> > CI.
>> >
>> > Whatever that PR requires should run as part of Nutch Infra. Why in
>> Hadoop
>> > CI?
>> > Our CI is already loaded with our own workloads.
>> > If by any chance the above assertion gets a pass, then secondly we have
>> > very less number of people managing work related to CI and Infra. I
>> don’t
>> > think most of the people won’t have context or say in the Nutch project,
>> > neither bandwidth to fix stuff if it gets broken.
>> >
>> > Just my thoughts. Looped in the dev lists, if others have any feedback.
>> As
>> > for the process, this would require a consensus from the Hadoop PMC
>> >
>> > -Ayush
>> >
>> > > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
>> > wrote:
>> > >
>> > > Hi general@,
>> > >
>> > > Not sure if this is the correct mailing list. Please redirect me if
>> there
>> > > is a more suitable location. Thank you
>> > >
>> > > I am PMC over on the Nutch project (https://nutch.apache.org). I
>> would
>> > like
>> > > to investigate whether we can build an integration testing capability
>> for
>> > > the project. This would involve running a Nutch integration test suite
>> > > (collection of MR jobs) in a Hadoop CI environment. For example
>> whenever
>> > a
>> > > pull request is submitted to the Nutch project. This could easily be
>> > > automated through Jenkins.
>> > >
>> > > I’m not sure if this is something the Hadoop PMC would consider. Thank
>> > you
>> > > for the consideration.
>> > >
>> > > lewismc
>> > > --
>> > > http://home.apache.org/~lewismc/
>> > > http://people.apache.org/keys/committer/lewismc
>> >
>>
>>
>> --
>> http://home.apache.org/~lewismc/
>> http://people.apache.org/keys/committer/lewismc
>>
> --
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by lewis john mcgibbney <le...@apache.org>.
Thank you for the information. I agree with you. I totally misunderstood
what ci-hadoop.a.o was.
Sorry for the noise.
lewismc

On Wed, Jan 5, 2022 at 21:56 Akira Ajisaka <aa...@apache.org> wrote:

> (Adding builds@)
>
> Hi Lewis,
>
> Nutch is already using ci-builds.apache.org, so I think Nutch can
> continue using it. ci-hadoop.apache.org provides almost the same
> functionality as ci-builds.apache.org and there is no non-production
> Hadoop cluster running there. Therefore moving to ci-hadoop does not make
> sense.
>
> Short history: In the past there were some jenkins hosts that were labeled
> for Hadoop and its related projects. After the migration to cloudbees, the
> labeled hosts are moved under ci-hadoop.apache.org.
>
> Thanks,
> Akira
>
>
> On Thu, Jan 6, 2022 at 2:20 PM lewis john mcgibbney <le...@apache.org>
> wrote:
>
>> Thank you for the response and for directing the conversation to the
>> correct places.
>> I may have misunderstood what ci-hadoop.apache.org actually is. We are
>> looking for a non-production Hadoop cluster which we can use to simulate
>> Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
>> is...
>> Instead it looks like lots of compute resources used to perform Jenkins
>> CI/CD tasks for Hadoop and associated projects rather than test things
>> on-top of Hadoop (and associated projects).
>> Any clarity on what ci-hadoop.apache.org actually is would be greatly
>> appreciated.
>>
>> Let me also clarify my language, rather than have the integration tests
>> run
>> on every PR, we could trigger the integration tests to be run by tagging a
>> Github bot i.e., "@nutchbot integration-test". Similar to what is done
>> with
>> Dependabot or conda-forge for anyon familiar with those mechanisms.
>>
>> Thanks for any advice or comments.
>> lewismc
>>
>> On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:
>>
>> > Moved to Dev lists.
>> >
>> > Not sure about this though:
>> >  when a PR is submitted to Nutch project it will run some MR job in
>> Hadoop
>> > CI.
>> >
>> > Whatever that PR requires should run as part of Nutch Infra. Why in
>> Hadoop
>> > CI?
>> > Our CI is already loaded with our own workloads.
>> > If by any chance the above assertion gets a pass, then secondly we have
>> > very less number of people managing work related to CI and Infra. I
>> don’t
>> > think most of the people won’t have context or say in the Nutch project,
>> > neither bandwidth to fix stuff if it gets broken.
>> >
>> > Just my thoughts. Looped in the dev lists, if others have any feedback.
>> As
>> > for the process, this would require a consensus from the Hadoop PMC
>> >
>> > -Ayush
>> >
>> > > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
>> > wrote:
>> > >
>> > > Hi general@,
>> > >
>> > > Not sure if this is the correct mailing list. Please redirect me if
>> there
>> > > is a more suitable location. Thank you
>> > >
>> > > I am PMC over on the Nutch project (https://nutch.apache.org). I
>> would
>> > like
>> > > to investigate whether we can build an integration testing capability
>> for
>> > > the project. This would involve running a Nutch integration test suite
>> > > (collection of MR jobs) in a Hadoop CI environment. For example
>> whenever
>> > a
>> > > pull request is submitted to the Nutch project. This could easily be
>> > > automated through Jenkins.
>> > >
>> > > I’m not sure if this is something the Hadoop PMC would consider. Thank
>> > you
>> > > for the consideration.
>> > >
>> > > lewismc
>> > > --
>> > > http://home.apache.org/~lewismc/
>> > > http://people.apache.org/keys/committer/lewismc
>> >
>>
>>
>> --
>> http://home.apache.org/~lewismc/
>> http://people.apache.org/keys/committer/lewismc
>>
> --
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by lewis john mcgibbney <le...@apache.org>.
Thank you for the information. I agree with you. I totally misunderstood
what ci-hadoop.a.o was.
Sorry for the noise.
lewismc

On Wed, Jan 5, 2022 at 21:56 Akira Ajisaka <aa...@apache.org> wrote:

> (Adding builds@)
>
> Hi Lewis,
>
> Nutch is already using ci-builds.apache.org, so I think Nutch can
> continue using it. ci-hadoop.apache.org provides almost the same
> functionality as ci-builds.apache.org and there is no non-production
> Hadoop cluster running there. Therefore moving to ci-hadoop does not make
> sense.
>
> Short history: In the past there were some jenkins hosts that were labeled
> for Hadoop and its related projects. After the migration to cloudbees, the
> labeled hosts are moved under ci-hadoop.apache.org.
>
> Thanks,
> Akira
>
>
> On Thu, Jan 6, 2022 at 2:20 PM lewis john mcgibbney <le...@apache.org>
> wrote:
>
>> Thank you for the response and for directing the conversation to the
>> correct places.
>> I may have misunderstood what ci-hadoop.apache.org actually is. We are
>> looking for a non-production Hadoop cluster which we can use to simulate
>> Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
>> is...
>> Instead it looks like lots of compute resources used to perform Jenkins
>> CI/CD tasks for Hadoop and associated projects rather than test things
>> on-top of Hadoop (and associated projects).
>> Any clarity on what ci-hadoop.apache.org actually is would be greatly
>> appreciated.
>>
>> Let me also clarify my language, rather than have the integration tests
>> run
>> on every PR, we could trigger the integration tests to be run by tagging a
>> Github bot i.e., "@nutchbot integration-test". Similar to what is done
>> with
>> Dependabot or conda-forge for anyon familiar with those mechanisms.
>>
>> Thanks for any advice or comments.
>> lewismc
>>
>> On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:
>>
>> > Moved to Dev lists.
>> >
>> > Not sure about this though:
>> >  when a PR is submitted to Nutch project it will run some MR job in
>> Hadoop
>> > CI.
>> >
>> > Whatever that PR requires should run as part of Nutch Infra. Why in
>> Hadoop
>> > CI?
>> > Our CI is already loaded with our own workloads.
>> > If by any chance the above assertion gets a pass, then secondly we have
>> > very less number of people managing work related to CI and Infra. I
>> don’t
>> > think most of the people won’t have context or say in the Nutch project,
>> > neither bandwidth to fix stuff if it gets broken.
>> >
>> > Just my thoughts. Looped in the dev lists, if others have any feedback.
>> As
>> > for the process, this would require a consensus from the Hadoop PMC
>> >
>> > -Ayush
>> >
>> > > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
>> > wrote:
>> > >
>> > > Hi general@,
>> > >
>> > > Not sure if this is the correct mailing list. Please redirect me if
>> there
>> > > is a more suitable location. Thank you
>> > >
>> > > I am PMC over on the Nutch project (https://nutch.apache.org). I
>> would
>> > like
>> > > to investigate whether we can build an integration testing capability
>> for
>> > > the project. This would involve running a Nutch integration test suite
>> > > (collection of MR jobs) in a Hadoop CI environment. For example
>> whenever
>> > a
>> > > pull request is submitted to the Nutch project. This could easily be
>> > > automated through Jenkins.
>> > >
>> > > I’m not sure if this is something the Hadoop PMC would consider. Thank
>> > you
>> > > for the consideration.
>> > >
>> > > lewismc
>> > > --
>> > > http://home.apache.org/~lewismc/
>> > > http://people.apache.org/keys/committer/lewismc
>> >
>>
>>
>> --
>> http://home.apache.org/~lewismc/
>> http://people.apache.org/keys/committer/lewismc
>>
> --
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by lewis john mcgibbney <le...@apache.org>.
Thank you for the information. I agree with you. I totally misunderstood
what ci-hadoop.a.o was.
Sorry for the noise.
lewismc

On Wed, Jan 5, 2022 at 21:56 Akira Ajisaka <aa...@apache.org> wrote:

> (Adding builds@)
>
> Hi Lewis,
>
> Nutch is already using ci-builds.apache.org, so I think Nutch can
> continue using it. ci-hadoop.apache.org provides almost the same
> functionality as ci-builds.apache.org and there is no non-production
> Hadoop cluster running there. Therefore moving to ci-hadoop does not make
> sense.
>
> Short history: In the past there were some jenkins hosts that were labeled
> for Hadoop and its related projects. After the migration to cloudbees, the
> labeled hosts are moved under ci-hadoop.apache.org.
>
> Thanks,
> Akira
>
>
> On Thu, Jan 6, 2022 at 2:20 PM lewis john mcgibbney <le...@apache.org>
> wrote:
>
>> Thank you for the response and for directing the conversation to the
>> correct places.
>> I may have misunderstood what ci-hadoop.apache.org actually is. We are
>> looking for a non-production Hadoop cluster which we can use to simulate
>> Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
>> is...
>> Instead it looks like lots of compute resources used to perform Jenkins
>> CI/CD tasks for Hadoop and associated projects rather than test things
>> on-top of Hadoop (and associated projects).
>> Any clarity on what ci-hadoop.apache.org actually is would be greatly
>> appreciated.
>>
>> Let me also clarify my language, rather than have the integration tests
>> run
>> on every PR, we could trigger the integration tests to be run by tagging a
>> Github bot i.e., "@nutchbot integration-test". Similar to what is done
>> with
>> Dependabot or conda-forge for anyon familiar with those mechanisms.
>>
>> Thanks for any advice or comments.
>> lewismc
>>
>> On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:
>>
>> > Moved to Dev lists.
>> >
>> > Not sure about this though:
>> >  when a PR is submitted to Nutch project it will run some MR job in
>> Hadoop
>> > CI.
>> >
>> > Whatever that PR requires should run as part of Nutch Infra. Why in
>> Hadoop
>> > CI?
>> > Our CI is already loaded with our own workloads.
>> > If by any chance the above assertion gets a pass, then secondly we have
>> > very less number of people managing work related to CI and Infra. I
>> don’t
>> > think most of the people won’t have context or say in the Nutch project,
>> > neither bandwidth to fix stuff if it gets broken.
>> >
>> > Just my thoughts. Looped in the dev lists, if others have any feedback.
>> As
>> > for the process, this would require a consensus from the Hadoop PMC
>> >
>> > -Ayush
>> >
>> > > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
>> > wrote:
>> > >
>> > > Hi general@,
>> > >
>> > > Not sure if this is the correct mailing list. Please redirect me if
>> there
>> > > is a more suitable location. Thank you
>> > >
>> > > I am PMC over on the Nutch project (https://nutch.apache.org). I
>> would
>> > like
>> > > to investigate whether we can build an integration testing capability
>> for
>> > > the project. This would involve running a Nutch integration test suite
>> > > (collection of MR jobs) in a Hadoop CI environment. For example
>> whenever
>> > a
>> > > pull request is submitted to the Nutch project. This could easily be
>> > > automated through Jenkins.
>> > >
>> > > I’m not sure if this is something the Hadoop PMC would consider. Thank
>> > you
>> > > for the consideration.
>> > >
>> > > lewismc
>> > > --
>> > > http://home.apache.org/~lewismc/
>> > > http://people.apache.org/keys/committer/lewismc
>> >
>>
>>
>> --
>> http://home.apache.org/~lewismc/
>> http://people.apache.org/keys/committer/lewismc
>>
> --
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by Akira Ajisaka <aa...@apache.org>.
(Adding builds@)

Hi Lewis,

Nutch is already using ci-builds.apache.org, so I think Nutch can continue
using it. ci-hadoop.apache.org provides almost the same functionality as
ci-builds.apache.org and there is no non-production Hadoop cluster running
there. Therefore moving to ci-hadoop does not make sense.

Short history: In the past there were some jenkins hosts that were labeled
for Hadoop and its related projects. After the migration to cloudbees, the
labeled hosts are moved under ci-hadoop.apache.org.

Thanks,
Akira


On Thu, Jan 6, 2022 at 2:20 PM lewis john mcgibbney <le...@apache.org>
wrote:

> Thank you for the response and for directing the conversation to the
> correct places.
> I may have misunderstood what ci-hadoop.apache.org actually is. We are
> looking for a non-production Hadoop cluster which we can use to simulate
> Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
> is...
> Instead it looks like lots of compute resources used to perform Jenkins
> CI/CD tasks for Hadoop and associated projects rather than test things
> on-top of Hadoop (and associated projects).
> Any clarity on what ci-hadoop.apache.org actually is would be greatly
> appreciated.
>
> Let me also clarify my language, rather than have the integration tests run
> on every PR, we could trigger the integration tests to be run by tagging a
> Github bot i.e., "@nutchbot integration-test". Similar to what is done with
> Dependabot or conda-forge for anyon familiar with those mechanisms.
>
> Thanks for any advice or comments.
> lewismc
>
> On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:
>
> > Moved to Dev lists.
> >
> > Not sure about this though:
> >  when a PR is submitted to Nutch project it will run some MR job in
> Hadoop
> > CI.
> >
> > Whatever that PR requires should run as part of Nutch Infra. Why in
> Hadoop
> > CI?
> > Our CI is already loaded with our own workloads.
> > If by any chance the above assertion gets a pass, then secondly we have
> > very less number of people managing work related to CI and Infra. I don’t
> > think most of the people won’t have context or say in the Nutch project,
> > neither bandwidth to fix stuff if it gets broken.
> >
> > Just my thoughts. Looped in the dev lists, if others have any feedback.
> As
> > for the process, this would require a consensus from the Hadoop PMC
> >
> > -Ayush
> >
> > > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
> > wrote:
> > >
> > > Hi general@,
> > >
> > > Not sure if this is the correct mailing list. Please redirect me if
> there
> > > is a more suitable location. Thank you
> > >
> > > I am PMC over on the Nutch project (https://nutch.apache.org). I would
> > like
> > > to investigate whether we can build an integration testing capability
> for
> > > the project. This would involve running a Nutch integration test suite
> > > (collection of MR jobs) in a Hadoop CI environment. For example
> whenever
> > a
> > > pull request is submitted to the Nutch project. This could easily be
> > > automated through Jenkins.
> > >
> > > I’m not sure if this is something the Hadoop PMC would consider. Thank
> > you
> > > for the consideration.
> > >
> > > lewismc
> > > --
> > > http://home.apache.org/~lewismc/
> > > http://people.apache.org/keys/committer/lewismc
> >
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by Akira Ajisaka <aa...@apache.org>.
(Adding builds@)

Hi Lewis,

Nutch is already using ci-builds.apache.org, so I think Nutch can continue
using it. ci-hadoop.apache.org provides almost the same functionality as
ci-builds.apache.org and there is no non-production Hadoop cluster running
there. Therefore moving to ci-hadoop does not make sense.

Short history: In the past there were some jenkins hosts that were labeled
for Hadoop and its related projects. After the migration to cloudbees, the
labeled hosts are moved under ci-hadoop.apache.org.

Thanks,
Akira


On Thu, Jan 6, 2022 at 2:20 PM lewis john mcgibbney <le...@apache.org>
wrote:

> Thank you for the response and for directing the conversation to the
> correct places.
> I may have misunderstood what ci-hadoop.apache.org actually is. We are
> looking for a non-production Hadoop cluster which we can use to simulate
> Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
> is...
> Instead it looks like lots of compute resources used to perform Jenkins
> CI/CD tasks for Hadoop and associated projects rather than test things
> on-top of Hadoop (and associated projects).
> Any clarity on what ci-hadoop.apache.org actually is would be greatly
> appreciated.
>
> Let me also clarify my language, rather than have the integration tests run
> on every PR, we could trigger the integration tests to be run by tagging a
> Github bot i.e., "@nutchbot integration-test". Similar to what is done with
> Dependabot or conda-forge for anyon familiar with those mechanisms.
>
> Thanks for any advice or comments.
> lewismc
>
> On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:
>
> > Moved to Dev lists.
> >
> > Not sure about this though:
> >  when a PR is submitted to Nutch project it will run some MR job in
> Hadoop
> > CI.
> >
> > Whatever that PR requires should run as part of Nutch Infra. Why in
> Hadoop
> > CI?
> > Our CI is already loaded with our own workloads.
> > If by any chance the above assertion gets a pass, then secondly we have
> > very less number of people managing work related to CI and Infra. I don’t
> > think most of the people won’t have context or say in the Nutch project,
> > neither bandwidth to fix stuff if it gets broken.
> >
> > Just my thoughts. Looped in the dev lists, if others have any feedback.
> As
> > for the process, this would require a consensus from the Hadoop PMC
> >
> > -Ayush
> >
> > > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
> > wrote:
> > >
> > > Hi general@,
> > >
> > > Not sure if this is the correct mailing list. Please redirect me if
> there
> > > is a more suitable location. Thank you
> > >
> > > I am PMC over on the Nutch project (https://nutch.apache.org). I would
> > like
> > > to investigate whether we can build an integration testing capability
> for
> > > the project. This would involve running a Nutch integration test suite
> > > (collection of MR jobs) in a Hadoop CI environment. For example
> whenever
> > a
> > > pull request is submitted to the Nutch project. This could easily be
> > > automated through Jenkins.
> > >
> > > I’m not sure if this is something the Hadoop PMC would consider. Thank
> > you
> > > for the consideration.
> > >
> > > lewismc
> > > --
> > > http://home.apache.org/~lewismc/
> > > http://people.apache.org/keys/committer/lewismc
> >
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by Akira Ajisaka <aa...@apache.org>.
(Adding builds@)

Hi Lewis,

Nutch is already using ci-builds.apache.org, so I think Nutch can continue
using it. ci-hadoop.apache.org provides almost the same functionality as
ci-builds.apache.org and there is no non-production Hadoop cluster running
there. Therefore moving to ci-hadoop does not make sense.

Short history: In the past there were some jenkins hosts that were labeled
for Hadoop and its related projects. After the migration to cloudbees, the
labeled hosts are moved under ci-hadoop.apache.org.

Thanks,
Akira


On Thu, Jan 6, 2022 at 2:20 PM lewis john mcgibbney <le...@apache.org>
wrote:

> Thank you for the response and for directing the conversation to the
> correct places.
> I may have misunderstood what ci-hadoop.apache.org actually is. We are
> looking for a non-production Hadoop cluster which we can use to simulate
> Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
> is...
> Instead it looks like lots of compute resources used to perform Jenkins
> CI/CD tasks for Hadoop and associated projects rather than test things
> on-top of Hadoop (and associated projects).
> Any clarity on what ci-hadoop.apache.org actually is would be greatly
> appreciated.
>
> Let me also clarify my language, rather than have the integration tests run
> on every PR, we could trigger the integration tests to be run by tagging a
> Github bot i.e., "@nutchbot integration-test". Similar to what is done with
> Dependabot or conda-forge for anyon familiar with those mechanisms.
>
> Thanks for any advice or comments.
> lewismc
>
> On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:
>
> > Moved to Dev lists.
> >
> > Not sure about this though:
> >  when a PR is submitted to Nutch project it will run some MR job in
> Hadoop
> > CI.
> >
> > Whatever that PR requires should run as part of Nutch Infra. Why in
> Hadoop
> > CI?
> > Our CI is already loaded with our own workloads.
> > If by any chance the above assertion gets a pass, then secondly we have
> > very less number of people managing work related to CI and Infra. I don’t
> > think most of the people won’t have context or say in the Nutch project,
> > neither bandwidth to fix stuff if it gets broken.
> >
> > Just my thoughts. Looped in the dev lists, if others have any feedback.
> As
> > for the process, this would require a consensus from the Hadoop PMC
> >
> > -Ayush
> >
> > > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
> > wrote:
> > >
> > > Hi general@,
> > >
> > > Not sure if this is the correct mailing list. Please redirect me if
> there
> > > is a more suitable location. Thank you
> > >
> > > I am PMC over on the Nutch project (https://nutch.apache.org). I would
> > like
> > > to investigate whether we can build an integration testing capability
> for
> > > the project. This would involve running a Nutch integration test suite
> > > (collection of MR jobs) in a Hadoop CI environment. For example
> whenever
> > a
> > > pull request is submitted to the Nutch project. This could easily be
> > > automated through Jenkins.
> > >
> > > I’m not sure if this is something the Hadoop PMC would consider. Thank
> > you
> > > for the consideration.
> > >
> > > lewismc
> > > --
> > > http://home.apache.org/~lewismc/
> > > http://people.apache.org/keys/committer/lewismc
> >
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by Akira Ajisaka <aa...@apache.org>.
(Adding builds@)

Hi Lewis,

Nutch is already using ci-builds.apache.org, so I think Nutch can continue
using it. ci-hadoop.apache.org provides almost the same functionality as
ci-builds.apache.org and there is no non-production Hadoop cluster running
there. Therefore moving to ci-hadoop does not make sense.

Short history: In the past there were some jenkins hosts that were labeled
for Hadoop and its related projects. After the migration to cloudbees, the
labeled hosts are moved under ci-hadoop.apache.org.

Thanks,
Akira


On Thu, Jan 6, 2022 at 2:20 PM lewis john mcgibbney <le...@apache.org>
wrote:

> Thank you for the response and for directing the conversation to the
> correct places.
> I may have misunderstood what ci-hadoop.apache.org actually is. We are
> looking for a non-production Hadoop cluster which we can use to simulate
> Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
> is...
> Instead it looks like lots of compute resources used to perform Jenkins
> CI/CD tasks for Hadoop and associated projects rather than test things
> on-top of Hadoop (and associated projects).
> Any clarity on what ci-hadoop.apache.org actually is would be greatly
> appreciated.
>
> Let me also clarify my language, rather than have the integration tests run
> on every PR, we could trigger the integration tests to be run by tagging a
> Github bot i.e., "@nutchbot integration-test". Similar to what is done with
> Dependabot or conda-forge for anyon familiar with those mechanisms.
>
> Thanks for any advice or comments.
> lewismc
>
> On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:
>
> > Moved to Dev lists.
> >
> > Not sure about this though:
> >  when a PR is submitted to Nutch project it will run some MR job in
> Hadoop
> > CI.
> >
> > Whatever that PR requires should run as part of Nutch Infra. Why in
> Hadoop
> > CI?
> > Our CI is already loaded with our own workloads.
> > If by any chance the above assertion gets a pass, then secondly we have
> > very less number of people managing work related to CI and Infra. I don’t
> > think most of the people won’t have context or say in the Nutch project,
> > neither bandwidth to fix stuff if it gets broken.
> >
> > Just my thoughts. Looped in the dev lists, if others have any feedback.
> As
> > for the process, this would require a consensus from the Hadoop PMC
> >
> > -Ayush
> >
> > > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
> > wrote:
> > >
> > > Hi general@,
> > >
> > > Not sure if this is the correct mailing list. Please redirect me if
> there
> > > is a more suitable location. Thank you
> > >
> > > I am PMC over on the Nutch project (https://nutch.apache.org). I would
> > like
> > > to investigate whether we can build an integration testing capability
> for
> > > the project. This would involve running a Nutch integration test suite
> > > (collection of MR jobs) in a Hadoop CI environment. For example
> whenever
> > a
> > > pull request is submitted to the Nutch project. This could easily be
> > > automated through Jenkins.
> > >
> > > I’m not sure if this is something the Hadoop PMC would consider. Thank
> > you
> > > for the consideration.
> > >
> > > lewismc
> > > --
> > > http://home.apache.org/~lewismc/
> > > http://people.apache.org/keys/committer/lewismc
> >
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by Akira Ajisaka <aa...@apache.org>.
(Adding builds@)

Hi Lewis,

Nutch is already using ci-builds.apache.org, so I think Nutch can continue
using it. ci-hadoop.apache.org provides almost the same functionality as
ci-builds.apache.org and there is no non-production Hadoop cluster running
there. Therefore moving to ci-hadoop does not make sense.

Short history: In the past there were some jenkins hosts that were labeled
for Hadoop and its related projects. After the migration to cloudbees, the
labeled hosts are moved under ci-hadoop.apache.org.

Thanks,
Akira


On Thu, Jan 6, 2022 at 2:20 PM lewis john mcgibbney <le...@apache.org>
wrote:

> Thank you for the response and for directing the conversation to the
> correct places.
> I may have misunderstood what ci-hadoop.apache.org actually is. We are
> looking for a non-production Hadoop cluster which we can use to simulate
> Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
> is...
> Instead it looks like lots of compute resources used to perform Jenkins
> CI/CD tasks for Hadoop and associated projects rather than test things
> on-top of Hadoop (and associated projects).
> Any clarity on what ci-hadoop.apache.org actually is would be greatly
> appreciated.
>
> Let me also clarify my language, rather than have the integration tests run
> on every PR, we could trigger the integration tests to be run by tagging a
> Github bot i.e., "@nutchbot integration-test". Similar to what is done with
> Dependabot or conda-forge for anyon familiar with those mechanisms.
>
> Thanks for any advice or comments.
> lewismc
>
> On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:
>
> > Moved to Dev lists.
> >
> > Not sure about this though:
> >  when a PR is submitted to Nutch project it will run some MR job in
> Hadoop
> > CI.
> >
> > Whatever that PR requires should run as part of Nutch Infra. Why in
> Hadoop
> > CI?
> > Our CI is already loaded with our own workloads.
> > If by any chance the above assertion gets a pass, then secondly we have
> > very less number of people managing work related to CI and Infra. I don’t
> > think most of the people won’t have context or say in the Nutch project,
> > neither bandwidth to fix stuff if it gets broken.
> >
> > Just my thoughts. Looped in the dev lists, if others have any feedback.
> As
> > for the process, this would require a consensus from the Hadoop PMC
> >
> > -Ayush
> >
> > > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
> > wrote:
> > >
> > > Hi general@,
> > >
> > > Not sure if this is the correct mailing list. Please redirect me if
> there
> > > is a more suitable location. Thank you
> > >
> > > I am PMC over on the Nutch project (https://nutch.apache.org). I would
> > like
> > > to investigate whether we can build an integration testing capability
> for
> > > the project. This would involve running a Nutch integration test suite
> > > (collection of MR jobs) in a Hadoop CI environment. For example
> whenever
> > a
> > > pull request is submitted to the Nutch project. This could easily be
> > > automated through Jenkins.
> > >
> > > I’m not sure if this is something the Hadoop PMC would consider. Thank
> > you
> > > for the consideration.
> > >
> > > lewismc
> > > --
> > > http://home.apache.org/~lewismc/
> > > http://people.apache.org/keys/committer/lewismc
> >
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by lewis john mcgibbney <le...@apache.org>.
Thank you for the response and for directing the conversation to the
correct places.
I may have misunderstood what ci-hadoop.apache.org actually is. We are
looking for a non-production Hadoop cluster which we can use to simulate
Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
is...
Instead it looks like lots of compute resources used to perform Jenkins
CI/CD tasks for Hadoop and associated projects rather than test things
on-top of Hadoop (and associated projects).
Any clarity on what ci-hadoop.apache.org actually is would be greatly
appreciated.

Let me also clarify my language, rather than have the integration tests run
on every PR, we could trigger the integration tests to be run by tagging a
Github bot i.e., "@nutchbot integration-test". Similar to what is done with
Dependabot or conda-forge for anyon familiar with those mechanisms.

Thanks for any advice or comments.
lewismc

On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:

> Moved to Dev lists.
>
> Not sure about this though:
>  when a PR is submitted to Nutch project it will run some MR job in Hadoop
> CI.
>
> Whatever that PR requires should run as part of Nutch Infra. Why in Hadoop
> CI?
> Our CI is already loaded with our own workloads.
> If by any chance the above assertion gets a pass, then secondly we have
> very less number of people managing work related to CI and Infra. I don’t
> think most of the people won’t have context or say in the Nutch project,
> neither bandwidth to fix stuff if it gets broken.
>
> Just my thoughts. Looped in the dev lists, if others have any feedback. As
> for the process, this would require a consensus from the Hadoop PMC
>
> -Ayush
>
> > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
> wrote:
> >
> > Hi general@,
> >
> > Not sure if this is the correct mailing list. Please redirect me if there
> > is a more suitable location. Thank you
> >
> > I am PMC over on the Nutch project (https://nutch.apache.org). I would
> like
> > to investigate whether we can build an integration testing capability for
> > the project. This would involve running a Nutch integration test suite
> > (collection of MR jobs) in a Hadoop CI environment. For example whenever
> a
> > pull request is submitted to the Nutch project. This could easily be
> > automated through Jenkins.
> >
> > I’m not sure if this is something the Hadoop PMC would consider. Thank
> you
> > for the consideration.
> >
> > lewismc
> > --
> > http://home.apache.org/~lewismc/
> > http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by lewis john mcgibbney <le...@apache.org>.
Thank you for the response and for directing the conversation to the
correct places.
I may have misunderstood what ci-hadoop.apache.org actually is. We are
looking for a non-production Hadoop cluster which we can use to simulate
Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
is...
Instead it looks like lots of compute resources used to perform Jenkins
CI/CD tasks for Hadoop and associated projects rather than test things
on-top of Hadoop (and associated projects).
Any clarity on what ci-hadoop.apache.org actually is would be greatly
appreciated.

Let me also clarify my language, rather than have the integration tests run
on every PR, we could trigger the integration tests to be run by tagging a
Github bot i.e., "@nutchbot integration-test". Similar to what is done with
Dependabot or conda-forge for anyon familiar with those mechanisms.

Thanks for any advice or comments.
lewismc

On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:

> Moved to Dev lists.
>
> Not sure about this though:
>  when a PR is submitted to Nutch project it will run some MR job in Hadoop
> CI.
>
> Whatever that PR requires should run as part of Nutch Infra. Why in Hadoop
> CI?
> Our CI is already loaded with our own workloads.
> If by any chance the above assertion gets a pass, then secondly we have
> very less number of people managing work related to CI and Infra. I don’t
> think most of the people won’t have context or say in the Nutch project,
> neither bandwidth to fix stuff if it gets broken.
>
> Just my thoughts. Looped in the dev lists, if others have any feedback. As
> for the process, this would require a consensus from the Hadoop PMC
>
> -Ayush
>
> > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
> wrote:
> >
> > Hi general@,
> >
> > Not sure if this is the correct mailing list. Please redirect me if there
> > is a more suitable location. Thank you
> >
> > I am PMC over on the Nutch project (https://nutch.apache.org). I would
> like
> > to investigate whether we can build an integration testing capability for
> > the project. This would involve running a Nutch integration test suite
> > (collection of MR jobs) in a Hadoop CI environment. For example whenever
> a
> > pull request is submitted to the Nutch project. This could easily be
> > automated through Jenkins.
> >
> > I’m not sure if this is something the Hadoop PMC would consider. Thank
> you
> > for the consideration.
> >
> > lewismc
> > --
> > http://home.apache.org/~lewismc/
> > http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Thank you for the response and for directing the conversation to the
correct places.
I may have misunderstood what ci-hadoop.apache.org actually is. We are
looking for a non-production Hadoop cluster which we can use to simulate
Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
is...
Instead it looks like lots of compute resources used to perform Jenkins
CI/CD tasks for Hadoop and associated projects rather than test things
on-top of Hadoop (and associated projects).
Any clarity on what ci-hadoop.apache.org actually is would be greatly
appreciated.
Thanks
lewismc

On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:

> Moved to Dev lists.
>
> Not sure about this though:
>  when a PR is submitted to Nutch project it will run some MR job in Hadoop
> CI.
>
> Whatever that PR requires should run as part of Nutch Infra. Why in Hadoop
> CI?
> Our CI is already loaded with our own workloads.
> If by any chance the above assertion gets a pass, then secondly we have
> very less number of people managing work related to CI and Infra. I don’t
> think most of the people won’t have context or say in the Nutch project,
> neither bandwidth to fix stuff if it gets broken.
>
> Just my thoughts. Looped in the dev lists, if others have any feedback. As
> for the process, this would require a consensus from the Hadoop PMC
>
> -Ayush
>
> > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
> wrote:
> >
> > Hi general@,
> >
> > Not sure if this is the correct mailing list. Please redirect me if there
> > is a more suitable location. Thank you
> >
> > I am PMC over on the Nutch project (https://nutch.apache.org). I would
> like
> > to investigate whether we can build an integration testing capability for
> > the project. This would involve running a Nutch integration test suite
> > (collection of MR jobs) in a Hadoop CI environment. For example whenever
> a
> > pull request is submitted to the Nutch project. This could easily be
> > automated through Jenkins.
> >
> > I’m not sure if this is something the Hadoop PMC would consider. Thank
> you
> > for the consideration.
> >
> > lewismc
> > --
> > http://home.apache.org/~lewismc/
> > http://people.apache.org/keys/committer/lewismc
>


-- 

*Lewis*
Dr. Lewis J. McGibbney Ph.D, B.Sc
*Skype*: lewis.john.mcgibbney

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by lewis john mcgibbney <le...@apache.org>.
Thank you for the response and for directing the conversation to the
correct places.
I may have misunderstood what ci-hadoop.apache.org actually is. We are
looking for a non-production Hadoop cluster which we can use to simulate
Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
is...
Instead it looks like lots of compute resources used to perform Jenkins
CI/CD tasks for Hadoop and associated projects rather than test things
on-top of Hadoop (and associated projects).
Any clarity on what ci-hadoop.apache.org actually is would be greatly
appreciated.

Let me also clarify my language, rather than have the integration tests run
on every PR, we could trigger the integration tests to be run by tagging a
Github bot i.e., "@nutchbot integration-test". Similar to what is done with
Dependabot or conda-forge for anyon familiar with those mechanisms.

Thanks for any advice or comments.
lewismc

On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:

> Moved to Dev lists.
>
> Not sure about this though:
>  when a PR is submitted to Nutch project it will run some MR job in Hadoop
> CI.
>
> Whatever that PR requires should run as part of Nutch Infra. Why in Hadoop
> CI?
> Our CI is already loaded with our own workloads.
> If by any chance the above assertion gets a pass, then secondly we have
> very less number of people managing work related to CI and Infra. I don’t
> think most of the people won’t have context or say in the Nutch project,
> neither bandwidth to fix stuff if it gets broken.
>
> Just my thoughts. Looped in the dev lists, if others have any feedback. As
> for the process, this would require a consensus from the Hadoop PMC
>
> -Ayush
>
> > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
> wrote:
> >
> > Hi general@,
> >
> > Not sure if this is the correct mailing list. Please redirect me if there
> > is a more suitable location. Thank you
> >
> > I am PMC over on the Nutch project (https://nutch.apache.org). I would
> like
> > to investigate whether we can build an integration testing capability for
> > the project. This would involve running a Nutch integration test suite
> > (collection of MR jobs) in a Hadoop CI environment. For example whenever
> a
> > pull request is submitted to the Nutch project. This could easily be
> > automated through Jenkins.
> >
> > I’m not sure if this is something the Hadoop PMC would consider. Thank
> you
> > for the consideration.
> >
> > lewismc
> > --
> > http://home.apache.org/~lewismc/
> > http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Thank you for the response and for directing the conversation to the
correct places.
I may have misunderstood what ci-hadoop.apache.org actually is. We are
looking for a non-production Hadoop cluster which we can use to simulate
Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
is...
Instead it looks like lots of compute resources used to perform Jenkins
CI/CD tasks for Hadoop and associated projects rather than test things
on-top of Hadoop (and associated projects).
Any clarity on what ci-hadoop.apache.org actually is would be greatly
appreciated.
Thanks
lewismc

On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:

> Moved to Dev lists.
>
> Not sure about this though:
>  when a PR is submitted to Nutch project it will run some MR job in Hadoop
> CI.
>
> Whatever that PR requires should run as part of Nutch Infra. Why in Hadoop
> CI?
> Our CI is already loaded with our own workloads.
> If by any chance the above assertion gets a pass, then secondly we have
> very less number of people managing work related to CI and Infra. I don’t
> think most of the people won’t have context or say in the Nutch project,
> neither bandwidth to fix stuff if it gets broken.
>
> Just my thoughts. Looped in the dev lists, if others have any feedback. As
> for the process, this would require a consensus from the Hadoop PMC
>
> -Ayush
>
> > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
> wrote:
> >
> > Hi general@,
> >
> > Not sure if this is the correct mailing list. Please redirect me if there
> > is a more suitable location. Thank you
> >
> > I am PMC over on the Nutch project (https://nutch.apache.org). I would
> like
> > to investigate whether we can build an integration testing capability for
> > the project. This would involve running a Nutch integration test suite
> > (collection of MR jobs) in a Hadoop CI environment. For example whenever
> a
> > pull request is submitted to the Nutch project. This could easily be
> > automated through Jenkins.
> >
> > I’m not sure if this is something the Hadoop PMC would consider. Thank
> you
> > for the consideration.
> >
> > lewismc
> > --
> > http://home.apache.org/~lewismc/
> > http://people.apache.org/keys/committer/lewismc
>


-- 

*Lewis*
Dr. Lewis J. McGibbney Ph.D, B.Sc
*Skype*: lewis.john.mcgibbney

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Thank you for the response and for directing the conversation to the
correct places.
I may have misunderstood what ci-hadoop.apache.org actually is. We are
looking for a non-production Hadoop cluster which we can use to simulate
Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
is...
Instead it looks like lots of compute resources used to perform Jenkins
CI/CD tasks for Hadoop and associated projects rather than test things
on-top of Hadoop (and associated projects).
Any clarity on what ci-hadoop.apache.org actually is would be greatly
appreciated.
Thanks
lewismc

On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:

> Moved to Dev lists.
>
> Not sure about this though:
>  when a PR is submitted to Nutch project it will run some MR job in Hadoop
> CI.
>
> Whatever that PR requires should run as part of Nutch Infra. Why in Hadoop
> CI?
> Our CI is already loaded with our own workloads.
> If by any chance the above assertion gets a pass, then secondly we have
> very less number of people managing work related to CI and Infra. I don’t
> think most of the people won’t have context or say in the Nutch project,
> neither bandwidth to fix stuff if it gets broken.
>
> Just my thoughts. Looped in the dev lists, if others have any feedback. As
> for the process, this would require a consensus from the Hadoop PMC
>
> -Ayush
>
> > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
> wrote:
> >
> > Hi general@,
> >
> > Not sure if this is the correct mailing list. Please redirect me if there
> > is a more suitable location. Thank you
> >
> > I am PMC over on the Nutch project (https://nutch.apache.org). I would
> like
> > to investigate whether we can build an integration testing capability for
> > the project. This would involve running a Nutch integration test suite
> > (collection of MR jobs) in a Hadoop CI environment. For example whenever
> a
> > pull request is submitted to the Nutch project. This could easily be
> > automated through Jenkins.
> >
> > I’m not sure if this is something the Hadoop PMC would consider. Thank
> you
> > for the consideration.
> >
> > lewismc
> > --
> > http://home.apache.org/~lewismc/
> > http://people.apache.org/keys/committer/lewismc
>


-- 

*Lewis*
Dr. Lewis J. McGibbney Ph.D, B.Sc
*Skype*: lewis.john.mcgibbney

Re: Possibility of using ci-hadoop.a.o for Nutch integration tests

Posted by lewis john mcgibbney <le...@apache.org>.
Thank you for the response and for directing the conversation to the
correct places.
I may have misunderstood what ci-hadoop.apache.org actually is. We are
looking for a non-production Hadoop cluster which we can use to simulate
Nutch jobs. I am not sure if this is what ci-hadoop.apache.org actually
is...
Instead it looks like lots of compute resources used to perform Jenkins
CI/CD tasks for Hadoop and associated projects rather than test things
on-top of Hadoop (and associated projects).
Any clarity on what ci-hadoop.apache.org actually is would be greatly
appreciated.

Let me also clarify my language, rather than have the integration tests run
on every PR, we could trigger the integration tests to be run by tagging a
Github bot i.e., "@nutchbot integration-test". Similar to what is done with
Dependabot or conda-forge for anyon familiar with those mechanisms.

Thanks for any advice or comments.
lewismc

On Wed, Jan 5, 2022 at 9:05 PM Ayush Saxena <ay...@gmail.com> wrote:

> Moved to Dev lists.
>
> Not sure about this though:
>  when a PR is submitted to Nutch project it will run some MR job in Hadoop
> CI.
>
> Whatever that PR requires should run as part of Nutch Infra. Why in Hadoop
> CI?
> Our CI is already loaded with our own workloads.
> If by any chance the above assertion gets a pass, then secondly we have
> very less number of people managing work related to CI and Infra. I don’t
> think most of the people won’t have context or say in the Nutch project,
> neither bandwidth to fix stuff if it gets broken.
>
> Just my thoughts. Looped in the dev lists, if others have any feedback. As
> for the process, this would require a consensus from the Hadoop PMC
>
> -Ayush
>
> > On 06-Jan-2022, at 7:02 AM, lewis john mcgibbney <le...@apache.org>
> wrote:
> >
> > Hi general@,
> >
> > Not sure if this is the correct mailing list. Please redirect me if there
> > is a more suitable location. Thank you
> >
> > I am PMC over on the Nutch project (https://nutch.apache.org). I would
> like
> > to investigate whether we can build an integration testing capability for
> > the project. This would involve running a Nutch integration test suite
> > (collection of MR jobs) in a Hadoop CI environment. For example whenever
> a
> > pull request is submitted to the Nutch project. This could easily be
> > automated through Jenkins.
> >
> > I’m not sure if this is something the Hadoop PMC would consider. Thank
> you
> > for the consideration.
> >
> > lewismc
> > --
> > http://home.apache.org/~lewismc/
> > http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc