You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Adam Szita <sz...@cloudera.com> on 2018/05/02 15:34:22 UTC
Re: ptest queue
I have a patch available for the voted version at
https://issues.apache.org/jira/browse/HIVE-19077. Let me know what you
think.
On 27 April 2018 at 15:55, Adam Szita <sz...@cloudera.com> wrote:
> Thanks to all for the responses.
> As I see it, option 3 is the winning one. Next week I'm going start
> working on this one then (unless any objections of course).
>
> Adam
>
> On 26 April 2018 at 05:48, Deepak Jaiswal <dj...@hortonworks.com>
> wrote:
>
>> +1 for option 3. Thanks Adam for taking this up again.
>>
>> Regards,
>> Deepak
>>
>> On 4/25/18, 4:54 PM, "Thejas Nair" <th...@gmail.com> wrote:
>>
>> Option 3 seems reasonable. I believe that used to be the state a while
>> back (maybe 12 months back or so).
>> When 2nd ptest for same jira runs, it checks if the latest patch has
>> already been run.
>>
>>
>> On Wed, Apr 25, 2018 at 7:37 AM, Peter Vary <pv...@cloudera.com>
>> wrote:
>> > I would vote for version 3. It would solve the big patch problem,
>> and removes the unnecessary test runs too.
>> >
>> > Thanks,
>> > Peter
>> >
>> >> On Apr 25, 2018, at 11:01 AM, Adam Szita <sz...@cloudera.com>
>> wrote:
>> >>
>> >> Hi all,
>> >>
>> >> I had a patch (HIVE-19077) committed with the original aim being
>> the
>> >> prevention of wasting resources when running ptest on the same
>> patch
>> >> multiple times:
>> >> It is supposed to manage scenarios where a developer uploads
>> >> HIVE-XYZ.1.patch, that gets queued in jenkins, then before
>> execution
>> >> HIVE-XYZ.2.patch (for the same jira) is uploaded and that gets
>> queued also.
>> >> When the first patch starts to execute ptest will see that patch2
>> is the
>> >> latest patch and will use that. After some time the second queued
>> job will
>> >> also run on this very same patch.
>> >> This is just pointless and causes long queues to progress slowly.
>> >>
>> >> My idea was to remove these duplicates from the queue where I'd
>> only keep
>> >> the latest queued element if I see more queued entries for the
>> same jira
>> >> number. It's like when you go grocery shopping and you're already
>> in line
>> >> at cashier but you realise you also need e.g. milk. You go grab it
>> and join
>> >> the END of the queue. So I believe it's a fair punishment for
>> losing one's
>> >> spot in the queue for making amends on their patch.
>> >>
>> >> That said Deepak made me realise that for big patches this will be
>> very
>> >> cumbersome due to the need of constant rebasing to avoid conflicts
>> on patch
>> >> application.
>> >> I have three proposals now:
>> >>
>> >> 1: Leave this as it currently is (with HIVE-19077 committed) -
>> *only the
>> >> latest queued job will run of the same jira*
>> >> pros: no wasting resources to run the same patches more times,
>> 'scheduling'
>> >> is fair: if you amend you're patch you may loose your original
>> spot in the
>> >> queue
>> >> cons: big patches that are prone to conflicts will be hard to get
>> executed
>> >> in ptest, devs will have to wait more time for their ptest results
>> if they
>> >> amend their patches
>> >>
>> >> 2: *Add a safety switch* to this queue checking feature (currently
>> proposed
>> >> in HIVE-19077), deduplication can be switch off on request
>> >> pros: same as 1st, + ability to have more control on this
>> mechanism i.e.
>> >> turn it off for big/urgent patches
>> >> cons: big patches that use the swich might still waste resources,
>> also devs
>> >> might use safety switch inappropriately for their own evil benefit
>> :)
>> >>
>> >> 3: Deduplication the other way around - *only the first queued job
>> will run
>> >> of the same jira*, ptest server will keep record of patch names
>> and won't
>> >> execute a patch with a seen name and jira number again
>> >> pros: same patches will not be executed more times accidentally,
>> big
>> >> patches won't be a problem either, devs will get their ptest
>> result back
>> >> earlier even if more jobs are triggered for same jira/patch name
>> >> cons: scheduling is less fair: devs can reserve their spots in the
>> queue
>> >>
>> >>
>> >> (0: restore original: I'm strongly against this, ptest queue is
>> already too
>> >> big as it is, we have to at least try and decrease its size by
>> >> deduplicating jiras in it)
>> >>
>> >> I'm personally fine with any of the 1,2,3 methods listed above,
>> with my
>> >> favourites being 2 and 3.
>> >> Let me know which one you think is the right path to go down on.
>> >>
>> >> Thanks,
>> >> Adam
>> >>
>> >> On 20 April 2018 at 20:14, Eugene Koifman <
>> ekoifman@hortonworks.com> wrote:
>> >>
>> >>> Would it be possible to add patch name validation when it gets
>> added to
>> >>> the queue?
>> >>> Currently I think it fails when the bot gets to the patch if it’s
>> not
>> >>> named correctly.
>> >>> More common for branch patches
>> >>>
>> >>> On 4/20/18, 8:20 AM, "Zoltan Haindrich" <ki...@rxd.hu> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> Some time ago the ptest queue worked the following way:
>> >>>
>> >>> * for some reason ATTACHMENT_ID was not set by the upstream
>> jira
>> >>> scanner
>> >>> tool; this triggered a feature in Jenkins: if for the same
>> ticket
>> >>> mutliple patches were uploaded; they didn't triggered new runs
>> >>> (because
>> >>> the parameters were the same)
>> >>> * this have become fixed at some point...around that time I
>> started
>> >>> getting multiple ptest executions for the same ticket -
>> because I've
>> >>> fixed a minor typo after submitting the first version of my
>> patch...
>> >>> * currently we also have a jenkins queue reader inside the
>> ptest
>> >>> job...which checks if the ticket is in the queue right now;
>> and if is
>> >>> it, it just exits...this logic kinda restores the earlier
>> behaviour;
>> >>> with the exception that if I upload a patch every day and the
>> queue is
>> >>> longer that 1day (like now); I will never get a ptest run :D
>> >>> * ...now here I come! I've just removed my patch from
>> yesterday;
>> >>> because
>> >>> I want a ptest run with my newest patch; and the only way to
>> force the
>> >>> above logic to do that....is by removing that attachment..
>> >>>
>> >>>
>> >>> So...could we go back to the state when the attachment_id was
>> ignored?
>> >>> I would recommend to remove the ATTACHMENT_ID from the jenkins
>> >>> parameters...
>> >>>
>> >>> cheers,
>> >>> Zoltan
>> >>>
>> >>> JenkinsQueueUtil.java:
>> >>> https://github.com/apache/hive/blob/f8a671d8cfe8a26d1d12c51f
>> 93207e
>> >>> c92577c796/testutils/ptest2/src/main/java/org/apache/hive/
>> >>> ptest/api/client/JenkinsQueueUtil.java#L82
>> >>>
>> >>>
>> >>>
>> >>>
>> >
>>
>>
>>
>>
>
Re: ptest queue
Posted by Deepak Jaiswal <dj...@hortonworks.com>.
Hi Adam,
Thanks for putting so much effort for making ptests better. Really appreciate this.
Regards,
Deepak
On 5/14/18, 11:47 AM, "Adam Szita" <sz...@cloudera.com> wrote:
This is now committed and has been deployed some hours ago - don't worry if
you see your job failed I resubmitted everything from the queue.
I will be keeping an eye on how ptest works after this change.
Thanks,
Adam
On 2 May 2018 at 17:34, Adam Szita <sz...@cloudera.com> wrote:
> I have a patch available for the voted version at
> https://issues.apache.org/jira/browse/HIVE-19077. Let me know what you
> think.
>
> On 27 April 2018 at 15:55, Adam Szita <sz...@cloudera.com> wrote:
>
>> Thanks to all for the responses.
>> As I see it, option 3 is the winning one. Next week I'm going start
>> working on this one then (unless any objections of course).
>>
>> Adam
>>
>> On 26 April 2018 at 05:48, Deepak Jaiswal <dj...@hortonworks.com>
>> wrote:
>>
>>> +1 for option 3. Thanks Adam for taking this up again.
>>>
>>> Regards,
>>> Deepak
>>>
>>> On 4/25/18, 4:54 PM, "Thejas Nair" <th...@gmail.com> wrote:
>>>
>>> Option 3 seems reasonable. I believe that used to be the state a
>>> while
>>> back (maybe 12 months back or so).
>>> When 2nd ptest for same jira runs, it checks if the latest patch has
>>> already been run.
>>>
>>>
>>> On Wed, Apr 25, 2018 at 7:37 AM, Peter Vary <pv...@cloudera.com>
>>> wrote:
>>> > I would vote for version 3. It would solve the big patch problem,
>>> and removes the unnecessary test runs too.
>>> >
>>> > Thanks,
>>> > Peter
>>> >
>>> >> On Apr 25, 2018, at 11:01 AM, Adam Szita <sz...@cloudera.com>
>>> wrote:
>>> >>
>>> >> Hi all,
>>> >>
>>> >> I had a patch (HIVE-19077) committed with the original aim being
>>> the
>>> >> prevention of wasting resources when running ptest on the same
>>> patch
>>> >> multiple times:
>>> >> It is supposed to manage scenarios where a developer uploads
>>> >> HIVE-XYZ.1.patch, that gets queued in jenkins, then before
>>> execution
>>> >> HIVE-XYZ.2.patch (for the same jira) is uploaded and that gets
>>> queued also.
>>> >> When the first patch starts to execute ptest will see that patch2
>>> is the
>>> >> latest patch and will use that. After some time the second queued
>>> job will
>>> >> also run on this very same patch.
>>> >> This is just pointless and causes long queues to progress slowly.
>>> >>
>>> >> My idea was to remove these duplicates from the queue where I'd
>>> only keep
>>> >> the latest queued element if I see more queued entries for the
>>> same jira
>>> >> number. It's like when you go grocery shopping and you're already
>>> in line
>>> >> at cashier but you realise you also need e.g. milk. You go grab
>>> it and join
>>> >> the END of the queue. So I believe it's a fair punishment for
>>> losing one's
>>> >> spot in the queue for making amends on their patch.
>>> >>
>>> >> That said Deepak made me realise that for big patches this will
>>> be very
>>> >> cumbersome due to the need of constant rebasing to avoid
>>> conflicts on patch
>>> >> application.
>>> >> I have three proposals now:
>>> >>
>>> >> 1: Leave this as it currently is (with HIVE-19077 committed) -
>>> *only the
>>> >> latest queued job will run of the same jira*
>>> >> pros: no wasting resources to run the same patches more times,
>>> 'scheduling'
>>> >> is fair: if you amend you're patch you may loose your original
>>> spot in the
>>> >> queue
>>> >> cons: big patches that are prone to conflicts will be hard to get
>>> executed
>>> >> in ptest, devs will have to wait more time for their ptest
>>> results if they
>>> >> amend their patches
>>> >>
>>> >> 2: *Add a safety switch* to this queue checking feature
>>> (currently proposed
>>> >> in HIVE-19077), deduplication can be switch off on request
>>> >> pros: same as 1st, + ability to have more control on this
>>> mechanism i.e.
>>> >> turn it off for big/urgent patches
>>> >> cons: big patches that use the swich might still waste resources,
>>> also devs
>>> >> might use safety switch inappropriately for their own evil
>>> benefit :)
>>> >>
>>> >> 3: Deduplication the other way around - *only the first queued
>>> job will run
>>> >> of the same jira*, ptest server will keep record of patch names
>>> and won't
>>> >> execute a patch with a seen name and jira number again
>>> >> pros: same patches will not be executed more times accidentally,
>>> big
>>> >> patches won't be a problem either, devs will get their ptest
>>> result back
>>> >> earlier even if more jobs are triggered for same jira/patch name
>>> >> cons: scheduling is less fair: devs can reserve their spots in
>>> the queue
>>> >>
>>> >>
>>> >> (0: restore original: I'm strongly against this, ptest queue is
>>> already too
>>> >> big as it is, we have to at least try and decrease its size by
>>> >> deduplicating jiras in it)
>>> >>
>>> >> I'm personally fine with any of the 1,2,3 methods listed above,
>>> with my
>>> >> favourites being 2 and 3.
>>> >> Let me know which one you think is the right path to go down on.
>>> >>
>>> >> Thanks,
>>> >> Adam
>>> >>
>>> >> On 20 April 2018 at 20:14, Eugene Koifman <
>>> ekoifman@hortonworks.com> wrote:
>>> >>
>>> >>> Would it be possible to add patch name validation when it gets
>>> added to
>>> >>> the queue?
>>> >>> Currently I think it fails when the bot gets to the patch if
>>> it’s not
>>> >>> named correctly.
>>> >>> More common for branch patches
>>> >>>
>>> >>> On 4/20/18, 8:20 AM, "Zoltan Haindrich" <ki...@rxd.hu> wrote:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> Some time ago the ptest queue worked the following way:
>>> >>>
>>> >>> * for some reason ATTACHMENT_ID was not set by the upstream
>>> jira
>>> >>> scanner
>>> >>> tool; this triggered a feature in Jenkins: if for the same
>>> ticket
>>> >>> mutliple patches were uploaded; they didn't triggered new runs
>>> >>> (because
>>> >>> the parameters were the same)
>>> >>> * this have become fixed at some point...around that time I
>>> started
>>> >>> getting multiple ptest executions for the same ticket -
>>> because I've
>>> >>> fixed a minor typo after submitting the first version of my
>>> patch...
>>> >>> * currently we also have a jenkins queue reader inside the
>>> ptest
>>> >>> job...which checks if the ticket is in the queue right now;
>>> and if is
>>> >>> it, it just exits...this logic kinda restores the earlier
>>> behaviour;
>>> >>> with the exception that if I upload a patch every day and the
>>> queue is
>>> >>> longer that 1day (like now); I will never get a ptest run :D
>>> >>> * ...now here I come! I've just removed my patch from
>>> yesterday;
>>> >>> because
>>> >>> I want a ptest run with my newest patch; and the only way to
>>> force the
>>> >>> above logic to do that....is by removing that attachment..
>>> >>>
>>> >>>
>>> >>> So...could we go back to the state when the attachment_id was
>>> ignored?
>>> >>> I would recommend to remove the ATTACHMENT_ID from the jenkins
>>> >>> parameters...
>>> >>>
>>> >>> cheers,
>>> >>> Zoltan
>>> >>>
>>> >>> JenkinsQueueUtil.java:
>>> >>> https://github.com/apache/hive/blob/f8a671d8cfe8a26d1d12c51f
>>> 93207e
>>> >>> c92577c796/testutils/ptest2/src/main/java/org/apache/hive/
>>> >>> ptest/api/client/JenkinsQueueUtil.java#L82
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >
>>>
>>>
>>>
>>>
>>
>
Re: ptest queue
Posted by Adam Szita <sz...@cloudera.com>.
This is now committed and has been deployed some hours ago - don't worry if
you see your job failed I resubmitted everything from the queue.
I will be keeping an eye on how ptest works after this change.
Thanks,
Adam
On 2 May 2018 at 17:34, Adam Szita <sz...@cloudera.com> wrote:
> I have a patch available for the voted version at
> https://issues.apache.org/jira/browse/HIVE-19077. Let me know what you
> think.
>
> On 27 April 2018 at 15:55, Adam Szita <sz...@cloudera.com> wrote:
>
>> Thanks to all for the responses.
>> As I see it, option 3 is the winning one. Next week I'm going start
>> working on this one then (unless any objections of course).
>>
>> Adam
>>
>> On 26 April 2018 at 05:48, Deepak Jaiswal <dj...@hortonworks.com>
>> wrote:
>>
>>> +1 for option 3. Thanks Adam for taking this up again.
>>>
>>> Regards,
>>> Deepak
>>>
>>> On 4/25/18, 4:54 PM, "Thejas Nair" <th...@gmail.com> wrote:
>>>
>>> Option 3 seems reasonable. I believe that used to be the state a
>>> while
>>> back (maybe 12 months back or so).
>>> When 2nd ptest for same jira runs, it checks if the latest patch has
>>> already been run.
>>>
>>>
>>> On Wed, Apr 25, 2018 at 7:37 AM, Peter Vary <pv...@cloudera.com>
>>> wrote:
>>> > I would vote for version 3. It would solve the big patch problem,
>>> and removes the unnecessary test runs too.
>>> >
>>> > Thanks,
>>> > Peter
>>> >
>>> >> On Apr 25, 2018, at 11:01 AM, Adam Szita <sz...@cloudera.com>
>>> wrote:
>>> >>
>>> >> Hi all,
>>> >>
>>> >> I had a patch (HIVE-19077) committed with the original aim being
>>> the
>>> >> prevention of wasting resources when running ptest on the same
>>> patch
>>> >> multiple times:
>>> >> It is supposed to manage scenarios where a developer uploads
>>> >> HIVE-XYZ.1.patch, that gets queued in jenkins, then before
>>> execution
>>> >> HIVE-XYZ.2.patch (for the same jira) is uploaded and that gets
>>> queued also.
>>> >> When the first patch starts to execute ptest will see that patch2
>>> is the
>>> >> latest patch and will use that. After some time the second queued
>>> job will
>>> >> also run on this very same patch.
>>> >> This is just pointless and causes long queues to progress slowly.
>>> >>
>>> >> My idea was to remove these duplicates from the queue where I'd
>>> only keep
>>> >> the latest queued element if I see more queued entries for the
>>> same jira
>>> >> number. It's like when you go grocery shopping and you're already
>>> in line
>>> >> at cashier but you realise you also need e.g. milk. You go grab
>>> it and join
>>> >> the END of the queue. So I believe it's a fair punishment for
>>> losing one's
>>> >> spot in the queue for making amends on their patch.
>>> >>
>>> >> That said Deepak made me realise that for big patches this will
>>> be very
>>> >> cumbersome due to the need of constant rebasing to avoid
>>> conflicts on patch
>>> >> application.
>>> >> I have three proposals now:
>>> >>
>>> >> 1: Leave this as it currently is (with HIVE-19077 committed) -
>>> *only the
>>> >> latest queued job will run of the same jira*
>>> >> pros: no wasting resources to run the same patches more times,
>>> 'scheduling'
>>> >> is fair: if you amend you're patch you may loose your original
>>> spot in the
>>> >> queue
>>> >> cons: big patches that are prone to conflicts will be hard to get
>>> executed
>>> >> in ptest, devs will have to wait more time for their ptest
>>> results if they
>>> >> amend their patches
>>> >>
>>> >> 2: *Add a safety switch* to this queue checking feature
>>> (currently proposed
>>> >> in HIVE-19077), deduplication can be switch off on request
>>> >> pros: same as 1st, + ability to have more control on this
>>> mechanism i.e.
>>> >> turn it off for big/urgent patches
>>> >> cons: big patches that use the swich might still waste resources,
>>> also devs
>>> >> might use safety switch inappropriately for their own evil
>>> benefit :)
>>> >>
>>> >> 3: Deduplication the other way around - *only the first queued
>>> job will run
>>> >> of the same jira*, ptest server will keep record of patch names
>>> and won't
>>> >> execute a patch with a seen name and jira number again
>>> >> pros: same patches will not be executed more times accidentally,
>>> big
>>> >> patches won't be a problem either, devs will get their ptest
>>> result back
>>> >> earlier even if more jobs are triggered for same jira/patch name
>>> >> cons: scheduling is less fair: devs can reserve their spots in
>>> the queue
>>> >>
>>> >>
>>> >> (0: restore original: I'm strongly against this, ptest queue is
>>> already too
>>> >> big as it is, we have to at least try and decrease its size by
>>> >> deduplicating jiras in it)
>>> >>
>>> >> I'm personally fine with any of the 1,2,3 methods listed above,
>>> with my
>>> >> favourites being 2 and 3.
>>> >> Let me know which one you think is the right path to go down on.
>>> >>
>>> >> Thanks,
>>> >> Adam
>>> >>
>>> >> On 20 April 2018 at 20:14, Eugene Koifman <
>>> ekoifman@hortonworks.com> wrote:
>>> >>
>>> >>> Would it be possible to add patch name validation when it gets
>>> added to
>>> >>> the queue?
>>> >>> Currently I think it fails when the bot gets to the patch if
>>> it’s not
>>> >>> named correctly.
>>> >>> More common for branch patches
>>> >>>
>>> >>> On 4/20/18, 8:20 AM, "Zoltan Haindrich" <ki...@rxd.hu> wrote:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> Some time ago the ptest queue worked the following way:
>>> >>>
>>> >>> * for some reason ATTACHMENT_ID was not set by the upstream
>>> jira
>>> >>> scanner
>>> >>> tool; this triggered a feature in Jenkins: if for the same
>>> ticket
>>> >>> mutliple patches were uploaded; they didn't triggered new runs
>>> >>> (because
>>> >>> the parameters were the same)
>>> >>> * this have become fixed at some point...around that time I
>>> started
>>> >>> getting multiple ptest executions for the same ticket -
>>> because I've
>>> >>> fixed a minor typo after submitting the first version of my
>>> patch...
>>> >>> * currently we also have a jenkins queue reader inside the
>>> ptest
>>> >>> job...which checks if the ticket is in the queue right now;
>>> and if is
>>> >>> it, it just exits...this logic kinda restores the earlier
>>> behaviour;
>>> >>> with the exception that if I upload a patch every day and the
>>> queue is
>>> >>> longer that 1day (like now); I will never get a ptest run :D
>>> >>> * ...now here I come! I've just removed my patch from
>>> yesterday;
>>> >>> because
>>> >>> I want a ptest run with my newest patch; and the only way to
>>> force the
>>> >>> above logic to do that....is by removing that attachment..
>>> >>>
>>> >>>
>>> >>> So...could we go back to the state when the attachment_id was
>>> ignored?
>>> >>> I would recommend to remove the ATTACHMENT_ID from the jenkins
>>> >>> parameters...
>>> >>>
>>> >>> cheers,
>>> >>> Zoltan
>>> >>>
>>> >>> JenkinsQueueUtil.java:
>>> >>> https://github.com/apache/hive/blob/f8a671d8cfe8a26d1d12c51f
>>> 93207e
>>> >>> c92577c796/testutils/ptest2/src/main/java/org/apache/hive/
>>> >>> ptest/api/client/JenkinsQueueUtil.java#L82
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >
>>>
>>>
>>>
>>>
>>
>