You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Adam Szita <sz...@cloudera.com> on 2018/05/02 15:34:22 UTC

Re: ptest queue

I have a patch available for the voted version at
https://issues.apache.org/jira/browse/HIVE-19077. Let me know what you
think.

On 27 April 2018 at 15:55, Adam Szita <sz...@cloudera.com> wrote:

> Thanks to all for the responses.
> As I see it, option 3 is the winning one. Next week I'm going start
> working on this one then (unless any objections of course).
>
> Adam
>
> On 26 April 2018 at 05:48, Deepak Jaiswal <dj...@hortonworks.com>
> wrote:
>
>> +1 for option 3. Thanks Adam for taking this up again.
>>
>> Regards,
>> Deepak
>>
>> On 4/25/18, 4:54 PM, "Thejas Nair" <th...@gmail.com> wrote:
>>
>>     Option 3 seems reasonable. I believe that used to be the state a while
>>     back (maybe 12 months back or so).
>>     When 2nd ptest for same jira runs, it checks if the latest patch has
>>     already been run.
>>
>>
>>     On Wed, Apr 25, 2018 at 7:37 AM, Peter Vary <pv...@cloudera.com>
>> wrote:
>>     > I would vote for version 3. It would solve the big patch problem,
>> and removes the unnecessary test runs too.
>>     >
>>     > Thanks,
>>     > Peter
>>     >
>>     >> On Apr 25, 2018, at 11:01 AM, Adam Szita <sz...@cloudera.com>
>> wrote:
>>     >>
>>     >> Hi all,
>>     >>
>>     >> I had a patch (HIVE-19077) committed with the original aim being
>> the
>>     >> prevention of wasting resources when running ptest on the same
>> patch
>>     >> multiple times:
>>     >> It is supposed to manage scenarios where a developer uploads
>>     >> HIVE-XYZ.1.patch, that gets queued in jenkins, then before
>> execution
>>     >> HIVE-XYZ.2.patch (for the same jira) is uploaded and that gets
>> queued also.
>>     >> When the first patch starts to execute ptest will see that patch2
>> is the
>>     >> latest patch and will use that. After some time the second queued
>> job will
>>     >> also run on this very same patch.
>>     >> This is just pointless and causes long queues to progress slowly.
>>     >>
>>     >> My idea was to remove these duplicates from the queue where I'd
>> only keep
>>     >> the latest queued element if I see more queued entries for the
>> same jira
>>     >> number. It's like when you go grocery shopping and you're already
>> in line
>>     >> at cashier but you realise you also need e.g. milk. You go grab it
>> and join
>>     >> the END of the queue. So I believe it's a fair punishment for
>> losing one's
>>     >> spot in the queue for making amends on their patch.
>>     >>
>>     >> That said Deepak made me realise that for big patches this will be
>> very
>>     >> cumbersome due to the need of constant rebasing to avoid conflicts
>> on patch
>>     >> application.
>>     >> I have three proposals now:
>>     >>
>>     >> 1: Leave this as it currently is (with HIVE-19077 committed) -
>> *only the
>>     >> latest queued job will run of the same jira*
>>     >> pros: no wasting resources to run the same patches more times,
>> 'scheduling'
>>     >> is fair: if you amend you're patch you may loose your original
>> spot in the
>>     >> queue
>>     >> cons: big patches that are prone to conflicts will be hard to get
>> executed
>>     >> in ptest, devs will have to wait more time for their ptest results
>> if they
>>     >> amend their patches
>>     >>
>>     >> 2: *Add a safety switch* to this queue checking feature (currently
>> proposed
>>     >> in HIVE-19077), deduplication can be switch off on request
>>     >> pros: same as 1st, + ability to have more control on this
>> mechanism i.e.
>>     >> turn it off for big/urgent patches
>>     >> cons: big patches that use the swich might still waste resources,
>> also devs
>>     >> might use safety switch inappropriately for their own evil benefit
>> :)
>>     >>
>>     >> 3: Deduplication the other way around - *only the first queued job
>> will run
>>     >> of the same jira*, ptest server will keep record of patch names
>> and won't
>>     >> execute a patch with a seen name and jira number again
>>     >> pros: same patches will not be executed more times accidentally,
>> big
>>     >> patches won't be a problem either, devs will get their ptest
>> result back
>>     >> earlier even if more jobs are triggered for same jira/patch name
>>     >> cons: scheduling is less fair: devs can reserve their spots in the
>> queue
>>     >>
>>     >>
>>     >> (0: restore original: I'm strongly against this, ptest queue is
>> already too
>>     >> big as it is, we have to at least try and decrease its size by
>>     >> deduplicating jiras in it)
>>     >>
>>     >> I'm personally fine with any of the 1,2,3 methods listed above,
>> with my
>>     >> favourites being 2 and 3.
>>     >> Let me know which one you think is the right path to go down on.
>>     >>
>>     >> Thanks,
>>     >> Adam
>>     >>
>>     >> On 20 April 2018 at 20:14, Eugene Koifman <
>> ekoifman@hortonworks.com> wrote:
>>     >>
>>     >>> Would it be possible to add patch name validation when it gets
>> added to
>>     >>> the queue?
>>     >>> Currently I think it fails when the bot gets to the patch if it’s
>> not
>>     >>> named correctly.
>>     >>> More  common for branch patches
>>     >>>
>>     >>> On 4/20/18, 8:20 AM, "Zoltan Haindrich" <ki...@rxd.hu> wrote:
>>     >>>
>>     >>>    Hello,
>>     >>>
>>     >>>    Some time ago the ptest queue worked the following way:
>>     >>>
>>     >>>    * for some reason ATTACHMENT_ID was not set by the upstream
>> jira
>>     >>> scanner
>>     >>>    tool; this  triggered a feature in Jenkins: if for the same
>> ticket
>>     >>>    mutliple patches were uploaded; they didn't triggered new runs
>>     >>> (because
>>     >>>    the parameters were the same)
>>     >>>    * this have become fixed at some point...around that time I
>> started
>>     >>>    getting multiple ptest executions for the same ticket -
>> because I've
>>     >>>    fixed a minor typo after submitting the first version of my
>> patch...
>>     >>>    * currently we also have a jenkins queue reader inside the
>> ptest
>>     >>>    job...which checks if the ticket is in the queue right now;
>> and if is
>>     >>>    it, it just exits...this logic kinda restores the earlier
>> behaviour;
>>     >>>    with the exception that if I upload a patch every day and the
>> queue is
>>     >>>    longer that 1day (like now); I will never get a ptest run :D
>>     >>>    * ...now here I come! I've just removed my patch from
>> yesterday;
>>     >>> because
>>     >>>    I want a ptest run with my newest patch; and the only way to
>> force the
>>     >>>    above logic to do that....is by removing that attachment..
>>     >>>
>>     >>>
>>     >>>    So...could we go back to the state when the attachment_id was
>> ignored?
>>     >>>    I would recommend to remove the ATTACHMENT_ID from the jenkins
>>     >>> parameters...
>>     >>>
>>     >>>    cheers,
>>     >>>    Zoltan
>>     >>>
>>     >>>    JenkinsQueueUtil.java:
>>     >>>    https://github.com/apache/hive/blob/f8a671d8cfe8a26d1d12c51f
>> 93207e
>>     >>> c92577c796/testutils/ptest2/src/main/java/org/apache/hive/
>>     >>> ptest/api/client/JenkinsQueueUtil.java#L82
>>     >>>
>>     >>>
>>     >>>
>>     >>>
>>     >
>>
>>
>>
>>
>

Re: ptest queue

Posted by Deepak Jaiswal <dj...@hortonworks.com>.
Hi Adam,

Thanks for putting so much effort for making ptests better. Really appreciate this.

Regards,
Deepak

On 5/14/18, 11:47 AM, "Adam Szita" <sz...@cloudera.com> wrote:

    This is now committed and has been deployed some hours ago - don't worry if
    you see your job failed I resubmitted everything from the queue.
    I will be keeping an eye on how ptest works after this change.
    
    Thanks,
    Adam
    
    On 2 May 2018 at 17:34, Adam Szita <sz...@cloudera.com> wrote:
    
    > I have a patch available for the voted version at
    > https://issues.apache.org/jira/browse/HIVE-19077. Let me know what you
    > think.
    >
    > On 27 April 2018 at 15:55, Adam Szita <sz...@cloudera.com> wrote:
    >
    >> Thanks to all for the responses.
    >> As I see it, option 3 is the winning one. Next week I'm going start
    >> working on this one then (unless any objections of course).
    >>
    >> Adam
    >>
    >> On 26 April 2018 at 05:48, Deepak Jaiswal <dj...@hortonworks.com>
    >> wrote:
    >>
    >>> +1 for option 3. Thanks Adam for taking this up again.
    >>>
    >>> Regards,
    >>> Deepak
    >>>
    >>> On 4/25/18, 4:54 PM, "Thejas Nair" <th...@gmail.com> wrote:
    >>>
    >>>     Option 3 seems reasonable. I believe that used to be the state a
    >>> while
    >>>     back (maybe 12 months back or so).
    >>>     When 2nd ptest for same jira runs, it checks if the latest patch has
    >>>     already been run.
    >>>
    >>>
    >>>     On Wed, Apr 25, 2018 at 7:37 AM, Peter Vary <pv...@cloudera.com>
    >>> wrote:
    >>>     > I would vote for version 3. It would solve the big patch problem,
    >>> and removes the unnecessary test runs too.
    >>>     >
    >>>     > Thanks,
    >>>     > Peter
    >>>     >
    >>>     >> On Apr 25, 2018, at 11:01 AM, Adam Szita <sz...@cloudera.com>
    >>> wrote:
    >>>     >>
    >>>     >> Hi all,
    >>>     >>
    >>>     >> I had a patch (HIVE-19077) committed with the original aim being
    >>> the
    >>>     >> prevention of wasting resources when running ptest on the same
    >>> patch
    >>>     >> multiple times:
    >>>     >> It is supposed to manage scenarios where a developer uploads
    >>>     >> HIVE-XYZ.1.patch, that gets queued in jenkins, then before
    >>> execution
    >>>     >> HIVE-XYZ.2.patch (for the same jira) is uploaded and that gets
    >>> queued also.
    >>>     >> When the first patch starts to execute ptest will see that patch2
    >>> is the
    >>>     >> latest patch and will use that. After some time the second queued
    >>> job will
    >>>     >> also run on this very same patch.
    >>>     >> This is just pointless and causes long queues to progress slowly.
    >>>     >>
    >>>     >> My idea was to remove these duplicates from the queue where I'd
    >>> only keep
    >>>     >> the latest queued element if I see more queued entries for the
    >>> same jira
    >>>     >> number. It's like when you go grocery shopping and you're already
    >>> in line
    >>>     >> at cashier but you realise you also need e.g. milk. You go grab
    >>> it and join
    >>>     >> the END of the queue. So I believe it's a fair punishment for
    >>> losing one's
    >>>     >> spot in the queue for making amends on their patch.
    >>>     >>
    >>>     >> That said Deepak made me realise that for big patches this will
    >>> be very
    >>>     >> cumbersome due to the need of constant rebasing to avoid
    >>> conflicts on patch
    >>>     >> application.
    >>>     >> I have three proposals now:
    >>>     >>
    >>>     >> 1: Leave this as it currently is (with HIVE-19077 committed) -
    >>> *only the
    >>>     >> latest queued job will run of the same jira*
    >>>     >> pros: no wasting resources to run the same patches more times,
    >>> 'scheduling'
    >>>     >> is fair: if you amend you're patch you may loose your original
    >>> spot in the
    >>>     >> queue
    >>>     >> cons: big patches that are prone to conflicts will be hard to get
    >>> executed
    >>>     >> in ptest, devs will have to wait more time for their ptest
    >>> results if they
    >>>     >> amend their patches
    >>>     >>
    >>>     >> 2: *Add a safety switch* to this queue checking feature
    >>> (currently proposed
    >>>     >> in HIVE-19077), deduplication can be switch off on request
    >>>     >> pros: same as 1st, + ability to have more control on this
    >>> mechanism i.e.
    >>>     >> turn it off for big/urgent patches
    >>>     >> cons: big patches that use the swich might still waste resources,
    >>> also devs
    >>>     >> might use safety switch inappropriately for their own evil
    >>> benefit :)
    >>>     >>
    >>>     >> 3: Deduplication the other way around - *only the first queued
    >>> job will run
    >>>     >> of the same jira*, ptest server will keep record of patch names
    >>> and won't
    >>>     >> execute a patch with a seen name and jira number again
    >>>     >> pros: same patches will not be executed more times accidentally,
    >>> big
    >>>     >> patches won't be a problem either, devs will get their ptest
    >>> result back
    >>>     >> earlier even if more jobs are triggered for same jira/patch name
    >>>     >> cons: scheduling is less fair: devs can reserve their spots in
    >>> the queue
    >>>     >>
    >>>     >>
    >>>     >> (0: restore original: I'm strongly against this, ptest queue is
    >>> already too
    >>>     >> big as it is, we have to at least try and decrease its size by
    >>>     >> deduplicating jiras in it)
    >>>     >>
    >>>     >> I'm personally fine with any of the 1,2,3 methods listed above,
    >>> with my
    >>>     >> favourites being 2 and 3.
    >>>     >> Let me know which one you think is the right path to go down on.
    >>>     >>
    >>>     >> Thanks,
    >>>     >> Adam
    >>>     >>
    >>>     >> On 20 April 2018 at 20:14, Eugene Koifman <
    >>> ekoifman@hortonworks.com> wrote:
    >>>     >>
    >>>     >>> Would it be possible to add patch name validation when it gets
    >>> added to
    >>>     >>> the queue?
    >>>     >>> Currently I think it fails when the bot gets to the patch if
    >>> it’s not
    >>>     >>> named correctly.
    >>>     >>> More  common for branch patches
    >>>     >>>
    >>>     >>> On 4/20/18, 8:20 AM, "Zoltan Haindrich" <ki...@rxd.hu> wrote:
    >>>     >>>
    >>>     >>>    Hello,
    >>>     >>>
    >>>     >>>    Some time ago the ptest queue worked the following way:
    >>>     >>>
    >>>     >>>    * for some reason ATTACHMENT_ID was not set by the upstream
    >>> jira
    >>>     >>> scanner
    >>>     >>>    tool; this  triggered a feature in Jenkins: if for the same
    >>> ticket
    >>>     >>>    mutliple patches were uploaded; they didn't triggered new runs
    >>>     >>> (because
    >>>     >>>    the parameters were the same)
    >>>     >>>    * this have become fixed at some point...around that time I
    >>> started
    >>>     >>>    getting multiple ptest executions for the same ticket -
    >>> because I've
    >>>     >>>    fixed a minor typo after submitting the first version of my
    >>> patch...
    >>>     >>>    * currently we also have a jenkins queue reader inside the
    >>> ptest
    >>>     >>>    job...which checks if the ticket is in the queue right now;
    >>> and if is
    >>>     >>>    it, it just exits...this logic kinda restores the earlier
    >>> behaviour;
    >>>     >>>    with the exception that if I upload a patch every day and the
    >>> queue is
    >>>     >>>    longer that 1day (like now); I will never get a ptest run :D
    >>>     >>>    * ...now here I come! I've just removed my patch from
    >>> yesterday;
    >>>     >>> because
    >>>     >>>    I want a ptest run with my newest patch; and the only way to
    >>> force the
    >>>     >>>    above logic to do that....is by removing that attachment..
    >>>     >>>
    >>>     >>>
    >>>     >>>    So...could we go back to the state when the attachment_id was
    >>> ignored?
    >>>     >>>    I would recommend to remove the ATTACHMENT_ID from the jenkins
    >>>     >>> parameters...
    >>>     >>>
    >>>     >>>    cheers,
    >>>     >>>    Zoltan
    >>>     >>>
    >>>     >>>    JenkinsQueueUtil.java:
    >>>     >>>    https://github.com/apache/hive/blob/f8a671d8cfe8a26d1d12c51f
    >>> 93207e
    >>>     >>> c92577c796/testutils/ptest2/src/main/java/org/apache/hive/
    >>>     >>> ptest/api/client/JenkinsQueueUtil.java#L82
    >>>     >>>
    >>>     >>>
    >>>     >>>
    >>>     >>>
    >>>     >
    >>>
    >>>
    >>>
    >>>
    >>
    >
    


Re: ptest queue

Posted by Adam Szita <sz...@cloudera.com>.
This is now committed and has been deployed some hours ago - don't worry if
you see your job failed I resubmitted everything from the queue.
I will be keeping an eye on how ptest works after this change.

Thanks,
Adam

On 2 May 2018 at 17:34, Adam Szita <sz...@cloudera.com> wrote:

> I have a patch available for the voted version at
> https://issues.apache.org/jira/browse/HIVE-19077. Let me know what you
> think.
>
> On 27 April 2018 at 15:55, Adam Szita <sz...@cloudera.com> wrote:
>
>> Thanks to all for the responses.
>> As I see it, option 3 is the winning one. Next week I'm going start
>> working on this one then (unless any objections of course).
>>
>> Adam
>>
>> On 26 April 2018 at 05:48, Deepak Jaiswal <dj...@hortonworks.com>
>> wrote:
>>
>>> +1 for option 3. Thanks Adam for taking this up again.
>>>
>>> Regards,
>>> Deepak
>>>
>>> On 4/25/18, 4:54 PM, "Thejas Nair" <th...@gmail.com> wrote:
>>>
>>>     Option 3 seems reasonable. I believe that used to be the state a
>>> while
>>>     back (maybe 12 months back or so).
>>>     When 2nd ptest for same jira runs, it checks if the latest patch has
>>>     already been run.
>>>
>>>
>>>     On Wed, Apr 25, 2018 at 7:37 AM, Peter Vary <pv...@cloudera.com>
>>> wrote:
>>>     > I would vote for version 3. It would solve the big patch problem,
>>> and removes the unnecessary test runs too.
>>>     >
>>>     > Thanks,
>>>     > Peter
>>>     >
>>>     >> On Apr 25, 2018, at 11:01 AM, Adam Szita <sz...@cloudera.com>
>>> wrote:
>>>     >>
>>>     >> Hi all,
>>>     >>
>>>     >> I had a patch (HIVE-19077) committed with the original aim being
>>> the
>>>     >> prevention of wasting resources when running ptest on the same
>>> patch
>>>     >> multiple times:
>>>     >> It is supposed to manage scenarios where a developer uploads
>>>     >> HIVE-XYZ.1.patch, that gets queued in jenkins, then before
>>> execution
>>>     >> HIVE-XYZ.2.patch (for the same jira) is uploaded and that gets
>>> queued also.
>>>     >> When the first patch starts to execute ptest will see that patch2
>>> is the
>>>     >> latest patch and will use that. After some time the second queued
>>> job will
>>>     >> also run on this very same patch.
>>>     >> This is just pointless and causes long queues to progress slowly.
>>>     >>
>>>     >> My idea was to remove these duplicates from the queue where I'd
>>> only keep
>>>     >> the latest queued element if I see more queued entries for the
>>> same jira
>>>     >> number. It's like when you go grocery shopping and you're already
>>> in line
>>>     >> at cashier but you realise you also need e.g. milk. You go grab
>>> it and join
>>>     >> the END of the queue. So I believe it's a fair punishment for
>>> losing one's
>>>     >> spot in the queue for making amends on their patch.
>>>     >>
>>>     >> That said Deepak made me realise that for big patches this will
>>> be very
>>>     >> cumbersome due to the need of constant rebasing to avoid
>>> conflicts on patch
>>>     >> application.
>>>     >> I have three proposals now:
>>>     >>
>>>     >> 1: Leave this as it currently is (with HIVE-19077 committed) -
>>> *only the
>>>     >> latest queued job will run of the same jira*
>>>     >> pros: no wasting resources to run the same patches more times,
>>> 'scheduling'
>>>     >> is fair: if you amend you're patch you may loose your original
>>> spot in the
>>>     >> queue
>>>     >> cons: big patches that are prone to conflicts will be hard to get
>>> executed
>>>     >> in ptest, devs will have to wait more time for their ptest
>>> results if they
>>>     >> amend their patches
>>>     >>
>>>     >> 2: *Add a safety switch* to this queue checking feature
>>> (currently proposed
>>>     >> in HIVE-19077), deduplication can be switch off on request
>>>     >> pros: same as 1st, + ability to have more control on this
>>> mechanism i.e.
>>>     >> turn it off for big/urgent patches
>>>     >> cons: big patches that use the swich might still waste resources,
>>> also devs
>>>     >> might use safety switch inappropriately for their own evil
>>> benefit :)
>>>     >>
>>>     >> 3: Deduplication the other way around - *only the first queued
>>> job will run
>>>     >> of the same jira*, ptest server will keep record of patch names
>>> and won't
>>>     >> execute a patch with a seen name and jira number again
>>>     >> pros: same patches will not be executed more times accidentally,
>>> big
>>>     >> patches won't be a problem either, devs will get their ptest
>>> result back
>>>     >> earlier even if more jobs are triggered for same jira/patch name
>>>     >> cons: scheduling is less fair: devs can reserve their spots in
>>> the queue
>>>     >>
>>>     >>
>>>     >> (0: restore original: I'm strongly against this, ptest queue is
>>> already too
>>>     >> big as it is, we have to at least try and decrease its size by
>>>     >> deduplicating jiras in it)
>>>     >>
>>>     >> I'm personally fine with any of the 1,2,3 methods listed above,
>>> with my
>>>     >> favourites being 2 and 3.
>>>     >> Let me know which one you think is the right path to go down on.
>>>     >>
>>>     >> Thanks,
>>>     >> Adam
>>>     >>
>>>     >> On 20 April 2018 at 20:14, Eugene Koifman <
>>> ekoifman@hortonworks.com> wrote:
>>>     >>
>>>     >>> Would it be possible to add patch name validation when it gets
>>> added to
>>>     >>> the queue?
>>>     >>> Currently I think it fails when the bot gets to the patch if
>>> it’s not
>>>     >>> named correctly.
>>>     >>> More  common for branch patches
>>>     >>>
>>>     >>> On 4/20/18, 8:20 AM, "Zoltan Haindrich" <ki...@rxd.hu> wrote:
>>>     >>>
>>>     >>>    Hello,
>>>     >>>
>>>     >>>    Some time ago the ptest queue worked the following way:
>>>     >>>
>>>     >>>    * for some reason ATTACHMENT_ID was not set by the upstream
>>> jira
>>>     >>> scanner
>>>     >>>    tool; this  triggered a feature in Jenkins: if for the same
>>> ticket
>>>     >>>    mutliple patches were uploaded; they didn't triggered new runs
>>>     >>> (because
>>>     >>>    the parameters were the same)
>>>     >>>    * this have become fixed at some point...around that time I
>>> started
>>>     >>>    getting multiple ptest executions for the same ticket -
>>> because I've
>>>     >>>    fixed a minor typo after submitting the first version of my
>>> patch...
>>>     >>>    * currently we also have a jenkins queue reader inside the
>>> ptest
>>>     >>>    job...which checks if the ticket is in the queue right now;
>>> and if is
>>>     >>>    it, it just exits...this logic kinda restores the earlier
>>> behaviour;
>>>     >>>    with the exception that if I upload a patch every day and the
>>> queue is
>>>     >>>    longer that 1day (like now); I will never get a ptest run :D
>>>     >>>    * ...now here I come! I've just removed my patch from
>>> yesterday;
>>>     >>> because
>>>     >>>    I want a ptest run with my newest patch; and the only way to
>>> force the
>>>     >>>    above logic to do that....is by removing that attachment..
>>>     >>>
>>>     >>>
>>>     >>>    So...could we go back to the state when the attachment_id was
>>> ignored?
>>>     >>>    I would recommend to remove the ATTACHMENT_ID from the jenkins
>>>     >>> parameters...
>>>     >>>
>>>     >>>    cheers,
>>>     >>>    Zoltan
>>>     >>>
>>>     >>>    JenkinsQueueUtil.java:
>>>     >>>    https://github.com/apache/hive/blob/f8a671d8cfe8a26d1d12c51f
>>> 93207e
>>>     >>> c92577c796/testutils/ptest2/src/main/java/org/apache/hive/
>>>     >>> ptest/api/client/JenkinsQueueUtil.java#L82
>>>     >>>
>>>     >>>
>>>     >>>
>>>     >>>
>>>     >
>>>
>>>
>>>
>>>
>>
>