You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by Alex Zai <az...@gmail.com> on 2018/11/29 22:38:19 UTC

Adding AMD CPU to CI

What are people's thoughts on having AMD machines tested on the CI? AMD
machines are now available on AWS.

Best,
Alex

Re: Adding AMD CPU to CI

Posted by Tianqi Chen <tq...@cs.washington.edu>.
I am not sure if it is necessary, as AMD CPU also supports x86, and it
would not add additional information

Tianqi

On Thu, Nov 29, 2018 at 3:35 PM kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> +1
>
> On Thu, Nov 29, 2018 at 2:50 PM Seth, Manu <se...@amazon.com.invalid>
> wrote:
>
> > +1
> >
> > On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:
> >
> >     What are people's thoughts on having AMD machines tested on the CI?
> AMD
> >     machines are now available on AWS.
> >
> >     Best,
> >     Alex
> >
> >
> >
>

Re: Adding AMD CPU to CI

Posted by kellen sunderland <ke...@gmail.com>.
+1

On Thu, Nov 29, 2018 at 2:50 PM Seth, Manu <se...@amazon.com.invalid>
wrote:

> +1
>
> On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:
>
>     What are people's thoughts on having AMD machines tested on the CI? AMD
>     machines are now available on AWS.
>
>     Best,
>     Alex
>
>
>

Re: Adding AMD CPU to CI

Posted by Pedro Larroy <pe...@gmail.com>.
I think just Adding AMD is not the right abstraction level. Testing and benchmarking with different cpu flags / march ie AVX2 sse2 brings value in my opinion. Just testing another vendor of a compatible cpu doesn’t.

Pedro

> On 30. Nov 2018, at 19:32, kellen sunderland <ke...@gmail.com> wrote:
> 
> Damn, knew i should have double-checked!  Oh well it's also carbon neutral.
> 
> On Fri, Nov 30, 2018 at 10:27 AM Pedro Larroy <pe...@gmail.com>
> wrote:
> 
>> Agee with Tianqi and Hao. Adding AMD brings no value and increases
>> complexity and CI cost. The instructions sets are the same. For
>> benchmarking it might make sense though.
>> 
>> Pedro
>> 
>>> On 30. Nov 2018, at 18:19, Tianqi Chen <tq...@cs.washington.edu> wrote:
>>> 
>>> I still think it is overkill to add AMD CPU to the CI, given the
>> additional
>>> cost it could bring and little additional information we can get out from
>>> it.
>>> 
>>> A middle group is to add AMD CPU to a nightly build or final sweep before
>>> release. If there is a case that we find that AMD CPU really makes a
>>> difference, then we add it to the CI
>>> 
>>> Tianqi
>>> 
>>>> On Thu, Nov 29, 2018 at 6:29 PM Hao Jin <hj...@gmail.com> wrote:
>>>> 
>>>> For CPUs, the supported instruction sets may also vary between the same
>>>> manufacturer's different product lines of the same generation
>> (Skylake-SP
>>>> versus Skylake).
>>>> For the same instruction set, the two manufacturers should both have a
>>>> working version of the hardware implementation. If any of the
>>>> implementations does not work, then the chip would not even be
>> considered
>>>> functioning properly.
>>>> If some AMD CPUs only support up to AVX2 instruction sets, they would
>> just
>>>> function in the same way as an Intel CPU that supports up to AVX2
>>>> instruction sets. The performance may vary, but the capability and
>> behavior
>>>> of the two chips would be the same when given the same machine code.
>>>> For AMD GPUs it's a totally different story, as AMD GPUs do not share
>> the
>>>> same instruction sets with the NVIDIA ones, thus testing on AMD GPUs(if
>> we
>>>> do have support for them) would definitely add values.
>>>> Hao
>>>> 
>>>> On Thu, Nov 29, 2018 at 8:37 PM Anirudh Subramanian <
>> anirudh2290@gmail.com
>>>>> 
>>>> wrote:
>>>> 
>>>>> Instruction set extensions support like AVX2, AVX512 etc. can vary
>>>> between
>>>>> AMD and Intel and there can also be a time lag between when Intel
>>>> supports
>>>>> it versus when AMD supports it.
>>>>> Also, in the future this setup may be useful in case MXNet supports AMD
>>>>> GPUs and AWS also happens to have support for it.
>>>>> 
>>>>> Anirudh
>>>>> 
>>>>> 
>>>>> On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
>>>>> <ma...@googlemail.com.invalid> wrote:
>>>>> 
>>>>>> I think it's worth a discussion to do a sanity check. While generally
>>>>> these
>>>>>> instructions are standardized, we also made the experience with ARM
>>>> that
>>>>>> the theory and reality sometimes don't match. Thus, it's always good
>> to
>>>>>> check.
>>>>>> 
>>>>>> In the next months we are going to refactor our slave creation
>>>> processes.
>>>>>> Chance Bair has been working on rewriting Windows slaves from scratch
>>>> (we
>>>>>> used images that haven't really been updated for 2 years - we still
>>>> don't
>>>>>> know what was done on them) and they're ready soon. In the following
>>>>>> months, we will also port our Ubuntu slaves to the new method (don't
>>>>> have a
>>>>>> timeline yet). Ideally, the integration of AMD instances will only be
>> a
>>>>>> matter of running the same pipeline on a different instance type. In
>>>> that
>>>>>> Case, it should not be a big deal.
>>>>>> 
>>>>>> If there are big differences, that's already a yellow flag for
>>>>>> compatibility, but that's unlikely. But in that case, we would have to
>>>>> make
>>>>>> a more thorough time analysis and whether it's worth the effort.
>> Maybe,
>>>>>> somebody else could also lend us a hand and help us with adding AMD
>>>>>> support.
>>>>>> 
>>>>>> -Marco
>>>>>> 
>>>>>> Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <hj...@gmail.com>
>>>>>> geschrieben:
>>>>>> 
>>>>>>> f16c is also an instruction set supported by both brands' recent CPUs
>>>>>> just
>>>>>>> like x86, AVX, SSE etc., and any difference in behaviors (quite
>>>>>> impossible
>>>>>>> to happen or it will be a major defect) would most likely be caused
>>>> by
>>>>>> the
>>>>>>> underlying hardware implementation, so still, adding AMD instances is
>>>>> not
>>>>>>> adding much value here.
>>>>>>> Hao
>>>>>>> 
>>>>>>> On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
>>>>>>> kellen.sunderland@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Just looked at the mf16c work and wanted to mention Rahul clearly
>>>>> _was_
>>>>>>>> thinking about AMD users in that PR.
>>>>>>>> 
>>>>>>>> On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
>>>>>>>> kellen.sunderland@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> From my perspective we're developing a few features like mf16c
>>>> and
>>>>>>> MKLDNN
>>>>>>>>> integration specifically for Intel CPUs.  It wouldn't hurt to
>>>> make
>>>>>> sure
>>>>>>>>> those changes also run properly on AMD cpus.
>>>>>>>>> 
>>>>>>>>> On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com
>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> I'm a bit confused about why we need extra functionality tests
>>>>> just
>>>>>>> for
>>>>>>>>>> AMD
>>>>>>>>>> CPUs, aren't AMD CPUs supporting roughly the same instruction
>>>> sets
>>>>>> as
>>>>>>>> the
>>>>>>>>>> Intel ones? In the very impossible case that something working
>>>> on
>>>>>>> Intel
>>>>>>>>>> CPUs being not functioning on AMD CPUs (or vice versa), it would
>>>>>>> mostly
>>>>>>>>>> likely be related to the underlying hardware implementation of
>>>> the
>>>>>>> same
>>>>>>>>>> ISA, to which we definitely do not have a good solution. So I
>>>>> don't
>>>>>>>> think
>>>>>>>>>> performing extra tests on functional aspect of the system on AMD
>>>>>> CPUs
>>>>>>> is
>>>>>>>>>> adding any values.
>>>>>>>>>> Hao
>>>>>>>>>> 
>>>>>>>>>> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
>>>>>> <sethman@amazon.com.invalid
>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> +1
>>>>>>>>>>> 
>>>>>>>>>>> On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>   What are people's thoughts on having AMD machines tested
>>>> on
>>>>>> the
>>>>>>>> CI?
>>>>>>>>>> AMD
>>>>>>>>>>>   machines are now available on AWS.
>>>>>>>>>>> 
>>>>>>>>>>>   Best,
>>>>>>>>>>>   Alex
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 

Re: Adding AMD CPU to CI

Posted by kellen sunderland <ke...@gmail.com>.
Damn, knew i should have double-checked!  Oh well it's also carbon neutral.

On Fri, Nov 30, 2018 at 10:27 AM Pedro Larroy <pe...@gmail.com>
wrote:

> Agee with Tianqi and Hao. Adding AMD brings no value and increases
> complexity and CI cost. The instructions sets are the same. For
> benchmarking it might make sense though.
>
> Pedro
>
> > On 30. Nov 2018, at 18:19, Tianqi Chen <tq...@cs.washington.edu> wrote:
> >
> > I still think it is overkill to add AMD CPU to the CI, given the
> additional
> > cost it could bring and little additional information we can get out from
> > it.
> >
> > A middle group is to add AMD CPU to a nightly build or final sweep before
> > release. If there is a case that we find that AMD CPU really makes a
> > difference, then we add it to the CI
> >
> > Tianqi
> >
> >> On Thu, Nov 29, 2018 at 6:29 PM Hao Jin <hj...@gmail.com> wrote:
> >>
> >> For CPUs, the supported instruction sets may also vary between the same
> >> manufacturer's different product lines of the same generation
> (Skylake-SP
> >> versus Skylake).
> >> For the same instruction set, the two manufacturers should both have a
> >> working version of the hardware implementation. If any of the
> >> implementations does not work, then the chip would not even be
> considered
> >> functioning properly.
> >> If some AMD CPUs only support up to AVX2 instruction sets, they would
> just
> >> function in the same way as an Intel CPU that supports up to AVX2
> >> instruction sets. The performance may vary, but the capability and
> behavior
> >> of the two chips would be the same when given the same machine code.
> >> For AMD GPUs it's a totally different story, as AMD GPUs do not share
> the
> >> same instruction sets with the NVIDIA ones, thus testing on AMD GPUs(if
> we
> >> do have support for them) would definitely add values.
> >> Hao
> >>
> >> On Thu, Nov 29, 2018 at 8:37 PM Anirudh Subramanian <
> anirudh2290@gmail.com
> >>>
> >> wrote:
> >>
> >>> Instruction set extensions support like AVX2, AVX512 etc. can vary
> >> between
> >>> AMD and Intel and there can also be a time lag between when Intel
> >> supports
> >>> it versus when AMD supports it.
> >>> Also, in the future this setup may be useful in case MXNet supports AMD
> >>> GPUs and AWS also happens to have support for it.
> >>>
> >>> Anirudh
> >>>
> >>>
> >>> On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
> >>> <ma...@googlemail.com.invalid> wrote:
> >>>
> >>>> I think it's worth a discussion to do a sanity check. While generally
> >>> these
> >>>> instructions are standardized, we also made the experience with ARM
> >> that
> >>>> the theory and reality sometimes don't match. Thus, it's always good
> to
> >>>> check.
> >>>>
> >>>> In the next months we are going to refactor our slave creation
> >> processes.
> >>>> Chance Bair has been working on rewriting Windows slaves from scratch
> >> (we
> >>>> used images that haven't really been updated for 2 years - we still
> >> don't
> >>>> know what was done on them) and they're ready soon. In the following
> >>>> months, we will also port our Ubuntu slaves to the new method (don't
> >>> have a
> >>>> timeline yet). Ideally, the integration of AMD instances will only be
> a
> >>>> matter of running the same pipeline on a different instance type. In
> >> that
> >>>> Case, it should not be a big deal.
> >>>>
> >>>> If there are big differences, that's already a yellow flag for
> >>>> compatibility, but that's unlikely. But in that case, we would have to
> >>> make
> >>>> a more thorough time analysis and whether it's worth the effort.
> Maybe,
> >>>> somebody else could also lend us a hand and help us with adding AMD
> >>>> support.
> >>>>
> >>>> -Marco
> >>>>
> >>>> Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <hj...@gmail.com>
> >>>> geschrieben:
> >>>>
> >>>>> f16c is also an instruction set supported by both brands' recent CPUs
> >>>> just
> >>>>> like x86, AVX, SSE etc., and any difference in behaviors (quite
> >>>> impossible
> >>>>> to happen or it will be a major defect) would most likely be caused
> >> by
> >>>> the
> >>>>> underlying hardware implementation, so still, adding AMD instances is
> >>> not
> >>>>> adding much value here.
> >>>>> Hao
> >>>>>
> >>>>> On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
> >>>>> kellen.sunderland@gmail.com> wrote:
> >>>>>
> >>>>>> Just looked at the mf16c work and wanted to mention Rahul clearly
> >>> _was_
> >>>>>> thinking about AMD users in that PR.
> >>>>>>
> >>>>>> On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
> >>>>>> kellen.sunderland@gmail.com> wrote:
> >>>>>>
> >>>>>>> From my perspective we're developing a few features like mf16c
> >> and
> >>>>> MKLDNN
> >>>>>>> integration specifically for Intel CPUs.  It wouldn't hurt to
> >> make
> >>>> sure
> >>>>>>> those changes also run properly on AMD cpus.
> >>>>>>>
> >>>>>>> On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com
> >> wrote:
> >>>>>>>
> >>>>>>>> I'm a bit confused about why we need extra functionality tests
> >>> just
> >>>>> for
> >>>>>>>> AMD
> >>>>>>>> CPUs, aren't AMD CPUs supporting roughly the same instruction
> >> sets
> >>>> as
> >>>>>> the
> >>>>>>>> Intel ones? In the very impossible case that something working
> >> on
> >>>>> Intel
> >>>>>>>> CPUs being not functioning on AMD CPUs (or vice versa), it would
> >>>>> mostly
> >>>>>>>> likely be related to the underlying hardware implementation of
> >> the
> >>>>> same
> >>>>>>>> ISA, to which we definitely do not have a good solution. So I
> >>> don't
> >>>>>> think
> >>>>>>>> performing extra tests on functional aspect of the system on AMD
> >>>> CPUs
> >>>>> is
> >>>>>>>> adding any values.
> >>>>>>>> Hao
> >>>>>>>>
> >>>>>>>> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
> >>>> <sethman@amazon.com.invalid
> >>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> +1
> >>>>>>>>>
> >>>>>>>>> On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>    What are people's thoughts on having AMD machines tested
> >> on
> >>>> the
> >>>>>> CI?
> >>>>>>>> AMD
> >>>>>>>>>    machines are now available on AWS.
> >>>>>>>>>
> >>>>>>>>>    Best,
> >>>>>>>>>    Alex
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>

Re: Adding AMD CPU to CI

Posted by Pedro Larroy <pe...@gmail.com>.
Agee with Tianqi and Hao. Adding AMD brings no value and increases complexity and CI cost. The instructions sets are the same. For benchmarking it might make sense though.

Pedro

> On 30. Nov 2018, at 18:19, Tianqi Chen <tq...@cs.washington.edu> wrote:
> 
> I still think it is overkill to add AMD CPU to the CI, given the additional
> cost it could bring and little additional information we can get out from
> it.
> 
> A middle group is to add AMD CPU to a nightly build or final sweep before
> release. If there is a case that we find that AMD CPU really makes a
> difference, then we add it to the CI
> 
> Tianqi
> 
>> On Thu, Nov 29, 2018 at 6:29 PM Hao Jin <hj...@gmail.com> wrote:
>> 
>> For CPUs, the supported instruction sets may also vary between the same
>> manufacturer's different product lines of the same generation (Skylake-SP
>> versus Skylake).
>> For the same instruction set, the two manufacturers should both have a
>> working version of the hardware implementation. If any of the
>> implementations does not work, then the chip would not even be considered
>> functioning properly.
>> If some AMD CPUs only support up to AVX2 instruction sets, they would just
>> function in the same way as an Intel CPU that supports up to AVX2
>> instruction sets. The performance may vary, but the capability and behavior
>> of the two chips would be the same when given the same machine code.
>> For AMD GPUs it's a totally different story, as AMD GPUs do not share the
>> same instruction sets with the NVIDIA ones, thus testing on AMD GPUs(if we
>> do have support for them) would definitely add values.
>> Hao
>> 
>> On Thu, Nov 29, 2018 at 8:37 PM Anirudh Subramanian <anirudh2290@gmail.com
>>> 
>> wrote:
>> 
>>> Instruction set extensions support like AVX2, AVX512 etc. can vary
>> between
>>> AMD and Intel and there can also be a time lag between when Intel
>> supports
>>> it versus when AMD supports it.
>>> Also, in the future this setup may be useful in case MXNet supports AMD
>>> GPUs and AWS also happens to have support for it.
>>> 
>>> Anirudh
>>> 
>>> 
>>> On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
>>> <ma...@googlemail.com.invalid> wrote:
>>> 
>>>> I think it's worth a discussion to do a sanity check. While generally
>>> these
>>>> instructions are standardized, we also made the experience with ARM
>> that
>>>> the theory and reality sometimes don't match. Thus, it's always good to
>>>> check.
>>>> 
>>>> In the next months we are going to refactor our slave creation
>> processes.
>>>> Chance Bair has been working on rewriting Windows slaves from scratch
>> (we
>>>> used images that haven't really been updated for 2 years - we still
>> don't
>>>> know what was done on them) and they're ready soon. In the following
>>>> months, we will also port our Ubuntu slaves to the new method (don't
>>> have a
>>>> timeline yet). Ideally, the integration of AMD instances will only be a
>>>> matter of running the same pipeline on a different instance type. In
>> that
>>>> Case, it should not be a big deal.
>>>> 
>>>> If there are big differences, that's already a yellow flag for
>>>> compatibility, but that's unlikely. But in that case, we would have to
>>> make
>>>> a more thorough time analysis and whether it's worth the effort. Maybe,
>>>> somebody else could also lend us a hand and help us with adding AMD
>>>> support.
>>>> 
>>>> -Marco
>>>> 
>>>> Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <hj...@gmail.com>
>>>> geschrieben:
>>>> 
>>>>> f16c is also an instruction set supported by both brands' recent CPUs
>>>> just
>>>>> like x86, AVX, SSE etc., and any difference in behaviors (quite
>>>> impossible
>>>>> to happen or it will be a major defect) would most likely be caused
>> by
>>>> the
>>>>> underlying hardware implementation, so still, adding AMD instances is
>>> not
>>>>> adding much value here.
>>>>> Hao
>>>>> 
>>>>> On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
>>>>> kellen.sunderland@gmail.com> wrote:
>>>>> 
>>>>>> Just looked at the mf16c work and wanted to mention Rahul clearly
>>> _was_
>>>>>> thinking about AMD users in that PR.
>>>>>> 
>>>>>> On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
>>>>>> kellen.sunderland@gmail.com> wrote:
>>>>>> 
>>>>>>> From my perspective we're developing a few features like mf16c
>> and
>>>>> MKLDNN
>>>>>>> integration specifically for Intel CPUs.  It wouldn't hurt to
>> make
>>>> sure
>>>>>>> those changes also run properly on AMD cpus.
>>>>>>> 
>>>>>>> On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com
>> wrote:
>>>>>>> 
>>>>>>>> I'm a bit confused about why we need extra functionality tests
>>> just
>>>>> for
>>>>>>>> AMD
>>>>>>>> CPUs, aren't AMD CPUs supporting roughly the same instruction
>> sets
>>>> as
>>>>>> the
>>>>>>>> Intel ones? In the very impossible case that something working
>> on
>>>>> Intel
>>>>>>>> CPUs being not functioning on AMD CPUs (or vice versa), it would
>>>>> mostly
>>>>>>>> likely be related to the underlying hardware implementation of
>> the
>>>>> same
>>>>>>>> ISA, to which we definitely do not have a good solution. So I
>>> don't
>>>>>> think
>>>>>>>> performing extra tests on functional aspect of the system on AMD
>>>> CPUs
>>>>> is
>>>>>>>> adding any values.
>>>>>>>> Hao
>>>>>>>> 
>>>>>>>> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
>>>> <sethman@amazon.com.invalid
>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> +1
>>>>>>>>> 
>>>>>>>>> On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>>    What are people's thoughts on having AMD machines tested
>> on
>>>> the
>>>>>> CI?
>>>>>>>> AMD
>>>>>>>>>    machines are now available on AWS.
>>>>>>>>> 
>>>>>>>>>    Best,
>>>>>>>>>    Alex
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Re: Adding AMD CPU to CI

Posted by Tianqi Chen <tq...@cs.washington.edu>.
I still think it is overkill to add AMD CPU to the CI, given the additional
cost it could bring and little additional information we can get out from
it.

A middle group is to add AMD CPU to a nightly build or final sweep before
release. If there is a case that we find that AMD CPU really makes a
difference, then we add it to the CI

Tianqi

On Thu, Nov 29, 2018 at 6:29 PM Hao Jin <hj...@gmail.com> wrote:

> For CPUs, the supported instruction sets may also vary between the same
> manufacturer's different product lines of the same generation (Skylake-SP
> versus Skylake).
> For the same instruction set, the two manufacturers should both have a
> working version of the hardware implementation. If any of the
> implementations does not work, then the chip would not even be considered
> functioning properly.
> If some AMD CPUs only support up to AVX2 instruction sets, they would just
> function in the same way as an Intel CPU that supports up to AVX2
> instruction sets. The performance may vary, but the capability and behavior
> of the two chips would be the same when given the same machine code.
> For AMD GPUs it's a totally different story, as AMD GPUs do not share the
> same instruction sets with the NVIDIA ones, thus testing on AMD GPUs(if we
> do have support for them) would definitely add values.
> Hao
>
> On Thu, Nov 29, 2018 at 8:37 PM Anirudh Subramanian <anirudh2290@gmail.com
> >
> wrote:
>
> > Instruction set extensions support like AVX2, AVX512 etc. can vary
> between
> > AMD and Intel and there can also be a time lag between when Intel
> supports
> > it versus when AMD supports it.
> > Also, in the future this setup may be useful in case MXNet supports AMD
> > GPUs and AWS also happens to have support for it.
> >
> > Anirudh
> >
> >
> > On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
> > <ma...@googlemail.com.invalid> wrote:
> >
> > > I think it's worth a discussion to do a sanity check. While generally
> > these
> > > instructions are standardized, we also made the experience with ARM
> that
> > > the theory and reality sometimes don't match. Thus, it's always good to
> > > check.
> > >
> > > In the next months we are going to refactor our slave creation
> processes.
> > > Chance Bair has been working on rewriting Windows slaves from scratch
> (we
> > > used images that haven't really been updated for 2 years - we still
> don't
> > > know what was done on them) and they're ready soon. In the following
> > > months, we will also port our Ubuntu slaves to the new method (don't
> > have a
> > > timeline yet). Ideally, the integration of AMD instances will only be a
> > > matter of running the same pipeline on a different instance type. In
> that
> > > Case, it should not be a big deal.
> > >
> > > If there are big differences, that's already a yellow flag for
> > > compatibility, but that's unlikely. But in that case, we would have to
> > make
> > > a more thorough time analysis and whether it's worth the effort. Maybe,
> > > somebody else could also lend us a hand and help us with adding AMD
> > > support.
> > >
> > > -Marco
> > >
> > > Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <hj...@gmail.com>
> > > geschrieben:
> > >
> > > > f16c is also an instruction set supported by both brands' recent CPUs
> > > just
> > > > like x86, AVX, SSE etc., and any difference in behaviors (quite
> > > impossible
> > > > to happen or it will be a major defect) would most likely be caused
> by
> > > the
> > > > underlying hardware implementation, so still, adding AMD instances is
> > not
> > > > adding much value here.
> > > > Hao
> > > >
> > > > On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
> > > > kellen.sunderland@gmail.com> wrote:
> > > >
> > > > > Just looked at the mf16c work and wanted to mention Rahul clearly
> > _was_
> > > > > thinking about AMD users in that PR.
> > > > >
> > > > > On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
> > > > > kellen.sunderland@gmail.com> wrote:
> > > > >
> > > > > > From my perspective we're developing a few features like mf16c
> and
> > > > MKLDNN
> > > > > > integration specifically for Intel CPUs.  It wouldn't hurt to
> make
> > > sure
> > > > > > those changes also run properly on AMD cpus.
> > > > > >
> > > > > > On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com
> wrote:
> > > > > >
> > > > > >> I'm a bit confused about why we need extra functionality tests
> > just
> > > > for
> > > > > >> AMD
> > > > > >> CPUs, aren't AMD CPUs supporting roughly the same instruction
> sets
> > > as
> > > > > the
> > > > > >> Intel ones? In the very impossible case that something working
> on
> > > > Intel
> > > > > >> CPUs being not functioning on AMD CPUs (or vice versa), it would
> > > > mostly
> > > > > >> likely be related to the underlying hardware implementation of
> the
> > > > same
> > > > > >> ISA, to which we definitely do not have a good solution. So I
> > don't
> > > > > think
> > > > > >> performing extra tests on functional aspect of the system on AMD
> > > CPUs
> > > > is
> > > > > >> adding any values.
> > > > > >> Hao
> > > > > >>
> > > > > >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
> > > <sethman@amazon.com.invalid
> > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > +1
> > > > > >> >
> > > > > >> > On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:
> > > > > >> >
> > > > > >> >     What are people's thoughts on having AMD machines tested
> on
> > > the
> > > > > CI?
> > > > > >> AMD
> > > > > >> >     machines are now available on AWS.
> > > > > >> >
> > > > > >> >     Best,
> > > > > >> >     Alex
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Adding AMD CPU to CI

Posted by Hao Jin <hj...@gmail.com>.
For CPUs, the supported instruction sets may also vary between the same
manufacturer's different product lines of the same generation (Skylake-SP
versus Skylake).
For the same instruction set, the two manufacturers should both have a
working version of the hardware implementation. If any of the
implementations does not work, then the chip would not even be considered
functioning properly.
If some AMD CPUs only support up to AVX2 instruction sets, they would just
function in the same way as an Intel CPU that supports up to AVX2
instruction sets. The performance may vary, but the capability and behavior
of the two chips would be the same when given the same machine code.
For AMD GPUs it's a totally different story, as AMD GPUs do not share the
same instruction sets with the NVIDIA ones, thus testing on AMD GPUs(if we
do have support for them) would definitely add values.
Hao

On Thu, Nov 29, 2018 at 8:37 PM Anirudh Subramanian <an...@gmail.com>
wrote:

> Instruction set extensions support like AVX2, AVX512 etc. can vary between
> AMD and Intel and there can also be a time lag between when Intel supports
> it versus when AMD supports it.
> Also, in the future this setup may be useful in case MXNet supports AMD
> GPUs and AWS also happens to have support for it.
>
> Anirudh
>
>
> On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
> <ma...@googlemail.com.invalid> wrote:
>
> > I think it's worth a discussion to do a sanity check. While generally
> these
> > instructions are standardized, we also made the experience with ARM that
> > the theory and reality sometimes don't match. Thus, it's always good to
> > check.
> >
> > In the next months we are going to refactor our slave creation processes.
> > Chance Bair has been working on rewriting Windows slaves from scratch (we
> > used images that haven't really been updated for 2 years - we still don't
> > know what was done on them) and they're ready soon. In the following
> > months, we will also port our Ubuntu slaves to the new method (don't
> have a
> > timeline yet). Ideally, the integration of AMD instances will only be a
> > matter of running the same pipeline on a different instance type. In that
> > Case, it should not be a big deal.
> >
> > If there are big differences, that's already a yellow flag for
> > compatibility, but that's unlikely. But in that case, we would have to
> make
> > a more thorough time analysis and whether it's worth the effort. Maybe,
> > somebody else could also lend us a hand and help us with adding AMD
> > support.
> >
> > -Marco
> >
> > Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <hj...@gmail.com>
> > geschrieben:
> >
> > > f16c is also an instruction set supported by both brands' recent CPUs
> > just
> > > like x86, AVX, SSE etc., and any difference in behaviors (quite
> > impossible
> > > to happen or it will be a major defect) would most likely be caused by
> > the
> > > underlying hardware implementation, so still, adding AMD instances is
> not
> > > adding much value here.
> > > Hao
> > >
> > > On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
> > > kellen.sunderland@gmail.com> wrote:
> > >
> > > > Just looked at the mf16c work and wanted to mention Rahul clearly
> _was_
> > > > thinking about AMD users in that PR.
> > > >
> > > > On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
> > > > kellen.sunderland@gmail.com> wrote:
> > > >
> > > > > From my perspective we're developing a few features like mf16c and
> > > MKLDNN
> > > > > integration specifically for Intel CPUs.  It wouldn't hurt to make
> > sure
> > > > > those changes also run properly on AMD cpus.
> > > > >
> > > > > On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com wrote:
> > > > >
> > > > >> I'm a bit confused about why we need extra functionality tests
> just
> > > for
> > > > >> AMD
> > > > >> CPUs, aren't AMD CPUs supporting roughly the same instruction sets
> > as
> > > > the
> > > > >> Intel ones? In the very impossible case that something working on
> > > Intel
> > > > >> CPUs being not functioning on AMD CPUs (or vice versa), it would
> > > mostly
> > > > >> likely be related to the underlying hardware implementation of the
> > > same
> > > > >> ISA, to which we definitely do not have a good solution. So I
> don't
> > > > think
> > > > >> performing extra tests on functional aspect of the system on AMD
> > CPUs
> > > is
> > > > >> adding any values.
> > > > >> Hao
> > > > >>
> > > > >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
> > <sethman@amazon.com.invalid
> > > >
> > > > >> wrote:
> > > > >>
> > > > >> > +1
> > > > >> >
> > > > >> > On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:
> > > > >> >
> > > > >> >     What are people's thoughts on having AMD machines tested on
> > the
> > > > CI?
> > > > >> AMD
> > > > >> >     machines are now available on AWS.
> > > > >> >
> > > > >> >     Best,
> > > > >> >     Alex
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Adding AMD CPU to CI

Posted by Marco de Abreu <ma...@googlemail.com.INVALID>.
Kellen we run CI in us-west-2, Oregon :P sorry, Environment :(

-Marco

Am Fr., 30. Nov. 2018, 18:58 hat kellen sunderland <
kellen.sunderland@gmail.com> geschrieben:

> +1 to nightly.
>
> Given the awesome results shown by Alex for AMD cpus I think MKLDNN
> actually would probably be something I'd use, even on my AMD machines.
> Kudos to Intel for releasing this lib which works great on their hardware,
> but still pretty well w/ AMD.  The upshot of MKLDNN supporting AMD to me is
> that it makes me much more likely to support it as the default PyPi package
> (discussed in another thread).  This is part of the reason I'd like to have
> a sanity test in CI somewhere for AMD hardware.
>
> Unrelated note: regarding global warming I actually partially chose
> eu-west-1 to host CI because it's carbon neutral.  The cost of the CI is
> significant, and although it's donated by AWS I'm glad the community is
> cognizant of that.
>
> On Fri, Nov 30, 2018 at 9:54 AM Kumar, Vikas <vi...@amazon.com.invalid>
> wrote:
>
> > I concur. +1 for nightly for pre-release suit.
> >
> > On 11/30/18, 9:49 AM, "Tianqi Chen" <tq...@cs.washington.edu> wrote:
> >
> >     +1 for nightly for pre-release suit, but not the CI that triggered in
> > every
> >     test.  The best engineering practice is not to add things, but to
> > remove
> >     things so that there is nothing can be removed.
> >
> >     In terms of MLDNN, since it is an Intel product, I doubt optimizing
> > for AMD
> >     CPUs is its goal, adding CI to guard against backward compatibility
> is
> > a
> >     bit overkill even. Since the AMD CPU user would likely disable this
> > feature
> >     and use the original CPU version of the project.
> >
> >     At least we can contribute to reducing the carbon footprint and slows
> > down
> >     the global warming :)
> >
> >     Tianqi
> >
> >     On Fri, Nov 30, 2018 at 9:38 AM kellen sunderland <
> >     kellen.sunderland@gmail.com> wrote:
> >
> >     > Regarding cost, yes we could run this nightly or simply make it run
> > an
> >     > existing test suite that would make sense rather than having it
> > duplicate a
> >     > suite.
> >     >
> >     > On Fri, Nov 30, 2018 at 9:26 AM Kumar, Vikas
> > <vi...@amazon.com.invalid>
> >     > wrote:
> >     >
> >     > > I don't think there is any downside to this proposal. I think a
> > basic
> >     > > sanity CI testing on AMD processors will give extra boost to our
> > tests.
> >     > > This adds to developer productivity and they have one less thing
> > to worry
> >     > > about. Developers have spent time in past where they had to
> > manually test
> >     > > on AMD  processors, MKLDNN being the recent instance. It's good
> to
> > have
> >     > > those test in CI pipeline.
> >     > > All I see is benefit. If the $ cost is not too high for basic
> > sanity
> >     > > testing, we should do this, until and unless some strong downside
> > is
> >     > called
> >     > > out.
> >     > >
> >     > > +1
> >     > >
> >     > >
> >     > > On 11/29/18, 5:37 PM, "Anirudh Subramanian" <
> anirudh2290@gmail.com
> > >
> >     > > wrote:
> >     > >
> >     > >     Instruction set extensions support like AVX2, AVX512 etc. can
> > vary
> >     > > between
> >     > >     AMD and Intel and there can also be a time lag between when
> > Intel
> >     > > supports
> >     > >     it versus when AMD supports it.
> >     > >     Also, in the future this setup may be useful in case MXNet
> > supports
> >     > AMD
> >     > >     GPUs and AWS also happens to have support for it.
> >     > >
> >     > >     Anirudh
> >     > >
> >     > >
> >     > >     On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
> >     > >     <ma...@googlemail.com.invalid> wrote:
> >     > >
> >     > >     > I think it's worth a discussion to do a sanity check. While
> >     > > generally these
> >     > >     > instructions are standardized, we also made the experience
> > with ARM
> >     > > that
> >     > >     > the theory and reality sometimes don't match. Thus, it's
> > always
> >     > good
> >     > > to
> >     > >     > check.
> >     > >     >
> >     > >     > In the next months we are going to refactor our slave
> > creation
> >     > > processes.
> >     > >     > Chance Bair has been working on rewriting Windows slaves
> from
> >     > > scratch (we
> >     > >     > used images that haven't really been updated for 2 years -
> > we still
> >     > > don't
> >     > >     > know what was done on them) and they're ready soon. In the
> >     > following
> >     > >     > months, we will also port our Ubuntu slaves to the new
> method
> >     > (don't
> >     > > have a
> >     > >     > timeline yet). Ideally, the integration of AMD instances
> > will only
> >     > > be a
> >     > >     > matter of running the same pipeline on a different instance
> > type.
> >     > In
> >     > > that
> >     > >     > Case, it should not be a big deal.
> >     > >     >
> >     > >     > If there are big differences, that's already a yellow flag
> > for
> >     > >     > compatibility, but that's unlikely. But in that case, we
> > would have
> >     > > to make
> >     > >     > a more thorough time analysis and whether it's worth the
> > effort.
> >     > > Maybe,
> >     > >     > somebody else could also lend us a hand and help us with
> > adding AMD
> >     > >     > support.
> >     > >     >
> >     > >     > -Marco
> >     > >     >
> >     > >     > Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <
> > hjjn.amzn@gmail.com>
> >     > >     > geschrieben:
> >     > >     >
> >     > >     > > f16c is also an instruction set supported by both brands'
> > recent
> >     > > CPUs
> >     > >     > just
> >     > >     > > like x86, AVX, SSE etc., and any difference in behaviors
> > (quite
> >     > >     > impossible
> >     > >     > > to happen or it will be a major defect) would most likely
> > be
> >     > > caused by
> >     > >     > the
> >     > >     > > underlying hardware implementation, so still, adding AMD
> >     > instances
> >     > > is not
> >     > >     > > adding much value here.
> >     > >     > > Hao
> >     > >     > >
> >     > >     > > On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
> >     > >     > > kellen.sunderland@gmail.com> wrote:
> >     > >     > >
> >     > >     > > > Just looked at the mf16c work and wanted to mention
> Rahul
> >     > > clearly _was_
> >     > >     > > > thinking about AMD users in that PR.
> >     > >     > > >
> >     > >     > > > On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
> >     > >     > > > kellen.sunderland@gmail.com> wrote:
> >     > >     > > >
> >     > >     > > > > From my perspective we're developing a few features
> > like
> >     > mf16c
> >     > > and
> >     > >     > > MKLDNN
> >     > >     > > > > integration specifically for Intel CPUs.  It wouldn't
> > hurt to
> >     > > make
> >     > >     > sure
> >     > >     > > > > those changes also run properly on AMD cpus.
> >     > >     > > > >
> >     > >     > > > > On Thu, Nov 29, 2018, 3:38 PM Hao Jin <
> > hjjn.amzn@gmail.com
> >     > > wrote:
> >     > >     > > > >
> >     > >     > > > >> I'm a bit confused about why we need extra
> > functionality
> >     > > tests just
> >     > >     > > for
> >     > >     > > > >> AMD
> >     > >     > > > >> CPUs, aren't AMD CPUs supporting roughly the same
> >     > instruction
> >     > > sets
> >     > >     > as
> >     > >     > > > the
> >     > >     > > > >> Intel ones? In the very impossible case that
> something
> >     > > working on
> >     > >     > > Intel
> >     > >     > > > >> CPUs being not functioning on AMD CPUs (or vice
> > versa), it
> >     > > would
> >     > >     > > mostly
> >     > >     > > > >> likely be related to the underlying hardware
> > implementation
> >     > > of the
> >     > >     > > same
> >     > >     > > > >> ISA, to which we definitely do not have a good
> > solution. So
> >     > I
> >     > > don't
> >     > >     > > > think
> >     > >     > > > >> performing extra tests on functional aspect of the
> > system on
> >     > > AMD
> >     > >     > CPUs
> >     > >     > > is
> >     > >     > > > >> adding any values.
> >     > >     > > > >> Hao
> >     > >     > > > >>
> >     > >     > > > >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
> >     > >     > <sethman@amazon.com.invalid
> >     > >     > > >
> >     > >     > > > >> wrote:
> >     > >     > > > >>
> >     > >     > > > >> > +1
> >     > >     > > > >> >
> >     > >     > > > >> > On 11/29/18, 2:39 PM, "Alex Zai" <
> azai91@gmail.com>
> >     > wrote:
> >     > >     > > > >> >
> >     > >     > > > >> >     What are people's thoughts on having AMD
> > machines
> >     > > tested on
> >     > >     > the
> >     > >     > > > CI?
> >     > >     > > > >> AMD
> >     > >     > > > >> >     machines are now available on AWS.
> >     > >     > > > >> >
> >     > >     > > > >> >     Best,
> >     > >     > > > >> >     Alex
> >     > >     > > > >> >
> >     > >     > > > >> >
> >     > >     > > > >> >
> >     > >     > > > >>
> >     > >     > > > >
> >     > >     > > >
> >     > >     > >
> >     > >     >
> >     > >
> >     > >
> >     > >
> >     >
> >
> >
> >
>

Re: Adding AMD CPU to CI

Posted by kellen sunderland <ke...@gmail.com>.
+1 to nightly.

Given the awesome results shown by Alex for AMD cpus I think MKLDNN
actually would probably be something I'd use, even on my AMD machines.
Kudos to Intel for releasing this lib which works great on their hardware,
but still pretty well w/ AMD.  The upshot of MKLDNN supporting AMD to me is
that it makes me much more likely to support it as the default PyPi package
(discussed in another thread).  This is part of the reason I'd like to have
a sanity test in CI somewhere for AMD hardware.

Unrelated note: regarding global warming I actually partially chose
eu-west-1 to host CI because it's carbon neutral.  The cost of the CI is
significant, and although it's donated by AWS I'm glad the community is
cognizant of that.

On Fri, Nov 30, 2018 at 9:54 AM Kumar, Vikas <vi...@amazon.com.invalid>
wrote:

> I concur. +1 for nightly for pre-release suit.
>
> On 11/30/18, 9:49 AM, "Tianqi Chen" <tq...@cs.washington.edu> wrote:
>
>     +1 for nightly for pre-release suit, but not the CI that triggered in
> every
>     test.  The best engineering practice is not to add things, but to
> remove
>     things so that there is nothing can be removed.
>
>     In terms of MLDNN, since it is an Intel product, I doubt optimizing
> for AMD
>     CPUs is its goal, adding CI to guard against backward compatibility is
> a
>     bit overkill even. Since the AMD CPU user would likely disable this
> feature
>     and use the original CPU version of the project.
>
>     At least we can contribute to reducing the carbon footprint and slows
> down
>     the global warming :)
>
>     Tianqi
>
>     On Fri, Nov 30, 2018 at 9:38 AM kellen sunderland <
>     kellen.sunderland@gmail.com> wrote:
>
>     > Regarding cost, yes we could run this nightly or simply make it run
> an
>     > existing test suite that would make sense rather than having it
> duplicate a
>     > suite.
>     >
>     > On Fri, Nov 30, 2018 at 9:26 AM Kumar, Vikas
> <vi...@amazon.com.invalid>
>     > wrote:
>     >
>     > > I don't think there is any downside to this proposal. I think a
> basic
>     > > sanity CI testing on AMD processors will give extra boost to our
> tests.
>     > > This adds to developer productivity and they have one less thing
> to worry
>     > > about. Developers have spent time in past where they had to
> manually test
>     > > on AMD  processors, MKLDNN being the recent instance. It's good to
> have
>     > > those test in CI pipeline.
>     > > All I see is benefit. If the $ cost is not too high for basic
> sanity
>     > > testing, we should do this, until and unless some strong downside
> is
>     > called
>     > > out.
>     > >
>     > > +1
>     > >
>     > >
>     > > On 11/29/18, 5:37 PM, "Anirudh Subramanian" <anirudh2290@gmail.com
> >
>     > > wrote:
>     > >
>     > >     Instruction set extensions support like AVX2, AVX512 etc. can
> vary
>     > > between
>     > >     AMD and Intel and there can also be a time lag between when
> Intel
>     > > supports
>     > >     it versus when AMD supports it.
>     > >     Also, in the future this setup may be useful in case MXNet
> supports
>     > AMD
>     > >     GPUs and AWS also happens to have support for it.
>     > >
>     > >     Anirudh
>     > >
>     > >
>     > >     On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
>     > >     <ma...@googlemail.com.invalid> wrote:
>     > >
>     > >     > I think it's worth a discussion to do a sanity check. While
>     > > generally these
>     > >     > instructions are standardized, we also made the experience
> with ARM
>     > > that
>     > >     > the theory and reality sometimes don't match. Thus, it's
> always
>     > good
>     > > to
>     > >     > check.
>     > >     >
>     > >     > In the next months we are going to refactor our slave
> creation
>     > > processes.
>     > >     > Chance Bair has been working on rewriting Windows slaves from
>     > > scratch (we
>     > >     > used images that haven't really been updated for 2 years -
> we still
>     > > don't
>     > >     > know what was done on them) and they're ready soon. In the
>     > following
>     > >     > months, we will also port our Ubuntu slaves to the new method
>     > (don't
>     > > have a
>     > >     > timeline yet). Ideally, the integration of AMD instances
> will only
>     > > be a
>     > >     > matter of running the same pipeline on a different instance
> type.
>     > In
>     > > that
>     > >     > Case, it should not be a big deal.
>     > >     >
>     > >     > If there are big differences, that's already a yellow flag
> for
>     > >     > compatibility, but that's unlikely. But in that case, we
> would have
>     > > to make
>     > >     > a more thorough time analysis and whether it's worth the
> effort.
>     > > Maybe,
>     > >     > somebody else could also lend us a hand and help us with
> adding AMD
>     > >     > support.
>     > >     >
>     > >     > -Marco
>     > >     >
>     > >     > Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <
> hjjn.amzn@gmail.com>
>     > >     > geschrieben:
>     > >     >
>     > >     > > f16c is also an instruction set supported by both brands'
> recent
>     > > CPUs
>     > >     > just
>     > >     > > like x86, AVX, SSE etc., and any difference in behaviors
> (quite
>     > >     > impossible
>     > >     > > to happen or it will be a major defect) would most likely
> be
>     > > caused by
>     > >     > the
>     > >     > > underlying hardware implementation, so still, adding AMD
>     > instances
>     > > is not
>     > >     > > adding much value here.
>     > >     > > Hao
>     > >     > >
>     > >     > > On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
>     > >     > > kellen.sunderland@gmail.com> wrote:
>     > >     > >
>     > >     > > > Just looked at the mf16c work and wanted to mention Rahul
>     > > clearly _was_
>     > >     > > > thinking about AMD users in that PR.
>     > >     > > >
>     > >     > > > On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
>     > >     > > > kellen.sunderland@gmail.com> wrote:
>     > >     > > >
>     > >     > > > > From my perspective we're developing a few features
> like
>     > mf16c
>     > > and
>     > >     > > MKLDNN
>     > >     > > > > integration specifically for Intel CPUs.  It wouldn't
> hurt to
>     > > make
>     > >     > sure
>     > >     > > > > those changes also run properly on AMD cpus.
>     > >     > > > >
>     > >     > > > > On Thu, Nov 29, 2018, 3:38 PM Hao Jin <
> hjjn.amzn@gmail.com
>     > > wrote:
>     > >     > > > >
>     > >     > > > >> I'm a bit confused about why we need extra
> functionality
>     > > tests just
>     > >     > > for
>     > >     > > > >> AMD
>     > >     > > > >> CPUs, aren't AMD CPUs supporting roughly the same
>     > instruction
>     > > sets
>     > >     > as
>     > >     > > > the
>     > >     > > > >> Intel ones? In the very impossible case that something
>     > > working on
>     > >     > > Intel
>     > >     > > > >> CPUs being not functioning on AMD CPUs (or vice
> versa), it
>     > > would
>     > >     > > mostly
>     > >     > > > >> likely be related to the underlying hardware
> implementation
>     > > of the
>     > >     > > same
>     > >     > > > >> ISA, to which we definitely do not have a good
> solution. So
>     > I
>     > > don't
>     > >     > > > think
>     > >     > > > >> performing extra tests on functional aspect of the
> system on
>     > > AMD
>     > >     > CPUs
>     > >     > > is
>     > >     > > > >> adding any values.
>     > >     > > > >> Hao
>     > >     > > > >>
>     > >     > > > >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
>     > >     > <sethman@amazon.com.invalid
>     > >     > > >
>     > >     > > > >> wrote:
>     > >     > > > >>
>     > >     > > > >> > +1
>     > >     > > > >> >
>     > >     > > > >> > On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com>
>     > wrote:
>     > >     > > > >> >
>     > >     > > > >> >     What are people's thoughts on having AMD
> machines
>     > > tested on
>     > >     > the
>     > >     > > > CI?
>     > >     > > > >> AMD
>     > >     > > > >> >     machines are now available on AWS.
>     > >     > > > >> >
>     > >     > > > >> >     Best,
>     > >     > > > >> >     Alex
>     > >     > > > >> >
>     > >     > > > >> >
>     > >     > > > >> >
>     > >     > > > >>
>     > >     > > > >
>     > >     > > >
>     > >     > >
>     > >     >
>     > >
>     > >
>     > >
>     >
>
>
>

Re: Adding AMD CPU to CI

Posted by "Kumar, Vikas" <vi...@amazon.com.INVALID>.
I concur. +1 for nightly for pre-release suit. 

On 11/30/18, 9:49 AM, "Tianqi Chen" <tq...@cs.washington.edu> wrote:

    +1 for nightly for pre-release suit, but not the CI that triggered in every
    test.  The best engineering practice is not to add things, but to remove
    things so that there is nothing can be removed.
    
    In terms of MLDNN, since it is an Intel product, I doubt optimizing for AMD
    CPUs is its goal, adding CI to guard against backward compatibility is a
    bit overkill even. Since the AMD CPU user would likely disable this feature
    and use the original CPU version of the project.
    
    At least we can contribute to reducing the carbon footprint and slows down
    the global warming :)
    
    Tianqi
    
    On Fri, Nov 30, 2018 at 9:38 AM kellen sunderland <
    kellen.sunderland@gmail.com> wrote:
    
    > Regarding cost, yes we could run this nightly or simply make it run an
    > existing test suite that would make sense rather than having it duplicate a
    > suite.
    >
    > On Fri, Nov 30, 2018 at 9:26 AM Kumar, Vikas <vi...@amazon.com.invalid>
    > wrote:
    >
    > > I don't think there is any downside to this proposal. I think a basic
    > > sanity CI testing on AMD processors will give extra boost to our tests.
    > > This adds to developer productivity and they have one less thing to worry
    > > about. Developers have spent time in past where they had to manually test
    > > on AMD  processors, MKLDNN being the recent instance. It's good to have
    > > those test in CI pipeline.
    > > All I see is benefit. If the $ cost is not too high for basic sanity
    > > testing, we should do this, until and unless some strong downside is
    > called
    > > out.
    > >
    > > +1
    > >
    > >
    > > On 11/29/18, 5:37 PM, "Anirudh Subramanian" <an...@gmail.com>
    > > wrote:
    > >
    > >     Instruction set extensions support like AVX2, AVX512 etc. can vary
    > > between
    > >     AMD and Intel and there can also be a time lag between when Intel
    > > supports
    > >     it versus when AMD supports it.
    > >     Also, in the future this setup may be useful in case MXNet supports
    > AMD
    > >     GPUs and AWS also happens to have support for it.
    > >
    > >     Anirudh
    > >
    > >
    > >     On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
    > >     <ma...@googlemail.com.invalid> wrote:
    > >
    > >     > I think it's worth a discussion to do a sanity check. While
    > > generally these
    > >     > instructions are standardized, we also made the experience with ARM
    > > that
    > >     > the theory and reality sometimes don't match. Thus, it's always
    > good
    > > to
    > >     > check.
    > >     >
    > >     > In the next months we are going to refactor our slave creation
    > > processes.
    > >     > Chance Bair has been working on rewriting Windows slaves from
    > > scratch (we
    > >     > used images that haven't really been updated for 2 years - we still
    > > don't
    > >     > know what was done on them) and they're ready soon. In the
    > following
    > >     > months, we will also port our Ubuntu slaves to the new method
    > (don't
    > > have a
    > >     > timeline yet). Ideally, the integration of AMD instances will only
    > > be a
    > >     > matter of running the same pipeline on a different instance type.
    > In
    > > that
    > >     > Case, it should not be a big deal.
    > >     >
    > >     > If there are big differences, that's already a yellow flag for
    > >     > compatibility, but that's unlikely. But in that case, we would have
    > > to make
    > >     > a more thorough time analysis and whether it's worth the effort.
    > > Maybe,
    > >     > somebody else could also lend us a hand and help us with adding AMD
    > >     > support.
    > >     >
    > >     > -Marco
    > >     >
    > >     > Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <hj...@gmail.com>
    > >     > geschrieben:
    > >     >
    > >     > > f16c is also an instruction set supported by both brands' recent
    > > CPUs
    > >     > just
    > >     > > like x86, AVX, SSE etc., and any difference in behaviors (quite
    > >     > impossible
    > >     > > to happen or it will be a major defect) would most likely be
    > > caused by
    > >     > the
    > >     > > underlying hardware implementation, so still, adding AMD
    > instances
    > > is not
    > >     > > adding much value here.
    > >     > > Hao
    > >     > >
    > >     > > On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
    > >     > > kellen.sunderland@gmail.com> wrote:
    > >     > >
    > >     > > > Just looked at the mf16c work and wanted to mention Rahul
    > > clearly _was_
    > >     > > > thinking about AMD users in that PR.
    > >     > > >
    > >     > > > On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
    > >     > > > kellen.sunderland@gmail.com> wrote:
    > >     > > >
    > >     > > > > From my perspective we're developing a few features like
    > mf16c
    > > and
    > >     > > MKLDNN
    > >     > > > > integration specifically for Intel CPUs.  It wouldn't hurt to
    > > make
    > >     > sure
    > >     > > > > those changes also run properly on AMD cpus.
    > >     > > > >
    > >     > > > > On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com
    > > wrote:
    > >     > > > >
    > >     > > > >> I'm a bit confused about why we need extra functionality
    > > tests just
    > >     > > for
    > >     > > > >> AMD
    > >     > > > >> CPUs, aren't AMD CPUs supporting roughly the same
    > instruction
    > > sets
    > >     > as
    > >     > > > the
    > >     > > > >> Intel ones? In the very impossible case that something
    > > working on
    > >     > > Intel
    > >     > > > >> CPUs being not functioning on AMD CPUs (or vice versa), it
    > > would
    > >     > > mostly
    > >     > > > >> likely be related to the underlying hardware implementation
    > > of the
    > >     > > same
    > >     > > > >> ISA, to which we definitely do not have a good solution. So
    > I
    > > don't
    > >     > > > think
    > >     > > > >> performing extra tests on functional aspect of the system on
    > > AMD
    > >     > CPUs
    > >     > > is
    > >     > > > >> adding any values.
    > >     > > > >> Hao
    > >     > > > >>
    > >     > > > >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
    > >     > <sethman@amazon.com.invalid
    > >     > > >
    > >     > > > >> wrote:
    > >     > > > >>
    > >     > > > >> > +1
    > >     > > > >> >
    > >     > > > >> > On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com>
    > wrote:
    > >     > > > >> >
    > >     > > > >> >     What are people's thoughts on having AMD machines
    > > tested on
    > >     > the
    > >     > > > CI?
    > >     > > > >> AMD
    > >     > > > >> >     machines are now available on AWS.
    > >     > > > >> >
    > >     > > > >> >     Best,
    > >     > > > >> >     Alex
    > >     > > > >> >
    > >     > > > >> >
    > >     > > > >> >
    > >     > > > >>
    > >     > > > >
    > >     > > >
    > >     > >
    > >     >
    > >
    > >
    > >
    >
    


Re: Adding AMD CPU to CI

Posted by Tianqi Chen <tq...@cs.washington.edu>.
+1 for nightly for pre-release suit, but not the CI that triggered in every
test.  The best engineering practice is not to add things, but to remove
things so that there is nothing can be removed.

In terms of MLDNN, since it is an Intel product, I doubt optimizing for AMD
CPUs is its goal, adding CI to guard against backward compatibility is a
bit overkill even. Since the AMD CPU user would likely disable this feature
and use the original CPU version of the project.

At least we can contribute to reducing the carbon footprint and slows down
the global warming :)

Tianqi

On Fri, Nov 30, 2018 at 9:38 AM kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> Regarding cost, yes we could run this nightly or simply make it run an
> existing test suite that would make sense rather than having it duplicate a
> suite.
>
> On Fri, Nov 30, 2018 at 9:26 AM Kumar, Vikas <vi...@amazon.com.invalid>
> wrote:
>
> > I don't think there is any downside to this proposal. I think a basic
> > sanity CI testing on AMD processors will give extra boost to our tests.
> > This adds to developer productivity and they have one less thing to worry
> > about. Developers have spent time in past where they had to manually test
> > on AMD  processors, MKLDNN being the recent instance. It's good to have
> > those test in CI pipeline.
> > All I see is benefit. If the $ cost is not too high for basic sanity
> > testing, we should do this, until and unless some strong downside is
> called
> > out.
> >
> > +1
> >
> >
> > On 11/29/18, 5:37 PM, "Anirudh Subramanian" <an...@gmail.com>
> > wrote:
> >
> >     Instruction set extensions support like AVX2, AVX512 etc. can vary
> > between
> >     AMD and Intel and there can also be a time lag between when Intel
> > supports
> >     it versus when AMD supports it.
> >     Also, in the future this setup may be useful in case MXNet supports
> AMD
> >     GPUs and AWS also happens to have support for it.
> >
> >     Anirudh
> >
> >
> >     On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
> >     <ma...@googlemail.com.invalid> wrote:
> >
> >     > I think it's worth a discussion to do a sanity check. While
> > generally these
> >     > instructions are standardized, we also made the experience with ARM
> > that
> >     > the theory and reality sometimes don't match. Thus, it's always
> good
> > to
> >     > check.
> >     >
> >     > In the next months we are going to refactor our slave creation
> > processes.
> >     > Chance Bair has been working on rewriting Windows slaves from
> > scratch (we
> >     > used images that haven't really been updated for 2 years - we still
> > don't
> >     > know what was done on them) and they're ready soon. In the
> following
> >     > months, we will also port our Ubuntu slaves to the new method
> (don't
> > have a
> >     > timeline yet). Ideally, the integration of AMD instances will only
> > be a
> >     > matter of running the same pipeline on a different instance type.
> In
> > that
> >     > Case, it should not be a big deal.
> >     >
> >     > If there are big differences, that's already a yellow flag for
> >     > compatibility, but that's unlikely. But in that case, we would have
> > to make
> >     > a more thorough time analysis and whether it's worth the effort.
> > Maybe,
> >     > somebody else could also lend us a hand and help us with adding AMD
> >     > support.
> >     >
> >     > -Marco
> >     >
> >     > Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <hj...@gmail.com>
> >     > geschrieben:
> >     >
> >     > > f16c is also an instruction set supported by both brands' recent
> > CPUs
> >     > just
> >     > > like x86, AVX, SSE etc., and any difference in behaviors (quite
> >     > impossible
> >     > > to happen or it will be a major defect) would most likely be
> > caused by
> >     > the
> >     > > underlying hardware implementation, so still, adding AMD
> instances
> > is not
> >     > > adding much value here.
> >     > > Hao
> >     > >
> >     > > On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
> >     > > kellen.sunderland@gmail.com> wrote:
> >     > >
> >     > > > Just looked at the mf16c work and wanted to mention Rahul
> > clearly _was_
> >     > > > thinking about AMD users in that PR.
> >     > > >
> >     > > > On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
> >     > > > kellen.sunderland@gmail.com> wrote:
> >     > > >
> >     > > > > From my perspective we're developing a few features like
> mf16c
> > and
> >     > > MKLDNN
> >     > > > > integration specifically for Intel CPUs.  It wouldn't hurt to
> > make
> >     > sure
> >     > > > > those changes also run properly on AMD cpus.
> >     > > > >
> >     > > > > On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com
> > wrote:
> >     > > > >
> >     > > > >> I'm a bit confused about why we need extra functionality
> > tests just
> >     > > for
> >     > > > >> AMD
> >     > > > >> CPUs, aren't AMD CPUs supporting roughly the same
> instruction
> > sets
> >     > as
> >     > > > the
> >     > > > >> Intel ones? In the very impossible case that something
> > working on
> >     > > Intel
> >     > > > >> CPUs being not functioning on AMD CPUs (or vice versa), it
> > would
> >     > > mostly
> >     > > > >> likely be related to the underlying hardware implementation
> > of the
> >     > > same
> >     > > > >> ISA, to which we definitely do not have a good solution. So
> I
> > don't
> >     > > > think
> >     > > > >> performing extra tests on functional aspect of the system on
> > AMD
> >     > CPUs
> >     > > is
> >     > > > >> adding any values.
> >     > > > >> Hao
> >     > > > >>
> >     > > > >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
> >     > <sethman@amazon.com.invalid
> >     > > >
> >     > > > >> wrote:
> >     > > > >>
> >     > > > >> > +1
> >     > > > >> >
> >     > > > >> > On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com>
> wrote:
> >     > > > >> >
> >     > > > >> >     What are people's thoughts on having AMD machines
> > tested on
> >     > the
> >     > > > CI?
> >     > > > >> AMD
> >     > > > >> >     machines are now available on AWS.
> >     > > > >> >
> >     > > > >> >     Best,
> >     > > > >> >     Alex
> >     > > > >> >
> >     > > > >> >
> >     > > > >> >
> >     > > > >>
> >     > > > >
> >     > > >
> >     > >
> >     >
> >
> >
> >
>

Re: Adding AMD CPU to CI

Posted by kellen sunderland <ke...@gmail.com>.
Regarding cost, yes we could run this nightly or simply make it run an
existing test suite that would make sense rather than having it duplicate a
suite.

On Fri, Nov 30, 2018 at 9:26 AM Kumar, Vikas <vi...@amazon.com.invalid>
wrote:

> I don't think there is any downside to this proposal. I think a basic
> sanity CI testing on AMD processors will give extra boost to our tests.
> This adds to developer productivity and they have one less thing to worry
> about. Developers have spent time in past where they had to manually test
> on AMD  processors, MKLDNN being the recent instance. It's good to have
> those test in CI pipeline.
> All I see is benefit. If the $ cost is not too high for basic sanity
> testing, we should do this, until and unless some strong downside is called
> out.
>
> +1
>
>
> On 11/29/18, 5:37 PM, "Anirudh Subramanian" <an...@gmail.com>
> wrote:
>
>     Instruction set extensions support like AVX2, AVX512 etc. can vary
> between
>     AMD and Intel and there can also be a time lag between when Intel
> supports
>     it versus when AMD supports it.
>     Also, in the future this setup may be useful in case MXNet supports AMD
>     GPUs and AWS also happens to have support for it.
>
>     Anirudh
>
>
>     On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
>     <ma...@googlemail.com.invalid> wrote:
>
>     > I think it's worth a discussion to do a sanity check. While
> generally these
>     > instructions are standardized, we also made the experience with ARM
> that
>     > the theory and reality sometimes don't match. Thus, it's always good
> to
>     > check.
>     >
>     > In the next months we are going to refactor our slave creation
> processes.
>     > Chance Bair has been working on rewriting Windows slaves from
> scratch (we
>     > used images that haven't really been updated for 2 years - we still
> don't
>     > know what was done on them) and they're ready soon. In the following
>     > months, we will also port our Ubuntu slaves to the new method (don't
> have a
>     > timeline yet). Ideally, the integration of AMD instances will only
> be a
>     > matter of running the same pipeline on a different instance type. In
> that
>     > Case, it should not be a big deal.
>     >
>     > If there are big differences, that's already a yellow flag for
>     > compatibility, but that's unlikely. But in that case, we would have
> to make
>     > a more thorough time analysis and whether it's worth the effort.
> Maybe,
>     > somebody else could also lend us a hand and help us with adding AMD
>     > support.
>     >
>     > -Marco
>     >
>     > Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <hj...@gmail.com>
>     > geschrieben:
>     >
>     > > f16c is also an instruction set supported by both brands' recent
> CPUs
>     > just
>     > > like x86, AVX, SSE etc., and any difference in behaviors (quite
>     > impossible
>     > > to happen or it will be a major defect) would most likely be
> caused by
>     > the
>     > > underlying hardware implementation, so still, adding AMD instances
> is not
>     > > adding much value here.
>     > > Hao
>     > >
>     > > On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
>     > > kellen.sunderland@gmail.com> wrote:
>     > >
>     > > > Just looked at the mf16c work and wanted to mention Rahul
> clearly _was_
>     > > > thinking about AMD users in that PR.
>     > > >
>     > > > On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
>     > > > kellen.sunderland@gmail.com> wrote:
>     > > >
>     > > > > From my perspective we're developing a few features like mf16c
> and
>     > > MKLDNN
>     > > > > integration specifically for Intel CPUs.  It wouldn't hurt to
> make
>     > sure
>     > > > > those changes also run properly on AMD cpus.
>     > > > >
>     > > > > On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com
> wrote:
>     > > > >
>     > > > >> I'm a bit confused about why we need extra functionality
> tests just
>     > > for
>     > > > >> AMD
>     > > > >> CPUs, aren't AMD CPUs supporting roughly the same instruction
> sets
>     > as
>     > > > the
>     > > > >> Intel ones? In the very impossible case that something
> working on
>     > > Intel
>     > > > >> CPUs being not functioning on AMD CPUs (or vice versa), it
> would
>     > > mostly
>     > > > >> likely be related to the underlying hardware implementation
> of the
>     > > same
>     > > > >> ISA, to which we definitely do not have a good solution. So I
> don't
>     > > > think
>     > > > >> performing extra tests on functional aspect of the system on
> AMD
>     > CPUs
>     > > is
>     > > > >> adding any values.
>     > > > >> Hao
>     > > > >>
>     > > > >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
>     > <sethman@amazon.com.invalid
>     > > >
>     > > > >> wrote:
>     > > > >>
>     > > > >> > +1
>     > > > >> >
>     > > > >> > On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:
>     > > > >> >
>     > > > >> >     What are people's thoughts on having AMD machines
> tested on
>     > the
>     > > > CI?
>     > > > >> AMD
>     > > > >> >     machines are now available on AWS.
>     > > > >> >
>     > > > >> >     Best,
>     > > > >> >     Alex
>     > > > >> >
>     > > > >> >
>     > > > >> >
>     > > > >>
>     > > > >
>     > > >
>     > >
>     >
>
>
>

Re: Adding AMD CPU to CI

Posted by "Kumar, Vikas" <vi...@amazon.com.INVALID>.
I don't think there is any downside to this proposal. I think a basic sanity CI testing on AMD processors will give extra boost to our tests. This adds to developer productivity and they have one less thing to worry about. Developers have spent time in past where they had to manually test on AMD  processors, MKLDNN being the recent instance. It's good to have those test in CI pipeline.
All I see is benefit. If the $ cost is not too high for basic sanity testing, we should do this, until and unless some strong downside is called out.

+1
 

On 11/29/18, 5:37 PM, "Anirudh Subramanian" <an...@gmail.com> wrote:

    Instruction set extensions support like AVX2, AVX512 etc. can vary between
    AMD and Intel and there can also be a time lag between when Intel supports
    it versus when AMD supports it.
    Also, in the future this setup may be useful in case MXNet supports AMD
    GPUs and AWS also happens to have support for it.
    
    Anirudh
    
    
    On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
    <ma...@googlemail.com.invalid> wrote:
    
    > I think it's worth a discussion to do a sanity check. While generally these
    > instructions are standardized, we also made the experience with ARM that
    > the theory and reality sometimes don't match. Thus, it's always good to
    > check.
    >
    > In the next months we are going to refactor our slave creation processes.
    > Chance Bair has been working on rewriting Windows slaves from scratch (we
    > used images that haven't really been updated for 2 years - we still don't
    > know what was done on them) and they're ready soon. In the following
    > months, we will also port our Ubuntu slaves to the new method (don't have a
    > timeline yet). Ideally, the integration of AMD instances will only be a
    > matter of running the same pipeline on a different instance type. In that
    > Case, it should not be a big deal.
    >
    > If there are big differences, that's already a yellow flag for
    > compatibility, but that's unlikely. But in that case, we would have to make
    > a more thorough time analysis and whether it's worth the effort. Maybe,
    > somebody else could also lend us a hand and help us with adding AMD
    > support.
    >
    > -Marco
    >
    > Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <hj...@gmail.com>
    > geschrieben:
    >
    > > f16c is also an instruction set supported by both brands' recent CPUs
    > just
    > > like x86, AVX, SSE etc., and any difference in behaviors (quite
    > impossible
    > > to happen or it will be a major defect) would most likely be caused by
    > the
    > > underlying hardware implementation, so still, adding AMD instances is not
    > > adding much value here.
    > > Hao
    > >
    > > On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
    > > kellen.sunderland@gmail.com> wrote:
    > >
    > > > Just looked at the mf16c work and wanted to mention Rahul clearly _was_
    > > > thinking about AMD users in that PR.
    > > >
    > > > On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
    > > > kellen.sunderland@gmail.com> wrote:
    > > >
    > > > > From my perspective we're developing a few features like mf16c and
    > > MKLDNN
    > > > > integration specifically for Intel CPUs.  It wouldn't hurt to make
    > sure
    > > > > those changes also run properly on AMD cpus.
    > > > >
    > > > > On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com wrote:
    > > > >
    > > > >> I'm a bit confused about why we need extra functionality tests just
    > > for
    > > > >> AMD
    > > > >> CPUs, aren't AMD CPUs supporting roughly the same instruction sets
    > as
    > > > the
    > > > >> Intel ones? In the very impossible case that something working on
    > > Intel
    > > > >> CPUs being not functioning on AMD CPUs (or vice versa), it would
    > > mostly
    > > > >> likely be related to the underlying hardware implementation of the
    > > same
    > > > >> ISA, to which we definitely do not have a good solution. So I don't
    > > > think
    > > > >> performing extra tests on functional aspect of the system on AMD
    > CPUs
    > > is
    > > > >> adding any values.
    > > > >> Hao
    > > > >>
    > > > >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
    > <sethman@amazon.com.invalid
    > > >
    > > > >> wrote:
    > > > >>
    > > > >> > +1
    > > > >> >
    > > > >> > On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:
    > > > >> >
    > > > >> >     What are people's thoughts on having AMD machines tested on
    > the
    > > > CI?
    > > > >> AMD
    > > > >> >     machines are now available on AWS.
    > > > >> >
    > > > >> >     Best,
    > > > >> >     Alex
    > > > >> >
    > > > >> >
    > > > >> >
    > > > >>
    > > > >
    > > >
    > >
    >
    


Re: Adding AMD CPU to CI

Posted by Anirudh Subramanian <an...@gmail.com>.
Instruction set extensions support like AVX2, AVX512 etc. can vary between
AMD and Intel and there can also be a time lag between when Intel supports
it versus when AMD supports it.
Also, in the future this setup may be useful in case MXNet supports AMD
GPUs and AWS also happens to have support for it.

Anirudh


On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
<ma...@googlemail.com.invalid> wrote:

> I think it's worth a discussion to do a sanity check. While generally these
> instructions are standardized, we also made the experience with ARM that
> the theory and reality sometimes don't match. Thus, it's always good to
> check.
>
> In the next months we are going to refactor our slave creation processes.
> Chance Bair has been working on rewriting Windows slaves from scratch (we
> used images that haven't really been updated for 2 years - we still don't
> know what was done on them) and they're ready soon. In the following
> months, we will also port our Ubuntu slaves to the new method (don't have a
> timeline yet). Ideally, the integration of AMD instances will only be a
> matter of running the same pipeline on a different instance type. In that
> Case, it should not be a big deal.
>
> If there are big differences, that's already a yellow flag for
> compatibility, but that's unlikely. But in that case, we would have to make
> a more thorough time analysis and whether it's worth the effort. Maybe,
> somebody else could also lend us a hand and help us with adding AMD
> support.
>
> -Marco
>
> Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <hj...@gmail.com>
> geschrieben:
>
> > f16c is also an instruction set supported by both brands' recent CPUs
> just
> > like x86, AVX, SSE etc., and any difference in behaviors (quite
> impossible
> > to happen or it will be a major defect) would most likely be caused by
> the
> > underlying hardware implementation, so still, adding AMD instances is not
> > adding much value here.
> > Hao
> >
> > On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
> > kellen.sunderland@gmail.com> wrote:
> >
> > > Just looked at the mf16c work and wanted to mention Rahul clearly _was_
> > > thinking about AMD users in that PR.
> > >
> > > On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
> > > kellen.sunderland@gmail.com> wrote:
> > >
> > > > From my perspective we're developing a few features like mf16c and
> > MKLDNN
> > > > integration specifically for Intel CPUs.  It wouldn't hurt to make
> sure
> > > > those changes also run properly on AMD cpus.
> > > >
> > > > On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com wrote:
> > > >
> > > >> I'm a bit confused about why we need extra functionality tests just
> > for
> > > >> AMD
> > > >> CPUs, aren't AMD CPUs supporting roughly the same instruction sets
> as
> > > the
> > > >> Intel ones? In the very impossible case that something working on
> > Intel
> > > >> CPUs being not functioning on AMD CPUs (or vice versa), it would
> > mostly
> > > >> likely be related to the underlying hardware implementation of the
> > same
> > > >> ISA, to which we definitely do not have a good solution. So I don't
> > > think
> > > >> performing extra tests on functional aspect of the system on AMD
> CPUs
> > is
> > > >> adding any values.
> > > >> Hao
> > > >>
> > > >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
> <sethman@amazon.com.invalid
> > >
> > > >> wrote:
> > > >>
> > > >> > +1
> > > >> >
> > > >> > On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:
> > > >> >
> > > >> >     What are people's thoughts on having AMD machines tested on
> the
> > > CI?
> > > >> AMD
> > > >> >     machines are now available on AWS.
> > > >> >
> > > >> >     Best,
> > > >> >     Alex
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: Adding AMD CPU to CI

Posted by Marco de Abreu <ma...@googlemail.com.INVALID>.
I think it's worth a discussion to do a sanity check. While generally these
instructions are standardized, we also made the experience with ARM that
the theory and reality sometimes don't match. Thus, it's always good to
check.

In the next months we are going to refactor our slave creation processes.
Chance Bair has been working on rewriting Windows slaves from scratch (we
used images that haven't really been updated for 2 years - we still don't
know what was done on them) and they're ready soon. In the following
months, we will also port our Ubuntu slaves to the new method (don't have a
timeline yet). Ideally, the integration of AMD instances will only be a
matter of running the same pipeline on a different instance type. In that
Case, it should not be a big deal.

If there are big differences, that's already a yellow flag for
compatibility, but that's unlikely. But in that case, we would have to make
a more thorough time analysis and whether it's worth the effort. Maybe,
somebody else could also lend us a hand and help us with adding AMD support.

-Marco

Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin <hj...@gmail.com> geschrieben:

> f16c is also an instruction set supported by both brands' recent CPUs just
> like x86, AVX, SSE etc., and any difference in behaviors (quite impossible
> to happen or it will be a major defect) would most likely be caused by the
> underlying hardware implementation, so still, adding AMD instances is not
> adding much value here.
> Hao
>
> On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > Just looked at the mf16c work and wanted to mention Rahul clearly _was_
> > thinking about AMD users in that PR.
> >
> > On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
> > kellen.sunderland@gmail.com> wrote:
> >
> > > From my perspective we're developing a few features like mf16c and
> MKLDNN
> > > integration specifically for Intel CPUs.  It wouldn't hurt to make sure
> > > those changes also run properly on AMD cpus.
> > >
> > > On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com wrote:
> > >
> > >> I'm a bit confused about why we need extra functionality tests just
> for
> > >> AMD
> > >> CPUs, aren't AMD CPUs supporting roughly the same instruction sets as
> > the
> > >> Intel ones? In the very impossible case that something working on
> Intel
> > >> CPUs being not functioning on AMD CPUs (or vice versa), it would
> mostly
> > >> likely be related to the underlying hardware implementation of the
> same
> > >> ISA, to which we definitely do not have a good solution. So I don't
> > think
> > >> performing extra tests on functional aspect of the system on AMD CPUs
> is
> > >> adding any values.
> > >> Hao
> > >>
> > >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu <sethman@amazon.com.invalid
> >
> > >> wrote:
> > >>
> > >> > +1
> > >> >
> > >> > On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:
> > >> >
> > >> >     What are people's thoughts on having AMD machines tested on the
> > CI?
> > >> AMD
> > >> >     machines are now available on AWS.
> > >> >
> > >> >     Best,
> > >> >     Alex
> > >> >
> > >> >
> > >> >
> > >>
> > >
> >
>

Re: Adding AMD CPU to CI

Posted by Hao Jin <hj...@gmail.com>.
f16c is also an instruction set supported by both brands' recent CPUs just
like x86, AVX, SSE etc., and any difference in behaviors (quite impossible
to happen or it will be a major defect) would most likely be caused by the
underlying hardware implementation, so still, adding AMD instances is not
adding much value here.
Hao

On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> Just looked at the mf16c work and wanted to mention Rahul clearly _was_
> thinking about AMD users in that PR.
>
> On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > From my perspective we're developing a few features like mf16c and MKLDNN
> > integration specifically for Intel CPUs.  It wouldn't hurt to make sure
> > those changes also run properly on AMD cpus.
> >
> > On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com wrote:
> >
> >> I'm a bit confused about why we need extra functionality tests just for
> >> AMD
> >> CPUs, aren't AMD CPUs supporting roughly the same instruction sets as
> the
> >> Intel ones? In the very impossible case that something working on Intel
> >> CPUs being not functioning on AMD CPUs (or vice versa), it would mostly
> >> likely be related to the underlying hardware implementation of the same
> >> ISA, to which we definitely do not have a good solution. So I don't
> think
> >> performing extra tests on functional aspect of the system on AMD CPUs is
> >> adding any values.
> >> Hao
> >>
> >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu <se...@amazon.com.invalid>
> >> wrote:
> >>
> >> > +1
> >> >
> >> > On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:
> >> >
> >> >     What are people's thoughts on having AMD machines tested on the
> CI?
> >> AMD
> >> >     machines are now available on AWS.
> >> >
> >> >     Best,
> >> >     Alex
> >> >
> >> >
> >> >
> >>
> >
>

Re: Adding AMD CPU to CI

Posted by Rahul Huilgol <ra...@gmail.com>.
+1
I do think it would be valuable to add an AMD step to our CI. As we
continue to improve performance, we might have to consider more
instructions which are faster but are specific to the hardware
architecture. We are doing a lot of Intel specific work, it would be a good
sanity check that we continue to support AMD.


On Thu, Nov 29, 2018 at 4:03 PM kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> Just looked at the mf16c work and wanted to mention Rahul clearly _was_
> thinking about AMD users in that PR.
>
> On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > From my perspective we're developing a few features like mf16c and MKLDNN
> > integration specifically for Intel CPUs.  It wouldn't hurt to make sure
> > those changes also run properly on AMD cpus.
> >
> > On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com wrote:
> >
> >> I'm a bit confused about why we need extra functionality tests just for
> >> AMD
> >> CPUs, aren't AMD CPUs supporting roughly the same instruction sets as
> the
> >> Intel ones? In the very impossible case that something working on Intel
> >> CPUs being not functioning on AMD CPUs (or vice versa), it would mostly
> >> likely be related to the underlying hardware implementation of the same
> >> ISA, to which we definitely do not have a good solution. So I don't
> think
> >> performing extra tests on functional aspect of the system on AMD CPUs is
> >> adding any values.
> >> Hao
> >>
> >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu <se...@amazon.com.invalid>
> >> wrote:
> >>
> >> > +1
> >> >
> >> > On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:
> >> >
> >> >     What are people's thoughts on having AMD machines tested on the
> CI?
> >> AMD
> >> >     machines are now available on AWS.
> >> >
> >> >     Best,
> >> >     Alex
> >> >
> >> >
> >> >
> >>
> >
>


-- 
Rahul Huilgol

Re: Adding AMD CPU to CI

Posted by kellen sunderland <ke...@gmail.com>.
Just looked at the mf16c work and wanted to mention Rahul clearly _was_
thinking about AMD users in that PR.

On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> From my perspective we're developing a few features like mf16c and MKLDNN
> integration specifically for Intel CPUs.  It wouldn't hurt to make sure
> those changes also run properly on AMD cpus.
>
> On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com wrote:
>
>> I'm a bit confused about why we need extra functionality tests just for
>> AMD
>> CPUs, aren't AMD CPUs supporting roughly the same instruction sets as the
>> Intel ones? In the very impossible case that something working on Intel
>> CPUs being not functioning on AMD CPUs (or vice versa), it would mostly
>> likely be related to the underlying hardware implementation of the same
>> ISA, to which we definitely do not have a good solution. So I don't think
>> performing extra tests on functional aspect of the system on AMD CPUs is
>> adding any values.
>> Hao
>>
>> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu <se...@amazon.com.invalid>
>> wrote:
>>
>> > +1
>> >
>> > On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:
>> >
>> >     What are people's thoughts on having AMD machines tested on the CI?
>> AMD
>> >     machines are now available on AWS.
>> >
>> >     Best,
>> >     Alex
>> >
>> >
>> >
>>
>

Re: Adding AMD CPU to CI

Posted by kellen sunderland <ke...@gmail.com>.
From my perspective we're developing a few features like mf16c and MKLDNN
integration specifically for Intel CPUs.  It wouldn't hurt to make sure
those changes also run properly on AMD cpus.

On Thu, Nov 29, 2018, 3:38 PM Hao Jin <hjjn.amzn@gmail.com wrote:

> I'm a bit confused about why we need extra functionality tests just for AMD
> CPUs, aren't AMD CPUs supporting roughly the same instruction sets as the
> Intel ones? In the very impossible case that something working on Intel
> CPUs being not functioning on AMD CPUs (or vice versa), it would mostly
> likely be related to the underlying hardware implementation of the same
> ISA, to which we definitely do not have a good solution. So I don't think
> performing extra tests on functional aspect of the system on AMD CPUs is
> adding any values.
> Hao
>
> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu <se...@amazon.com.invalid>
> wrote:
>
> > +1
> >
> > On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:
> >
> >     What are people's thoughts on having AMD machines tested on the CI?
> AMD
> >     machines are now available on AWS.
> >
> >     Best,
> >     Alex
> >
> >
> >
>

Re: Adding AMD CPU to CI

Posted by Hao Jin <hj...@gmail.com>.
I'm a bit confused about why we need extra functionality tests just for AMD
CPUs, aren't AMD CPUs supporting roughly the same instruction sets as the
Intel ones? In the very impossible case that something working on Intel
CPUs being not functioning on AMD CPUs (or vice versa), it would mostly
likely be related to the underlying hardware implementation of the same
ISA, to which we definitely do not have a good solution. So I don't think
performing extra tests on functional aspect of the system on AMD CPUs is
adding any values.
Hao

On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu <se...@amazon.com.invalid>
wrote:

> +1
>
> On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:
>
>     What are people's thoughts on having AMD machines tested on the CI? AMD
>     machines are now available on AWS.
>
>     Best,
>     Alex
>
>
>

Re: Adding AMD CPU to CI

Posted by "Seth, Manu" <se...@amazon.com.INVALID>.
+1

On 11/29/18, 2:39 PM, "Alex Zai" <az...@gmail.com> wrote:

    What are people's thoughts on having AMD machines tested on the CI? AMD
    machines are now available on AWS.
    
    Best,
    Alex
    


Re: Adding AMD CPU to CI

Posted by Anirudh Subramanian <an...@gmail.com>.
+1

On Thu, Nov 29, 2018 at 2:38 PM Alex Zai <az...@gmail.com> wrote:

> What are people's thoughts on having AMD machines tested on the CI? AMD
> machines are now available on AWS.
>
> Best,
> Alex
>