You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mxnet.apache.org by Sam Skalicky <no...@github.com> on 2020/07/27 20:24:05 UTC

[apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Hi MXNet community,

Now that the 1.7.0 release development is closed (and in the midst of the release process), I wanted to start a discussion around another 1.x based release. There are many users that will continue to us 1.x for the foreseeable future while the community transitions to 2.x. Some examples are those who are using toolkits (ie. GluonCV/NLP/etc.) that are pinned to a 1.x version of MXNet. 

Are there features that we want to make available in a 1.8.0 release while the 2.0 transition is ongoing?

I would like to propose the following features in a 1.8.0 release:
- CUDA 11 and cuDNN 8.0
- Additional custom graph pass enhancements (#18779)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by bartekkuncer <no...@github.com.INVALID>.

Is fix for lacking mkldnn headers: https://github.com/apache/incubator-mxnet/pull/18608 included?

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-733702596

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by m-ky <no...@github.com.INVALID>.

Can https://github.com/apache/incubator-mxnet/pull/19410 be merged?

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-733697168

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Ofer Hasson <no...@github.com.INVALID>.

Thanks for that

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-757005757

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Fedor <no...@github.com.INVALID>.

@samskalicky @stu1130 @szha hello, please, can you tell any approximate plans for release 1.8.0 to pip? 

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-791409001

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by bgawrych <no...@github.com.INVALID>.

Can we add fix for ElemwiseSum too? : https://github.com/apache/incubator-mxnet/pull/19200

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-697159021

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sam Skalicky <no...@github.com>.

> * [Fix Bug] set upper bound infinity for conv, resize, upsampling, etc.
>   #18857
>   It will allow users to create convolutional layers with large number of filters and large spatial kernel.

Hey @wkcn can you create a PR to backport this change on v1.x?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-672271586

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sam Skalicky <no...@github.com.INVALID>.

> @samskalicky is there a timeline for mac wheels being available on pip? I noticed the post0 release of 1.8 for Linux on 3/30, but Mac wheels are still missing.

@fhieber currently we only release Linux wheels as part of our release process. The Windows and Mac wheels are currently built by other community members (@yajiedesign for Windows, @szha for Mac) outside of the normal release process. At some point in the future we hope to integrate the building of these into the release process but where not there yet. 

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-813465016

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sam Skalicky <no...@github.com.INVALID>.

The current status is here:

https://lists.apache.org/thread.html/r0bcc91647e8d199eb138e13644aa765167402e778227dbf7c6d50a84%40%3Cdev.mxnet.apache.org%3E

On Fri, Jan 8, 2021 at 2:16 AM Ofer Hasson <no...@github.com> wrote:

> Any timeline updates ?
> The tables at
> https://cwiki.apache.org/confluence/display/MXNET/1.8.0+Release+Plan+and+Status
> seems out of date
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-756673106>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AALYO2KACGRPLL42LA64HJTSY3LORANCNFSM4PJHMTFA>
> .
>


-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-756999827

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Thiam Khean Hah <no...@github.com>.

I would also like to see the "Duplicate subgraph input and output" fix in MXNet.18
Specifically, this PR https://github.com/apache/incubator-mxnet/pull/16131.

If I am not mistaken that PR had some test failures.
However, I have a different implementation that is passing.
Can we make this a goal for MXNet 1.8 as well ?

I can do the PR or fix the old one if we approve it for 1.8


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-675586998

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sam Skalicky <no...@github.com>.

> Any plan to fix MXNET_BACKWARD_DO_MIRROR, or to fix the docs since it is currently not supported by GLuon. I think MXNET_BACKWARD_DO_MIRROR is very important especially in object detection and sematic segmentation. Without that, even the backbone is resnet50, the im_per_gpu can only be set to 1 if the scale is 800 times 1333.

@kohillyang or @szha do you want to open a PR with this fix for v1.8?

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-685258826

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sam Skalicky <no...@github.com>.

Just to summarize, we're all on board with including #18218 in v1.8. 

Just wanted to clarify that this PR makes the CI more stable, but doesnt help users with the same problem after they install MXNet on windows since thrush is a dependency that gets installed with whatever version of CUDA a user has setup on their machine. If we can document this issue too that would be helpful, but is separate from getting #18218 backported to v1.x.

So @stu1130 go ahead and create the backport PR and we'll track that for v1.8. Thanks!

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-677832573

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by JackieWu <no...@github.com>.

Hi @samskalicky , I created a PR #18910 to backport it : )

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-672531959

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Jake Lee <no...@github.com>.

@leezu @samskalicky I agree to add a doc for this, will raise the PR later. But I would prefer to include the code so user don't need to patch it manually and for DJL team we don't forget to patch it unless the patch cause another problem

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-677829977

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by kohillyang <no...@github.com>.

@szha  I found that training with mx.mod.Module setting MXNET_BACKWARD_DO_MIRROR to 1 takes more GPU memory than Gluon HybridBlock. Because if setting MXNET_BACKWARD_DO_MIRROR to 1, MXNET_USE_FUSION must be also set to 1 because it seems that relu has been fused. Does it mean that Gluon does not need MXNET_BACKWARD_DO_MIRROR? Or we can't generate Symbol from HybridBlock and must write a network with pure symbol API?

I test the memory consuming with the following codes:
```python
import mxnet as mx
import mxnet.autograd as ag


class NaiveDataset(object):
    def __len__(self):
        return 10000

    def __getitem__(self, idx):
        if idx % 2 ==0:
            label = mx.nd.zeros(shape=(1000, ))
            label[0] = 1
            return mx.nd.array(mx.nd.zeros(shape=(3, 224, 224))), label
        else:
            label = mx.nd.zeros(shape=(1000, ))
            label[1] = 1
            return mx.nd.array(mx.nd.ones(shape=(3, 224, 224))), label


def train_gluon_model_with_module():
    import os
    # os.environ["MXNET_BACKWARD_DO_MIRROR"]="1"
    # os.environ["MXNET_USE_FUSION"]="0"
    ctx_list = [mx.gpu(0)]
    from models.backbones.resnet._resnetv1b import resnet50_v1b
    net = resnet50_v1b(pretrained=False)
    # net = mx.gluon.model_zoo.vision.resnet50_v1(pretrained=False)
    net.initialize()
    _ = net(mx.nd.zeros(shape=(1, 3, 224, 224)))
    arg_params = {}
    aux_params = {}
    arg_params_collected = net.collect_params()
    for k in arg_params_collected:
        arg_params[k] = arg_params_collected[k].data(mx.cpu())
    for k in arg_params_collected:
        aux_params[k] = arg_params_collected[k].data(mx.cpu())

    data = mx.sym.var(name="data")
    sym = net(data)
    module = mx.mod.Module(sym, data_names=['data'], label_names=[], context=ctx_list)
    module.bind(data_shapes=[("data", (len(ctx_list) * 2, 3, 224, 224))])
    module.init_params(arg_params=arg_params, aux_params=aux_params, allow_missing=False, allow_extra=True)
    module.init_optimizer(force_init=True)
    train_loader = mx.gluon.data.DataLoader(dataset=NaiveDataset(), batch_size=100,
                                            num_workers=8, last_batch="discard", shuffle=True,
                                            thread_pool=False)
    for data_batch in train_loader:
        module_data_batch = mx.io.DataBatch(data=[data_batch[0], ], label=None)
        module.forward(module_data_batch, is_train=True)
        y_hat = module.get_outputs(merge_multi_context=True)
        label_list = mx.gluon.utils.split_and_load(data_batch[1], ctx_list=ctx_list, batch_axis=0)
        preds_list = mx.gluon.utils.split_and_load(y_hat[0], ctx_list=ctx_list, batch_axis=0)
        pred_grad_list = []
        for pred, label in zip(preds_list, label_list):  # type: mx.nd.NDArray, mx.nd.NDArray
            pred.attach_grad()
            label.attach_grad()
            with ag.record():
                pred_log_softmax = mx.nd.log_softmax(pred,  axis=1)
                loss = pred_log_softmax * label * -1
            loss.backward()
            pred_grad_list.append(pred.grad)
        pred_gradients = mx.nd.concatenate(pred_grad_list, axis=0)
        module.backward([pred_gradients])
        module.update()
        print(loss.sum().asnumpy())
        mx.nd.waitall()


def train_gluon_model_with_gluon():
    ctx_list = [mx.gpu(0)]
    net = mx.gluon.model_zoo.vision.resnet50_v1(pretrained=False)
    net.initialize()
    net.collect_params().reset_ctx(ctx_list)
    net.hybridize(static_alloc=True)
    trainer = mx.gluon.Trainer(
        net.collect_params(),  # fix batchnorm, fix first stage, etc...
        'sgd',
        {
            'learning_rate':1e-2
         },
    )

    train_loader = mx.gluon.data.DataLoader(dataset=NaiveDataset(), batch_size=100,
                                            num_workers=8, last_batch="discard", shuffle=True,
                                            thread_pool=False)
    for data_batch in train_loader:
        data_list = mx.gluon.utils.split_and_load(data_batch[0], ctx_list=ctx_list, batch_axis=0)
        label_list = mx.gluon.utils.split_and_load(data_batch[1], ctx_list=ctx_list, batch_axis=0)
        losses = []
        for data, label in zip(data_list, label_list):  # type: mx.nd.NDArray, mx.nd.NDArray
            with ag.record():
                y_hat = net(data)
                pred_log_softmax = mx.nd.log_softmax(y_hat,  axis=1)
                loss = pred_log_softmax * label * -1
            losses.append(loss)
        ag.backward(losses)
        trainer.step(1)
        print(loss.sum().asnumpy())
        mx.nd.waitall()


if __name__ == '__main__':
    # train_gluon_model_with_module()
    train_gluon_model_with_gluon()

```

By default train_gluon_model_with_module and train_gluon_model_with_gluon need almost same GPU memory, but if  set MXNET_BACKWARD_DO_MIRROR to 1 and set MXNET_USE_FUSION to 0, train_gluon_model_with_module will fail and raise a OOM exception. 

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-691611097

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Serge Panev <no...@github.com>.

@samskalicky that works for me and the TensorRT related PRs I am backporting.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-678082133

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Ofer Hasson <no...@github.com.INVALID>.

Any timeline updates ?
The tables at https://cwiki.apache.org/confluence/display/MXNET/1.8.0+Release+Plan+and+Status
seems out of date

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-756673106

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Leonard Lausen <no...@github.com>.

@stu1130 the PR you reference updates the CI configuration to workaround a bug in NVidia CUDA. Unfortunately NVidia will not fix the bug in CUDA 10. Just backporting the PR may not help users building MXNet from source. It may be more helpful to add a paragraph to the installation docs to recommend windows users to build with CUDA 11 or to manually patch their CUDA 10 version to backport the thrust changes included in CUDA 11. But ideally NVidia would just fix the bugs in CUDA 10 as well, given this bug affects all CUDA 10 users on Windows (not just MXNet): https://github.com/thrust/thrust/issues/1090

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-677820040

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Neutron3529 <no...@github.com.INVALID>.

> > @samskalicky It seems that my merged BUG fix (for `KLDivLoss`, #18423 ) is not included in 1.8.0-rc1.
> 
> Hi @Neutron3529 it looks like #18423 was merged to the master branch. The v1.8.0 release (and rc0/rc1) branched off of the v1.x branch and since the PR was not backported it is not included in this release.

How to merge the PR to 1.x?
I manually opened a new PR(https://github.com/apache/incubator-mxnet/pull/19356).
Is it enough to fix the bug(if all checks pasts)?


-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-709636407

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Xingjian Shi <no...@github.com>.

Add the numpy bug fix of pad operator: https://github.com/apache/incubator-mxnet/pull/19044

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-688061685

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sheng Zha <no...@github.com.INVALID>.

Mac 1.8 wheels should be available now.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-813520884

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Jake Lee <no...@github.com>.

I would like to include the [BatchNorm performance improvement PR](https://github.com/apache/incubator-mxnet/pull/18676) for axis != 1 to 1.8

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-664642323

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sam Skalicky <no...@github.com.INVALID>.

Hi @lgg currently no date for pip. we're still working on making the pip packaging compliant with ASF. we'll send out an update on dev@ when we have more info.

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-791739770

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Leonard Lausen <no...@github.com>.

I'd like to include packaging change https://github.com/apache/incubator-mxnet/pull/19053 which ensures that no GPL dependencies are packaged by the static build scripts for distribution use-cases.

-- 
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-683932854

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by JackieWu <no...@github.com>.

- [Fix Bug] set upper bound infinity for conv, resize, upsampling, etc.
https://github.com/apache/incubator-mxnet/pull/18857
It will allow users to create convolutional layers with large number of filters and large spatial kernel.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-668973386

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sam Skalicky <no...@github.com.INVALID>.

Hi @bgawrych, #19200 is already merged into the v1.8.x branch. We're planning to tag the release candidate on Friday and start the vote on Saturday 9/26 so your PR should be included.

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-697160358

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by kohillyang <no...@github.com>.

Any plan to fix MXNET_BACKWARD_DO_MIRROR, or to fix the docs since it is currently not supported by GLuon. I think MXNET_BACKWARD_DO_MIRROR is very important especially in object detection and sematic segmentation. Without that, even the backbone is resnet50, the im_per_gpu can only be set to 1 if the scale is 800 times 1333.  

-- 
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-685232706

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Leonard Lausen <no...@github.com.INVALID>.

@bartekkuncer I think so: https://github.com/apache/incubator-mxnet/commit/2aa2702880207df2665c290b4416712b6d9fceaa

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-733882599

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sam Skalicky <no...@github.com>.

Hi @pengzhao-intel we havent set any dates for the 1.8 release yet. Thanks for bringing this up. I have been holding off while waiting on the vote on general@ to complete for the v1.7 release. But we should start to formalize the 1.8 plans now that we're moving forward with backporting PRs and have consensus from the community on having another 1.x release before 2.0. 

I would like to propose that we have a feature freeze 1 month from today on September 18th. Does that work for those that intend to submit PRs for the v1.8 release @DickJC123 @stu1130 @Kh4L and others?

Once we decide on a feature freeze date we'll work backwards from there and i'll formally setup the release plan with @ChaiBapchya and @josephevans who also will be co-managing the v1.8 release with me. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-678041884

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sheng Zha <no...@github.com>.

> > Any plan to fix MXNET_BACKWARD_DO_MIRROR, or to fix the docs since it is currently not supported by GLuon. I think MXNET_BACKWARD_DO_MIRROR is very important especially in object detection and sematic segmentation. Without that, even the backbone is resnet50, the im_per_gpu can only be set to 1 if the scale is 800 times 1333.
> 
> @kohillyang or @szha do you want to open a PR with this fix for v1.8?

I agree that this is an important feature though we may not make it in time for 1.8.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-689127698

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by kpuatamazon <no...@github.com>.

@fhieber wants #19099 supporting the intgemm library so that https://github.com/awslabs/sockeye version 2 (based on MXNet 1.x) can be installed with `pip`.  

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-688439143

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Michael Yahalom <no...@github.com.INVALID>.

> I think we have been supporting mac and windows so far and it's just that @yajiedesign and I are part of the release process. I'm currently working on producing the mac wheels for 1.8.
> 
> By the way, I believe @access2rohit is currently making the CD for mac available in the main repo in #19957.

@yajiedesign hey any idea when 1.8 wheels for Windows?

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-830778182

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Felix Hieber <no...@github.com.INVALID>.

> > @samskalicky is there a timeline for mac wheels being available on pip? I noticed the post0 release of 1.8 for Linux on 3/30, but Mac wheels are still missing.
> 
> 
> 
> @fhieber currently we only release Linux wheels as part of our release process. The Windows and Mac wheels are currently built by other community members (@yajiedesign for Windows, @szha for Mac) outside of the normal release process. At some point in the future we hope to integrate the building of these into the release process but where not there yet. 

I see, thanks. It would be helpful to make this clear on the website where selection of MacOs/Python/{CPU,GPU} still shows the pip option. 
What is the recommendation for downstream projects like Sockeye what to build against by default in their CI systems, only Linux?

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-813467441

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Serge Panev <no...@github.com>.

These PR related to Partition API changes for the Gluon support could be also added: 

- Partition API adding and deleting new params to Block and Symbol  https://github.com/apache/incubator-mxnet/pull/18405
- Add backward Type inference to main NN operators https://github.com/apache/incubator-mxnet/pull/18378
- Add better partial args/aux handling in symbol optimize_for https://github.com/apache/incubator-mxnet/pull/18350
- Fix FInferShape for some ops to support partial type inference https://github.com/apache/incubator-mxnet/pull/18348
- Change include to relative in nvvm_to_onnx.cc https://github.com/apache/incubator-mxnet/pull/18249

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-664634047

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sheng Zha <no...@github.com.INVALID>.

I think we have been supporting mac and windows so far and it's just that @yajiedesign and I are part of the release process. I'm currently working on producing the mac wheels for 1.8.

By the way, I believe @access2rohit is currently making the CD for mac available in the main repo in #19957.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-813484096

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by kohillyang <no...@github.com>.

@samskalicky @szha Thank you very much. I created an issue for this <https://github.com/apache/incubator-mxnet/issues/19133>.

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-691619993

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Neutron3529 <no...@github.com.INVALID>.

@samskalicky It seems that my merged BUG fix (for `KLDivLoss`, #18423 ) is not included in 1.8.0-rc1.

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-707592100

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sam Skalicky <no...@github.com>.

Thanks @sxjscience and @kpuatamazon, looking forward to those additions to the 1.8.0 release. Can you please give an ETA for those backport PRs? Will you still be able to have those ready by September 18th code freeze date?

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-689004918

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Jake Lee <no...@github.com>.

I would like to include one more PR #18218 which helps use to build windows mxnet.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-676931837

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Joe Evans <no...@github.com>.

I'd also like to include updated ONNX support in 1.8 release with PR #19017 

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-686252921

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sheng Zha <no...@github.com>.

Github milestone which seems to be a suitable tool for tracking release progress. I created one here and added all mentioned items that we can try out: https://github.com/apache/incubator-mxnet/milestone/5

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-678818307

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Nikolay Ulmasov <no...@github.com.INVALID>.

Can someone review this simple one please - a back-port of bug fix reported first for v1.6:  https://github.com/apache/incubator-mxnet/pull/19095

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-700998763

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Felix Hieber <no...@github.com.INVALID>.

@samskalicky is there a timeline for mac wheels being available on pip? I noticed the post0 release of 1.8 for Linux on 3/30, but Mac wheels are still missing.

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-813012286

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Leonard Lausen <no...@github.com>.

I'd like to include a fix for NaiveEngine::PushAsync https://github.com/apache/incubator-mxnet/pull/19122

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-691347641

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Dick Carter <no...@github.com>.

A major feature of CUDA 11 and cuDNN 8.0 is support for the new A100 GPU and its TensorFloat-32 (TF32) mode of computation.  I would like to include PR https://github.com/apache/incubator-mxnet/pull/18694, "Unittest tolerance handling improvements", which allows MXNet to use TF32 effectively.  The PR also makes sensible adjustments to the unittest tolerances based on device context and dtype, ensuring A100 compatibility with our unittest suite.

With cuDNN 8.0 also comes compatibility with CUDA Graph Capture- I would like to include a PR (near complete, but not yet submitted) that enables CUDA Graph use.  This will permit MXNet to bypass much of the CPU preparation for launching identical kernel sequences, as are commonly seen in many deep learning training and inferencing environments.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-668969957

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by bartekkuncer <no...@github.com.INVALID>.

Please include FindMKL.cmake fix: https://github.com/apache/incubator-mxnet/pull/19152 and update of mkldnn to v1.7.0: https://github.com/apache/incubator-mxnet/pull/19560

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-734350768

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Leonard Lausen <no...@github.com>.

@samskalicky generally MXNet supports the last 2 CUDA versions, so MXNet 1.8 would support CUDA 11 and CUDA 10.2

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-677821816

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by bartekkuncer <no...@github.com.INVALID>.

Please include fix for GCC8 with oneDNN v1.6.4: https://github.com/apache/incubator-mxnet/pull/19251


-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-703544920

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sam Skalicky <no...@github.com.INVALID>.

> @samskalicky It seems that my merged BUG fix (for `KLDivLoss`, #18423 ) is not included in 1.8.0-rc1.

Hi @Neutron3529 it looks like #18423 was merged to the master branch. The v1.8.0 release (and rc0/rc1) were forked off of the v1.x branch and since the PR was not backported it is not included in this release. 

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-707833796

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sheng Zha <no...@github.com>.

@kohillyang mirror and fusion are two orthogonal techniques that help save memory and Gluon (or specifically CachedOp) still needs to implement it. I can help elaborate more once you open a tracking issue.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-691613775

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by PatricZhao <no...@github.com>.

@samskalicky  what's the timeline of 1.8? 

Does anyone work on update for the wiki?
https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-677999463

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sam Skalicky <no...@github.com>.

@leezu There is a CUDA 11 update planned for 1.8. @DickJC123 is planning to do this work. Would we want #18218 anyway?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-677821099

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by Sam Skalicky <no...@github.com>.

@kohillyang thanks for the update. Would you mind opening a separate issue to track debugging? If you do end up opening a PR with a fix please post back here so we can track that for the release. Thank you!

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-691613137

Re: [apache/incubator-mxnet] [RFC] v1.8.0 release (#18800)

Posted by bartekkuncer <no...@github.com>.

I would like to include update of oneDNN version to v1.6: https://github.com/apache/incubator-mxnet/pull/18867.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18800#issuecomment-669778611