You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mxnet.apache.org by "Srivastava, Rohit Kumar" <sr...@buckeyemail.osu.edu> on 2019/04/26 23:33:29 UTC

[RFC] Support for creation of Large Tensors in MXNet

Dear Community,

Currently MXNet supports creation of Tensors containing up to 2^32 elements. However there are cases where tensors of size over 5 billion is required

We plan to support creation of large tensors on MXNet. A design proposal is ready for review:
https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support

We will appreciate any help and feedbacks from the community.

Thank you!

Rohit

Re: [RFC] Support for creation of Large Tensors in MXNet

Posted by Pedro Larroy <pe...@gmail.com>.

From what I know, I don't think this will be feasible, given that many
functions such as openblas or cuda operate directly with memory
pointers. It's different from an ideal scenario where you could have
access through an STL like iterator which can be mocked.

Pedro.



On Wed, May 29, 2019 at 3:32 PM Marco de Abreu <ma...@gmail.com> wrote:
>
> Hi Pedro,
>
> thank you for elaborating the situation. I agree that it's a difficult
> topic and appreciate that it's considered important. Your test strategy
> sounds like a good idea.
>
> The first thing that came to my mind was to add a subclass that inherits
> off Vector (on mobile, so can't link, but I think you know which class I
> mean). In there, we could handle these calls gracefully.
>
> For now I agree that it's not something that can be done quickly but needs
> proper planning. I just talked to Lin and we agreed on merging the end to
> end tests for now. The condition is that they have to be replaced by 15th
> of July or the next minor release that includes large tensor support;
> whichever comes first.
>
> Thanks everyone for their input on this topic!
>
> Best regards,
> Marco
>
>
> Pedro Larroy <pe...@gmail.com> schrieb am Mi., 29. Mai 2019,
> 23:56:
>
> > Hi Marco
> >
> > I think this is a very good point that we both have made before, and
> > is good that we bring up the topic again, as currently the unit test
> > suite is heavy, costly and takes too long to get feedback for
> > development and doesn't run on embedded.
> >
> > The problem here is that we don't know how to mock array access in
> > C++, or even if this is possible.. And wrapping array access in
> > heavier classes or C++ iterators is too much of a departure from the
> > way operators work in MXNet.
> >
> > I agree that we definitely need to look into leaning out the test
> > suite and putting these heavier tests in nightly, which to my
> > understanding is the right place for these end to end tests.
> >
> > I think there are two action items that we need to decide on how to
> > move this forward.
> >
> > 1- Do we want to eventually have a heavy end-to-end test for some
> > features like large tensor? I think so. In this case say allocating a
> > tensor bigger than 2^32. Is this too much for nightly? If so there
> > could be a heavy test suite to run before stable releases in an
> > appropriately resourced machine. I think heavy tests should be
> > appropriate for nightly but not for unit tests run / PR validation.
> >
> > 2- We should ad C++ unit tests and regression tests when possible as
> > it's the case in the bug that Rohit and Lin detected in the
> > ReverseIndex function. [1].  Reviewers could ask for those C++ tests
> > when appropiate.
> >
> > So with the current state of things I don't think we should block the
> > PR asking to refactor all memory accesses.
> >
> > Pedro.
> >
> >
> > [1]
> >
> > This seems to be a C++ function that operates on the indexes for
> > reversing tensors along a dimension and doesn't allocate memory (the
> > arrays have size 10.), and the bug was an integer overflow on the
> > return type (int vs index_t). This can be tested in a unit test.
> >
> >
> > https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/matrix_op-inl.h#L1953
> >
> > On Tue, May 28, 2019 at 6:56 AM Marco de Abreu <ma...@gmail.com>
> > wrote:
> > >
> > > Hey everyone,
> > >
> > > this topic was also subject in one of the recent PRs:
> > >
> > https://github.com/apache/incubator-mxnet/pull/15048#discussion_r288114883
> > >
> > > My concern around this topic is the testability of large tensors.
> > Generally
> > > I'd like to debate whether we have to run full end-to-end tests with
> > arrays
> > > that are multiple Gigabyte in size or if there aren't any smarter and
> > less
> > > ressource-intensitve ways to approach this challenge. It's true that our
> > CI
> > > has quite a lot of resources (and if we're talking about CPU tests, it's
> > > almost infinite due to virtual memory and swapping). Since I think that
> > > generally it's good to not have too many constraints on the testing
> > > environment, I'd like to propose a different approach. Here's a copy from
> > > my comment in the PR:
> > >
> > > When I'm talking about layers in this case, I'm talking about the
> > physical
> > > execution on a machine and not about machine learning operator layers.
> > >
> > > Specifically in this case I'm talking about mocking memory accesses. Read
> > > and writes would access a virtual array array that doesn't have the full
> > > capacity on physical memory but instead consists of procedually generated
> > > data.
> > >
> > > In various languages there's the concept of Iterators
> > >
> > https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/
> > > where a function can be invoked that will deliver a stream of entries. If
> > > you would call ```.toList()``` that would then create a physically mapped
> > > array (which we wouldn't want). Instead, you could call
> > ```.get(1243952)```
> > > which would iterate the array but not allocate for the full size but only
> > > for that particular entry. So support writes, you could have custom
> > lookups
> > > that would allow setting certain values in a sparse fashion. The iterator
> > > function could be something like
> > > ```
> > > float64 getValue(int64 index)
> > > {
> > >     dictValue = _internalDict.get(index);
> > >     if (dictValue != null)
> > >         return dictValue;
> > >     else
> > >         return index; //Replace with generator function
> > > }
> > >
> > > setValue(int64 index, float64 value)
> > > {
> > >     _internalDict.set(index, value)
> > > }
> > > ```
> > >
> > > With this concept you'd be able to test infinitely big arrays as long as
> > > you only use sparse data to test (the size of the dict would depend on
> > the
> > > number of specifically set entries). Since the purpose of these large
> > array
> > > tests is not to test dense data (it wouldn't make a difference between
> > > sparse and dense besides performance I assume), this type of generator
> > > model would allow us to execute these tests in an efficient way.
> > >
> > > The only thing we'd have to mock at this point are the getters and
> > setters
> > > that try to read/write from/to physical memory. The calls would be
> > > redirected to the pseudo-code above.
> > >
> > >
> > > I'm aware that we do not support this type of behaviour out of the box,
> > but
> > > I think that we're getting to a point where we should take a step back
> > and
> > > reconsider the approaches we would like to do in such resource-heavy
> > cases.
> > > Of course, we wouldn't be testing the physical memory access with my
> > > method, but I think that we can assume that memory accesses are properly
> > > working or otherwise we'd notice in quite a few other cases.
> > >
> > > Best regards,
> > > Marco
> > >
> > > On Sat, May 25, 2019 at 12:15 PM Lv, Tao A <ta...@intel.com> wrote:
> > >
> > > > Hi Lin,
> > > >
> > > > Yes, MKL supports that. Please refer to
> > > >
> > https://software.intel.com/en-us/mkl-macos-developer-guide-using-the-ilp64-interface-vs-lp64-interface
> > > > for details.
> > > >
> > > > I also did some work towards that direction. Please see below PRs for
> > > > MXNet and mshadow respectively.
> > > > https://github.com/apache/incubator-mxnet/pull/13723
> > > > https://github.com/dmlc/mshadow/pull/365
> > > >
> > > > Feel free to let me know if anything I can help.
> > > >
> > > > Thanks,
> > > > -tao
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Lin Yuan [mailto:apeforest@gmail.com]
> > > > Sent: Saturday, May 25, 2019 1:36 AM
> > > > To: dev@mxnet.incubator.apache.org; Lv, Tao A <ta...@intel.com>
> > > > Cc: dev@mxnet.apache.org
> > > > Subject: Re: [RFC] Support for creation of Large Tensors in MXNet
> > > >
> > > > Hi Sheng,
> > > >
> > > > Thanks for the nice suggestions. To summarize the current status and
> > > > future plan of this project:
> > > >
> > > > There were some missing operators from #11742 that did not support
> > large
> > > > tensors. Thanbks to Rohit's help, those missing operators have been
> > > > completed and tests added to nightly pipeline in MXNet 1.5 release
> > > > (currently on GPU only and will be added to CPU once issue
> > > > https://github.com/apache/incubator-mxnet/issues/14980 is resolved) .
> > > >
> > > > The next phases of this project are:
> > > > (1) Run operator profling to identify the operators that have
> > performance
> > > > regression after turning on int64 compiler flag
> > > > (2) Mitigate the performance regressions in the operators collected
> > from
> > > > (1)
> > > > (3) Turn on int64 compilation flag by default (the target completion is
> > > > release 1.6)
> > > > (4) Support int64 for each dimension of the tensor. This can be
> > carried on
> > > > in parallel with (1) to (3). The currently limitation AFAIK is the
> > > > cblas_gemm libraries which uses int32 for each dimension and a lot of
> > > > matrix operators in MXNet is calling cblas_gemm in mshadow.
> > > > @Lv, Tao A <ta...@intel.com> Does Intel MKL Cblas library support
> > > > int64 for each dimension? Thanks!
> > > >
> > > > Best,
> > > >
> > > > Lin
> > > >
> > > >
> > > >
> > > >
> > > > On Sat, May 18, 2019 at 9:05 PM Sheng Zha <zh...@apache.org> wrote:
> > > >
> > > > > Thanks for clarifying. This seems like a duplicate of [1] (though
> > > > > there wasn't any feedback there). I think everyone already agrees on
> > the
> > > > goal.
> > > > >
> > > > > > Currently, we assume the max size of each dimension.
> > > > >
> > > > > I agree with Tao that int64_t would be necessary given that it's
> > > > > common to flatten and reshape ndarrays.
> > > > >
> > > > > To help avoid repeating discussion and to make this discussion more
> > > > > productive, here are some of the relevant context that I'm aware of:
> > > > > - The first part of the proposed change was merged in #11742 which
> > > > > caused #14496, i.e. performance degredation in transpose and
> > imdecode.
> > > > > The full scope is still unclear.
> > > > > - A compilation flag was added in #14570 so that people can
> > explicitly
> > > > > opt in for the support without impacting others using the default
> > > > setting.
> > > > >
> > > > > Given the context, since the goal is to support large tensor by
> > > > > default without performance impact, I hope more investigation could
> > > > > accompany this proposal that covers:
> > > > > - The problem: list the parts (e.g. operators) whose performance is
> > > > > impacted by changing the index type, and the amount of slow-down.
> > > > > - The solution for addressing the slow-down.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > -sz
> > > > >
> > > > > [1]
> > > > >
> > https://lists.apache.org/thread.html/52b784cf85f89a22355e195fc88b01992
> > > > > fb1993a6f08499a46fa1ff8@%3Cdev.mxnet.apache.org%3E
> > > > >
> > > > > On 2019/05/19 02:43:39, "Srivastava, Rohit Kumar" <
> > > > > srivastava.141@buckeyemail.osu.edu> wrote:
> > > > > > Hi Tao,
> > > > > >     Existing MXNet implementation doesn't support large tensors.
> > > > > > MXNet
> > > > > NDArray creation for tensors of sizes larger than 2^32 is only
> > > > > supported by enabling a build flag for now. The purpose of this
> > thread
> > > > > is to have the community provide feedback on the design cwiki for
> > > > > *Large Tensor Support* in MXNet. The intension is to make large
> > tensor
> > > > > support as default feature in MXNet (in future) w/o any performance
> > > > > impact so consumers do not have to build it from source.
> > > > > >
> > > > > > -Rohit
> > > > > >
> > > > > > On 5/18/19, 5:59 PM, "Lv, Tao A" <ta...@intel.com> wrote:
> > > > > >
> > > > > >     Hi Rohit,
> > > > > >
> > > > > >     The existing MKL-DNN and its integration in MXNet should
> > already
> > > > > support *large tensor* which means the total number of elements
> > > > > (Prod(shape)) can exceed INT_MAX. Feel free to me know if you find
> > any
> > > > > issue when using MKL-DNN operators with large tensors.
> > > > > >
> > > > > >     For large dimension size (shape[x]), MKL-DNN is going to
> > support
> > > > > > in
> > > > > its 1.0 release and will be released at the middle of year. But I'm
> > > > > not sure if MXNet has plan to support that.
> > > > > >
> > > > > >     Thanks,
> > > > > >     -tao
> > > > > >
> > > > > >     -----Original Message-----
> > > > > >     From: Srivastava, Rohit Kumar [mailto:
> > > > > srivastava.141@buckeyemail.osu.edu]
> > > > > >     Sent: Sunday, May 19, 2019 7:23 AM
> > > > > >     To: dev@mxnet.incubator.apache.org
> > > > > >     Subject: Re: [RFC] Support for creation of Large Tensors in
> > > > > > MXNet
> > > > > >
> > > > > >     Hi Tao,
> > > > > >         There are already couple of operators implemented in MXNet
> > > > > > that
> > > > > are currently supporting Tensors with size over ~4.5 billion. In the
> > > > > meantime core MXNet can move ahead with providing initial support for
> > > > > such large tensors so MXNet customers can start using it.
> > > > > >
> > > > > >     Good to hear MKLDNN will provide support for such cases. Do you
> > > > > > have
> > > > > a timeline as to when this feature will be released ?
> > > > > >
> > > > > >     -Rohit
> > > > > >
> > > > > >     On 4/29/19, 7:18 PM, "Lv, Tao A" <ta...@intel.com> wrote:
> > > > > >
> > > > > >         Thank you Lin! I would expect the current MKL-DNN
> > > > > > implementation
> > > > > already supports the scenario you mentioned here. Can be verified by
> > > > > this
> > > > > issue: https://github.com/apache/incubator-mxnet/issues/13451
> > > > > >
> > > > > >         But as I said before, since we support flatten or reshape
> > > > > operators, so it's possible for users to convert a tensor with large
> > > > > element size to a tensor with large dimension size. It possibly will
> > > > > cause issue there.
> > > > > >
> > > > > >         To cover more cases, MKL-DNN is going to support INT64
> > > > > > dimension
> > > > > size in its coming 1.0 major release.
> > > > > >
> > > > > >         -tao
> > > > > >
> > > > > >         -----Original Message-----
> > > > > >         From: Lin Yuan [mailto:apeforest@gmail.com]
> > > > > >         Sent: Tuesday, April 30, 2019 12:56 AM
> > > > > >         To: dev@mxnet.incubator.apache.org
> > > > > >         Subject: Re: [RFC] Support for creation of Large Tensors in
> > > > > > MXNet
> > > > > >
> > > > > >         Tao,
> > > > > >
> > > > > >         - what's the max size of dimensionality? Which data type is
> > > > > > used
> > > > > to define dimensionality (ndims)?
> > > > > >         We assume the max size of dimensionality is relatively
> > small.
> > > > > Hence `int` data type is used to define ndim
> > > > > >
> > > > > >         - what's the max size of each dimension? Which data type is
> > > > > > used
> > > > > to define dimension size (shape[x])?
> > > > > >         Currently, we assume the max size of each dimension is not
> > > > > > going
> > > > > to exceed
> > > > > >         2^31 in real applications. Hence the data type is `int32_t`
> > > > > >
> > > > > >         - what's the max size of total elements? Which data type is
> > > > > > used
> > > > > to define element size (Prod(shape))?
> > > > > >         We assume the total number of elements in a tensor can be
> > > > > > larger
> > > > > than 2^32 in some applications such as deep graph library. We use the
> > > > > data type `int64_t` to represent the total element size. Currently
> > due
> > > > > to performance regression in some operators (such as transpose), we
> > > > > used a compiler flag to set this data type to `int32_t` by default.
> > > > > Once we have ways to mitigate the performance regression, we will set
> > > > > the default data type to `int64_t`, which is part of the effort in
> > > > > this project that Rohit proposed.
> > > > > >
> > > > > >         What is the plan in MKLDNN to support large tensors? We may
> > > > > > want
> > > > > to coordinate the progress since many operators are using MKLDNN
> > > > > implementation in CPU now.
> > > > > >
> > > > > >         Many Thanks,
> > > > > >
> > > > > >         Lin
> > > > > >
> > > > > >         On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A
> > > > > > <ta...@intel.com>
> > > > > wrote:
> > > > > >
> > > > > >         > Thank you for bringing this topic to dev, Rohit.
> > > > > >         >
> > > > > >         > Regarding large tensor, can you articulate:
> > > > > >         > - what's the max size of dimensionality? Which data type
> > > > > > is
> > > > > used to
> > > > > >         > define dimensionality (ndims)?
> > > > > >         > - what's the max size of each dimension? Which data type
> > > > > > is
> > > > > used to
> > > > > >         > define dimension size (shape[x])?
> > > > > >         > - what's the max size of total elements? Which data type
> > > > > > is
> > > > > used to
> > > > > >         > define element size (Prod(shape))?
> > > > > >         >
> > > > > >         > For me, any of these three can be *large*.
> > > > > >         >
> > > > > >         > -----Original Message-----
> > > > > >         > From: Srivastava, Rohit Kumar
> > > > > >         > [mailto:srivastava.141@buckeyemail.osu.edu]
> > > > > >         > Sent: Saturday, April 27, 2019 7:33 AM
> > > > > >         > To: dev@mxnet.incubator.apache.org
> > > > > >         > Subject: [RFC] Support for creation of Large Tensors in
> > MXNet
> > > > > >         >
> > > > > >         > Dear Community,
> > > > > >         >
> > > > > >         > Currently MXNet supports creation of Tensors containing
> > up
> > > > > > to
> > > > > 2^32
> > > > > >         > elements. However there are cases where tensors of size
> > > > > > over 5
> > > > > billion
> > > > > >         > is required
> > > > > >         >
> > > > > >         > We plan to support creation of large tensors on MXNet. A
> > > > > design
> > > > > >         > proposal is ready for review:
> > > > > >         >
> > > > >
> > https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
> > > > > >         >
> > > > > >         > We will appreciate any help and feedbacks from the
> > community.
> > > > > >         >
> > > > > >         > Thank you!
> > > > > >         >
> > > > > >         > Rohit
> > > > > >         >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> >

Re: [RFC] Support for creation of Large Tensors in MXNet

Posted by Marco de Abreu <ma...@gmail.com>.

Hi Pedro,

thank you for elaborating the situation. I agree that it's a difficult
topic and appreciate that it's considered important. Your test strategy
sounds like a good idea.

The first thing that came to my mind was to add a subclass that inherits
off Vector (on mobile, so can't link, but I think you know which class I
mean). In there, we could handle these calls gracefully.

For now I agree that it's not something that can be done quickly but needs
proper planning. I just talked to Lin and we agreed on merging the end to
end tests for now. The condition is that they have to be replaced by 15th
of July or the next minor release that includes large tensor support;
whichever comes first.

Thanks everyone for their input on this topic!

Best regards,
Marco


Pedro Larroy <pe...@gmail.com> schrieb am Mi., 29. Mai 2019,
23:56:

> Hi Marco
>
> I think this is a very good point that we both have made before, and
> is good that we bring up the topic again, as currently the unit test
> suite is heavy, costly and takes too long to get feedback for
> development and doesn't run on embedded.
>
> The problem here is that we don't know how to mock array access in
> C++, or even if this is possible.. And wrapping array access in
> heavier classes or C++ iterators is too much of a departure from the
> way operators work in MXNet.
>
> I agree that we definitely need to look into leaning out the test
> suite and putting these heavier tests in nightly, which to my
> understanding is the right place for these end to end tests.
>
> I think there are two action items that we need to decide on how to
> move this forward.
>
> 1- Do we want to eventually have a heavy end-to-end test for some
> features like large tensor? I think so. In this case say allocating a
> tensor bigger than 2^32. Is this too much for nightly? If so there
> could be a heavy test suite to run before stable releases in an
> appropriately resourced machine. I think heavy tests should be
> appropriate for nightly but not for unit tests run / PR validation.
>
> 2- We should ad C++ unit tests and regression tests when possible as
> it's the case in the bug that Rohit and Lin detected in the
> ReverseIndex function. [1].  Reviewers could ask for those C++ tests
> when appropiate.
>
> So with the current state of things I don't think we should block the
> PR asking to refactor all memory accesses.
>
> Pedro.
>
>
> [1]
>
> This seems to be a C++ function that operates on the indexes for
> reversing tensors along a dimension and doesn't allocate memory (the
> arrays have size 10.), and the bug was an integer overflow on the
> return type (int vs index_t). This can be tested in a unit test.
>
>
> https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/matrix_op-inl.h#L1953
>
> On Tue, May 28, 2019 at 6:56 AM Marco de Abreu <ma...@gmail.com>
> wrote:
> >
> > Hey everyone,
> >
> > this topic was also subject in one of the recent PRs:
> >
> https://github.com/apache/incubator-mxnet/pull/15048#discussion_r288114883
> >
> > My concern around this topic is the testability of large tensors.
> Generally
> > I'd like to debate whether we have to run full end-to-end tests with
> arrays
> > that are multiple Gigabyte in size or if there aren't any smarter and
> less
> > ressource-intensitve ways to approach this challenge. It's true that our
> CI
> > has quite a lot of resources (and if we're talking about CPU tests, it's
> > almost infinite due to virtual memory and swapping). Since I think that
> > generally it's good to not have too many constraints on the testing
> > environment, I'd like to propose a different approach. Here's a copy from
> > my comment in the PR:
> >
> > When I'm talking about layers in this case, I'm talking about the
> physical
> > execution on a machine and not about machine learning operator layers.
> >
> > Specifically in this case I'm talking about mocking memory accesses. Read
> > and writes would access a virtual array array that doesn't have the full
> > capacity on physical memory but instead consists of procedually generated
> > data.
> >
> > In various languages there's the concept of Iterators
> >
> https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/
> > where a function can be invoked that will deliver a stream of entries. If
> > you would call ```.toList()``` that would then create a physically mapped
> > array (which we wouldn't want). Instead, you could call
> ```.get(1243952)```
> > which would iterate the array but not allocate for the full size but only
> > for that particular entry. So support writes, you could have custom
> lookups
> > that would allow setting certain values in a sparse fashion. The iterator
> > function could be something like
> > ```
> > float64 getValue(int64 index)
> > {
> >     dictValue = _internalDict.get(index);
> >     if (dictValue != null)
> >         return dictValue;
> >     else
> >         return index; //Replace with generator function
> > }
> >
> > setValue(int64 index, float64 value)
> > {
> >     _internalDict.set(index, value)
> > }
> > ```
> >
> > With this concept you'd be able to test infinitely big arrays as long as
> > you only use sparse data to test (the size of the dict would depend on
> the
> > number of specifically set entries). Since the purpose of these large
> array
> > tests is not to test dense data (it wouldn't make a difference between
> > sparse and dense besides performance I assume), this type of generator
> > model would allow us to execute these tests in an efficient way.
> >
> > The only thing we'd have to mock at this point are the getters and
> setters
> > that try to read/write from/to physical memory. The calls would be
> > redirected to the pseudo-code above.
> >
> >
> > I'm aware that we do not support this type of behaviour out of the box,
> but
> > I think that we're getting to a point where we should take a step back
> and
> > reconsider the approaches we would like to do in such resource-heavy
> cases.
> > Of course, we wouldn't be testing the physical memory access with my
> > method, but I think that we can assume that memory accesses are properly
> > working or otherwise we'd notice in quite a few other cases.
> >
> > Best regards,
> > Marco
> >
> > On Sat, May 25, 2019 at 12:15 PM Lv, Tao A <ta...@intel.com> wrote:
> >
> > > Hi Lin,
> > >
> > > Yes, MKL supports that. Please refer to
> > >
> https://software.intel.com/en-us/mkl-macos-developer-guide-using-the-ilp64-interface-vs-lp64-interface
> > > for details.
> > >
> > > I also did some work towards that direction. Please see below PRs for
> > > MXNet and mshadow respectively.
> > > https://github.com/apache/incubator-mxnet/pull/13723
> > > https://github.com/dmlc/mshadow/pull/365
> > >
> > > Feel free to let me know if anything I can help.
> > >
> > > Thanks,
> > > -tao
> > >
> > >
> > > -----Original Message-----
> > > From: Lin Yuan [mailto:apeforest@gmail.com]
> > > Sent: Saturday, May 25, 2019 1:36 AM
> > > To: dev@mxnet.incubator.apache.org; Lv, Tao A <ta...@intel.com>
> > > Cc: dev@mxnet.apache.org
> > > Subject: Re: [RFC] Support for creation of Large Tensors in MXNet
> > >
> > > Hi Sheng,
> > >
> > > Thanks for the nice suggestions. To summarize the current status and
> > > future plan of this project:
> > >
> > > There were some missing operators from #11742 that did not support
> large
> > > tensors. Thanbks to Rohit's help, those missing operators have been
> > > completed and tests added to nightly pipeline in MXNet 1.5 release
> > > (currently on GPU only and will be added to CPU once issue
> > > https://github.com/apache/incubator-mxnet/issues/14980 is resolved) .
> > >
> > > The next phases of this project are:
> > > (1) Run operator profling to identify the operators that have
> performance
> > > regression after turning on int64 compiler flag
> > > (2) Mitigate the performance regressions in the operators collected
> from
> > > (1)
> > > (3) Turn on int64 compilation flag by default (the target completion is
> > > release 1.6)
> > > (4) Support int64 for each dimension of the tensor. This can be
> carried on
> > > in parallel with (1) to (3). The currently limitation AFAIK is the
> > > cblas_gemm libraries which uses int32 for each dimension and a lot of
> > > matrix operators in MXNet is calling cblas_gemm in mshadow.
> > > @Lv, Tao A <ta...@intel.com> Does Intel MKL Cblas library support
> > > int64 for each dimension? Thanks!
> > >
> > > Best,
> > >
> > > Lin
> > >
> > >
> > >
> > >
> > > On Sat, May 18, 2019 at 9:05 PM Sheng Zha <zh...@apache.org> wrote:
> > >
> > > > Thanks for clarifying. This seems like a duplicate of [1] (though
> > > > there wasn't any feedback there). I think everyone already agrees on
> the
> > > goal.
> > > >
> > > > > Currently, we assume the max size of each dimension.
> > > >
> > > > I agree with Tao that int64_t would be necessary given that it's
> > > > common to flatten and reshape ndarrays.
> > > >
> > > > To help avoid repeating discussion and to make this discussion more
> > > > productive, here are some of the relevant context that I'm aware of:
> > > > - The first part of the proposed change was merged in #11742 which
> > > > caused #14496, i.e. performance degredation in transpose and
> imdecode.
> > > > The full scope is still unclear.
> > > > - A compilation flag was added in #14570 so that people can
> explicitly
> > > > opt in for the support without impacting others using the default
> > > setting.
> > > >
> > > > Given the context, since the goal is to support large tensor by
> > > > default without performance impact, I hope more investigation could
> > > > accompany this proposal that covers:
> > > > - The problem: list the parts (e.g. operators) whose performance is
> > > > impacted by changing the index type, and the amount of slow-down.
> > > > - The solution for addressing the slow-down.
> > > >
> > > > Thanks.
> > > >
> > > > -sz
> > > >
> > > > [1]
> > > >
> https://lists.apache.org/thread.html/52b784cf85f89a22355e195fc88b01992
> > > > fb1993a6f08499a46fa1ff8@%3Cdev.mxnet.apache.org%3E
> > > >
> > > > On 2019/05/19 02:43:39, "Srivastava, Rohit Kumar" <
> > > > srivastava.141@buckeyemail.osu.edu> wrote:
> > > > > Hi Tao,
> > > > >     Existing MXNet implementation doesn't support large tensors.
> > > > > MXNet
> > > > NDArray creation for tensors of sizes larger than 2^32 is only
> > > > supported by enabling a build flag for now. The purpose of this
> thread
> > > > is to have the community provide feedback on the design cwiki for
> > > > *Large Tensor Support* in MXNet. The intension is to make large
> tensor
> > > > support as default feature in MXNet (in future) w/o any performance
> > > > impact so consumers do not have to build it from source.
> > > > >
> > > > > -Rohit
> > > > >
> > > > > On 5/18/19, 5:59 PM, "Lv, Tao A" <ta...@intel.com> wrote:
> > > > >
> > > > >     Hi Rohit,
> > > > >
> > > > >     The existing MKL-DNN and its integration in MXNet should
> already
> > > > support *large tensor* which means the total number of elements
> > > > (Prod(shape)) can exceed INT_MAX. Feel free to me know if you find
> any
> > > > issue when using MKL-DNN operators with large tensors.
> > > > >
> > > > >     For large dimension size (shape[x]), MKL-DNN is going to
> support
> > > > > in
> > > > its 1.0 release and will be released at the middle of year. But I'm
> > > > not sure if MXNet has plan to support that.
> > > > >
> > > > >     Thanks,
> > > > >     -tao
> > > > >
> > > > >     -----Original Message-----
> > > > >     From: Srivastava, Rohit Kumar [mailto:
> > > > srivastava.141@buckeyemail.osu.edu]
> > > > >     Sent: Sunday, May 19, 2019 7:23 AM
> > > > >     To: dev@mxnet.incubator.apache.org
> > > > >     Subject: Re: [RFC] Support for creation of Large Tensors in
> > > > > MXNet
> > > > >
> > > > >     Hi Tao,
> > > > >         There are already couple of operators implemented in MXNet
> > > > > that
> > > > are currently supporting Tensors with size over ~4.5 billion. In the
> > > > meantime core MXNet can move ahead with providing initial support for
> > > > such large tensors so MXNet customers can start using it.
> > > > >
> > > > >     Good to hear MKLDNN will provide support for such cases. Do you
> > > > > have
> > > > a timeline as to when this feature will be released ?
> > > > >
> > > > >     -Rohit
> > > > >
> > > > >     On 4/29/19, 7:18 PM, "Lv, Tao A" <ta...@intel.com> wrote:
> > > > >
> > > > >         Thank you Lin! I would expect the current MKL-DNN
> > > > > implementation
> > > > already supports the scenario you mentioned here. Can be verified by
> > > > this
> > > > issue: https://github.com/apache/incubator-mxnet/issues/13451
> > > > >
> > > > >         But as I said before, since we support flatten or reshape
> > > > operators, so it's possible for users to convert a tensor with large
> > > > element size to a tensor with large dimension size. It possibly will
> > > > cause issue there.
> > > > >
> > > > >         To cover more cases, MKL-DNN is going to support INT64
> > > > > dimension
> > > > size in its coming 1.0 major release.
> > > > >
> > > > >         -tao
> > > > >
> > > > >         -----Original Message-----
> > > > >         From: Lin Yuan [mailto:apeforest@gmail.com]
> > > > >         Sent: Tuesday, April 30, 2019 12:56 AM
> > > > >         To: dev@mxnet.incubator.apache.org
> > > > >         Subject: Re: [RFC] Support for creation of Large Tensors in
> > > > > MXNet
> > > > >
> > > > >         Tao,
> > > > >
> > > > >         - what's the max size of dimensionality? Which data type is
> > > > > used
> > > > to define dimensionality (ndims)?
> > > > >         We assume the max size of dimensionality is relatively
> small.
> > > > Hence `int` data type is used to define ndim
> > > > >
> > > > >         - what's the max size of each dimension? Which data type is
> > > > > used
> > > > to define dimension size (shape[x])?
> > > > >         Currently, we assume the max size of each dimension is not
> > > > > going
> > > > to exceed
> > > > >         2^31 in real applications. Hence the data type is `int32_t`
> > > > >
> > > > >         - what's the max size of total elements? Which data type is
> > > > > used
> > > > to define element size (Prod(shape))?
> > > > >         We assume the total number of elements in a tensor can be
> > > > > larger
> > > > than 2^32 in some applications such as deep graph library. We use the
> > > > data type `int64_t` to represent the total element size. Currently
> due
> > > > to performance regression in some operators (such as transpose), we
> > > > used a compiler flag to set this data type to `int32_t` by default.
> > > > Once we have ways to mitigate the performance regression, we will set
> > > > the default data type to `int64_t`, which is part of the effort in
> > > > this project that Rohit proposed.
> > > > >
> > > > >         What is the plan in MKLDNN to support large tensors? We may
> > > > > want
> > > > to coordinate the progress since many operators are using MKLDNN
> > > > implementation in CPU now.
> > > > >
> > > > >         Many Thanks,
> > > > >
> > > > >         Lin
> > > > >
> > > > >         On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A
> > > > > <ta...@intel.com>
> > > > wrote:
> > > > >
> > > > >         > Thank you for bringing this topic to dev, Rohit.
> > > > >         >
> > > > >         > Regarding large tensor, can you articulate:
> > > > >         > - what's the max size of dimensionality? Which data type
> > > > > is
> > > > used to
> > > > >         > define dimensionality (ndims)?
> > > > >         > - what's the max size of each dimension? Which data type
> > > > > is
> > > > used to
> > > > >         > define dimension size (shape[x])?
> > > > >         > - what's the max size of total elements? Which data type
> > > > > is
> > > > used to
> > > > >         > define element size (Prod(shape))?
> > > > >         >
> > > > >         > For me, any of these three can be *large*.
> > > > >         >
> > > > >         > -----Original Message-----
> > > > >         > From: Srivastava, Rohit Kumar
> > > > >         > [mailto:srivastava.141@buckeyemail.osu.edu]
> > > > >         > Sent: Saturday, April 27, 2019 7:33 AM
> > > > >         > To: dev@mxnet.incubator.apache.org
> > > > >         > Subject: [RFC] Support for creation of Large Tensors in
> MXNet
> > > > >         >
> > > > >         > Dear Community,
> > > > >         >
> > > > >         > Currently MXNet supports creation of Tensors containing
> up
> > > > > to
> > > > 2^32
> > > > >         > elements. However there are cases where tensors of size
> > > > > over 5
> > > > billion
> > > > >         > is required
> > > > >         >
> > > > >         > We plan to support creation of large tensors on MXNet. A
> > > > design
> > > > >         > proposal is ready for review:
> > > > >         >
> > > >
> https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
> > > > >         >
> > > > >         > We will appreciate any help and feedbacks from the
> community.
> > > > >         >
> > > > >         > Thank you!
> > > > >         >
> > > > >         > Rohit
> > > > >         >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
>

Re: [RFC] Support for creation of Large Tensors in MXNet

Posted by Pedro Larroy <pe...@gmail.com>.

Hi Marco

I think this is a very good point that we both have made before, and
is good that we bring up the topic again, as currently the unit test
suite is heavy, costly and takes too long to get feedback for
development and doesn't run on embedded.

The problem here is that we don't know how to mock array access in
C++, or even if this is possible.. And wrapping array access in
heavier classes or C++ iterators is too much of a departure from the
way operators work in MXNet.

I agree that we definitely need to look into leaning out the test
suite and putting these heavier tests in nightly, which to my
understanding is the right place for these end to end tests.

I think there are two action items that we need to decide on how to
move this forward.

1- Do we want to eventually have a heavy end-to-end test for some
features like large tensor? I think so. In this case say allocating a
tensor bigger than 2^32. Is this too much for nightly? If so there
could be a heavy test suite to run before stable releases in an
appropriately resourced machine. I think heavy tests should be
appropriate for nightly but not for unit tests run / PR validation.

2- We should ad C++ unit tests and regression tests when possible as
it's the case in the bug that Rohit and Lin detected in the
ReverseIndex function. [1].  Reviewers could ask for those C++ tests
when appropiate.

So with the current state of things I don't think we should block the
PR asking to refactor all memory accesses.

Pedro.


[1]

This seems to be a C++ function that operates on the indexes for
reversing tensors along a dimension and doesn't allocate memory (the
arrays have size 10.), and the bug was an integer overflow on the
return type (int vs index_t). This can be tested in a unit test.

https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/matrix_op-inl.h#L1953

On Tue, May 28, 2019 at 6:56 AM Marco de Abreu <ma...@gmail.com> wrote:
>
> Hey everyone,
>
> this topic was also subject in one of the recent PRs:
> https://github.com/apache/incubator-mxnet/pull/15048#discussion_r288114883
>
> My concern around this topic is the testability of large tensors. Generally
> I'd like to debate whether we have to run full end-to-end tests with arrays
> that are multiple Gigabyte in size or if there aren't any smarter and less
> ressource-intensitve ways to approach this challenge. It's true that our CI
> has quite a lot of resources (and if we're talking about CPU tests, it's
> almost infinite due to virtual memory and swapping). Since I think that
> generally it's good to not have too many constraints on the testing
> environment, I'd like to propose a different approach. Here's a copy from
> my comment in the PR:
>
> When I'm talking about layers in this case, I'm talking about the physical
> execution on a machine and not about machine learning operator layers.
>
> Specifically in this case I'm talking about mocking memory accesses. Read
> and writes would access a virtual array array that doesn't have the full
> capacity on physical memory but instead consists of procedually generated
> data.
>
> In various languages there's the concept of Iterators
> https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/
> where a function can be invoked that will deliver a stream of entries. If
> you would call ```.toList()``` that would then create a physically mapped
> array (which we wouldn't want). Instead, you could call ```.get(1243952)```
> which would iterate the array but not allocate for the full size but only
> for that particular entry. So support writes, you could have custom lookups
> that would allow setting certain values in a sparse fashion. The iterator
> function could be something like
> ```
> float64 getValue(int64 index)
> {
>     dictValue = _internalDict.get(index);
>     if (dictValue != null)
>         return dictValue;
>     else
>         return index; //Replace with generator function
> }
>
> setValue(int64 index, float64 value)
> {
>     _internalDict.set(index, value)
> }
> ```
>
> With this concept you'd be able to test infinitely big arrays as long as
> you only use sparse data to test (the size of the dict would depend on the
> number of specifically set entries). Since the purpose of these large array
> tests is not to test dense data (it wouldn't make a difference between
> sparse and dense besides performance I assume), this type of generator
> model would allow us to execute these tests in an efficient way.
>
> The only thing we'd have to mock at this point are the getters and setters
> that try to read/write from/to physical memory. The calls would be
> redirected to the pseudo-code above.
>
>
> I'm aware that we do not support this type of behaviour out of the box, but
> I think that we're getting to a point where we should take a step back and
> reconsider the approaches we would like to do in such resource-heavy cases.
> Of course, we wouldn't be testing the physical memory access with my
> method, but I think that we can assume that memory accesses are properly
> working or otherwise we'd notice in quite a few other cases.
>
> Best regards,
> Marco
>
> On Sat, May 25, 2019 at 12:15 PM Lv, Tao A <ta...@intel.com> wrote:
>
> > Hi Lin,
> >
> > Yes, MKL supports that. Please refer to
> > https://software.intel.com/en-us/mkl-macos-developer-guide-using-the-ilp64-interface-vs-lp64-interface
> > for details.
> >
> > I also did some work towards that direction. Please see below PRs for
> > MXNet and mshadow respectively.
> > https://github.com/apache/incubator-mxnet/pull/13723
> > https://github.com/dmlc/mshadow/pull/365
> >
> > Feel free to let me know if anything I can help.
> >
> > Thanks,
> > -tao
> >
> >
> > -----Original Message-----
> > From: Lin Yuan [mailto:apeforest@gmail.com]
> > Sent: Saturday, May 25, 2019 1:36 AM
> > To: dev@mxnet.incubator.apache.org; Lv, Tao A <ta...@intel.com>
> > Cc: dev@mxnet.apache.org
> > Subject: Re: [RFC] Support for creation of Large Tensors in MXNet
> >
> > Hi Sheng,
> >
> > Thanks for the nice suggestions. To summarize the current status and
> > future plan of this project:
> >
> > There were some missing operators from #11742 that did not support large
> > tensors. Thanbks to Rohit's help, those missing operators have been
> > completed and tests added to nightly pipeline in MXNet 1.5 release
> > (currently on GPU only and will be added to CPU once issue
> > https://github.com/apache/incubator-mxnet/issues/14980 is resolved) .
> >
> > The next phases of this project are:
> > (1) Run operator profling to identify the operators that have performance
> > regression after turning on int64 compiler flag
> > (2) Mitigate the performance regressions in the operators collected from
> > (1)
> > (3) Turn on int64 compilation flag by default (the target completion is
> > release 1.6)
> > (4) Support int64 for each dimension of the tensor. This can be carried on
> > in parallel with (1) to (3). The currently limitation AFAIK is the
> > cblas_gemm libraries which uses int32 for each dimension and a lot of
> > matrix operators in MXNet is calling cblas_gemm in mshadow.
> > @Lv, Tao A <ta...@intel.com> Does Intel MKL Cblas library support
> > int64 for each dimension? Thanks!
> >
> > Best,
> >
> > Lin
> >
> >
> >
> >
> > On Sat, May 18, 2019 at 9:05 PM Sheng Zha <zh...@apache.org> wrote:
> >
> > > Thanks for clarifying. This seems like a duplicate of [1] (though
> > > there wasn't any feedback there). I think everyone already agrees on the
> > goal.
> > >
> > > > Currently, we assume the max size of each dimension.
> > >
> > > I agree with Tao that int64_t would be necessary given that it's
> > > common to flatten and reshape ndarrays.
> > >
> > > To help avoid repeating discussion and to make this discussion more
> > > productive, here are some of the relevant context that I'm aware of:
> > > - The first part of the proposed change was merged in #11742 which
> > > caused #14496, i.e. performance degredation in transpose and imdecode.
> > > The full scope is still unclear.
> > > - A compilation flag was added in #14570 so that people can explicitly
> > > opt in for the support without impacting others using the default
> > setting.
> > >
> > > Given the context, since the goal is to support large tensor by
> > > default without performance impact, I hope more investigation could
> > > accompany this proposal that covers:
> > > - The problem: list the parts (e.g. operators) whose performance is
> > > impacted by changing the index type, and the amount of slow-down.
> > > - The solution for addressing the slow-down.
> > >
> > > Thanks.
> > >
> > > -sz
> > >
> > > [1]
> > > https://lists.apache.org/thread.html/52b784cf85f89a22355e195fc88b01992
> > > fb1993a6f08499a46fa1ff8@%3Cdev.mxnet.apache.org%3E
> > >
> > > On 2019/05/19 02:43:39, "Srivastava, Rohit Kumar" <
> > > srivastava.141@buckeyemail.osu.edu> wrote:
> > > > Hi Tao,
> > > >     Existing MXNet implementation doesn't support large tensors.
> > > > MXNet
> > > NDArray creation for tensors of sizes larger than 2^32 is only
> > > supported by enabling a build flag for now. The purpose of this thread
> > > is to have the community provide feedback on the design cwiki for
> > > *Large Tensor Support* in MXNet. The intension is to make large tensor
> > > support as default feature in MXNet (in future) w/o any performance
> > > impact so consumers do not have to build it from source.
> > > >
> > > > -Rohit
> > > >
> > > > On 5/18/19, 5:59 PM, "Lv, Tao A" <ta...@intel.com> wrote:
> > > >
> > > >     Hi Rohit,
> > > >
> > > >     The existing MKL-DNN and its integration in MXNet should already
> > > support *large tensor* which means the total number of elements
> > > (Prod(shape)) can exceed INT_MAX. Feel free to me know if you find any
> > > issue when using MKL-DNN operators with large tensors.
> > > >
> > > >     For large dimension size (shape[x]), MKL-DNN is going to support
> > > > in
> > > its 1.0 release and will be released at the middle of year. But I'm
> > > not sure if MXNet has plan to support that.
> > > >
> > > >     Thanks,
> > > >     -tao
> > > >
> > > >     -----Original Message-----
> > > >     From: Srivastava, Rohit Kumar [mailto:
> > > srivastava.141@buckeyemail.osu.edu]
> > > >     Sent: Sunday, May 19, 2019 7:23 AM
> > > >     To: dev@mxnet.incubator.apache.org
> > > >     Subject: Re: [RFC] Support for creation of Large Tensors in
> > > > MXNet
> > > >
> > > >     Hi Tao,
> > > >         There are already couple of operators implemented in MXNet
> > > > that
> > > are currently supporting Tensors with size over ~4.5 billion. In the
> > > meantime core MXNet can move ahead with providing initial support for
> > > such large tensors so MXNet customers can start using it.
> > > >
> > > >     Good to hear MKLDNN will provide support for such cases. Do you
> > > > have
> > > a timeline as to when this feature will be released ?
> > > >
> > > >     -Rohit
> > > >
> > > >     On 4/29/19, 7:18 PM, "Lv, Tao A" <ta...@intel.com> wrote:
> > > >
> > > >         Thank you Lin! I would expect the current MKL-DNN
> > > > implementation
> > > already supports the scenario you mentioned here. Can be verified by
> > > this
> > > issue: https://github.com/apache/incubator-mxnet/issues/13451
> > > >
> > > >         But as I said before, since we support flatten or reshape
> > > operators, so it's possible for users to convert a tensor with large
> > > element size to a tensor with large dimension size. It possibly will
> > > cause issue there.
> > > >
> > > >         To cover more cases, MKL-DNN is going to support INT64
> > > > dimension
> > > size in its coming 1.0 major release.
> > > >
> > > >         -tao
> > > >
> > > >         -----Original Message-----
> > > >         From: Lin Yuan [mailto:apeforest@gmail.com]
> > > >         Sent: Tuesday, April 30, 2019 12:56 AM
> > > >         To: dev@mxnet.incubator.apache.org
> > > >         Subject: Re: [RFC] Support for creation of Large Tensors in
> > > > MXNet
> > > >
> > > >         Tao,
> > > >
> > > >         - what's the max size of dimensionality? Which data type is
> > > > used
> > > to define dimensionality (ndims)?
> > > >         We assume the max size of dimensionality is relatively small.
> > > Hence `int` data type is used to define ndim
> > > >
> > > >         - what's the max size of each dimension? Which data type is
> > > > used
> > > to define dimension size (shape[x])?
> > > >         Currently, we assume the max size of each dimension is not
> > > > going
> > > to exceed
> > > >         2^31 in real applications. Hence the data type is `int32_t`
> > > >
> > > >         - what's the max size of total elements? Which data type is
> > > > used
> > > to define element size (Prod(shape))?
> > > >         We assume the total number of elements in a tensor can be
> > > > larger
> > > than 2^32 in some applications such as deep graph library. We use the
> > > data type `int64_t` to represent the total element size. Currently due
> > > to performance regression in some operators (such as transpose), we
> > > used a compiler flag to set this data type to `int32_t` by default.
> > > Once we have ways to mitigate the performance regression, we will set
> > > the default data type to `int64_t`, which is part of the effort in
> > > this project that Rohit proposed.
> > > >
> > > >         What is the plan in MKLDNN to support large tensors? We may
> > > > want
> > > to coordinate the progress since many operators are using MKLDNN
> > > implementation in CPU now.
> > > >
> > > >         Many Thanks,
> > > >
> > > >         Lin
> > > >
> > > >         On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A
> > > > <ta...@intel.com>
> > > wrote:
> > > >
> > > >         > Thank you for bringing this topic to dev, Rohit.
> > > >         >
> > > >         > Regarding large tensor, can you articulate:
> > > >         > - what's the max size of dimensionality? Which data type
> > > > is
> > > used to
> > > >         > define dimensionality (ndims)?
> > > >         > - what's the max size of each dimension? Which data type
> > > > is
> > > used to
> > > >         > define dimension size (shape[x])?
> > > >         > - what's the max size of total elements? Which data type
> > > > is
> > > used to
> > > >         > define element size (Prod(shape))?
> > > >         >
> > > >         > For me, any of these three can be *large*.
> > > >         >
> > > >         > -----Original Message-----
> > > >         > From: Srivastava, Rohit Kumar
> > > >         > [mailto:srivastava.141@buckeyemail.osu.edu]
> > > >         > Sent: Saturday, April 27, 2019 7:33 AM
> > > >         > To: dev@mxnet.incubator.apache.org
> > > >         > Subject: [RFC] Support for creation of Large Tensors in MXNet
> > > >         >
> > > >         > Dear Community,
> > > >         >
> > > >         > Currently MXNet supports creation of Tensors containing up
> > > > to
> > > 2^32
> > > >         > elements. However there are cases where tensors of size
> > > > over 5
> > > billion
> > > >         > is required
> > > >         >
> > > >         > We plan to support creation of large tensors on MXNet. A
> > > design
> > > >         > proposal is ready for review:
> > > >         >
> > > https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
> > > >         >
> > > >         > We will appreciate any help and feedbacks from the community.
> > > >         >
> > > >         > Thank you!
> > > >         >
> > > >         > Rohit
> > > >         >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >

Re: [RFC] Support for creation of Large Tensors in MXNet

Posted by Marco de Abreu <ma...@gmail.com>.

Hey everyone,

this topic was also subject in one of the recent PRs:
https://github.com/apache/incubator-mxnet/pull/15048#discussion_r288114883

My concern around this topic is the testability of large tensors. Generally
I'd like to debate whether we have to run full end-to-end tests with arrays
that are multiple Gigabyte in size or if there aren't any smarter and less
ressource-intensitve ways to approach this challenge. It's true that our CI
has quite a lot of resources (and if we're talking about CPU tests, it's
almost infinite due to virtual memory and swapping). Since I think that
generally it's good to not have too many constraints on the testing
environment, I'd like to propose a different approach. Here's a copy from
my comment in the PR:

When I'm talking about layers in this case, I'm talking about the physical
execution on a machine and not about machine learning operator layers.

Specifically in this case I'm talking about mocking memory accesses. Read
and writes would access a virtual array array that doesn't have the full
capacity on physical memory but instead consists of procedually generated
data.

In various languages there's the concept of Iterators
https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/
where a function can be invoked that will deliver a stream of entries. If
you would call ```.toList()``` that would then create a physically mapped
array (which we wouldn't want). Instead, you could call ```.get(1243952)```
which would iterate the array but not allocate for the full size but only
for that particular entry. So support writes, you could have custom lookups
that would allow setting certain values in a sparse fashion. The iterator
function could be something like
```
float64 getValue(int64 index)
{
    dictValue = _internalDict.get(index);
    if (dictValue != null)
        return dictValue;
    else
        return index; //Replace with generator function
}

setValue(int64 index, float64 value)
{
    _internalDict.set(index, value)
}
```

With this concept you'd be able to test infinitely big arrays as long as
you only use sparse data to test (the size of the dict would depend on the
number of specifically set entries). Since the purpose of these large array
tests is not to test dense data (it wouldn't make a difference between
sparse and dense besides performance I assume), this type of generator
model would allow us to execute these tests in an efficient way.

The only thing we'd have to mock at this point are the getters and setters
that try to read/write from/to physical memory. The calls would be
redirected to the pseudo-code above.


I'm aware that we do not support this type of behaviour out of the box, but
I think that we're getting to a point where we should take a step back and
reconsider the approaches we would like to do in such resource-heavy cases.
Of course, we wouldn't be testing the physical memory access with my
method, but I think that we can assume that memory accesses are properly
working or otherwise we'd notice in quite a few other cases.

Best regards,
Marco

On Sat, May 25, 2019 at 12:15 PM Lv, Tao A <ta...@intel.com> wrote:

> Hi Lin,
>
> Yes, MKL supports that. Please refer to
> https://software.intel.com/en-us/mkl-macos-developer-guide-using-the-ilp64-interface-vs-lp64-interface
> for details.
>
> I also did some work towards that direction. Please see below PRs for
> MXNet and mshadow respectively.
> https://github.com/apache/incubator-mxnet/pull/13723
> https://github.com/dmlc/mshadow/pull/365
>
> Feel free to let me know if anything I can help.
>
> Thanks,
> -tao
>
>
> -----Original Message-----
> From: Lin Yuan [mailto:apeforest@gmail.com]
> Sent: Saturday, May 25, 2019 1:36 AM
> To: dev@mxnet.incubator.apache.org; Lv, Tao A <ta...@intel.com>
> Cc: dev@mxnet.apache.org
> Subject: Re: [RFC] Support for creation of Large Tensors in MXNet
>
> Hi Sheng,
>
> Thanks for the nice suggestions. To summarize the current status and
> future plan of this project:
>
> There were some missing operators from #11742 that did not support large
> tensors. Thanbks to Rohit's help, those missing operators have been
> completed and tests added to nightly pipeline in MXNet 1.5 release
> (currently on GPU only and will be added to CPU once issue
> https://github.com/apache/incubator-mxnet/issues/14980 is resolved) .
>
> The next phases of this project are:
> (1) Run operator profling to identify the operators that have performance
> regression after turning on int64 compiler flag
> (2) Mitigate the performance regressions in the operators collected from
> (1)
> (3) Turn on int64 compilation flag by default (the target completion is
> release 1.6)
> (4) Support int64 for each dimension of the tensor. This can be carried on
> in parallel with (1) to (3). The currently limitation AFAIK is the
> cblas_gemm libraries which uses int32 for each dimension and a lot of
> matrix operators in MXNet is calling cblas_gemm in mshadow.
> @Lv, Tao A <ta...@intel.com> Does Intel MKL Cblas library support
> int64 for each dimension? Thanks!
>
> Best,
>
> Lin
>
>
>
>
> On Sat, May 18, 2019 at 9:05 PM Sheng Zha <zh...@apache.org> wrote:
>
> > Thanks for clarifying. This seems like a duplicate of [1] (though
> > there wasn't any feedback there). I think everyone already agrees on the
> goal.
> >
> > > Currently, we assume the max size of each dimension.
> >
> > I agree with Tao that int64_t would be necessary given that it's
> > common to flatten and reshape ndarrays.
> >
> > To help avoid repeating discussion and to make this discussion more
> > productive, here are some of the relevant context that I'm aware of:
> > - The first part of the proposed change was merged in #11742 which
> > caused #14496, i.e. performance degredation in transpose and imdecode.
> > The full scope is still unclear.
> > - A compilation flag was added in #14570 so that people can explicitly
> > opt in for the support without impacting others using the default
> setting.
> >
> > Given the context, since the goal is to support large tensor by
> > default without performance impact, I hope more investigation could
> > accompany this proposal that covers:
> > - The problem: list the parts (e.g. operators) whose performance is
> > impacted by changing the index type, and the amount of slow-down.
> > - The solution for addressing the slow-down.
> >
> > Thanks.
> >
> > -sz
> >
> > [1]
> > https://lists.apache.org/thread.html/52b784cf85f89a22355e195fc88b01992
> > fb1993a6f08499a46fa1ff8@%3Cdev.mxnet.apache.org%3E
> >
> > On 2019/05/19 02:43:39, "Srivastava, Rohit Kumar" <
> > srivastava.141@buckeyemail.osu.edu> wrote:
> > > Hi Tao,
> > >     Existing MXNet implementation doesn't support large tensors.
> > > MXNet
> > NDArray creation for tensors of sizes larger than 2^32 is only
> > supported by enabling a build flag for now. The purpose of this thread
> > is to have the community provide feedback on the design cwiki for
> > *Large Tensor Support* in MXNet. The intension is to make large tensor
> > support as default feature in MXNet (in future) w/o any performance
> > impact so consumers do not have to build it from source.
> > >
> > > -Rohit
> > >
> > > On 5/18/19, 5:59 PM, "Lv, Tao A" <ta...@intel.com> wrote:
> > >
> > >     Hi Rohit,
> > >
> > >     The existing MKL-DNN and its integration in MXNet should already
> > support *large tensor* which means the total number of elements
> > (Prod(shape)) can exceed INT_MAX. Feel free to me know if you find any
> > issue when using MKL-DNN operators with large tensors.
> > >
> > >     For large dimension size (shape[x]), MKL-DNN is going to support
> > > in
> > its 1.0 release and will be released at the middle of year. But I'm
> > not sure if MXNet has plan to support that.
> > >
> > >     Thanks,
> > >     -tao
> > >
> > >     -----Original Message-----
> > >     From: Srivastava, Rohit Kumar [mailto:
> > srivastava.141@buckeyemail.osu.edu]
> > >     Sent: Sunday, May 19, 2019 7:23 AM
> > >     To: dev@mxnet.incubator.apache.org
> > >     Subject: Re: [RFC] Support for creation of Large Tensors in
> > > MXNet
> > >
> > >     Hi Tao,
> > >         There are already couple of operators implemented in MXNet
> > > that
> > are currently supporting Tensors with size over ~4.5 billion. In the
> > meantime core MXNet can move ahead with providing initial support for
> > such large tensors so MXNet customers can start using it.
> > >
> > >     Good to hear MKLDNN will provide support for such cases. Do you
> > > have
> > a timeline as to when this feature will be released ?
> > >
> > >     -Rohit
> > >
> > >     On 4/29/19, 7:18 PM, "Lv, Tao A" <ta...@intel.com> wrote:
> > >
> > >         Thank you Lin! I would expect the current MKL-DNN
> > > implementation
> > already supports the scenario you mentioned here. Can be verified by
> > this
> > issue: https://github.com/apache/incubator-mxnet/issues/13451
> > >
> > >         But as I said before, since we support flatten or reshape
> > operators, so it's possible for users to convert a tensor with large
> > element size to a tensor with large dimension size. It possibly will
> > cause issue there.
> > >
> > >         To cover more cases, MKL-DNN is going to support INT64
> > > dimension
> > size in its coming 1.0 major release.
> > >
> > >         -tao
> > >
> > >         -----Original Message-----
> > >         From: Lin Yuan [mailto:apeforest@gmail.com]
> > >         Sent: Tuesday, April 30, 2019 12:56 AM
> > >         To: dev@mxnet.incubator.apache.org
> > >         Subject: Re: [RFC] Support for creation of Large Tensors in
> > > MXNet
> > >
> > >         Tao,
> > >
> > >         - what's the max size of dimensionality? Which data type is
> > > used
> > to define dimensionality (ndims)?
> > >         We assume the max size of dimensionality is relatively small.
> > Hence `int` data type is used to define ndim
> > >
> > >         - what's the max size of each dimension? Which data type is
> > > used
> > to define dimension size (shape[x])?
> > >         Currently, we assume the max size of each dimension is not
> > > going
> > to exceed
> > >         2^31 in real applications. Hence the data type is `int32_t`
> > >
> > >         - what's the max size of total elements? Which data type is
> > > used
> > to define element size (Prod(shape))?
> > >         We assume the total number of elements in a tensor can be
> > > larger
> > than 2^32 in some applications such as deep graph library. We use the
> > data type `int64_t` to represent the total element size. Currently due
> > to performance regression in some operators (such as transpose), we
> > used a compiler flag to set this data type to `int32_t` by default.
> > Once we have ways to mitigate the performance regression, we will set
> > the default data type to `int64_t`, which is part of the effort in
> > this project that Rohit proposed.
> > >
> > >         What is the plan in MKLDNN to support large tensors? We may
> > > want
> > to coordinate the progress since many operators are using MKLDNN
> > implementation in CPU now.
> > >
> > >         Many Thanks,
> > >
> > >         Lin
> > >
> > >         On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A
> > > <ta...@intel.com>
> > wrote:
> > >
> > >         > Thank you for bringing this topic to dev, Rohit.
> > >         >
> > >         > Regarding large tensor, can you articulate:
> > >         > - what's the max size of dimensionality? Which data type
> > > is
> > used to
> > >         > define dimensionality (ndims)?
> > >         > - what's the max size of each dimension? Which data type
> > > is
> > used to
> > >         > define dimension size (shape[x])?
> > >         > - what's the max size of total elements? Which data type
> > > is
> > used to
> > >         > define element size (Prod(shape))?
> > >         >
> > >         > For me, any of these three can be *large*.
> > >         >
> > >         > -----Original Message-----
> > >         > From: Srivastava, Rohit Kumar
> > >         > [mailto:srivastava.141@buckeyemail.osu.edu]
> > >         > Sent: Saturday, April 27, 2019 7:33 AM
> > >         > To: dev@mxnet.incubator.apache.org
> > >         > Subject: [RFC] Support for creation of Large Tensors in MXNet
> > >         >
> > >         > Dear Community,
> > >         >
> > >         > Currently MXNet supports creation of Tensors containing up
> > > to
> > 2^32
> > >         > elements. However there are cases where tensors of size
> > > over 5
> > billion
> > >         > is required
> > >         >
> > >         > We plan to support creation of large tensors on MXNet. A
> > design
> > >         > proposal is ready for review:
> > >         >
> > https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
> > >         >
> > >         > We will appreciate any help and feedbacks from the community.
> > >         >
> > >         > Thank you!
> > >         >
> > >         > Rohit
> > >         >
> > >
> > >
> > >
> > >
> > >
> >
>

RE: [RFC] Support for creation of Large Tensors in MXNet

Posted by "Lv, Tao A" <ta...@intel.com>.

Hi Lin,

Yes, MKL supports that. Please refer to https://software.intel.com/en-us/mkl-macos-developer-guide-using-the-ilp64-interface-vs-lp64-interface for details.

I also did some work towards that direction. Please see below PRs for MXNet and mshadow respectively.
https://github.com/apache/incubator-mxnet/pull/13723
https://github.com/dmlc/mshadow/pull/365

Feel free to let me know if anything I can help.

Thanks,
-tao


-----Original Message-----
From: Lin Yuan [mailto:apeforest@gmail.com] 
Sent: Saturday, May 25, 2019 1:36 AM
To: dev@mxnet.incubator.apache.org; Lv, Tao A <ta...@intel.com>
Cc: dev@mxnet.apache.org
Subject: Re: [RFC] Support for creation of Large Tensors in MXNet

Hi Sheng,

Thanks for the nice suggestions. To summarize the current status and future plan of this project:

There were some missing operators from #11742 that did not support large tensors. Thanbks to Rohit's help, those missing operators have been completed and tests added to nightly pipeline in MXNet 1.5 release (currently on GPU only and will be added to CPU once issue
https://github.com/apache/incubator-mxnet/issues/14980 is resolved) .

The next phases of this project are:
(1) Run operator profling to identify the operators that have performance regression after turning on int64 compiler flag
(2) Mitigate the performance regressions in the operators collected from (1)
(3) Turn on int64 compilation flag by default (the target completion is release 1.6)
(4) Support int64 for each dimension of the tensor. This can be carried on in parallel with (1) to (3). The currently limitation AFAIK is the cblas_gemm libraries which uses int32 for each dimension and a lot of matrix operators in MXNet is calling cblas_gemm in mshadow.
@Lv, Tao A <ta...@intel.com> Does Intel MKL Cblas library support int64 for each dimension? Thanks!

Best,

Lin




On Sat, May 18, 2019 at 9:05 PM Sheng Zha <zh...@apache.org> wrote:

> Thanks for clarifying. This seems like a duplicate of [1] (though 
> there wasn't any feedback there). I think everyone already agrees on the goal.
>
> > Currently, we assume the max size of each dimension.
>
> I agree with Tao that int64_t would be necessary given that it's 
> common to flatten and reshape ndarrays.
>
> To help avoid repeating discussion and to make this discussion more 
> productive, here are some of the relevant context that I'm aware of:
> - The first part of the proposed change was merged in #11742 which 
> caused #14496, i.e. performance degredation in transpose and imdecode. 
> The full scope is still unclear.
> - A compilation flag was added in #14570 so that people can explicitly 
> opt in for the support without impacting others using the default setting.
>
> Given the context, since the goal is to support large tensor by 
> default without performance impact, I hope more investigation could 
> accompany this proposal that covers:
> - The problem: list the parts (e.g. operators) whose performance is 
> impacted by changing the index type, and the amount of slow-down.
> - The solution for addressing the slow-down.
>
> Thanks.
>
> -sz
>
> [1]
> https://lists.apache.org/thread.html/52b784cf85f89a22355e195fc88b01992
> fb1993a6f08499a46fa1ff8@%3Cdev.mxnet.apache.org%3E
>
> On 2019/05/19 02:43:39, "Srivastava, Rohit Kumar" < 
> srivastava.141@buckeyemail.osu.edu> wrote:
> > Hi Tao,
> >     Existing MXNet implementation doesn't support large tensors. 
> > MXNet
> NDArray creation for tensors of sizes larger than 2^32 is only 
> supported by enabling a build flag for now. The purpose of this thread 
> is to have the community provide feedback on the design cwiki for 
> *Large Tensor Support* in MXNet. The intension is to make large tensor 
> support as default feature in MXNet (in future) w/o any performance 
> impact so consumers do not have to build it from source.
> >
> > -Rohit
> >
> > On 5/18/19, 5:59 PM, "Lv, Tao A" <ta...@intel.com> wrote:
> >
> >     Hi Rohit,
> >
> >     The existing MKL-DNN and its integration in MXNet should already
> support *large tensor* which means the total number of elements
> (Prod(shape)) can exceed INT_MAX. Feel free to me know if you find any 
> issue when using MKL-DNN operators with large tensors.
> >
> >     For large dimension size (shape[x]), MKL-DNN is going to support 
> > in
> its 1.0 release and will be released at the middle of year. But I'm 
> not sure if MXNet has plan to support that.
> >
> >     Thanks,
> >     -tao
> >
> >     -----Original Message-----
> >     From: Srivastava, Rohit Kumar [mailto:
> srivastava.141@buckeyemail.osu.edu]
> >     Sent: Sunday, May 19, 2019 7:23 AM
> >     To: dev@mxnet.incubator.apache.org
> >     Subject: Re: [RFC] Support for creation of Large Tensors in 
> > MXNet
> >
> >     Hi Tao,
> >         There are already couple of operators implemented in MXNet 
> > that
> are currently supporting Tensors with size over ~4.5 billion. In the 
> meantime core MXNet can move ahead with providing initial support for 
> such large tensors so MXNet customers can start using it.
> >
> >     Good to hear MKLDNN will provide support for such cases. Do you 
> > have
> a timeline as to when this feature will be released ?
> >
> >     -Rohit
> >
> >     On 4/29/19, 7:18 PM, "Lv, Tao A" <ta...@intel.com> wrote:
> >
> >         Thank you Lin! I would expect the current MKL-DNN 
> > implementation
> already supports the scenario you mentioned here. Can be verified by 
> this
> issue: https://github.com/apache/incubator-mxnet/issues/13451
> >
> >         But as I said before, since we support flatten or reshape
> operators, so it's possible for users to convert a tensor with large 
> element size to a tensor with large dimension size. It possibly will 
> cause issue there.
> >
> >         To cover more cases, MKL-DNN is going to support INT64 
> > dimension
> size in its coming 1.0 major release.
> >
> >         -tao
> >
> >         -----Original Message-----
> >         From: Lin Yuan [mailto:apeforest@gmail.com]
> >         Sent: Tuesday, April 30, 2019 12:56 AM
> >         To: dev@mxnet.incubator.apache.org
> >         Subject: Re: [RFC] Support for creation of Large Tensors in 
> > MXNet
> >
> >         Tao,
> >
> >         - what's the max size of dimensionality? Which data type is 
> > used
> to define dimensionality (ndims)?
> >         We assume the max size of dimensionality is relatively small.
> Hence `int` data type is used to define ndim
> >
> >         - what's the max size of each dimension? Which data type is 
> > used
> to define dimension size (shape[x])?
> >         Currently, we assume the max size of each dimension is not 
> > going
> to exceed
> >         2^31 in real applications. Hence the data type is `int32_t`
> >
> >         - what's the max size of total elements? Which data type is 
> > used
> to define element size (Prod(shape))?
> >         We assume the total number of elements in a tensor can be 
> > larger
> than 2^32 in some applications such as deep graph library. We use the 
> data type `int64_t` to represent the total element size. Currently due 
> to performance regression in some operators (such as transpose), we 
> used a compiler flag to set this data type to `int32_t` by default. 
> Once we have ways to mitigate the performance regression, we will set 
> the default data type to `int64_t`, which is part of the effort in 
> this project that Rohit proposed.
> >
> >         What is the plan in MKLDNN to support large tensors? We may 
> > want
> to coordinate the progress since many operators are using MKLDNN 
> implementation in CPU now.
> >
> >         Many Thanks,
> >
> >         Lin
> >
> >         On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A 
> > <ta...@intel.com>
> wrote:
> >
> >         > Thank you for bringing this topic to dev, Rohit.
> >         >
> >         > Regarding large tensor, can you articulate:
> >         > - what's the max size of dimensionality? Which data type 
> > is
> used to
> >         > define dimensionality (ndims)?
> >         > - what's the max size of each dimension? Which data type 
> > is
> used to
> >         > define dimension size (shape[x])?
> >         > - what's the max size of total elements? Which data type 
> > is
> used to
> >         > define element size (Prod(shape))?
> >         >
> >         > For me, any of these three can be *large*.
> >         >
> >         > -----Original Message-----
> >         > From: Srivastava, Rohit Kumar
> >         > [mailto:srivastava.141@buckeyemail.osu.edu]
> >         > Sent: Saturday, April 27, 2019 7:33 AM
> >         > To: dev@mxnet.incubator.apache.org
> >         > Subject: [RFC] Support for creation of Large Tensors in MXNet
> >         >
> >         > Dear Community,
> >         >
> >         > Currently MXNet supports creation of Tensors containing up 
> > to
> 2^32
> >         > elements. However there are cases where tensors of size 
> > over 5
> billion
> >         > is required
> >         >
> >         > We plan to support creation of large tensors on MXNet. A
> design
> >         > proposal is ready for review:
> >         >
> https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
> >         >
> >         > We will appreciate any help and feedbacks from the community.
> >         >
> >         > Thank you!
> >         >
> >         > Rohit
> >         >
> >
> >
> >
> >
> >
>

Re: [RFC] Support for creation of Large Tensors in MXNet

Posted by Lin Yuan <ap...@gmail.com>.

Hi Sheng,

Thanks for the nice suggestions. To summarize the current status and future
plan of this project:

There were some missing operators from #11742 that did not support large
tensors. Thanbks to Rohit's help, those missing operators have been
completed and tests added to nightly pipeline in MXNet 1.5 release
(currently on GPU only and will be added to CPU once issue
https://github.com/apache/incubator-mxnet/issues/14980 is resolved) .

The next phases of this project are:
(1) Run operator profling to identify the operators that have performance
regression after turning on int64 compiler flag
(2) Mitigate the performance regressions in the operators collected from (1)
(3) Turn on int64 compilation flag by default (the target completion is
release 1.6)
(4) Support int64 for each dimension of the tensor. This can be carried on
in parallel with (1) to (3). The currently limitation AFAIK is the
cblas_gemm libraries which uses int32 for each dimension and a lot of
matrix operators in MXNet is calling cblas_gemm in mshadow.
@Lv, Tao A <ta...@intel.com> Does Intel MKL Cblas library support int64
for each dimension? Thanks!

Best,

Lin




On Sat, May 18, 2019 at 9:05 PM Sheng Zha <zh...@apache.org> wrote:

> Thanks for clarifying. This seems like a duplicate of [1] (though there
> wasn't any feedback there). I think everyone already agrees on the goal.
>
> > Currently, we assume the max size of each dimension.
>
> I agree with Tao that int64_t would be necessary given that it's common to
> flatten and reshape ndarrays.
>
> To help avoid repeating discussion and to make this discussion more
> productive, here are some of the relevant context that I'm aware of:
> - The first part of the proposed change was merged in #11742 which caused
> #14496, i.e. performance degredation in transpose and imdecode. The full
> scope is still unclear.
> - A compilation flag was added in #14570 so that people can explicitly opt
> in for the support without impacting others using the default setting.
>
> Given the context, since the goal is to support large tensor by default
> without performance impact, I hope more investigation could accompany this
> proposal that covers:
> - The problem: list the parts (e.g. operators) whose performance is
> impacted by changing the index type, and the amount of slow-down.
> - The solution for addressing the slow-down.
>
> Thanks.
>
> -sz
>
> [1]
> https://lists.apache.org/thread.html/52b784cf85f89a22355e195fc88b01992fb1993a6f08499a46fa1ff8@%3Cdev.mxnet.apache.org%3E
>
> On 2019/05/19 02:43:39, "Srivastava, Rohit Kumar" <
> srivastava.141@buckeyemail.osu.edu> wrote:
> > Hi Tao,
> >     Existing MXNet implementation doesn't support large tensors. MXNet
> NDArray creation for tensors of sizes larger than 2^32 is only supported by
> enabling a build flag for now. The purpose of this thread is to have the
> community provide feedback on the design cwiki for *Large Tensor Support*
> in MXNet. The intension is to make large tensor support as default feature
> in MXNet (in future) w/o any performance impact so consumers do not have to
> build it from source.
> >
> > -Rohit
> >
> > On 5/18/19, 5:59 PM, "Lv, Tao A" <ta...@intel.com> wrote:
> >
> >     Hi Rohit,
> >
> >     The existing MKL-DNN and its integration in MXNet should already
> support *large tensor* which means the total number of elements
> (Prod(shape)) can exceed INT_MAX. Feel free to me know if you find any
> issue when using MKL-DNN operators with large tensors.
> >
> >     For large dimension size (shape[x]), MKL-DNN is going to support in
> its 1.0 release and will be released at the middle of year. But I'm not
> sure if MXNet has plan to support that.
> >
> >     Thanks,
> >     -tao
> >
> >     -----Original Message-----
> >     From: Srivastava, Rohit Kumar [mailto:
> srivastava.141@buckeyemail.osu.edu]
> >     Sent: Sunday, May 19, 2019 7:23 AM
> >     To: dev@mxnet.incubator.apache.org
> >     Subject: Re: [RFC] Support for creation of Large Tensors in MXNet
> >
> >     Hi Tao,
> >         There are already couple of operators implemented in MXNet that
> are currently supporting Tensors with size over ~4.5 billion. In the
> meantime core MXNet can move ahead with providing initial support for such
> large tensors so MXNet customers can start using it.
> >
> >     Good to hear MKLDNN will provide support for such cases. Do you have
> a timeline as to when this feature will be released ?
> >
> >     -Rohit
> >
> >     On 4/29/19, 7:18 PM, "Lv, Tao A" <ta...@intel.com> wrote:
> >
> >         Thank you Lin! I would expect the current MKL-DNN implementation
> already supports the scenario you mentioned here. Can be verified by this
> issue: https://github.com/apache/incubator-mxnet/issues/13451
> >
> >         But as I said before, since we support flatten or reshape
> operators, so it's possible for users to convert a tensor with large
> element size to a tensor with large dimension size. It possibly will cause
> issue there.
> >
> >         To cover more cases, MKL-DNN is going to support INT64 dimension
> size in its coming 1.0 major release.
> >
> >         -tao
> >
> >         -----Original Message-----
> >         From: Lin Yuan [mailto:apeforest@gmail.com]
> >         Sent: Tuesday, April 30, 2019 12:56 AM
> >         To: dev@mxnet.incubator.apache.org
> >         Subject: Re: [RFC] Support for creation of Large Tensors in MXNet
> >
> >         Tao,
> >
> >         - what's the max size of dimensionality? Which data type is used
> to define dimensionality (ndims)?
> >         We assume the max size of dimensionality is relatively small.
> Hence `int` data type is used to define ndim
> >
> >         - what's the max size of each dimension? Which data type is used
> to define dimension size (shape[x])?
> >         Currently, we assume the max size of each dimension is not going
> to exceed
> >         2^31 in real applications. Hence the data type is `int32_t`
> >
> >         - what's the max size of total elements? Which data type is used
> to define element size (Prod(shape))?
> >         We assume the total number of elements in a tensor can be larger
> than 2^32 in some applications such as deep graph library. We use the data
> type `int64_t` to represent the total element size. Currently due to
> performance regression in some operators (such as transpose), we used a
> compiler flag to set this data type to `int32_t` by default. Once we have
> ways to mitigate the performance regression, we will set the default data
> type to `int64_t`, which is part of the effort in this project that Rohit
> proposed.
> >
> >         What is the plan in MKLDNN to support large tensors? We may want
> to coordinate the progress since many operators are using MKLDNN
> implementation in CPU now.
> >
> >         Many Thanks,
> >
> >         Lin
> >
> >         On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A <ta...@intel.com>
> wrote:
> >
> >         > Thank you for bringing this topic to dev, Rohit.
> >         >
> >         > Regarding large tensor, can you articulate:
> >         > - what's the max size of dimensionality? Which data type is
> used to
> >         > define dimensionality (ndims)?
> >         > - what's the max size of each dimension? Which data type is
> used to
> >         > define dimension size (shape[x])?
> >         > - what's the max size of total elements? Which data type is
> used to
> >         > define element size (Prod(shape))?
> >         >
> >         > For me, any of these three can be *large*.
> >         >
> >         > -----Original Message-----
> >         > From: Srivastava, Rohit Kumar
> >         > [mailto:srivastava.141@buckeyemail.osu.edu]
> >         > Sent: Saturday, April 27, 2019 7:33 AM
> >         > To: dev@mxnet.incubator.apache.org
> >         > Subject: [RFC] Support for creation of Large Tensors in MXNet
> >         >
> >         > Dear Community,
> >         >
> >         > Currently MXNet supports creation of Tensors containing up to
> 2^32
> >         > elements. However there are cases where tensors of size over 5
> billion
> >         > is required
> >         >
> >         > We plan to support creation of large tensors on MXNet. A
> design
> >         > proposal is ready for review:
> >         >
> https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
> >         >
> >         > We will appreciate any help and feedbacks from the community.
> >         >
> >         > Thank you!
> >         >
> >         > Rohit
> >         >
> >
> >
> >
> >
> >
>

Re: [RFC] Support for creation of Large Tensors in MXNet

Posted by Sheng Zha <zh...@apache.org>.

Thanks for clarifying. This seems like a duplicate of [1] (though there wasn't any feedback there). I think everyone already agrees on the goal. 

> Currently, we assume the max size of each dimension.

I agree with Tao that int64_t would be necessary given that it's common to flatten and reshape ndarrays.

To help avoid repeating discussion and to make this discussion more productive, here are some of the relevant context that I'm aware of:
- The first part of the proposed change was merged in #11742 which caused #14496, i.e. performance degredation in transpose and imdecode. The full scope is still unclear.
- A compilation flag was added in #14570 so that people can explicitly opt in for the support without impacting others using the default setting.

Given the context, since the goal is to support large tensor by default without performance impact, I hope more investigation could accompany this proposal that covers:
- The problem: list the parts (e.g. operators) whose performance is impacted by changing the index type, and the amount of slow-down.
- The solution for addressing the slow-down.

Thanks.

-sz

[1] https://lists.apache.org/thread.html/52b784cf85f89a22355e195fc88b01992fb1993a6f08499a46fa1ff8@%3Cdev.mxnet.apache.org%3E

On 2019/05/19 02:43:39, "Srivastava, Rohit Kumar" <sr...@buckeyemail.osu.edu> wrote: 
> Hi Tao,
>     Existing MXNet implementation doesn't support large tensors. MXNet NDArray creation for tensors of sizes larger than 2^32 is only supported by enabling a build flag for now. The purpose of this thread is to have the community provide feedback on the design cwiki for *Large Tensor Support* in MXNet. The intension is to make large tensor support as default feature in MXNet (in future) w/o any performance impact so consumers do not have to build it from source. 
> 
> -Rohit
> 
> On 5/18/19, 5:59 PM, "Lv, Tao A" <ta...@intel.com> wrote:
> 
>     Hi Rohit,
>     
>     The existing MKL-DNN and its integration in MXNet should already support *large tensor* which means the total number of elements (Prod(shape)) can exceed INT_MAX. Feel free to me know if you find any issue when using MKL-DNN operators with large tensors.
>     
>     For large dimension size (shape[x]), MKL-DNN is going to support in its 1.0 release and will be released at the middle of year. But I'm not sure if MXNet has plan to support that.
>     
>     Thanks,
>     -tao
>     
>     -----Original Message-----
>     From: Srivastava, Rohit Kumar [mailto:srivastava.141@buckeyemail.osu.edu] 
>     Sent: Sunday, May 19, 2019 7:23 AM
>     To: dev@mxnet.incubator.apache.org
>     Subject: Re: [RFC] Support for creation of Large Tensors in MXNet
>     
>     Hi Tao,
>         There are already couple of operators implemented in MXNet that are currently supporting Tensors with size over ~4.5 billion. In the meantime core MXNet can move ahead with providing initial support for such large tensors so MXNet customers can start using it.
>     
>     Good to hear MKLDNN will provide support for such cases. Do you have a timeline as to when this feature will be released ?
>     
>     -Rohit
>     
>     On 4/29/19, 7:18 PM, "Lv, Tao A" <ta...@intel.com> wrote:
>     
>         Thank you Lin! I would expect the current MKL-DNN implementation already supports the scenario you mentioned here. Can be verified by this issue: https://github.com/apache/incubator-mxnet/issues/13451
>         
>         But as I said before, since we support flatten or reshape operators, so it's possible for users to convert a tensor with large element size to a tensor with large dimension size. It possibly will cause issue there.
>         
>         To cover more cases, MKL-DNN is going to support INT64 dimension size in its coming 1.0 major release.
>         
>         -tao
>         
>         -----Original Message-----
>         From: Lin Yuan [mailto:apeforest@gmail.com] 
>         Sent: Tuesday, April 30, 2019 12:56 AM
>         To: dev@mxnet.incubator.apache.org
>         Subject: Re: [RFC] Support for creation of Large Tensors in MXNet
>         
>         Tao,
>         
>         - what's the max size of dimensionality? Which data type is used to define dimensionality (ndims)?
>         We assume the max size of dimensionality is relatively small. Hence `int` data type is used to define ndim
>         
>         - what's the max size of each dimension? Which data type is used to define dimension size (shape[x])?
>         Currently, we assume the max size of each dimension is not going to exceed
>         2^31 in real applications. Hence the data type is `int32_t`
>         
>         - what's the max size of total elements? Which data type is used to define element size (Prod(shape))?
>         We assume the total number of elements in a tensor can be larger than 2^32 in some applications such as deep graph library. We use the data type `int64_t` to represent the total element size. Currently due to performance regression in some operators (such as transpose), we used a compiler flag to set this data type to `int32_t` by default. Once we have ways to mitigate the performance regression, we will set the default data type to `int64_t`, which is part of the effort in this project that Rohit proposed.
>         
>         What is the plan in MKLDNN to support large tensors? We may want to coordinate the progress since many operators are using MKLDNN implementation in CPU now.
>         
>         Many Thanks,
>         
>         Lin
>         
>         On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A <ta...@intel.com> wrote:
>         
>         > Thank you for bringing this topic to dev, Rohit.
>         >
>         > Regarding large tensor, can you articulate:
>         > - what's the max size of dimensionality? Which data type is used to 
>         > define dimensionality (ndims)?
>         > - what's the max size of each dimension? Which data type is used to 
>         > define dimension size (shape[x])?
>         > - what's the max size of total elements? Which data type is used to 
>         > define element size (Prod(shape))?
>         >
>         > For me, any of these three can be *large*.
>         >
>         > -----Original Message-----
>         > From: Srivastava, Rohit Kumar 
>         > [mailto:srivastava.141@buckeyemail.osu.edu]
>         > Sent: Saturday, April 27, 2019 7:33 AM
>         > To: dev@mxnet.incubator.apache.org
>         > Subject: [RFC] Support for creation of Large Tensors in MXNet
>         >
>         > Dear Community,
>         >
>         > Currently MXNet supports creation of Tensors containing up to 2^32 
>         > elements. However there are cases where tensors of size over 5 billion 
>         > is required
>         >
>         > We plan to support creation of large tensors on MXNet. A design 
>         > proposal is ready for review:
>         > https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
>         >
>         > We will appreciate any help and feedbacks from the community.
>         >
>         > Thank you!
>         >
>         > Rohit
>         >
>         
>     
>     
> 
>

Re: [RFC] Support for creation of Large Tensors in MXNet

Posted by "Srivastava, Rohit Kumar" <sr...@buckeyemail.osu.edu>.

Hi Tao,
    Existing MXNet implementation doesn't support large tensors. MXNet NDArray creation for tensors of sizes larger than 2^32 is only supported by enabling a build flag for now. The purpose of this thread is to have the community provide feedback on the design cwiki for *Large Tensor Support* in MXNet. The intension is to make large tensor support as default feature in MXNet (in future) w/o any performance impact so consumers do not have to build it from source. 

-Rohit

On 5/18/19, 5:59 PM, "Lv, Tao A" <ta...@intel.com> wrote:

    Hi Rohit,
    
    The existing MKL-DNN and its integration in MXNet should already support *large tensor* which means the total number of elements (Prod(shape)) can exceed INT_MAX. Feel free to me know if you find any issue when using MKL-DNN operators with large tensors.
    
    For large dimension size (shape[x]), MKL-DNN is going to support in its 1.0 release and will be released at the middle of year. But I'm not sure if MXNet has plan to support that.
    
    Thanks,
    -tao
    
    -----Original Message-----
    From: Srivastava, Rohit Kumar [mailto:srivastava.141@buckeyemail.osu.edu] 
    Sent: Sunday, May 19, 2019 7:23 AM
    To: dev@mxnet.incubator.apache.org
    Subject: Re: [RFC] Support for creation of Large Tensors in MXNet
    
    Hi Tao,
        There are already couple of operators implemented in MXNet that are currently supporting Tensors with size over ~4.5 billion. In the meantime core MXNet can move ahead with providing initial support for such large tensors so MXNet customers can start using it.
    
    Good to hear MKLDNN will provide support for such cases. Do you have a timeline as to when this feature will be released ?
    
    -Rohit
    
    On 4/29/19, 7:18 PM, "Lv, Tao A" <ta...@intel.com> wrote:
    
        Thank you Lin! I would expect the current MKL-DNN implementation already supports the scenario you mentioned here. Can be verified by this issue: https://github.com/apache/incubator-mxnet/issues/13451
        
        But as I said before, since we support flatten or reshape operators, so it's possible for users to convert a tensor with large element size to a tensor with large dimension size. It possibly will cause issue there.
        
        To cover more cases, MKL-DNN is going to support INT64 dimension size in its coming 1.0 major release.
        
        -tao
        
        -----Original Message-----
        From: Lin Yuan [mailto:apeforest@gmail.com] 
        Sent: Tuesday, April 30, 2019 12:56 AM
        To: dev@mxnet.incubator.apache.org
        Subject: Re: [RFC] Support for creation of Large Tensors in MXNet
        
        Tao,
        
        - what's the max size of dimensionality? Which data type is used to define dimensionality (ndims)?
        We assume the max size of dimensionality is relatively small. Hence `int` data type is used to define ndim
        
        - what's the max size of each dimension? Which data type is used to define dimension size (shape[x])?
        Currently, we assume the max size of each dimension is not going to exceed
        2^31 in real applications. Hence the data type is `int32_t`
        
        - what's the max size of total elements? Which data type is used to define element size (Prod(shape))?
        We assume the total number of elements in a tensor can be larger than 2^32 in some applications such as deep graph library. We use the data type `int64_t` to represent the total element size. Currently due to performance regression in some operators (such as transpose), we used a compiler flag to set this data type to `int32_t` by default. Once we have ways to mitigate the performance regression, we will set the default data type to `int64_t`, which is part of the effort in this project that Rohit proposed.
        
        What is the plan in MKLDNN to support large tensors? We may want to coordinate the progress since many operators are using MKLDNN implementation in CPU now.
        
        Many Thanks,
        
        Lin
        
        On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A <ta...@intel.com> wrote:
        
        > Thank you for bringing this topic to dev, Rohit.
        >
        > Regarding large tensor, can you articulate:
        > - what's the max size of dimensionality? Which data type is used to 
        > define dimensionality (ndims)?
        > - what's the max size of each dimension? Which data type is used to 
        > define dimension size (shape[x])?
        > - what's the max size of total elements? Which data type is used to 
        > define element size (Prod(shape))?
        >
        > For me, any of these three can be *large*.
        >
        > -----Original Message-----
        > From: Srivastava, Rohit Kumar 
        > [mailto:srivastava.141@buckeyemail.osu.edu]
        > Sent: Saturday, April 27, 2019 7:33 AM
        > To: dev@mxnet.incubator.apache.org
        > Subject: [RFC] Support for creation of Large Tensors in MXNet
        >
        > Dear Community,
        >
        > Currently MXNet supports creation of Tensors containing up to 2^32 
        > elements. However there are cases where tensors of size over 5 billion 
        > is required
        >
        > We plan to support creation of large tensors on MXNet. A design 
        > proposal is ready for review:
        > https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
        >
        > We will appreciate any help and feedbacks from the community.
        >
        > Thank you!
        >
        > Rohit
        >

RE: [RFC] Support for creation of Large Tensors in MXNet

Posted by "Lv, Tao A" <ta...@intel.com>.

Hi Rohit,

The existing MKL-DNN and its integration in MXNet should already support *large tensor* which means the total number of elements (Prod(shape)) can exceed INT_MAX. Feel free to me know if you find any issue when using MKL-DNN operators with large tensors.

For large dimension size (shape[x]), MKL-DNN is going to support in its 1.0 release and will be released at the middle of year. But I'm not sure if MXNet has plan to support that.

Thanks,
-tao

-----Original Message-----
From: Srivastava, Rohit Kumar [mailto:srivastava.141@buckeyemail.osu.edu] 
Sent: Sunday, May 19, 2019 7:23 AM
To: dev@mxnet.incubator.apache.org
Subject: Re: [RFC] Support for creation of Large Tensors in MXNet

Hi Tao,
    There are already couple of operators implemented in MXNet that are currently supporting Tensors with size over ~4.5 billion. In the meantime core MXNet can move ahead with providing initial support for such large tensors so MXNet customers can start using it.

Good to hear MKLDNN will provide support for such cases. Do you have a timeline as to when this feature will be released ?

-Rohit

On 4/29/19, 7:18 PM, "Lv, Tao A" <ta...@intel.com> wrote:

    Thank you Lin! I would expect the current MKL-DNN implementation already supports the scenario you mentioned here. Can be verified by this issue: https://github.com/apache/incubator-mxnet/issues/13451
    
    But as I said before, since we support flatten or reshape operators, so it's possible for users to convert a tensor with large element size to a tensor with large dimension size. It possibly will cause issue there.
    
    To cover more cases, MKL-DNN is going to support INT64 dimension size in its coming 1.0 major release.
    
    -tao
    
    -----Original Message-----
    From: Lin Yuan [mailto:apeforest@gmail.com] 
    Sent: Tuesday, April 30, 2019 12:56 AM
    To: dev@mxnet.incubator.apache.org
    Subject: Re: [RFC] Support for creation of Large Tensors in MXNet
    
    Tao,
    
    - what's the max size of dimensionality? Which data type is used to define dimensionality (ndims)?
    We assume the max size of dimensionality is relatively small. Hence `int` data type is used to define ndim
    
    - what's the max size of each dimension? Which data type is used to define dimension size (shape[x])?
    Currently, we assume the max size of each dimension is not going to exceed
    2^31 in real applications. Hence the data type is `int32_t`
    
    - what's the max size of total elements? Which data type is used to define element size (Prod(shape))?
    We assume the total number of elements in a tensor can be larger than 2^32 in some applications such as deep graph library. We use the data type `int64_t` to represent the total element size. Currently due to performance regression in some operators (such as transpose), we used a compiler flag to set this data type to `int32_t` by default. Once we have ways to mitigate the performance regression, we will set the default data type to `int64_t`, which is part of the effort in this project that Rohit proposed.
    
    What is the plan in MKLDNN to support large tensors? We may want to coordinate the progress since many operators are using MKLDNN implementation in CPU now.
    
    Many Thanks,
    
    Lin
    
    On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A <ta...@intel.com> wrote:
    
    > Thank you for bringing this topic to dev, Rohit.
    >
    > Regarding large tensor, can you articulate:
    > - what's the max size of dimensionality? Which data type is used to 
    > define dimensionality (ndims)?
    > - what's the max size of each dimension? Which data type is used to 
    > define dimension size (shape[x])?
    > - what's the max size of total elements? Which data type is used to 
    > define element size (Prod(shape))?
    >
    > For me, any of these three can be *large*.
    >
    > -----Original Message-----
    > From: Srivastava, Rohit Kumar 
    > [mailto:srivastava.141@buckeyemail.osu.edu]
    > Sent: Saturday, April 27, 2019 7:33 AM
    > To: dev@mxnet.incubator.apache.org
    > Subject: [RFC] Support for creation of Large Tensors in MXNet
    >
    > Dear Community,
    >
    > Currently MXNet supports creation of Tensors containing up to 2^32 
    > elements. However there are cases where tensors of size over 5 billion 
    > is required
    >
    > We plan to support creation of large tensors on MXNet. A design 
    > proposal is ready for review:
    > https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
    >
    > We will appreciate any help and feedbacks from the community.
    >
    > Thank you!
    >
    > Rohit
    >

Re: [RFC] Support for creation of Large Tensors in MXNet

Posted by "Srivastava, Rohit Kumar" <sr...@buckeyemail.osu.edu>.

Hi Tao,
    There are already couple of operators implemented in MXNet that are currently supporting Tensors with size over ~4.5 billion. In the meantime core MXNet can move ahead with providing initial support for such large tensors so MXNet customers can start using it.

Good to hear MKLDNN will provide support for such cases. Do you have a timeline as to when this feature will be released ?

-Rohit

On 4/29/19, 7:18 PM, "Lv, Tao A" <ta...@intel.com> wrote:

    Thank you Lin! I would expect the current MKL-DNN implementation already supports the scenario you mentioned here. Can be verified by this issue: https://github.com/apache/incubator-mxnet/issues/13451
    
    But as I said before, since we support flatten or reshape operators, so it's possible for users to convert a tensor with large element size to a tensor with large dimension size. It possibly will cause issue there.
    
    To cover more cases, MKL-DNN is going to support INT64 dimension size in its coming 1.0 major release.
    
    -tao
    
    -----Original Message-----
    From: Lin Yuan [mailto:apeforest@gmail.com] 
    Sent: Tuesday, April 30, 2019 12:56 AM
    To: dev@mxnet.incubator.apache.org
    Subject: Re: [RFC] Support for creation of Large Tensors in MXNet
    
    Tao,
    
    - what's the max size of dimensionality? Which data type is used to define dimensionality (ndims)?
    We assume the max size of dimensionality is relatively small. Hence `int` data type is used to define ndim
    
    - what's the max size of each dimension? Which data type is used to define dimension size (shape[x])?
    Currently, we assume the max size of each dimension is not going to exceed
    2^31 in real applications. Hence the data type is `int32_t`
    
    - what's the max size of total elements? Which data type is used to define element size (Prod(shape))?
    We assume the total number of elements in a tensor can be larger than 2^32 in some applications such as deep graph library. We use the data type `int64_t` to represent the total element size. Currently due to performance regression in some operators (such as transpose), we used a compiler flag to set this data type to `int32_t` by default. Once we have ways to mitigate the performance regression, we will set the default data type to `int64_t`, which is part of the effort in this project that Rohit proposed.
    
    What is the plan in MKLDNN to support large tensors? We may want to coordinate the progress since many operators are using MKLDNN implementation in CPU now.
    
    Many Thanks,
    
    Lin
    
    On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A <ta...@intel.com> wrote:
    
    > Thank you for bringing this topic to dev, Rohit.
    >
    > Regarding large tensor, can you articulate:
    > - what's the max size of dimensionality? Which data type is used to 
    > define dimensionality (ndims)?
    > - what's the max size of each dimension? Which data type is used to 
    > define dimension size (shape[x])?
    > - what's the max size of total elements? Which data type is used to 
    > define element size (Prod(shape))?
    >
    > For me, any of these three can be *large*.
    >
    > -----Original Message-----
    > From: Srivastava, Rohit Kumar 
    > [mailto:srivastava.141@buckeyemail.osu.edu]
    > Sent: Saturday, April 27, 2019 7:33 AM
    > To: dev@mxnet.incubator.apache.org
    > Subject: [RFC] Support for creation of Large Tensors in MXNet
    >
    > Dear Community,
    >
    > Currently MXNet supports creation of Tensors containing up to 2^32 
    > elements. However there are cases where tensors of size over 5 billion 
    > is required
    >
    > We plan to support creation of large tensors on MXNet. A design 
    > proposal is ready for review:
    > https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
    >
    > We will appreciate any help and feedbacks from the community.
    >
    > Thank you!
    >
    > Rohit
    >

RE: [RFC] Support for creation of Large Tensors in MXNet

Posted by "Lv, Tao A" <ta...@intel.com>.

Thank you Lin! I would expect the current MKL-DNN implementation already supports the scenario you mentioned here. Can be verified by this issue: https://github.com/apache/incubator-mxnet/issues/13451

But as I said before, since we support flatten or reshape operators, so it's possible for users to convert a tensor with large element size to a tensor with large dimension size. It possibly will cause issue there.

To cover more cases, MKL-DNN is going to support INT64 dimension size in its coming 1.0 major release.

-tao

-----Original Message-----
From: Lin Yuan [mailto:apeforest@gmail.com] 
Sent: Tuesday, April 30, 2019 12:56 AM
To: dev@mxnet.incubator.apache.org
Subject: Re: [RFC] Support for creation of Large Tensors in MXNet

Tao,

- what's the max size of dimensionality? Which data type is used to define dimensionality (ndims)?
We assume the max size of dimensionality is relatively small. Hence `int` data type is used to define ndim

- what's the max size of each dimension? Which data type is used to define dimension size (shape[x])?
Currently, we assume the max size of each dimension is not going to exceed
2^31 in real applications. Hence the data type is `int32_t`

- what's the max size of total elements? Which data type is used to define element size (Prod(shape))?
We assume the total number of elements in a tensor can be larger than 2^32 in some applications such as deep graph library. We use the data type `int64_t` to represent the total element size. Currently due to performance regression in some operators (such as transpose), we used a compiler flag to set this data type to `int32_t` by default. Once we have ways to mitigate the performance regression, we will set the default data type to `int64_t`, which is part of the effort in this project that Rohit proposed.

What is the plan in MKLDNN to support large tensors? We may want to coordinate the progress since many operators are using MKLDNN implementation in CPU now.

Many Thanks,

Lin

On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A <ta...@intel.com> wrote:

> Thank you for bringing this topic to dev, Rohit.
>
> Regarding large tensor, can you articulate:
> - what's the max size of dimensionality? Which data type is used to 
> define dimensionality (ndims)?
> - what's the max size of each dimension? Which data type is used to 
> define dimension size (shape[x])?
> - what's the max size of total elements? Which data type is used to 
> define element size (Prod(shape))?
>
> For me, any of these three can be *large*.
>
> -----Original Message-----
> From: Srivastava, Rohit Kumar 
> [mailto:srivastava.141@buckeyemail.osu.edu]
> Sent: Saturday, April 27, 2019 7:33 AM
> To: dev@mxnet.incubator.apache.org
> Subject: [RFC] Support for creation of Large Tensors in MXNet
>
> Dear Community,
>
> Currently MXNet supports creation of Tensors containing up to 2^32 
> elements. However there are cases where tensors of size over 5 billion 
> is required
>
> We plan to support creation of large tensors on MXNet. A design 
> proposal is ready for review:
> https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
>
> We will appreciate any help and feedbacks from the community.
>
> Thank you!
>
> Rohit
>

Re: [RFC] Support for creation of Large Tensors in MXNet

Posted by Lin Yuan <ap...@gmail.com>.

Tao,

- what's the max size of dimensionality? Which data type is used to define
dimensionality (ndims)?
We assume the max size of dimensionality is relatively small. Hence `int`
data type is used to define ndim

- what's the max size of each dimension? Which data type is used to define
dimension size (shape[x])?
Currently, we assume the max size of each dimension is not going to exceed
2^31 in real applications. Hence the data type is `int32_t`

- what's the max size of total elements? Which data type is used to define
element size (Prod(shape))?
We assume the total number of elements in a tensor can be larger than 2^32
in some applications such as deep graph library. We use the data type
`int64_t` to represent the total element size. Currently due to performance
regression in some operators (such as transpose), we used a compiler flag
to set this data type to `int32_t` by default. Once we have ways to
mitigate the performance regression, we will set the default data type to
`int64_t`, which is part of the effort in this project that Rohit proposed.

What is the plan in MKLDNN to support large tensors? We may want to
coordinate the progress since many operators are using MKLDNN
implementation in CPU now.

Many Thanks,

Lin

On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A <ta...@intel.com> wrote:

> Thank you for bringing this topic to dev, Rohit.
>
> Regarding large tensor, can you articulate:
> - what's the max size of dimensionality? Which data type is used to define
> dimensionality (ndims)?
> - what's the max size of each dimension? Which data type is used to define
> dimension size (shape[x])?
> - what's the max size of total elements? Which data type is used to define
> element size (Prod(shape))?
>
> For me, any of these three can be *large*.
>
> -----Original Message-----
> From: Srivastava, Rohit Kumar [mailto:srivastava.141@buckeyemail.osu.edu]
> Sent: Saturday, April 27, 2019 7:33 AM
> To: dev@mxnet.incubator.apache.org
> Subject: [RFC] Support for creation of Large Tensors in MXNet
>
> Dear Community,
>
> Currently MXNet supports creation of Tensors containing up to 2^32
> elements. However there are cases where tensors of size over 5 billion is
> required
>
> We plan to support creation of large tensors on MXNet. A design proposal
> is ready for review:
> https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
>
> We will appreciate any help and feedbacks from the community.
>
> Thank you!
>
> Rohit
>

RE: [RFC] Support for creation of Large Tensors in MXNet

Posted by "Lv, Tao A" <ta...@intel.com>.

Thank you for bringing this topic to dev, Rohit.

Regarding large tensor, can you articulate:
- what's the max size of dimensionality? Which data type is used to define dimensionality (ndims)?
- what's the max size of each dimension? Which data type is used to define dimension size (shape[x])?
- what's the max size of total elements? Which data type is used to define element size (Prod(shape))?

For me, any of these three can be *large*.

-----Original Message-----
From: Srivastava, Rohit Kumar [mailto:srivastava.141@buckeyemail.osu.edu] 
Sent: Saturday, April 27, 2019 7:33 AM
To: dev@mxnet.incubator.apache.org
Subject: [RFC] Support for creation of Large Tensors in MXNet

Dear Community,

Currently MXNet supports creation of Tensors containing up to 2^32 elements. However there are cases where tensors of size over 5 billion is required

We plan to support creation of large tensors on MXNet. A design proposal is ready for review:
https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support

We will appreciate any help and feedbacks from the community.

Thank you!

Rohit