You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mxnet.apache.org by Pedro Larroy <pe...@gmail.com> on 2019/03/28 20:37:36 UTC

UB, type narrowing and integer overflows (in this case in mshadow)

Hi

While looking at the failures reporting in this issue:
https://github.com/apache/incubator-mxnet/issues/14522

I have noticed that in mshadow when calling the BLAS Engine we are
doing narrowing integer conversions from index_t (int64_t) to int, and
then operations on dimensions that can overflow integer arithmetic
such as i * m *k as seen in the second link below. Which when added to
the pointer holding the matrix data results in a) undefined behaviour,
and b) in x86_64 a subtraction instead of an addition due to platform
dependent integer overflow semantics in x86 platforms.

I think we should address this in a twofold manner: checking the
shapes for possible overflows in the implementation (which will have
some performance impact), and second we should widen the types of
BLASEngine to index_t.

https://github.com/dmlc/mshadow/blob/master/mshadow/tensor_cpu-inl.h#L613

Which in CPU ends up calling batched_gemm:

https://github.com/dmlc/mshadow/blob/master/mshadow/dot_engine-inl.h#L339

Let me know if you have additional thoughts in this solution or see
any blockers or better ideas, otherwise I will proceed to work on PRs
fixing this in mshadow. Since it's an issue that seem to happen often
I think we should be really careful with integer overflows and
undefined behaviour related bugs and pay attention in CRs to this kind
of traps.

Thanks!

Pedro.