You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/07/16 07:42:18 UTC

[GitHub] [incubator-mxnet] kostayScr opened a new issue #15555: Bug or highly unintuitive and undocumented behaviour corrupting NDArray data(possibly race condition)[MKLDNN][C++]

kostayScr opened a new issue #15555: Bug or highly unintuitive and undocumented behaviour corrupting NDArray data(possibly race condition)[MKLDNN][C++]
URL: https://github.com/apache/incubator-mxnet/issues/15555
 
 
   Copying from NDArray a to NDArray b and then back from b to a(saving network weights for early stopping) causes data corruption. It is not 100% consistent, probably hard to reproduce(but I had it reproducing semi-consistently, must be sensitive to the execution conditions, that's why I mention race possibility). Possibly because execution engine provides a weak gurantee, thus executing a->b and b->a out of order?(https://mxnet.incubator.apache.org/versions/master/architecture/overview.html#execution-engine "The execution of any two functions that modify a common variable is serialized in their push order.")
   
   Using MXNet with MKLDNN v0.20 beta and from C++, built from source downloaded from github(apache-mxnet-src-1.4.0.rc3-incubating). MXNet built with CMake and MSVC (VS 2019) .
   
   ```
   void Test()
   {
   	using namespace mxnet::cpp;
   	using ArgMap = map<string, NDArray>;
   	ArgMap args;
   	ArgMap auxiliaryState;
   	ArgMap bestModelArgs;
   	ArgMap bestModelAux;
   	auto AssignNDArrayMap = []( decltype( args ) & lhs, const decltype( args ) & rhs )
   	{
   		for ( auto &[k, arr] : rhs )
   			arr.CopyTo( &lhs.at(k) );
   	};
   	auto WaitMap = []( auto &map ) 
   	{
   		return; //Workaround - commenting this return statement fixes the problem
   		for ( auto &[k, v] : map )
   		{
   			v.WaitAll();
   			v.WaitToRead();
   			v.WaitToWrite();
   		}
   	};
   	auto SaveAsBestModel = [ &WaitMap, &bestModelArgs, &bestModelAux, &AssignNDArrayMap, &args, &auxiliaryState ]()
   	{
   		AssignNDArrayMap( bestModelArgs, args );
   		AssignNDArrayMap( bestModelAux, auxiliaryState );
   		WaitMap( bestModelArgs );
   		WaitMap( bestModelAux );
   	};
   	auto LoadBestModel = [ &WaitMap, &bestModelArgs, &bestModelAux, &AssignNDArrayMap, &args, &auxiliaryState ]()
   	{
   		AssignNDArrayMap( args, bestModelArgs );
   		AssignNDArrayMap( auxiliaryState, bestModelAux );
   		WaitMap( args );
   		WaitMap( auxiliaryState );
   	};
                               //copy
   	SaveAsBestModel();  //a -> b
   	LoadBestModel();    //b -> a
   		
   	//Data in args/auxiliaryState is now corrupt!
   }
   ````
   If this is the intended behaviour there needs to be several big, **bold** warnings, for example on the NDArray API docs page.
   Waiting with WaitAll() works around the problem(see code above).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services