You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@horn.apache.org by "Edward J. Yoon" <ed...@samsung.com> on 2016/04/18 07:10:58 UTC

Bug of Aync parameter merger

Today I've tried to test Aync parameter merger on my cluster but it throws
below exeptions. The reason is maybe that the objects are not writable.

// exchange parameter update with master
    SmallLayeredNeuralNetworkMessage inMessage =
proxy.merge(avgTrainingError,
        weightUpdates, this.inMemoryModel.getWeightMatrices());

16/04/18 13:44:37 ERROR bsp.BSPTask: Error running bsp setup and bsp
function.
java.lang.reflect.UndeclaredThrowableException
	at com.sun.proxy.$Proxy5.merge(Unknown Source)
	at
org.apache.horn.bsp.SmallLayeredNeuralNetworkTrainer.calculateUpdates(SmallL
ayeredNeuralNetworkTrainer.java:203)
	at
org.apache.horn.bsp.SmallLayeredNeuralNetworkTrainer.bsp(SmallLayeredNeuralN
etworkTrainer.java:157)
	at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:171)
	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
	at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1255)
Caused by: java.io.IOException: Call to tserver/127.0.1.1:40052 failed on
local exception: java.io.IOException: Can't write: [-9.15038855924827E-4,
-6.59978486972921E-5, -3.2776380633814714E-4, -4.908153521598804E-4,
-1.6535398987379825E-4, -2.481217176171659E-5, -3.097885126985827E-4,
-4.5677556953153244E-5, -2.4586775269751806E-5]
[0.007494058256877595, 5.350221684534322E-4, 0.002670209123842312,
0.004006823375780923, 0.0013388943782793772, 1.9042423341881118E-4,
0.002527834024659683, 3.6902914343781094E-4, 2.1355016559231787E-4]
[0.003978872861757267, 2.894127056140916E-4, 0.0014220632229977248,
0.0021264410749675996, 7.121792101231689E-4, 9.615850486945849E-5,
0.001346404836613541, 1.983544516681021E-4, 1.1456125145256654E-4]
[-0.00695218464799201, -5.06352811206525E-4, -0.0024558201932702453,
-0.0037067439613160842, -0.001223792213555592, -1.3795325608127702E-4,
-0.002327019982822582, -3.333356053453033E-4, -2.2891282141905694E-4]
[0.0055693438721696985, 3.880990234702207E-4, 0.0019653686596401037,
0.002976654095812432, 9.959956034508214E-4, 1.3512799657517912E-4,
0.0018675147363488187, 2.6864207467334106E-4, 1.466807720784203E-4]
[-0.004406974432832138, -3.253702689795474E-4, -0.0015839399468065797,
-0.002365407518184666, -7.935665906444789E-4, -1.2452109168041786E-4,
-0.0014916074949176499, -2.241789663094838E-4, -1.2922820566750723E-4]
[-0.007248671217487145, -5.286530805287898E-4, -0.0025899737055120175,
-0.003882832442858956, -0.001298906448842776, -1.8706905596266497E-4,
-0.002449470417876311, -3.6278584202388065E-4, -2.0959879825765203E-4]
[0.004184985972170336, 3.028041744683768E-4, 0.0015042967415655265,
0.0022373908691310524, 7.448922331540512E-4, 1.0410505256316024E-4,
0.0014161086335274698, 2.189835200884028E-4, 1.1446038058615025E-4]
 as interface org.apache.hama.commons.math.DoubleMatrix
	at org.apache.hama.ipc.Client.wrapException(Client.java:945)
	at org.apache.hama.ipc.Client.call(Client.java:913)
	at org.apache.hama.ipc.RPC$Invoker.invoke(RPC.java:239)

And also, with below code, the master task will be just finished as soon as
it starts.

@Override
  public void bsp(
      BSPPeer<LongWritable, VectorWritable, NullWritable, NullWritable,
SmallLayeredNeuralNetworkMessage> peer)
      throws IOException, SyncException, InterruptedException {
    if (!isMaster(peer)) {
      while (!this.isConverge.get()) {
        // each slave-worker calculate the matrices updates according to
local data
        // and merge them with master
        calculateUpdates(peer);
      }
    }
  }

I'll fix and commit them to the master directly today.

--
Best Regards, Edward J. Yoon