You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "zhengruifeng (Jira)" <ji...@apache.org> on 2020/06/23 01:48:00 UTC

[jira] [Created] (SPARK-32061) potential regression if use memoryUsage instead of numRows

zhengruifeng created SPARK-32061:
------------------------------------

             Summary: potential regression if use memoryUsage instead of numRows
                 Key: SPARK-32061
                 URL: https://issues.apache.org/jira/browse/SPARK-32061
             Project: Spark
          Issue Type: Sub-task
          Components: ML, PySpark
    Affects Versions: 3.1.0
            Reporter: zhengruifeng


1, if the `memoryUsage` is improperly set, for example, too small to store a instance;

2,  the blockify+GMM reuse two matrices whose shape is related to current blockSize:
{code:java}
@transient private lazy val auxiliaryProbMat = DenseMatrix.zeros(blockSize, k)
@transient private lazy val auxiliaryPDFMat = DenseMatrix.zeros(blockSize, numFeatures) {code}
When implementing blockify+GMM, I found that if I do not pre-allocate those matrices, there will be seriously regression (maybe 3~4 slower, I fogot the detailed numbers);

3, in MLP, three pre-allocated objects are also related to numRows:
{code:java}
if (ones == null || ones.length != delta.cols) ones = BDV.ones[Double](delta.cols)

// TODO: allocate outputs as one big array and then create BDMs from it
if (outputs == null || outputs(0).cols != currentBatchSize) {
...

// TODO: allocate deltas as one big array and then create BDMs from it
if (deltas == null || deltas(0).cols != currentBatchSize) {
  deltas = new Array[BDM[Double]](layerModels.length)
... {code}
I am not very familiar with the impl of MLP and failed to find some related document about this pro-allocation. But I guess there maybe regression if we disable this pro-allocation, since those objects look relatively big.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org