You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Chen Lin (JIRA)" <ji...@apache.org> on 2018/11/30 04:18:00 UTC

[jira] [Updated] (SPARK-26228) OOM issue encountered when computing Gramian matrix

     [ https://issues.apache.org/jira/browse/SPARK-26228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chen Lin updated SPARK-26228:
-----------------------------
    Description: 
/**

 * Computes the Gramian matrix `A^T A`.
  *

 * @note This cannot be computed on matrices with more than 65535 columns.
  */

As the above annotation of computeGramianMatrix in RowMatrix.scala said, it supports computing on matrices with no more than 65535 columns.

 

However, we find that it will throw OOM(Request Array Size Exceeds VM Limit) when computing on matrices with 16000 columns.

 

The root casue seems that the TreeAggregate writes a  very long buffer array (16000*16000*8) which exceeds jvm limit(2^31 - 1).

 

Does RowMatrix really supports computing on matrices with no more than 65535 columns?

I doubt that computeGramianMatrix has a very serious performance issue.

Do anyone has done some performance expriments before?

  was:
/**

 * Computes the Gramian matrix `A^T A`.
 *

 *@note This cannot be computed on matrices with more than 65535 columns.
 */

As the above annotation of computeGramianMatrix in RowMatrix.scala said, it supports computing on matrices with no more than 65535 columns.

 

However, we find that it will throw OOM(Request Array Size Exceeds VM Limit) when computing on matrices with 16000 columns.

 

The root casue seems that the TreeAggregate writes a  very long buffer array (16000*16000*8) which exceeds jvm limit(2^31 - 1).

 

Does RowMatrix really supports computing on matrices with no more than 65535 columns?

I doubt that computeGramianMatrix has a very serious performance issue.

Do anyone has done some performance expriments before?


> OOM issue encountered when computing Gramian matrix 
> ----------------------------------------------------
>
>                 Key: SPARK-26228
>                 URL: https://issues.apache.org/jira/browse/SPARK-26228
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 2.3.0
>            Reporter: Chen Lin
>            Priority: Major
>
> /**
>  * Computes the Gramian matrix `A^T A`.
>   *
>  * @note This cannot be computed on matrices with more than 65535 columns.
>   */
> As the above annotation of computeGramianMatrix in RowMatrix.scala said, it supports computing on matrices with no more than 65535 columns.
>  
> However, we find that it will throw OOM(Request Array Size Exceeds VM Limit) when computing on matrices with 16000 columns.
>  
> The root casue seems that the TreeAggregate writes a  very long buffer array (16000*16000*8) which exceeds jvm limit(2^31 - 1).
>  
> Does RowMatrix really supports computing on matrices with no more than 65535 columns?
> I doubt that computeGramianMatrix has a very serious performance issue.
> Do anyone has done some performance expriments before?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org