You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/21 15:46:35 UTC

[GitHub] [spark] srowen commented on issue #24648: [SPARK-27777][ML] Eliminate uncessary sliding job in AreaUnderCurve

srowen commented on issue #24648: [SPARK-27777][ML] Eliminate uncessary sliding job in AreaUnderCurve
URL: https://github.com/apache/spark/pull/24648#issuecomment-494447837
 
 
   The existing implementation also makes one pass; what do you mean there?
   This looks like it's just optimizing the implementation, though it makes it significantly more complex. The curve RDD is typically not nearly so large, maybe 1000 points. is it worth it?
   
   I wonder if a simpler implementation also gets a performance gain: 
   ```
     def of(curve: RDD[(Double, Double)]): Double = {
       curve.sliding(2).map { pair: Array[(Double, Double)] => trapezoid(pair) }.sum()
     }
   
     def of(curve: Iterable[(Double, Double)]): Double = {
       curve.toIterator.sliding(2).withPartial(false).map(trapezoid).sum
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org