You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by heary-cao <gi...@git.apache.org> on 2017/11/08 09:39:15 UTC

[GitHub] spark pull request #19693: [CORE]improved statistical shuffle write time

GitHub user heary-cao opened a pull request:

    https://github.com/apache/spark/pull/19693

    [CORE]improved statistical shuffle write time

    ## What changes were proposed in this pull request?
    
    Creating the file to write to and creating a disk writer both involve interacting with the disk, and can take a long time when we open or close many files, so should be included in the shuffle write time.
    
    so we call mergeSpillsWithTransferTo, only contains the write file the time, but did not included in the shuffle write time when open and close many merges spill files .
    
    ## How was this patch tested?
    
    existed test cases.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/heary-cao/spark task_statistics

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19693.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19693
    
----
commit e1d6df4cecc757a7f66feefa2e3bd6816e7abd3f
Author: caoxuewen <ca...@zte.com.cn>
Date:   2017-11-08T07:57:28Z

    improved statistical shuffle write time

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19693: [MINOR][CORE] Improved statistical shuffle write time

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/19693
  
    Whether shuffle write time should include the file open/close time is debatable, also we don't know  whether the actual open action is lazy or not (depends on OS). But one downside of this change is that it makes this metric incomparable to the old one, because we changed the semantics now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19693: [MINOR][CORE] Improved statistical shuffle write time

Posted by heary-cao <gi...@git.apache.org>.
Github user heary-cao commented on the issue:

    https://github.com/apache/spark/pull/19693
  
    @cloud-fan @jerryshao @jiangxb1987 thank you for review it.
    `Whether shuffle write time should include the file open/close time is debatable` I agree with you @jerryshao . Because review spark code, see the realization of this controversy, But this PR hope to unify. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19693: [MINOR][CORE] Improved statistical shuffle write time

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19693
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19693: [MINOR][CORE] Improved statistical shuffle write time

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/19693
  
    Personally I don't think we should take the file open/close time into account, cc @cloud-fan .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19693: [MINOR][CORE] Improved statistical shuffle write ...

Posted by heary-cao <gi...@git.apache.org>.
Github user heary-cao closed the pull request at:

    https://github.com/apache/spark/pull/19693


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19693: [MINOR][CORE] Improved statistical shuffle write time

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19693
  
    Shall we leave this closed for now @heary-cao?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19693: [MINOR][CORE] Improved statistical shuffle write time

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19693
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19693: [MINOR][CORE] Improved statistical shuffle write time

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19693
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19693: [MINOR][CORE] Improved statistical shuffle write time

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19693
  
    This can be a debate when we introduced this metric, now I think it's too late to change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19693: [MINOR][CORE] Improved statistical shuffle write time

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/19693
  
    Seems it's better to keep the previous  behavior.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19693: [MINOR][CORE] Improved statistical shuffle write time

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19693
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19693: [MINOR][CORE] Improved statistical shuffle write time

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19693
  
    I don't have a strong preference, cc @srowen 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org