You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jerryshao <gi...@git.apache.org> on 2014/07/08 11:17:30 UTC

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

GitHub user jerryshao opened a pull request:

    https://github.com/apache/spark/pull/1327

    [SPARK-2402] Update the initial position when reuse DiskBlockObjectWriter

    Minor fix, `initialPosition` can not be updated after `close()` and re`open()`, which will lead to error when reusing this object. Details can be seen in [SPARK-2402](https://issues.apache.org/jira/browse/SPARK-2402).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jerryshao/apache-spark SPARK-2402

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1327.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1327
    
----
commit ba8098f5c284b6803a98e5a3b3b8115488bb34cc
Author: jerryshao <sa...@intel.com>
Date:   2014-07-08T08:33:16Z

    Update the initial position when reuse DiskBlockObjectWriter to open the file

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/1327#issuecomment-48367468
  
    This class has historically been "almost-reusable", but since there was never any need to actually reuse it within Spark itself, the effort wasn't completed. I'm not sure if this change is sufficient, but doing this initialization in open() rather than the constructor seems marginally cleaner anyway. Unless, @mridulm, you have a case in mind where this will cause an issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao closed the pull request at:

    https://github.com/apache/spark/pull/1327


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/1327#issuecomment-48560796
  
    Honestly, I like this change, because it's generally good practice to minimize the amount of work done inside constructors. We should consider making it as part of the shuffle consolidation fixes patch @mridulm mentioned, even if the Writer is not intended to be reused. (Can't wait to see that patch, by the way!)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/1327#issuecomment-48329176
  
    It is alright if this class is not reopen supported, but seems there is not obvious fence to defend user to reuse this object, so at least this modification will not lead to error.
    
    If we really want to defend reopen of this object, `open()` should not be exposed to the user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/1327#issuecomment-48565390
  
    IMHO, if we leave reuse of this object aside, I don't think this change will bring specific effect to the current code path. If we really want to defend the reuse of this object, there's no conflict with adding closed flag. Also this change align with other field which are all initialized in the initialize(), not constructor....


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1327#issuecomment-48571146
  
    The reason to initialize size upfront and keep it immutable is to enforce
    constraints on the writer since it is lazily initialized.
    On Jul 10, 2014 10:29 AM, "Saisai Shao" <no...@github.com> wrote:
    
    > IMHO, if we leave reuse of this object aside, I don't think this change
    > will bring specific effect to the current code path. If we really want to
    > defend the reuse of this object, there's no conflict with adding closed
    > flag. Also this change align with other field which are all initialized in
    > the initialize(), not constructor....
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/1327#issuecomment-48565390>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1327#issuecomment-48295927
  
    This is incorrect. It is used to find the length of the file when created : not when initialized.
    
    There is not reopen supported in this class btw.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1327#issuecomment-48289657
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/1327#issuecomment-48553246
  
    Ok, sorry for my unthoughtful PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1327#issuecomment-48289644
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1327#issuecomment-48504833
  
    As @aarondav mentions, this class has never been 're-used'.
    As part of our shuffle consolidation fixes, we actually depend on this behavior (and enforce it via a closed flag).
    
    I did not want this PR to change the expectation within spark !
    Hopefully that PR is going to be submitted later this week or early next week (currently pending testcases)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1327#issuecomment-48301683
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1327#issuecomment-48301689
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16403/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2402] Update the initial position when ...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/1327#issuecomment-48569260
  
    Agreed, I'd be fine merging this one as-is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---