You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sarutak <gi...@git.apache.org> on 2017/03/10 20:14:42 UTC

[GitHub] spark pull request #17248: [SPARK-19909][SS] Batches will fail in case that ...

GitHub user sarutak opened a pull request:

    https://github.com/apache/spark/pull/17248

    [SPARK-19909][SS] Batches will fail in case that temporary checkpoint dir is on local file system while metadata dir is on HDFS

    ## What changes were proposed in this pull request?
    
    When we try to run Structured Streaming in local mode but use HDFS for the storage, batches will be fail because of error like as follows.
    
    ```
    val handle = stream.writeStream.format("console").start()
    17/03/09 16:54:45 ERROR StreamMetadata: Error writing stream metadata StreamMetadata(fc07a0b1-5423-483e-a59d-b2206a49491e) to /private/var/folders/4y/tmspvv353y59p3w4lknrf7cc0000gn/T/temporary-79d4fe05-4301-4b6d-a902-dff642d0ddca/metadata
    org.apache.hadoop.security.AccessControlException: Permission denied: user=kou, access=WRITE, inode="/private/var/folders/4y/tmspvv353y59p3w4lknrf7cc0000gn/T/temporary-79d4fe05-4301-4b6d-a902-dff642d0ddca/metadata":hdfs:supergroup:drwxr-xr-x
    ```
    
    It's because that a temporary checkpoint directory is created on local file system but metadata whose path is based on the checkpoint directory will be created on HDFS.
    
    This PR will fixe this issue.
    
    ## How was this patch tested?
    
    I tested manually in local mode with HDFS.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sarutak/spark SPARK-19909

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17248.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17248
    
----
commit cc7a3f8474975e49a5b4e87c9ae9d8ea0185fc9f
Author: Kousuke Saruta <sa...@oss.nttdata.co.jp>
Date:   2017-03-10T19:59:47Z

    Fix the logic about creating name of temporary checkpoint directory

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17248: [SPARK-19909][SS] Batches will fail in case that tempora...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak commented on the issue:

    https://github.com/apache/spark/pull/17248
  
    O.K. I'll close this PR. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17248: [SPARK-19909][SS] Batches will fail in case that tempora...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17248
  
    **[Test build #74334 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74334/testReport)** for PR 17248 at commit [`cc7a3f8`](https://github.com/apache/spark/commit/cc7a3f8474975e49a5b4e87c9ae9d8ea0185fc9f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17248: [SPARK-19909][SS] Batches will fail in case that tempora...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17248
  
    We are closing it due to inactivity. please do reopen if you want to push it forward. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17248: [SPARK-19909][SS] Batches will fail in case that tempora...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17248
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17248: [SPARK-19909][SS] Batches will fail in case that tempora...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17248
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74334/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17248: [SPARK-19909][SS] Batches will fail in case that ...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak closed the pull request at:

    https://github.com/apache/spark/pull/17248


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17248: [SPARK-19909][SS] Batches will fail in case that tempora...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/17248
  
    I don't think this PR resolve properly the issue. Indeed, it somewhat forces the metadata to be written in a local dir instead of the configured default filesystem.
    Of course, this fixes the exception, but we loose all the benefits of a distributed file system, as fault tolerance.
    Thus in my opinion, it would be better to let the metadata be written on the default file system, but changing the default location.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17248: [SPARK-19909][SS] Batches will fail in case that tempora...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17248
  
    **[Test build #74334 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74334/testReport)** for PR 17248 at commit [`cc7a3f8`](https://github.com/apache/spark/commit/cc7a3f8474975e49a5b4e87c9ae9d8ea0185fc9f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org