You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sujith71955 <gi...@git.apache.org> on 2018/09/27 17:58:35 UTC

[GitHub] spark pull request #22572: [SPARK-25521][SQL]Job id showing null in the logs...

GitHub user sujith71955 opened a pull request:

    https://github.com/apache/spark/pull/22572

    [SPARK-25521][SQL]Job id showing null in the logs when insert into command Job is finished.

    ## What changes were proposed in this pull request?
    ``As part of  insert command  in FileFormatWriter, a job context is created for handling the write operation , While initializing the job context setupJob() API
    in HadoopMapReduceCommitProtocol sets the jobid  in the Jobcontext configuration, Since we are directly getting the jobId from the map reduce JobContext the job id will come as
    null in the logs. As a solution we shall get the jobID from the configuration of the map reduce Jobcontext.``
    
    ## How was this patch tested?
    Manually, verified the logs after the changes.
    
    ![spark-25521 1](https://user-images.githubusercontent.com/12999161/46164933-e95ab700-c2ac-11e8-88e9-49fa5100b872.PNG)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sujith71955/spark master_log_issue

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22572.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22572
    
----
commit 23a1b063e8317b2422acdc05a4635fce14b2bc49
Author: s71955 <su...@...>
Date:   2018-09-27T15:33:51Z

    [SPARK-25521][SQL]Job id showing null in the logs when insert into command Job is finished.
    
    ## What changes were proposed in this pull request?
    As part of  insert command  in FileFormatWriter, a job context is created for handling the write operation , While initializing the job context setupJob() API
    in HadoopMapReduceCommitProtocol sets the jobid  in the Jobcontext configuration, Since we are directly getting the jobId from the map reduce JobContext the job id will come as
    null in the logs. As a solution we shall get the jobID from the configuration of the map reduce Jobcontext
    
    ## How was this patch tested?
    Manually, verified the logs after the changes.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    thanks, merging to master/2.4!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96938/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    **[Test build #96925 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96925/testReport)** for PR 22572 at commit [`56c5ff5`](https://github.com/apache/spark/commit/56c5ff5b5a11a0ef7d5f4055cbf67bbbc310e111).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22572: [SPARK-25521][SQL]Job id showing null in the logs...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22572#discussion_r221157274
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ---
    @@ -183,15 +183,16 @@ object FileFormatWriter extends Logging {
           val commitMsgs = ret.map(_.commitMsg)
     
           committer.commitJob(job, commitMsgs)
    -      logInfo(s"Job ${job.getJobID} committed.")
    +      logInfo(s"Job ${job.getConfiguration.get("mapreduce.job.id")} committed.")
    --- End diff --
    
    we should log something here, but `mapreduce.job.id` is not useful, how about `description.uuid`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    **[Test build #96938 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96938/testReport)** for PR 22572 at commit [`56c5ff5`](https://github.com/apache/spark/commit/56c5ff5b5a11a0ef7d5f4055cbf67bbbc310e111).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96925/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    Can we update the PR to use `description.uuid` first?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by sujith71955 <gi...@git.apache.org>.
Github user sujith71955 commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    > Is the value logged here always null?
    > I am not sure if it's meaningful to log mapreduce.job.id, especially given its name. If there's no meaningful job ID here do we are about it at all? how about deleting the log?
    > `SparkHadoopWriter` does something similar.
    
    even initially i thought the same, not sure whether mapreduce.job.id makes sense here, but i think we shall not display null . Deleting the log will be the easiest option but just curious to know why  were trying to log a map reduce job id .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22572: [SPARK-25521][SQL]Job id showing null in the logs...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22572


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by sujith71955 <gi...@git.apache.org>.
Github user sujith71955 commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    cc @cloud-fan @srowen 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22572: [SPARK-25521][SQL]Job id showing null in the logs...

Posted by sujith71955 <gi...@git.apache.org>.
Github user sujith71955 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22572#discussion_r221371957
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ---
    @@ -183,15 +183,16 @@ object FileFormatWriter extends Logging {
           val commitMsgs = ret.map(_.commitMsg)
     
           committer.commitJob(job, commitMsgs)
    -      logInfo(s"Job ${job.getJobID} committed.")
    +      logInfo(s"Job ${job.getConfiguration.get("mapreduce.job.id")} committed.")
    --- End diff --
    
    Thanks for the suggestions!! I will update this PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by sujith71955 <gi...@git.apache.org>.
Github user sujith71955 commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    > Can we update the PR to use `description.uuid` first?
    
    Updated FileFormatWriter with description.uuid, attaching the verification snapshot .
    ![image](https://user-images.githubusercontent.com/12999161/46455292-f48b7680-c7c7-11e8-8e10-e76e45723542.png)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by sujith71955 <gi...@git.apache.org>.
Github user sujith71955 commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    > > Can we update the PR to use `description.uuid` first?
    > 
    > Updated FileFormatWriter with description.uuid, attaching the verification snapshot .
    > ![image](https://user-images.githubusercontent.com/12999161/46455292-f48b7680-c7c7-11e8-8e10-e76e45723542.png)
    
    Let me know whether we shall update SparkHadoopWriter.scala flow as in this flow currently the jobid is been displayed properly , to display the job description uuid i need to explore as this flow doesnt holds any WriteJobDescription instance.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by sujith71955 <gi...@git.apache.org>.
Github user sujith71955 commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    @srowen @cloud-fan 
    I was testing the SparkHadoopWriter flow, with below steps and i could see in the log with  job id  printed properly, so is it fine to update this flow also with description.uuid ? Attaching the snapshot of logs based SparkHadoopWriter flow
    val rdd=spark.sparkContext.newAPIHadoopFile("D:/data/x.csv",classOf[org.apache.hadoop.mapreduce.lib.input.NLineInputFormat],classOf[org.apache.hadoop.io.LongWritable],classOf[org.apache.hadoop.io.Text])
    
    val hconf=spark.sparkContext.hadoopConfiguration
    
    hconf.set("mapreduce.output.fileoutputformat.outputdir","D:/data/test")
    
    scala> rdd.saveAsNewAPIHadoopDataset(hconf)
    
    ![sparkhadoopwriter](https://user-images.githubusercontent.com/12999161/46429141-59f94c00-c763-11e8-8991-fd154b8dba07.png)
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    **[Test build #96938 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96938/testReport)** for PR 22572 at commit [`56c5ff5`](https://github.com/apache/spark/commit/56c5ff5b5a11a0ef7d5f4055cbf67bbbc310e111).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by sujith71955 <gi...@git.apache.org>.
Github user sujith71955 commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    cc @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22572: [SPARK-25521][SQL]Job id showing null in the logs...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22572#discussion_r221305617
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ---
    @@ -183,15 +183,16 @@ object FileFormatWriter extends Logging {
           val commitMsgs = ret.map(_.commitMsg)
     
           committer.commitJob(job, commitMsgs)
    -      logInfo(s"Job ${job.getJobID} committed.")
    +      logInfo(s"Job ${job.getConfiguration.get("mapreduce.job.id")} committed.")
    --- End diff --
    
    SparkHadoopWriter needs a similar change, then, BTW


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by sujith71955 <gi...@git.apache.org>.
Github user sujith71955 commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    When i digged the code i could see in SparkHadoopWriter, while creating job context itself job id is been intialized.
    
    ![image](https://user-images.githubusercontent.com/12999161/46430185-ec025400-c765-11e8-90c3-cdb022c94183.png)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    **[Test build #96925 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96925/testReport)** for PR 22572 at commit [`56c5ff5`](https://github.com/apache/spark/commit/56c5ff5b5a11a0ef7d5f4055cbf67bbbc310e111).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22572: [SPARK-25521][SQL]Job id showing null in the logs when i...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22572
  
    Is the value logged here always null?
    I am not sure if it's meaningful to log mapreduce.job.id, especially given its name. If there's no meaningful job ID here do we are about it at all? how about deleting the log?
    `SparkHadoopWriter` does something similar.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org