You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "hsj (JIRA)" <ji...@apache.org> on 2017/05/31 07:52:04 UTC

[jira] [Issue Comment Deleted] (PIG-5157) Upgrade to Spark 2.0

     [ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

hsj updated PIG-5157:
---------------------
    Comment: was deleted

(was: [~nkollar]:
bq. in JobMetricsListener.java there's a huge code section commented out (uncomment and remove the code onTaskEnd until we fix PIG-5157). Should we enable that?
the reason to modify it is because [~rohini] suggested that [memory| is used a lot if we update metric info in onTaskEnd()(suppose there are thousand tasks)
in org.apache.pig.backend.hadoop.executionengine.spark.JobMetricsListener of spark21, we should use code like following 
notice: not fully test, can not guarantee it is right.
{code}
  public void onStageCompleted(SparkListenerStageCompleted stageCompleted) {
        if we update taskMetrics in onTaskEnd(), it consumes lot of memory.
        int stageId = stageCompleted.stageInfo().stageId();
        int stageAttemptId = stageCompleted.stageInfo().attemptId();
        String stageIdentifier = stageId + "_" + stageAttemptId;
        Integer jobId = stageIdToJobId.get(stageId);
        if (jobId == null) {
            LOG.warn("Cannot find job id for stage[" + stageId + "].");
        } else {
            Map<String, List<TaskMetrics>> jobMetrics = allJobMetrics.get(jobId);
            if (jobMetrics == null) {
                jobMetrics = Maps.newHashMap();
                allJobMetrics.put(jobId, jobMetrics);
            }
            List<TaskMetrics> stageMetrics = jobMetrics.get(stageIdentifier);
            if (stageMetrics == null) {
                stageMetrics = Lists.newLinkedList();
                jobMetrics.put(stageIdentifier, stageMetrics);
            }
 
             stageMetrics.add(stageCompleted.stageInfo().taskMetrics());
        }
    }
    public synchronized void onTaskEnd(SparkListenerTaskEnd taskEnd) {
}
{code}
bq. I removed JobLogger, do we need it? It seems that a property called 'spark.eventLog.enabled' is the proper replacement for this class, should we use it instead? It looks like JobLogger became deprecated and was removed from Spark 2.
It seems we can remove JobLogger and enable {{spark.eventLog.enabled}} in spark2
)

> Upgrade to Spark 2.0
> --------------------
>
>                 Key: PIG-5157
>                 URL: https://issues.apache.org/jira/browse/PIG-5157
>             Project: Pig
>          Issue Type: Improvement
>          Components: spark
>            Reporter: Nandor Kollar
>            Assignee: Nandor Kollar
>             Fix For: 0.18.0
>
>         Attachments: PIG-5157.patch
>
>
> Upgrade to Spark 2.0 (or latest)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)