You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by "yoonsung.lee (Jira)" <ji...@apache.org> on 2020/12/23 08:02:00 UTC

[jira] [Created] (KYLIN-4847) Cuboid to HFile step failed on multiple job server env because of trying to read the metric jar file from the inactive job server's location.

yoonsung.lee created KYLIN-4847:
-----------------------------------

             Summary: Cuboid to HFile step failed on multiple job server env because of trying to read the metric jar file from the inactive job server's location.
                 Key: KYLIN-4847
                 URL: https://issues.apache.org/jira/browse/KYLIN-4847
             Project: Kylin
          Issue Type: Bug
          Components: Job Engine
    Affects Versions: v3.1.0
            Reporter: yoonsung.lee


h1. My Cluster Setting
1. versIon: 3.1.0
2. 2 job servers(job & query mode), 2 query only servers. Each of them runs on each different host machine.
3. Use spark engine to build job.

h1. Problem Circumstance
h2. Root cause
The active job server submits spark job to execute `Convert Cuboid Data to HFile`. But the active job server get an error because a resource for submitting spark job has the wrong path which the active job server cannot read.
 * wrong resource: ${KYLIN_HOME}/tomcat/webapps/kylin/WEB-INF/lib/metrics-core-2.2.0.jar
 * The ${KYLIN_HOME} is the inactive job server's location for only the above jar file.

This situation occurs in the following two circumstances.

h2. On build cube
1. Request the build API to the inactive job server. 
   * exactly: /kylin/api/cubes/{cube_name}/rebuild
2. Inactive job server stores the build task in meta store.
3. Active job server takes the build task and proceeds it.
4. Active job server failed on the `Convert Cuboid Data to HFile` step.

** This doesn't occur when I request build API to the active job server. **

h2. On merge
1. Trigger merge cube job periodically
2. Active job server takes the merge task and proceeds it.
3. Active job server failed on the `Convert Cuboid Data to HFile` step.

**This doesn't occur when there is only one job server in the cluster.**

h1. Progress to solve this.
I'm trying to find which code set the metrics-core-2.2.0.jar path wrong.
Until now, I guess this code would be the set the metrics-core-2.2.0.jar for the `Cuboid to HFile` spark job.
 * https://github.com/apache/kylin/blob/kylin-3.1.0/storage-hbase/src/main/java/org/apache/kylin/storage/hbase/steps/HBaseSparkSteps.java#L69


h1. Questions
1.  I'm trying to remote debug with IDE to make sure my guess is right. But the breakpoint on that line is not captured on Runtime. It seems to be called on the booting phase. Is it right?

2. Is there any hint or guessing to solve this issue regardless of the above my progress?




--
This message was sent by Atlassian Jira
(v8.3.4#803005)