You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by "Richard Calaba (JIRA)" <ji...@apache.org> on 2016/03/18 18:15:33 UTC

[jira] [Created] (KYLIN-1508) NPE error ERROR - Build Cube - Step 3 if LOOKUP table is Hive View - : java.lang.NullPointerException at org.apache.kylin.source.hive.HiveTable.getSignature

Richard Calaba created KYLIN-1508:
-------------------------------------

             Summary: NPE error ERROR - Build Cube - Step 3 if LOOKUP table is Hive View - : java.lang.NullPointerException at org.apache.kylin.source.hive.HiveTable.getSignature
                 Key: KYLIN-1508
                 URL: https://issues.apache.org/jira/browse/KYLIN-1508
             Project: Kylin
          Issue Type: Bug
          Components: Job Engine
    Affects Versions: v1.2, v1.3.0, v1.5.0, all
         Environment: CentOS 6.7 , Hortonworks Hadoop 2.2.4.2 - sandbox
            Reporter: Richard Calaba
            Assignee: Dong Li


Affected releases: The error occurs in 1.2 release (downloaded as binary release from Kylin Home page). The same error occurs also in ??latest?? snapshot 2.1 (compiled today 3/12/16 from Kylin Git using bracnh 2.x-staging). Tests were done on Hortonworks HDP 2.2.4 sandbox. Thus I assume that all current branches are affected.

Error Summary:   Whereas it is possible to successfully build cube which FACT table referring Hive View – it seems not to be possible to build a cube which has LOOKUP table referring Hive View. It seems to be bug in metadata processing in Step 3 of the Build process. Build process fails with exception:

(release 1.2) 
java.io.IOException: java.lang.NullPointerException
                at org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:72)
                at org.apache.kylin.dict.lookup.SnapshotTable.<init>(SnapshotTable.java:62)
                at org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:85)
                at org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:205)
                at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:60)
                at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:41)
                at org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:52)
                at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
                at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
                at org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:62)
                at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
                at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:51)
                at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
                at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:130)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
                at org.apache.kylin.common.util.HadoopUtil.fixWindowsPath(HadoopUtil.java:76)
                at org.apache.kylin.common.util.HadoopUtil.makeURI(HadoopUtil.java:68)
                at org.apache.kylin.common.util.HadoopUtil.getFileSystem(HadoopUtil.java:63)
                at org.apache.kylin.dict.lookup.FileTable.getSizeAndLastModified(FileTable.java:85)
                at org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:57)
                ... 16 more


Error in latest branch - (snapshoot-2.1) :

java.io.IOException: java.lang.NullPointerException
                at org.apache.kylin.source.hive.HiveTable.getSignature(HiveTable.java:71)
                at org.apache.kylin.dict.lookup.SnapshotTable.<init>(SnapshotTable.java:64)
                at org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:89)
                at org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:208)
                at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
                at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
                at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
                at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
                at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
                at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
                at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
                at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
                at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
                at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                at java.lang.Thread.run(Thread.java:745)


Importance of resolution of this bug: I believe that having the ability to use Hive Views is essential and should work for both FACT tables/views and LOOKUP/DIMENSION tables/views. 1st it helps to do more efficient OLAP staging by not forcing to materialize temporary views used just for OLAP modelling. 2ndly and most importantly having views with left outer and inner joins can further overcome the current Kylin limitation of not beeing to able to provide snowflake-like models.

Further info - reproduce scenario: I have created 2 Hive Tables (FACT and LOOKUP) using CREATE TABLE AS SELECT … statement in Hive. Then I have created 2 Hive Views using CREATE VIEW AS SELECT … supplying exactly same SQL statements (using left outer and inner joins). All 4 tables/views are correctly producing data in Hive. If I build simple Cube in Kylin using 2 tables – everything works fine. If I use Hive view for FACT table and regular Hive Table for LOOKUP – it also works. The error occurs if I try to use Hiev View as LOOKUP table and define dimension from it.

Just for completeness - below is full Hive QL for creating the table/views. It operates on tables which are created by internal tests while building Kylin 1.2. But I believe the exact view definition should not matter. I did couple of additional tests to create very simple views by doing only "SELECT * FROM one_table" and the error was still the same.

Hive Query Editor : Create Fact as View 

CREATE VIEW view_fact_tab_enhanced 
AS SELECT * 
FROM DEFAULT.TEST_KYLIN_FACT AS TEST_KYLIN_FACT LEFT OUTER JOIN EDW.TEST_SELLER_TYPE_DIM AS TEST_SELLER_TYPE_DIM 
ON TEST_KYLIN_FACT.SLR_SEGMENT_CD = TEST_SELLER_TYPE_DIM.SELLER_TYPE_CD


Hive Query Editor : Create Fact as Table

CREATE TABLE table_fact_tab_enhanced 
AS SELECT * 
FROM DEFAULT.TEST_KYLIN_FACT AS TEST_KYLIN_FACT LEFT OUTER JOIN EDW.TEST_SELLER_TYPE_DIM AS TEST_SELLER_TYPE_DIM 
ON TEST_KYLIN_FACT.SLR_SEGMENT_CD = TEST_SELLER_TYPE_DIM.SELLER_TYPE_CD


Hive Query Editor : Create LOOKUP as View

CREATE VIEW view_dim_tab_self_join AS
SELECT distinct A.leaf_categ_id, B.leaf_categ_name, A.site_id 
FROM DEFAULT.TEST_CATEGORY_GROUPINGS AS A LEFT OUTER JOIN DEFAULT.TEST_CATEGORY_GROUPINGS AS B 
ON A.leaf_categ_id = B.leaf_categ_id AND A.site_id = B.site_id ORDER BY A.leaf_categ_id, B.leaf_categ_name, A.site_id  


Hive Query Editor : Create LOOKUP as Table

CREATE TABLE table_dim_tab_self_join AS
SELECT distinct A.leaf_categ_id, B.leaf_categ_name, A.site_id 
FROM DEFAULT.TEST_CATEGORY_GROUPINGS AS A LEFT OUTER JOIN DEFAULT.TEST_CATEGORY_GROUPINGS AS B 
ON A.leaf_categ_id = B.leaf_categ_id AND A.site_id = B.site_id ORDER BY A.leaf_categ_id, B.leaf_categ_name, A.site_id  


Additional curiosity – seems release 1.2 is handling better the final size of the generated Cube – assuming I didn’t make any mistake in cube definitions and used same in both 1.2-release and 2.1-SNAPSHOT, I am getting smaller cube size in release 1.2 than in 2.1-SNAPSHOT codeline - in both cases the number of rows is exactly same. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [jira] [Created] (KYLIN-1508) NPE error ERROR - Build Cube - Step 3 if LOOKUP table is Hive View - : java.lang.NullPointerException at org.apache.kylin.source.hive.HiveTable.getSignature

Posted by hongbin ma <ma...@apache.org>.
BTW, Richard you should take a look at
http://apache-kylin.74782.x6.nabble.com/Back-to-one-dev-branch-td3851.html#a3863,
we have renamed our branches

On Sun, Mar 20, 2016 at 1:54 PM, hongbin ma <ma...@apache.org> wrote:

> ​hi rechard,
>
> you're right, ​ it IS NOT possible to build a cube which has LOOKUP table
> referring Hive View for now. As I remember you're not the first one to come
> across with this issue. I'm tracking this issue in
> https://issues.apache.org/jira/browse/KYLIN-1510, hope to have bandwidth
> to fix it ASAP.
>
> BTW since it's a relatively independent task I'm marking it as "newbie"
> task. Volunteers are greatly welcomed on this:)
>
>
> On Sat, Mar 19, 2016 at 1:15 AM, Richard Calaba (JIRA) <ji...@apache.org>
> wrote:
>
>> ssible to successfully build cube which FACT table referring Hive View –
>> it seems not to be possible to build a cube which has LOOKUP table
>> referring Hive View. It seems to be
>
>
> ​
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: [jira] [Created] (KYLIN-1508) NPE error ERROR - Build Cube - Step 3 if LOOKUP table is Hive View - : java.lang.NullPointerException at org.apache.kylin.source.hive.HiveTable.getSignature

Posted by hongbin ma <ma...@apache.org>.
​hi rechard,

you're right, ​ it IS NOT possible to build a cube which has LOOKUP table
referring Hive View for now. As I remember you're not the first one to come
across with this issue. I'm tracking this issue in
https://issues.apache.org/jira/browse/KYLIN-1510, hope to have bandwidth to
fix it ASAP.

BTW since it's a relatively independent task I'm marking it as "newbie"
task. Volunteers are greatly welcomed on this:)


On Sat, Mar 19, 2016 at 1:15 AM, Richard Calaba (JIRA) <ji...@apache.org>
wrote:

> ssible to successfully build cube which FACT table referring Hive View –
> it seems not to be possible to build a cube which has LOOKUP table
> referring Hive View. It seems to be


​



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone