You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Joel Victor <jo...@gmail.com> on 2016/06/06 11:30:53 UTC

Fact tables with complex data types.

Hi,

I am using Kylin 1.5.2 with HDP 2.2
Currently my fact table contains multiple columns with type
array<string>. Kylin
won't allow me to sync this table since it has complex datatypes. I don't
need these complex data types in my cube builds but I do require them for
other jobs.

The table is partitioned on date has 2 buckets and is stored in ORC format.

I tried creating a view over it but it seems Kylin doesn't support views as
a fact table.

Another approach that I came up with is moving all the columns with complex
data types from the original table to a separate table and use the original
table as my fact table for building cubes.

Is there any other way to go about this scenario ?

I get the following error when I sync the view:
java.lang.RuntimeException: java.io.IOException:
java.lang.NullPointerException
        at
org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:86)
        at
org.apache.kylin.source.hive.cardinality.HiveColumnCardinalityJob.run(HiveColumnCardinalityJob.java:89)
        at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:91)
        at
org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:121)
        at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
        at
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
        at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
        at
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.lang.NullPointerException
        at
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97)
        at
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
        at
org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:81)
        ... 10 more
Caused by: java.lang.NullPointerException
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:191)
        at
org.apache.hive.hcatalog.mapreduce.FosterStorageHandler.<init>(FosterStorageHandler.java:59)
        at
org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:417)
        at
org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:380)
        at
org.apache.hive.hcatalog.mapreduce.InitializeInput.extractPartInfo(InitializeInput.java:158)
        at
org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:137)
        at
org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
        at
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)


Thanks,
Joel

Re: Fact tables with complex data types.

Posted by ShaoFeng Shi <sh...@apache.org>.
Hi Joel,

Kylin supports View as the fact table, today many users are building cubes
from view. There is a known issue about view: Kylin couldn't calculate the
columns' cardinality for a view, the root cause is Hive HCatalog (which
Kylin uses to read source) doesn't support View so far; The error you got
is just this issue;

Kylin 1.5.2 has another issue about using View as fact table (see KYLIN
-1758); The hot-fix version 1.5.2.1 will be released soon, you can try it.

2016-06-06 19:30 GMT+08:00 Joel Victor <jo...@gmail.com>:

> Hi,
>
> I am using Kylin 1.5.2 with HDP 2.2
> Currently my fact table contains multiple columns with type array<string>. Kylin
> won't allow me to sync this table since it has complex datatypes. I don't
> need these complex data types in my cube builds but I do require them for
> other jobs.
>
> The table is partitioned on date has 2 buckets and is stored in ORC format.
>
> I tried creating a view over it but it seems Kylin doesn't support views
> as a fact table.
>
> Another approach that I came up with is moving all the columns with
> complex data types from the original table to a separate table and use the
> original table as my fact table for building cubes.
>
> Is there any other way to go about this scenario ?
>
> I get the following error when I sync the view:
> java.lang.RuntimeException: java.io.IOException:
> java.lang.NullPointerException
>         at
> org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:86)
>         at
> org.apache.kylin.source.hive.cardinality.HiveColumnCardinalityJob.run(HiveColumnCardinalityJob.java:89)
>         at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:91)
>         at
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:121)
>         at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>         at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>         at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>         at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.NullPointerException
>         at
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97)
>         at
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
>         at
> org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:81)
>         ... 10 more
> Caused by: java.lang.NullPointerException
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:191)
>         at
> org.apache.hive.hcatalog.mapreduce.FosterStorageHandler.<init>(FosterStorageHandler.java:59)
>         at
> org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:417)
>         at
> org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:380)
>         at
> org.apache.hive.hcatalog.mapreduce.InitializeInput.extractPartInfo(InitializeInput.java:158)
>         at
> org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:137)
>         at
> org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
>         at
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)
>
>
> Thanks,
> Joel
>



-- 
Best regards,

Shaofeng Shi