You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Joel Victor <jo...@gmail.com> on 2016/06/06 11:30:53 UTC
Fact tables with complex data types.
Hi,
I am using Kylin 1.5.2 with HDP 2.2
Currently my fact table contains multiple columns with type
array<string>. Kylin
won't allow me to sync this table since it has complex datatypes. I don't
need these complex data types in my cube builds but I do require them for
other jobs.
The table is partitioned on date has 2 buckets and is stored in ORC format.
I tried creating a view over it but it seems Kylin doesn't support views as
a fact table.
Another approach that I came up with is moving all the columns with complex
data types from the original table to a separate table and use the original
table as my fact table for building cubes.
Is there any other way to go about this scenario ?
I get the following error when I sync the view:
java.lang.RuntimeException: java.io.IOException:
java.lang.NullPointerException
at
org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:86)
at
org.apache.kylin.source.hive.cardinality.HiveColumnCardinalityJob.run(HiveColumnCardinalityJob.java:89)
at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:91)
at
org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:121)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
at
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
at
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.lang.NullPointerException
at
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97)
at
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
at
org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:81)
... 10 more
Caused by: java.lang.NullPointerException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at
org.apache.hive.hcatalog.mapreduce.FosterStorageHandler.<init>(FosterStorageHandler.java:59)
at
org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:417)
at
org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:380)
at
org.apache.hive.hcatalog.mapreduce.InitializeInput.extractPartInfo(InitializeInput.java:158)
at
org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:137)
at
org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
at
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)
Thanks,
Joel
Re: Fact tables with complex data types.
Posted by ShaoFeng Shi <sh...@apache.org>.
Hi Joel,
Kylin supports View as the fact table, today many users are building cubes
from view. There is a known issue about view: Kylin couldn't calculate the
columns' cardinality for a view, the root cause is Hive HCatalog (which
Kylin uses to read source) doesn't support View so far; The error you got
is just this issue;
Kylin 1.5.2 has another issue about using View as fact table (see KYLIN
-1758); The hot-fix version 1.5.2.1 will be released soon, you can try it.
2016-06-06 19:30 GMT+08:00 Joel Victor <jo...@gmail.com>:
> Hi,
>
> I am using Kylin 1.5.2 with HDP 2.2
> Currently my fact table contains multiple columns with type array<string>. Kylin
> won't allow me to sync this table since it has complex datatypes. I don't
> need these complex data types in my cube builds but I do require them for
> other jobs.
>
> The table is partitioned on date has 2 buckets and is stored in ORC format.
>
> I tried creating a view over it but it seems Kylin doesn't support views
> as a fact table.
>
> Another approach that I came up with is moving all the columns with
> complex data types from the original table to a separate table and use the
> original table as my fact table for building cubes.
>
> Is there any other way to go about this scenario ?
>
> I get the following error when I sync the view:
> java.lang.RuntimeException: java.io.IOException:
> java.lang.NullPointerException
> at
> org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:86)
> at
> org.apache.kylin.source.hive.cardinality.HiveColumnCardinalityJob.run(HiveColumnCardinalityJob.java:89)
> at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:91)
> at
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:121)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
> at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
> at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.NullPointerException
> at
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97)
> at
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
> at
> org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:81)
> ... 10 more
> Caused by: java.lang.NullPointerException
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:191)
> at
> org.apache.hive.hcatalog.mapreduce.FosterStorageHandler.<init>(FosterStorageHandler.java:59)
> at
> org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:417)
> at
> org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:380)
> at
> org.apache.hive.hcatalog.mapreduce.InitializeInput.extractPartInfo(InitializeInput.java:158)
> at
> org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:137)
> at
> org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
> at
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)
>
>
> Thanks,
> Joel
>
--
Best regards,
Shaofeng Shi