You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by ga...@gfmintegration.it on 2015/07/07 12:42:23 UTC
Kylin 0.7.1 - Failed to build a cube
Hi,
I am trying to create a cube from a star schema created using Hive External tables (below an example) stored as TEXT FILE (CSV).
CREATE EXTERNAL TABLE IF NOT EXISTS USERS_TABLE (
uid INT,
name STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073' LINES TERMINATED BY '\012'
STORED AS TEXTFILE
LOCATION '/data/users';
To CSV files are obtained from Spark RDDs, so they are saved as part-xxxx. Below the HDFS listing
hdfs dfs -ls /data/users
Found 12 items
-rw-r--r-- 3 hdfs hdfs 0 2015-07-07 12:05 /data/users/_SUCCESS
-rw-r--r-- 3 hdfs hdfs 3699360 2015-07-07 12:05 /data/users/part-00000
-rw-r--r-- 3 hdfs hdfs 3694740 2015-07-07 12:05 /data/users/part-00001
-rw-r--r-- 3 hdfs hdfs 3685374 2015-07-07 12:05 /data/users/part-00002
-rw-r--r-- 3 hdfs hdfs 3719646 2015-07-07 12:05 /data/users/part-00003
-rw-r--r-- 3 hdfs hdfs 3682476 2015-07-07 12:05 /data/users/part-00004
-rw-r--r-- 3 hdfs hdfs 3679956 2015-07-07 12:05 /data/users/part-00005
-rw-r--r-- 3 hdfs hdfs 3700242 2015-07-07 12:05 /data/users/part-00006
-rw-r--r-- 3 hdfs hdfs 3672186 2015-07-07 12:05 /data/users/part-00007
-rw-r--r-- 3 hdfs hdfs 3682350 2015-07-07 12:05 /data/users/part-00008
-rw-r--r-- 3 hdfs hdfs 3680292 2015-07-07 12:05 /data/users/part-00009
-rw-r--r-- 3 hdfs hdfs 3697722 2015-07-07 12:05 /data/users/part-00010
The CUBE build JOB fails when try to build the Dimension Dictionary with the following exception (it seems that the Hive Table data directory MUST contain only one file)
java.lang.IllegalStateException: Expect 1 and only 1 non-zero file under hdfs://gas.gfmintegration.it:8020/data/cdr/bb/dimensions/users, but find 11
at org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
at org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:107)
at org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:83)
at org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
at org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
at org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:164)
at org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:53)
at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
at org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
result code:2
Do you have any indications on how to create a proper Hive star schema for Kylin?
I would like to use external tables (stored as CSV, parquet files or HBase) because I need to process the same data also from Spark.
Thanks in advance.
BR,
-- gas
Re: Kylin 0.7.1 - Failed to build a cube
Posted by 周千昊 <z....@gmail.com>.
Hi, gaspare
kylin has an assumption that dimension table is small enough to fit in
memory so that the corresponding directiory should contains only one file.
So as a workaround, you can merge these files into one single file, so
that kylin will be able to read from it
<ga...@gfmintegration.it>于2015年7月7日周二 下午6:42写道:
> Hi,
>
> I am trying to create a cube from a star schema created using Hive
> External tables (below an example) stored as TEXT FILE (CSV).
>
> CREATE EXTERNAL TABLE IF NOT EXISTS USERS_TABLE (
> uid INT,
> name STRING
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073' LINES TERMINATED BY '\012'
> STORED AS TEXTFILE
> LOCATION '/data/users';
>
>
> To CSV files are obtained from Spark RDDs, so they are saved as part-xxxx.
> Below the HDFS listing
>
> hdfs dfs -ls /data/users
> Found 12 items
> -rw-r--r-- 3 hdfs hdfs 0 2015-07-07 12:05 /data/users/_SUCCESS
> -rw-r--r-- 3 hdfs hdfs 3699360 2015-07-07 12:05 /data/users/part-00000
> -rw-r--r-- 3 hdfs hdfs 3694740 2015-07-07 12:05 /data/users/part-00001
> -rw-r--r-- 3 hdfs hdfs 3685374 2015-07-07 12:05 /data/users/part-00002
> -rw-r--r-- 3 hdfs hdfs 3719646 2015-07-07 12:05 /data/users/part-00003
> -rw-r--r-- 3 hdfs hdfs 3682476 2015-07-07 12:05 /data/users/part-00004
> -rw-r--r-- 3 hdfs hdfs 3679956 2015-07-07 12:05 /data/users/part-00005
> -rw-r--r-- 3 hdfs hdfs 3700242 2015-07-07 12:05 /data/users/part-00006
> -rw-r--r-- 3 hdfs hdfs 3672186 2015-07-07 12:05 /data/users/part-00007
> -rw-r--r-- 3 hdfs hdfs 3682350 2015-07-07 12:05 /data/users/part-00008
> -rw-r--r-- 3 hdfs hdfs 3680292 2015-07-07 12:05 /data/users/part-00009
> -rw-r--r-- 3 hdfs hdfs 3697722 2015-07-07 12:05 /data/users/part-00010
>
> The CUBE build JOB fails when try to build the Dimension Dictionary with
> the following exception (it seems that the Hive Table data directory MUST
> contain only one file)
>
> java.lang.IllegalStateException: Expect 1 and only 1 non-zero file under
> hdfs://gas.gfmintegration.it:8020/data/cdr/bb/dimensions/users, but find
> 11
> at
> org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
> at
> org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:107)
> at
> org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:83)
> at
> org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
> at
> org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
> at
> org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:164)
> at
> org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
> at
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:53)
> at
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
> at
> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at
> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> result code:2
>
>
> Do you have any indications on how to create a proper Hive star schema for
> Kylin?
>
> I would like to use external tables (stored as CSV, parquet files or
> HBase) because I need to process the same data also from Spark.
>
> Thanks in advance.
>
> BR,
>
> -- gas
>
>
>
>
Re: Kylin 0.7.1 - Failed to build a cube
Posted by 周千昊 <z....@gmail.com>.
Hi, gaspare
kylin has an assumption that dimension table is small enough to fit in
memory so that the corresponding directiory should contains only one file.
So as a workaround, you can merge these files into one single file, so
that kylin will be able to read from it
<ga...@gfmintegration.it>于2015年7月7日周二 下午6:42写道:
> Hi,
>
> I am trying to create a cube from a star schema created using Hive
> External tables (below an example) stored as TEXT FILE (CSV).
>
> CREATE EXTERNAL TABLE IF NOT EXISTS USERS_TABLE (
> uid INT,
> name STRING
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073' LINES TERMINATED BY '\012'
> STORED AS TEXTFILE
> LOCATION '/data/users';
>
>
> To CSV files are obtained from Spark RDDs, so they are saved as part-xxxx.
> Below the HDFS listing
>
> hdfs dfs -ls /data/users
> Found 12 items
> -rw-r--r-- 3 hdfs hdfs 0 2015-07-07 12:05 /data/users/_SUCCESS
> -rw-r--r-- 3 hdfs hdfs 3699360 2015-07-07 12:05 /data/users/part-00000
> -rw-r--r-- 3 hdfs hdfs 3694740 2015-07-07 12:05 /data/users/part-00001
> -rw-r--r-- 3 hdfs hdfs 3685374 2015-07-07 12:05 /data/users/part-00002
> -rw-r--r-- 3 hdfs hdfs 3719646 2015-07-07 12:05 /data/users/part-00003
> -rw-r--r-- 3 hdfs hdfs 3682476 2015-07-07 12:05 /data/users/part-00004
> -rw-r--r-- 3 hdfs hdfs 3679956 2015-07-07 12:05 /data/users/part-00005
> -rw-r--r-- 3 hdfs hdfs 3700242 2015-07-07 12:05 /data/users/part-00006
> -rw-r--r-- 3 hdfs hdfs 3672186 2015-07-07 12:05 /data/users/part-00007
> -rw-r--r-- 3 hdfs hdfs 3682350 2015-07-07 12:05 /data/users/part-00008
> -rw-r--r-- 3 hdfs hdfs 3680292 2015-07-07 12:05 /data/users/part-00009
> -rw-r--r-- 3 hdfs hdfs 3697722 2015-07-07 12:05 /data/users/part-00010
>
> The CUBE build JOB fails when try to build the Dimension Dictionary with
> the following exception (it seems that the Hive Table data directory MUST
> contain only one file)
>
> java.lang.IllegalStateException: Expect 1 and only 1 non-zero file under
> hdfs://gas.gfmintegration.it:8020/data/cdr/bb/dimensions/users, but find
> 11
> at
> org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
> at
> org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:107)
> at
> org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:83)
> at
> org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
> at
> org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
> at
> org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:164)
> at
> org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
> at
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:53)
> at
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
> at
> org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at
> org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
> at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> result code:2
>
>
> Do you have any indications on how to create a proper Hive star schema for
> Kylin?
>
> I would like to use external tables (stored as CSV, parquet files or
> HBase) because I need to process the same data also from Spark.
>
> Thanks in advance.
>
> BR,
>
> -- gas
>
>
>
>