You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kylin.apache.org by Wei Li <lw...@qq.com> on 2017/11/24 03:34:16 UTC

Build cube and streaming cube problems

Hi allI install Kylin binary package with Hbase namespace patch （KYLIN-2846） and have been used for nearly a month. My work always need several dimensions(large cardinality happened such as ID-num)and a dozen RAW measures.

I have some questiona about building cube. There are lots of successful cases that build cube with ten-billions of  data and have sub-seconds query speed, but when I actually use，cubing with ten-millions data failed sometimes and my querying is slow with a where filter and becomes slower when it comes to LIKE (10 million data costs 40 seconds). 


And here is a strange phenomenon like this:
I have a cube with 200 million rows, which contains three dimensions and no lookup table. But when I add a lookup table with 1400 rows and 4 RAW measures (two of them are Chinese string ), it fails at the 3 step, output is 'Job Counters \n failed reduce tasks=4'. I find that some key values in fact table out of the inner join lookup table, dose that cause the error? Are there any specific constrains when build a cube? 
For example, I notice that the dimensions should pick up an unique row or an error would happened.


Turn to streaming cube, I meet three problems.
Firstly, when I add streaming table, the Advanced Setting only has Timeout in web, Buffer Size  & Margin are missing.
Secondly, when I save my table and browse the table schema，Streaming Cluster config are blank which has been set before，and I can't Edit，throw an error message when I click save， which is Failed to deal with the request: SteamingConfig Illegal.
Thirdly, after I new model and cube succesful and comes to build, an ‘Oops...Could not find Kafka dependency' happended. Obviously， my kafka is ready，because I can consume it by java。


A long Email，thanks for reading，and hope for your reply！




Sincerely
Wei Li

Re: Build cube and streaming cube problems

Posted by ShaoFeng Shi <sh...@apache.org>.

A couple of comments:

1. Don't use the RAW measure, it is deprecated and has some known
limitations. If you do need query raw data, try to let Kylin to route the
query to Hive or Spark SQL with the new pushdown feature. Of course, your
hive table should be optimized for such query, for example using Parquet
format instead of text format.

2. When a cube has multiple UHC dimensions, you need carefully design the
combinations; Check the "how to optimize cube design" doc;

3. The Streaming Cube, Billy is correct, please check the tutorial,
most issues can find the answer there.

2017-11-25 10:06 GMT+08:00 Billy Liu <bi...@apache.org>:

> I suggest you seperating different questions into sepearted mail thread.
> That makes the dicussion more focus.
>
> Kylin is designed of OLAP queries which is aggreated query, not for RAW
> data. Although it has a workaround solution for RAW measure. I am not sure
> if it works for dozen columns. Maybe you could first try from one or two
> RAW measures, and seperate different RAW measure into different HBase
> column mapping.
>
> For more performance issue, it may relate to your cube design and query
> pattern. Here is a tool may help figure out the query bottleneck:
> https://kybot.io It analyzes the kylin log and show your how the query
> hit the cube.
>
> For the failed job, please describe the issue with detail logs, including
> the kylin.log and logs in YARN job.
>
> For the Kafka issue, most issue may caused by the front-end small changes.
> You could file an JIRA for that. To use Kafka as datasource, please export
> KAFKA_HOME before you start Kylin. That's how the tutorial is saying.
>
> 2017-11-24 11:34 GMT+08:00 Wei Li <lw...@qq.com>:
>
>> Hi all
>> I install Kylin binary package with Hbase namespace patch （KYLIN-2846）
>> and have been used for nearly a month. My work always need several
>> dimensions(large cardinality happened such as ID-num)and a dozen RAW
>> measures.
>>
>> I have some questiona about building cube. There are lots of successful
>> cases that build cube with ten-billions of  data and have sub-seconds query
>> speed, but when I actually use，cubing with ten-millions data failed
>> sometimes and my querying is slow with a where filter and becomes slower
>> when it comes to LIKE (10 million data costs 40 seconds).
>>
>> And here is a strange phenomenon like this:
>> I have a cube with 200 million rows, which contains three dimensions and
>> no lookup table. But when I add a lookup table with 1400 rows and 4 RAW
>> measures (two of them are Chinese string ), it fails at the 3 step, output
>> is 'Job Counters \n failed reduce tasks=4'. I find that some key values in
>> fact table out of the inner join lookup table, dose that cause the error?
>> Are there any specific constrains when build a cube?
>> For example, I notice that the dimensions should pick up an unique row or
>> an error would happened.
>>
>> Turn to streaming cube, I meet three problems.
>> Firstly, when I add streaming table, the Advanced Setting only has
>> Timeout in web, Buffer Size  & Margin are missing.
>> Secondly, when I save my table and browse the table schema，Streaming
>> Cluster config are blank which has been set before，and I can't Edit，throw
>> an error message when I click save， which is Failed to deal with the
>> request: SteamingConfig Illegal.
>> Thirdly, after I new model and cube succesful and comes to build, an
>> ‘Oops...Could not find Kafka dependency' happended. Obviously， my kafka is
>> ready，because I can consume it by java。
>>
>> A long Email，thanks for reading，and hope for your reply！
>>
>>
>> Sincerely
>> Wei Li
>>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: Build cube and streaming cube problems

Posted by Billy Liu <bi...@apache.org>.

I suggest you seperating different questions into sepearted mail thread.
That makes the dicussion more focus.

Kylin is designed of OLAP queries which is aggreated query, not for RAW
data. Although it has a workaround solution for RAW measure. I am not sure
if it works for dozen columns. Maybe you could first try from one or two
RAW measures, and seperate different RAW measure into different HBase
column mapping.

For more performance issue, it may relate to your cube design and query
pattern. Here is a tool may help figure out the query bottleneck:
https://kybot.io It analyzes the kylin log and show your how the query hit
the cube.

For the failed job, please describe the issue with detail logs, including
the kylin.log and logs in YARN job.

For the Kafka issue, most issue may caused by the front-end small changes.
You could file an JIRA for that. To use Kafka as datasource, please export
KAFKA_HOME before you start Kylin. That's how the tutorial is saying.

2017-11-24 11:34 GMT+08:00 Wei Li <lw...@qq.com>:

> Hi all
> I install Kylin binary package with Hbase namespace patch （KYLIN-2846） and
> have been used for nearly a month. My work always need several
> dimensions(large cardinality happened such as ID-num)and a dozen RAW
> measures.
>
> I have some questiona about building cube. There are lots of successful
> cases that build cube with ten-billions of  data and have sub-seconds query
> speed, but when I actually use，cubing with ten-millions data failed
> sometimes and my querying is slow with a where filter and becomes slower
> when it comes to LIKE (10 million data costs 40 seconds).
>
> And here is a strange phenomenon like this:
> I have a cube with 200 million rows, which contains three dimensions and
> no lookup table. But when I add a lookup table with 1400 rows and 4 RAW
> measures (two of them are Chinese string ), it fails at the 3 step, output
> is 'Job Counters \n failed reduce tasks=4'. I find that some key values in
> fact table out of the inner join lookup table, dose that cause the error?
> Are there any specific constrains when build a cube?
> For example, I notice that the dimensions should pick up an unique row or
> an error would happened.
>
> Turn to streaming cube, I meet three problems.
> Firstly, when I add streaming table, the Advanced Setting only has Timeout
> in web, Buffer Size  & Margin are missing.
> Secondly, when I save my table and browse the table schema，Streaming
> Cluster config are blank which has been set before，and I can't Edit，throw
> an error message when I click save， which is Failed to deal with the
> request: SteamingConfig Illegal.
> Thirdly, after I new model and cube succesful and comes to build, an
> ‘Oops...Could not find Kafka dependency' happended. Obviously， my kafka is
> ready，because I can consume it by java。
>
> A long Email，thanks for reading，and hope for your reply！
>
>
> Sincerely
> Wei Li
>