You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Li Yang <li...@apache.org> on 2017/11/10 05:13:20 UTC
Re: Seperate ZooKeeper nodes when deploy StandAlone Hbase cluster

Sorry for the late reply. Was very occupied recently.

> Is it OK  to  deploy StandAlone Hbase cluster with seperate Zookeeper
different from the main cluster?
> Is this imply that the main cluster & the Hbase cluster should share the
same ZK node?
Looked again. My previous answer confused you. Sorry for that. I thought
you were asking about using 2 HBase clusters, but actually the question was
about r/w separation deployment.

Yes, Kylin can work with 2 clusters. One called read cluster which hosts
HBase and provides query horsepower. Another called write cluster (the main
cluster in the question) which is responsible for cube building. Kylin uses
the Zookeeper of the HBase cluster for its job coordination by default.

When building cube, the write cluster (or main cluster) will write to the
HBase cluster, to create the HBase table and bulk load data. The
kylin.env.hdfs-working-dir should be on the write cluster by design.

In the step "Create HTable", Kylin wrote a partition file based on which a
new HTable is created. That must be the write operation you observed.

Cheers
Yang


On Mon, Oct 30, 2017 at 8:23 PM, Yuxiang Mai <yu...@gmail.com> wrote:

> Hi, Li Yang
>
> Thanks for your reply.
>
> > Is it OK  to  deploy StandAlone Hbase cluster with seperate Zookeeper
> different from the main cluster?
> No. Kylin only works with 1 HBase and its related Zookeeper.
>
> Is this imply that the main cluster & the Hbase cluster should share the
> same ZK node?
>
> And I have one more question. about the kylin.env.hdfs-working-dir, is
> the HDFS working should placed on main cluster or the Hbase cluster?
>
> Because during building a cube, after Extract Fact Table Distinct Columns
> &  Save Cuboid Statistics, In the step " Create HTable", It means stuck
> and no response for a long time;
> In kylin.log, it seems stuck in this job:
>
> 2017-10-30 20:16:46,730 INFO  [Job e82dca5a-93c6-47ca-a707-674372708b5f-193]
> common.HadoopShellExecutable:59 :  -cubename 123 -segmentid
> 6223ddc9-ac80-4a10-b3c8-33165fe8be4c -partitions hdfs://maincluster/
> kylinworkingdir/kylin_metadata/kylin-e82dca5a-93c6-
> 47ca-a707-674372708b5f/123/rowkey_stats/part-r-00000 -statisticsenabled
> true
>
>  In this step, it seems generating hbase table in the HDFS working dir.
> Does it mean the HDFS working dir is on Hbase cluster, not main cluster?
>
> Thanks a lot
>
> Yuxiang MAI
>
>
>
> On Sun, Oct 29, 2017 at 6:41 PM, Li Yang <li...@apache.org> wrote:
>
>> > Is it OK  to  deploy StandAlone Hbase cluster with seperate Zookeeper
>> different from the main cluster?
>> No. Kylin only works with 1 HBase and its related Zookeeper.
>>
>> > How Kylin get yarn config when submmiting job?
>> Kylin took Hadoop config from classpath. And the most classpath comes
>> from HBase shell.
>>
>> On Wed, Oct 25, 2017 at 4:33 PM, Yuxiang Mai <yu...@gmail.com>
>> wrote:
>>
>>> Hi, experts
>>>
>>> We are now deploying standalone Hbase out of the hadoop cluster to
>>> improve the query performance.
>>> http://kylin.apache.org/blog/2016/06/10/standalone-hbase-cluster/
>>>
>>> The new Hbase cluster use seperate zookeeper nodes from the main
>>> cluster. Kylin server can access both the Hbase, hadoop & hive resource.
>>> But in this configuration, cude build failed in the first step:
>>>
>>> There are 3 hive commands in the first step:
>>> DROP TABLE IF EXISTS kylin_intermediate_test1_ba3c5
>>> 910_ff7d_4669_b28a_4ec2736d60dc;
>>>
>>> CREATE EXTERNAL TABLE IF NOT EXISTS kylin_intermediate_test1_ba3c5
>>> 910_ff7d_4669_b28a_4ec2736d60dc
>>> ...
>>> INSERT OVERWRITE TABLE kylin_intermediate_test1_ba3c5
>>> 910_ff7d_4669_b28a_4ec2736d60dc SELECT
>>> ......
>>>
>>>
>>> drop & create table are OK, but failed on "insert overwrite" with the
>>> following exception.
>>>
>>>
>>> FAILED: IllegalArgumentException java.net.UnknownHostException:
>>> maincluster
>>>
>>> at org.apache.kylin.common.util.CliCommandExecutor.execute(CliC
>>> ommandExecutor.java:92)
>>> at org.apache.kylin.source.hive.CreateFlatHiveTableStep.createF
>>> latHiveTable(CreateFlatHiveTableStep.java:52)
>>> at org.apache.kylin.source.hive.CreateFlatHiveTableStep.doWork(
>>> CreateFlatHiveTableStep.java:70)
>>> at org.apache.kylin.job.execution.AbstractExecutable.execute(Ab
>>> stractExecutable.java:124)
>>> at org.apache.kylin.job.execution.DefaultChainedExecutable.doWo
>>> rk(DefaultChainedExecutable.java:64)
>>> at org.apache.kylin.job.execution.AbstractExecutable.execute(Ab
>>> stractExecutable.java:124)
>>> at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRun
>>> ner.run(DefaultScheduler.java:142)
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>> Executor.java:1145)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>> lExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>>
>>> It seems MR job are failed to submit to YARN. In our debug, seems job is
>>> not submitted to main cluster.
>>> So my question is:
>>> 1. Is it OK  to  deploy StandAlone Hbase cluster with seperate Zookeeper
>>> different from the main cluster?
>>> 2. How Kylin get yarn config when submmiting job? I can only find hive &
>>> hbase config, but not yarn related config.
>>>
>>>
>>> Thanks a lot.
>>>
>>> --
>>> Yuxiang Mai
>>>
>>>
>>
>
>
> --
> Yuxiang Mai
> Sun Yat-Sen Unitversity
> State Key Lab of Optoelectronic
> Materials and Technologies
>