You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by 朱真龙 <18...@qq.com> on 2017/12/25 02:46:30 UTC
回复：mismatch hdfs addr error when using kylin2.1 on two hadoopcluster

thank you for your response.    I have found the way to resolved my problem. I used mysql for hive metastore  and kylin used between two hadoop cluster ,like this:







its works well when model is not too big. but some model with many row keys does not  .  when I configured fifteen columns for rowkey and setted dict encode like this :

when running to "Build Dimension Dictionary" step,  I got error as follow( it means  after writted snapshot to hdfs, when submit these hdfs file to hbase ,the files and hbaseTable are in different hdfs cluster , one in cluster 1 and one in cluster 2):





through looked into source code the error referred, I found that ,when writting "Dimension Dictionary" to hbase , kylin will judge if your keyvalue is larger than the configured Max_KeyValue size (hbase.client.keyvalue.maxsize is setted in hbase-site.xml),if you not setted ,it will get 10485760(10MB) as default .  if your keyValue size is smaller than hbase.client.keyvalue.maxsize ,then kylin 
will set your key and value into put object then submit. if your keyvalue is bigger than hbase.client.keyvalue.maxsize ,it will write snapshot into hdfs first then submit it via hbase table.  but I think there is a bug on kylin2.1  when use kylin between two hadoop cluster and build Dimension with keyvalue bigger than hbase.client.keyvalue.maxsize. 
 
when keyvlaue size is bigger than hbase.client.keyvalue.maxsize , writte snapshot into hdfs( it got the hdfs path from hive table instead of hbase, and they are in different hdfs cluster) . so if you are using one hadoop cluster(hive and hbase are on one hadoop cluster ),it will works well . if you are using two hadoop cluster and your keyvlue size is smaller than hbase.client.keyvalue.maxsize ,it will also works well (use put object). but if you are using two hadoop cluster and your keyvalue is bigger than  hbase.client.keyvalue.maxsize , you will get then same error like mine(beause your hive and hbase are in different hdfs cluster) .
           
        I checked my hbase.client.keyvalue.maxsize in hbase-site.xml, and found ambari set 1MB as default ,and kylin set 10MB as default .so I changed this value to 10MB , and now my model works well.  If you find a better way to resolve this problem ,please tell me , thank you. 





apache-kylin-2.1.0-src\apache-kylin-2.1.0\storage-hbase\src\main\java\org\apache\kylin\storage\hbase\HBaseResourceStore.java










------------------ 原始邮件 ------------------
发件人: "jxs";<jx...@126.com>;
发送时间: 2017年12月22日(星期五) 下午4:10
收件人: "Kylin Users"<us...@kylin.apache.org>;

主题: Re:mismatch hdfs addr error when using kylin2.1 on two hadoopcluster



I also deploy kylin on two small EMR clusters for resource isolation, I used MySQL for Hive metastore and shared the same hive configuration between the two clusters.
You may try this.








在2017年12月22 13时08分, "朱真龙"<18...@qq.com>写道:

    thank you for your attention first,I am a chinses kylin user that always read english documents but write little, and i know most of kylin developer are chinese 。so ，if you don't know what i mean，i will describe again in chinese。


     I am using kylin2.1.0 between two hadoop cluster （all configured HA） and hadoop have same version(2.7.1).  by now runing well with sample model  , but  not good in my own model which has many  columns encoding with dict。like this：



when  building cube , got error like this:



after looked into source code that the error metioned, find that the dictionary got hdfs path from hive table desc , and when checking before load these hfiles into hbase ,found hfiles and hbase table are on different hdfs cluster .


apache-kylin-2.1.0-src\apache-kylin-2.1.0\core-cube\src\main\java\org\apache\kylin\cube\CubeManager





so, could you tell me how could i do on this case ?