You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by lk_hadoop <lk...@163.com> on 2019/10/21 05:32:24 UTC

can I read or write "Extract Fact Table Distinct Columns" result to somewhere

hi,all
    Some dimension like product name may hive many different values ,I need to list all values to users to select what the exactly value they want . because of the step 3 "Extract Fact Table Distinct Columns" have already calculated each dimension's distinct values , can I  directly read it or write it to somewhere like elasticsearch. Is there any way to do this easily.

2019-10-21


lk_hadoop 

Re: Re: can I read or write "Extract Fact Table Distinct Columns" result to somewhere

Posted by Xiaoxiang Yu <xi...@kyligence.io>.
Hi Sir,
  Another way I knew is the Hive-Mr global dictionary, if you have a COUNT_DISTINCT(BITMAP) in cube, you can fetch all distinct value via related Hive table. But it is first available in 3.0.0-alpha2, you may check this blog for detail: http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html .

----------------
Best wishes,
Xiaoxiang Yu


发件人: lk_hadoop <lk...@163.com>
答复: "user@kylin.apache.org" <us...@kylin.apache.org>
日期: 2019年10月21日 星期一 16:11
收件人: "user@kylin.apache.org" <us...@kylin.apache.org>
主题: Re: Re: can I read or write "Extract Fact Table Distinct Columns" result to somewhere

thank you very much  @xiaoxiang.yu I will try it.

2019-10-21
________________________________
lk_hadoop
________________________________
发件人:Xiaoxiang Yu <xi...@kyligence.io>
发送时间:2019-10-21 15:01
主题:Re: can I read or write "Extract Fact Table Distinct Columns" result to somewhere
收件人:"user@kylin.apache.org"<us...@kylin.apache.org>
抄送:

Hi,
This is my suggestion, you may check if it satisfy you request.

First, check the dictionary  for what you want and get it’s path in HDFS.
Second, fetch them to local disk.
Third, use DumpDictionaryCLI to dump dict’s content.


Following is my output:

hadoop fs -get /kylin/kylin_4117/resources/dict/LACUS.USERACTIONLOG/CITY/139315b2-44ba-b5ff-dea5-431c308cd399.dict
sh bin/kylin.sh org.apache.kylin.cube.cli.DumpDictionaryCLI 139315b2-44ba-b5ff-dea5-431c308cd399.dict


[root@cdh-client apache-kylin-3.0.0-SNAPSHOT-bin]# sh bin/kylin.sh org.apache.kylin.cube.cli.DumpDictionaryCLI 139315b2-44ba-b5ff-dea5-431c308cd399.dict
Using cached dependency...
KYLIN_JVM_SETTINGS is -Xms1024M -Xmx4096M -Dcalcite.debug -Xss1024K -XX:MaxPermSize=512M -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/root/xiaoxiang.yu/apache-kylin-3.0.0-SNAPSHOT-bin/logs/kylin.gc.23118 -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/xiaoxiang.yu/apache-kylin-3.0.0-SNAPSHOT-bin/tool/kylin-tool-3.0.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.6-1.cdh5.7.6.p0.6/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
============================================================================
File: /root/xiaoxiang.yu/apache-kylin-3.0.0-SNAPSHOT-bin/139315b2-44ba-b5ff-dea5-431c308cd399.dict
Thu Aug 08 20:47:26 CST 2019
{
  "uuid" : "139315b2-44ba-b5ff-dea5-431c308cd399",
  "last_modified" : 1565268446094,
  "version" : "2.6.0.20500",
  "source_table" : "LACUS.USERACTIONLOG",
  "source_column" : "CITY",
  "source_column_index" : 10,
  "data_type" : "varchar(30)",
  "input" : {
    "path" : "hdfs://cdh-master:8020/kylin/kylin_4117/kylin-86514b4e-ae55-ca6f-935a-b38bf55cf190/IntersectCountCube/fact_distinct_columns/USERACTIONLOG.CITY",
    "size" : 439,
    "last_modified_time" : 1565268427282
  },
  "dictionary_class" : "org.apache.kylin.dict.TrieDictionaryForest",
  "cardinality" : 9
}
TrieDictionaryForest
baseId:0
value divide:beijing
offset divide:0
----tree 0--------
Total 9 values
0 (0): beijing
1 (1): chongqin
2 (2): guangzhou
3 (3): hangzhou
4 (4): nanjing
5 (5): shanghai
6 (6): shenzhen
7 (7): taibei
8 (8): xianggang



----------------
Best wishes,
Xiaoxiang Yu


发件人: lk_hadoop <lk...@163.com>
答复: "user@kylin.apache.org" <us...@kylin.apache.org>
日期: 2019年10月21日 星期一 13:32
收件人: user <us...@kylin.apache.org>, dev <de...@kylin.apache.org>
主题: can I read or write "Extract Fact Table Distinct Columns" result to somewhere

hi,all
    Some dimension like product name may hive many different values ,I need to list all values to users to select what the exactly value they want . because of the step 3 "Extract Fact Table Distinct Columns" have already calculated each dimension's distinct values , can I  directly read it or write it to somewhere like elasticsearch. Is there any way to do this easily.

2019-10-21
________________________________
lk_hadoop

Re: Re: can I read or write "Extract Fact Table Distinct Columns" result to somewhere

Posted by lk_hadoop <lk...@163.com>.
thank you very much  @xiaoxiang.yu I will try it.

2019-10-21 

lk_hadoop 



发件人:Xiaoxiang Yu <xi...@kyligence.io>
发送时间:2019-10-21 15:01
主题:Re: can I read or write "Extract Fact Table Distinct Columns" result to somewhere
收件人:"user@kylin.apache.org"<us...@kylin.apache.org>
抄送:

Hi, 
This is my suggestion, you may check if it satisfy you request.
 
First, check the dictionary  for what you want and get it’s path in HDFS.
Second, fetch them to local disk.
Third, use DumpDictionaryCLI to dump dict’s content.
 
 
Following is my output:
 
hadoop fs -get /kylin/kylin_4117/resources/dict/LACUS.USERACTIONLOG/CITY/139315b2-44ba-b5ff-dea5-431c308cd399.dict
sh bin/kylin.sh org.apache.kylin.cube.cli.DumpDictionaryCLI 139315b2-44ba-b5ff-dea5-431c308cd399.dict
 
 
[root@cdh-client apache-kylin-3.0.0-SNAPSHOT-bin]# sh bin/kylin.sh org.apache.kylin.cube.cli.DumpDictionaryCLI 139315b2-44ba-b5ff-dea5-431c308cd399.dict
Using cached dependency...
KYLIN_JVM_SETTINGS is -Xms1024M -Xmx4096M -Dcalcite.debug -Xss1024K -XX:MaxPermSize=512M -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/root/xiaoxiang.yu/apache-kylin-3.0.0-SNAPSHOT-bin/logs/kylin.gc.23118 -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/xiaoxiang.yu/apache-kylin-3.0.0-SNAPSHOT-bin/tool/kylin-tool-3.0.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.6-1.cdh5.7.6.p0.6/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
============================================================================
File: /root/xiaoxiang.yu/apache-kylin-3.0.0-SNAPSHOT-bin/139315b2-44ba-b5ff-dea5-431c308cd399.dict
Thu Aug 08 20:47:26 CST 2019
{
  "uuid" : "139315b2-44ba-b5ff-dea5-431c308cd399",
  "last_modified" : 1565268446094,
  "version" : "2.6.0.20500",
  "source_table" : "LACUS.USERACTIONLOG",
  "source_column" : "CITY",
  "source_column_index" : 10,
  "data_type" : "varchar(30)",
  "input" : {
    "path" : "hdfs://cdh-master:8020/kylin/kylin_4117/kylin-86514b4e-ae55-ca6f-935a-b38bf55cf190/IntersectCountCube/fact_distinct_columns/USERACTIONLOG.CITY",
    "size" : 439,
    "last_modified_time" : 1565268427282
  },
  "dictionary_class" : "org.apache.kylin.dict.TrieDictionaryForest",
  "cardinality" : 9
}
TrieDictionaryForest
baseId:0
value divide:beijing
offset divide:0
----tree 0--------
Total 9 values
0 (0): beijing
1 (1): chongqin
2 (2): guangzhou
3 (3): hangzhou
4 (4): nanjing
5 (5): shanghai
6 (6): shenzhen
7 (7): taibei
8 (8): xianggang
 
 
 
----------------
Best wishes,
Xiaoxiang Yu 
 
 
发件人: lk_hadoop <lk...@163.com>
答复: "user@kylin.apache.org" <us...@kylin.apache.org>
日期: 2019年10月21日 星期一 13:32
收件人: user <us...@kylin.apache.org>, dev <de...@kylin.apache.org>
主题: can I read or write "Extract Fact Table Distinct Columns" result to somewhere
 
hi,all
    Some dimension like product name may hive many different values ,I need to list all values to users to select what the exactly value they want . because of the step 3 "Extract Fact Table Distinct Columns" have already calculated each dimension's distinct values , can I  directly read it or write it to somewhere like elasticsearch. Is there any way to do this easily.
 
2019-10-21



lk_hadoop 

Re: can I read or write "Extract Fact Table Distinct Columns" result to somewhere

Posted by Xiaoxiang Yu <xi...@kyligence.io>.
Hi,
This is my suggestion, you may check if it satisfy you request.

First, check the dictionary  for what you want and get it’s path in HDFS.
Second, fetch them to local disk.
Third, use DumpDictionaryCLI to dump dict’s content.


Following is my output:

hadoop fs -get /kylin/kylin_4117/resources/dict/LACUS.USERACTIONLOG/CITY/139315b2-44ba-b5ff-dea5-431c308cd399.dict
sh bin/kylin.sh org.apache.kylin.cube.cli.DumpDictionaryCLI 139315b2-44ba-b5ff-dea5-431c308cd399.dict


[root@cdh-client apache-kylin-3.0.0-SNAPSHOT-bin]# sh bin/kylin.sh org.apache.kylin.cube.cli.DumpDictionaryCLI 139315b2-44ba-b5ff-dea5-431c308cd399.dict
Using cached dependency...
KYLIN_JVM_SETTINGS is -Xms1024M -Xmx4096M -Dcalcite.debug -Xss1024K -XX:MaxPermSize=512M -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/root/xiaoxiang.yu/apache-kylin-3.0.0-SNAPSHOT-bin/logs/kylin.gc.23118 -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/xiaoxiang.yu/apache-kylin-3.0.0-SNAPSHOT-bin/tool/kylin-tool-3.0.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.6-1.cdh5.7.6.p0.6/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
============================================================================
File: /root/xiaoxiang.yu/apache-kylin-3.0.0-SNAPSHOT-bin/139315b2-44ba-b5ff-dea5-431c308cd399.dict
Thu Aug 08 20:47:26 CST 2019
{
  "uuid" : "139315b2-44ba-b5ff-dea5-431c308cd399",
  "last_modified" : 1565268446094,
  "version" : "2.6.0.20500",
  "source_table" : "LACUS.USERACTIONLOG",
  "source_column" : "CITY",
  "source_column_index" : 10,
  "data_type" : "varchar(30)",
  "input" : {
    "path" : "hdfs://cdh-master:8020/kylin/kylin_4117/kylin-86514b4e-ae55-ca6f-935a-b38bf55cf190/IntersectCountCube/fact_distinct_columns/USERACTIONLOG.CITY",
    "size" : 439,
    "last_modified_time" : 1565268427282
  },
  "dictionary_class" : "org.apache.kylin.dict.TrieDictionaryForest",
  "cardinality" : 9
}
TrieDictionaryForest
baseId:0
value divide:beijing
offset divide:0
----tree 0--------
Total 9 values
0 (0): beijing
1 (1): chongqin
2 (2): guangzhou
3 (3): hangzhou
4 (4): nanjing
5 (5): shanghai
6 (6): shenzhen
7 (7): taibei
8 (8): xianggang



----------------
Best wishes,
Xiaoxiang Yu


发件人: lk_hadoop <lk...@163.com>
答复: "user@kylin.apache.org" <us...@kylin.apache.org>
日期: 2019年10月21日 星期一 13:32
收件人: user <us...@kylin.apache.org>, dev <de...@kylin.apache.org>
主题: can I read or write "Extract Fact Table Distinct Columns" result to somewhere

hi,all
    Some dimension like product name may hive many different values ,I need to list all values to users to select what the exactly value they want . because of the step 3 "Extract Fact Table Distinct Columns" have already calculated each dimension's distinct values , can I  directly read it or write it to somewhere like elasticsearch. Is there any way to do this easily.

2019-10-21
________________________________
lk_hadoop