You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by 侯宗田 <zo...@icloud.com> on 2018/04/22 13:38:09 UTC

how to extract metadata of hive tables in speed

Hi,

I am writing a application which needs the metastore about hive tables. I have used webhcat to get the information about tables and process them. But a simple request takes over eight seconds to respond on localhost. Why is this so slow, and how can I fix it or is there other way I can extract the metadata in C?

$ time curl -s 'http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean <http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean>'
{"columns": 
  [{"name":"id","type":"int"}],
  "database":"default",
  "table":"haha"}

real    0m8.400s
user    0m0.053s
sys     0m0.019s
it seems to run a hcat.py, and it create a bunch of things then clear them, it takes very long time, does anyone have some ideas about it?? Any suggestions will be very appreciated!

$hcat.py -e "use default; desc haha; "
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file file:/usr/local/hive/conf/hive-site.xml
18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory: /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
18/04/21 16:38:16 INFO session.SessionState: Created local directory: /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory: /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
18/04/21 16:38:16 INFO ql.Driver: Compiling command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62): use default
18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize called
18/04/21 16:38:18 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/04/21 16:38:18 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in metastore
18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in metastore
18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions
18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda	ip=unknown-ip-addr	cmd=get_all_functions
18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda	ip=unknown-ip-addr	cmd=get_database: default
18/04/21 16:38:20 INFO ql.Driver: Semantic Analysis Completed
18/04/21 16:38:20 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
18/04/21 16:38:20 INFO ql.Driver: Completed compiling command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62); Time taken: 3.936 seconds
18/04/21 16:38:20 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
18/04/21 16:38:20 INFO ql.Driver: Executing command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62): use default
18/04/21 16:38:20 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=05096382-f9b6-4dae-aee2-dfa6750c0668, clientType=HIVECLI]
18/04/21 16:38:20 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
18/04/21 16:38:20 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Cleaning up thread local RawStore...
18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda	ip=unknown-ip-addr	cmd=Cleaning up thread local RawStore...
18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Done cleaning up thread local RawStore
18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda	ip=unknown-ip-addr	cmd=Done cleaning up thread local RawStore
18/04/21 16:38:20 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda	ip=unknown-ip-addr	cmd=get_database: default
18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
18/04/21 16:38:20 INFO metastore.ObjectStore: ObjectStore, initialize called
18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda	ip=unknown-ip-addr	cmd=get_database: default
18/04/21 16:38:20 INFO ql.Driver: Completed executing command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62); Time taken: 0.202 seconds
OK

Re: how to extract metadata of hive tables in speed

Posted by 侯宗田 <zo...@icloud.com>.
Hi,

This really helps me a lot, and I will try it soon, 
Thank you very much!

Hou
> 在 2018年4月23日,下午3:57,Peter Vary <pv...@cloudera.com> 写道:
> 
> Hi,
> 
> Disclaimer: I am not too familiar with the webhcat yet.
> From the logs, I see, that:
> - the first 3 seconds spent on starting a new session, and maybe a driver - this can be reduced, if the session is already there, and the HiveServer2 is started (but do not know if webhcat could use HS2, or reuse sessions) - this delay could be avoided if you use any of the 3 solutions suggested in my last mail.
> - the next 3 seconds spent on initializing the metastore. This can be reduced if a standalone metastore is started, and the webhcat is configured to access this metastore.
> 
> Hope this helps,
> Peter
> 
>> On Apr 23, 2018, at 9:27 AM, 侯宗田 <zo...@icloud.com> wrote:
>> 
>> Thank you very much for your reply, I am wondering whether I use the webhcat rightly, I don’t think it is normal to create all the directories and objects to get a table describ and take 8 seconds. The webhcat should not be so slow, Or it is because I forget to start some server which can respond immediately?   
>>> 在 2018年4月23日,下午3:06,Peter Vary <pv...@cloudera.com> 写道:
>>> 
>>> Hi,
>>> 
>>> Alexander Kolbasov has a project which might interest you (keeping in mind,
>>> that this is not production ready - more like a proof of concept):
>>> https://github.com/akolb1/gometastore/blob/master/hmstool/doc/hmstool.md
>>> 
>>> Also you can use HMS thrift API directly to access the MetaStore, or if you
>>> can/want write java code, you can use HiveMetastoreClient class to do it in
>>> java.
>>> 
>>> I am not sure about the performance gains compared to HCat, but currently
>>> there are no faster interfaces for HMS that I know of.
>>> 
>>> Regards,
>>> Peter
>>> 
>>> 
>>> 侯宗田 <zo...@icloud.com> ezt írta (időpont: 2018. ápr. 23., Hét 2:40):
>>> 
>>>> Can anyone give me some suggestions? I have been stuck in this problem for
>>>> several days. Need help!!
>>>>> 在 2018年4月22日,下午9:38,侯宗田 <zo...@icloud.com> 写道:
>>>>> 
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am writing a application which needs the metastore about hive tables.
>>>> I have used webhcat to get the information about tables and process them.
>>>> But a simple request takes over eight seconds to respond on localhost. Why
>>>> is this so slow, and how can I fix it or is there other way I can extract
>>>> the metadata in C?
>>>>> 
>>>>> $ time curl -s '
>>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>>> <
>>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>>>> '
>>>>> {"columns":
>>>>> [{"name":"id","type":"int"}],
>>>>> "database":"default",
>>>>> "table":"haha"}
>>>>> 
>>>>> real    0m8.400s
>>>>> user    0m0.053s
>>>>> sys     0m0.019s
>>>>> it seems to run a hcat.py, and it create a bunch of things then clear
>>>> them, it takes very long time, does anyone have some ideas about it?? Any
>>>> suggestions will be very appreciated!
>>>>> 
>>>>> $hcat.py -e "use default; desc haha; "
>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>> SLF4J: Found binding in
>>>> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> SLF4J: Found binding in
>>>> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <
>>>> http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>>> 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file
>>>> file:/usr/local/hive/conf/hive-site.xml
>>>>> 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load
>>>> native-hadoop library for your platform... using builtin-java classes where
>>>> applicable
>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created local directory:
>>>> /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
>>>>> 18/04/21 16:38:16 INFO ql.Driver: Compiling
>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>>> use default
>>>>> 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store
>>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>>> 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize
>>>> called
>>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>>> hive.metastore.integral.jdo.pushdown unknown - will be ignored
>>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>>> datanucleus.cache.level2 unknown - will be ignored
>>>>> 18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object
>>>> pin classes with
>>>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>>>> underlying DB is MYSQL
>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in
>>>> metastore
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in
>>>> metastore
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in
>>>> admin role, since config is empty
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions
>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>> ip=unknown-ip-addr      cmd=get_all_functions
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>> 18/04/21 16:38:20 INFO ql.Driver: Semantic Analysis Completed
>>>>> 18/04/21 16:38:20 INFO ql.Driver: Returning Hive schema:
>>>> Schema(fieldSchemas:null, properties:null)
>>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed compiling
>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>>>> Time taken: 3.936 seconds
>>>>> 18/04/21 16:38:20 INFO ql.Driver: Concurrency mode is disabled, not
>>>> creating a lock manager
>>>>> 18/04/21 16:38:20 INFO ql.Driver: Executing
>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>>> use default
>>>>> 18/04/21 16:38:20 INFO sqlstd.SQLStdHiveAccessController: Created
>>>> SQLStdHiveAccessController for session context : HiveAuthzSessionContext
>>>> [sessionString=05096382-f9b6-4dae-aee2-dfa6750c0668, clientType=HIVECLI]
>>>>> 18/04/21 16:38:20 WARN session.SessionState: METASTORE_FILTER_HOOK will
>>>> be ignored, since hive.security.authorization.manager is set to instance of
>>>> HiveAuthorizerFactory.
>>>>> 18/04/21 16:38:20 INFO hive.metastore: Mestastore configuration
>>>> hive.metastore.filter.hook changed from
>>>> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to
>>>> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Cleaning up thread
>>>> local RawStore...
>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>> ip=unknown-ip-addr      cmd=Cleaning up thread local RawStore...
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Done cleaning up
>>>> thread local RawStore
>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>> ip=unknown-ip-addr      cmd=Done cleaning up thread local RawStore
>>>>> 18/04/21 16:38:20 INFO ql.Driver: Starting task [Stage-0:DDL] in serial
>>>> mode
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Opening raw store
>>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: ObjectStore, initialize
>>>> called
>>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>>>> underlying DB is MYSQL
>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed executing
>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>>>> Time taken: 0.202 seconds
>>>>> OK
>>>> 
>>>> 
>> 
> 


Re: how to extract metadata of hive tables in speed

Posted by Peter Vary <pv...@cloudera.com>.
Thanks for the info too! :)

> On Apr 25, 2018, at 11:12 AM, 侯宗田 <zo...@icloud.com> wrote:
> 
> Hi,
> 
> Thank you, I have looked up the source code of Hcatalog, it seems every time when I run hcat -e “query”, it called hcatCli, then it make configuration, create and start a session, then dump it after being used. It can’t keep a session or connection and don’t have a Cli. The initialization take all the time. Therefore, I only can use the thrift API to do my job. Thank you for your precious suggestions!
> 
> Best regards,
> Hou
>> 在 2018年4月24日,下午7:45,Peter Vary <pv...@cloudera.com> 写道:
>> 
>> Hi Hou,
>> 
>> Kudu uses the Thrift HMS interface, and written in C. An example could be found here:
>> https://github.com/apache/kudu/tree/master/src/kudu/hms <https://github.com/apache/kudu/tree/master/src/kudu/hms>
>> 
>> As for parametrizing Hcatalog I have only found this:
>> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Configuration+Properties <https://cwiki.apache.org/confluence/display/Hive/HCatalog+Configuration+Properties>
>> But have not find anything there which might help you there.
>> 
>> Peter
>> 
>>> On Apr 24, 2018, at 10:51 AM, 侯宗田 <zo...@icloud.com> wrote:
>>> 
>>> Hi, Peter:
>>> I have started a standalone metastore server and it indeed short that part of time, it does connection instead of initialization. But I still have some questions,
>>> First, I believe the Hcatalog must be quick because it is a mature product and I have not seen others complaining about this problem, is there some configuration which controls starting new session or how to keep a session connected to the HMS, in the log below it started a new session and connected twice. 
>>> Second, I am very interested in using the HMS thrift API, but I could not found an example of how to use it in C/C++ to access hive table info. Do you know some link about it?
>>> Really thank you for your time!!
>>> 
>>> Best regards,
>>> Hou
>>> 
>>> $time ./hcat.py -e "use default; show table extended like haha;"
>>> 18/04/24 15:47:08 INFO conf.HiveConf: Found configuration file file:/usr/local/hive/conf/hive-site.xml
>>> 18/04/24 15:47:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>>> 18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory: /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f
>>> 18/04/24 15:47:10 INFO session.SessionState: Created local directory: /tmp/hive/java/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f
>>> 18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory: /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f/_tmp_space.db
>>> 18/04/24 15:47:10 INFO ql.Driver: Compiling command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5): use default
>>> 18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:9083
>>> 18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore, current connections: 1
>>> 18/04/24 15:47:12 INFO hive.metastore: Connected to metastore.
>>> 18/04/24 15:47:12 INFO ql.Driver: Semantic Analysis Completed
>>> 18/04/24 15:47:12 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>>> 18/04/24 15:47:12 INFO ql.Driver: Completed compiling command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5); Time taken: 1.591 seconds
>>> 18/04/24 15:47:12 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
>>> 18/04/24 15:47:12 INFO ql.Driver: Executing command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5): use default
>>> 18/04/24 15:47:12 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f, clientType=HIVECLI]
>>> 18/04/24 15:47:12 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
>>> 18/04/24 15:47:12 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
>>> 18/04/24 15:47:12 INFO hive.metastore: Closed a connection to metastore, current connections: 0
>>> 18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:9083
>>> 18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore, current connections: 1
>>> 18/04/24 15:47:12 INFO hive.metastore: Connected to metastore.
>>> 18/04/24 15:47:12 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
>>> 18/04/24 15:47:12 INFO ql.Driver: Completed executing command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5); Time taken: 0.119 seconds
>>> OK
>>> 18/04/24 15:47:12 INFO ql.Driver: OK
>>> Time taken: 1.728 seconds
>>> 18/04/24 15:47:12 INFO ql.Driver: Compiling command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59): show table extended like haha
>>> 18/04/24 15:47:12 INFO ql.Driver: Semantic Analysis Completed
>>> 18/04/24 15:47:12 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
>>> 18/04/24 15:47:12 INFO exec.ListSinkOperator: Initializing operator LIST_SINK[0]
>>> 18/04/24 15:47:12 INFO ql.Driver: Completed compiling command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59); Time taken: 0.166 seconds
>>> 18/04/24 15:47:12 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
>>> 18/04/24 15:47:12 INFO ql.Driver: Executing command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59): show table extended like haha
>>> 18/04/24 15:47:12 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
>>> 18/04/24 15:47:12 INFO exec.DDLTask: pattern: haha
>>> 18/04/24 15:47:12 INFO exec.DDLTask: results : 1
>>> 18/04/24 15:47:12 INFO ql.Driver: Completed executing command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59); Time taken: 0.187 seconds
>>> OK
>>> 18/04/24 15:47:12 INFO ql.Driver: OK
>>> 18/04/24 15:47:12 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
>>> 18/04/24 15:47:12 INFO mapred.FileInputFormat: Total input paths to process : 1
>>> 18/04/24 15:47:12 INFO exec.ListSinkOperator: Closing operator LIST_SINK[0]
>>> tableName:haha
>>> owner:kousouda
>>> location:hdfs://localhost:8020/user/hive/warehouse/haha
>>> inputformat:org.apache.hadoop.mapred.TextInputFormat
>>> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>>> columns:struct columns { i32 id}
>>> partitioned:false
>>> partitionColumns:
>>> totalNumberFiles:2
>>> totalFileSize:4
>>> maxFileSize:2
>>> minFileSize:2
>>> lastAccessTime:1524535110334
>>> lastUpdateTime:1524535113101
>>> 
>>> Time taken: 0.394 seconds
>>> 18/04/24 15:47:12 INFO session.SessionState: Deleted directory: /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f on fs with scheme hdfs
>>> 18/04/24 15:47:12 INFO session.SessionState: Deleted directory: /tmp/hive/java/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f on fs with scheme file
>>> 18/04/24 15:47:12 INFO hive.metastore: Closed a connection to metastore, current connections: 0 
>>> 
>>> real	0m5.593s
>>> user	0m8.645s
>>> sys	0m0.523s
>>>> 在 2018年4月23日,下午3:57,Peter Vary <pv...@cloudera.com> 写道:
>>>> 
>>>> Hi,
>>>> 
>>>> Disclaimer: I am not too familiar with the webhcat yet.
>>>> From the logs, I see, that:
>>>> - the first 3 seconds spent on starting a new session, and maybe a driver - this can be reduced, if the session is already there, and the HiveServer2 is started (but do not know if webhcat could use HS2, or reuse sessions) - this delay could be avoided if you use any of the 3 solutions suggested in my last mail.
>>>> - the next 3 seconds spent on initializing the metastore. This can be reduced if a standalone metastore is started, and the webhcat is configured to access this metastore.
>>>> 
>>>> Hope this helps,
>>>> Peter
>>>> 
>>>>> On Apr 23, 2018, at 9:27 AM, 侯宗田 <zo...@icloud.com> wrote:
>>>>> 
>>>>> Thank you very much for your reply, I am wondering whether I use the webhcat rightly, I don’t think it is normal to create all the directories and objects to get a table describ and take 8 seconds. The webhcat should not be so slow, Or it is because I forget to start some server which can respond immediately?   
>>>>>> 在 2018年4月23日,下午3:06,Peter Vary <pv...@cloudera.com> 写道:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Alexander Kolbasov has a project which might interest you (keeping in mind,
>>>>>> that this is not production ready - more like a proof of concept):
>>>>>> https://github.com/akolb1/gometastore/blob/master/hmstool/doc/hmstool.md
>>>>>> 
>>>>>> Also you can use HMS thrift API directly to access the MetaStore, or if you
>>>>>> can/want write java code, you can use HiveMetastoreClient class to do it in
>>>>>> java.
>>>>>> 
>>>>>> I am not sure about the performance gains compared to HCat, but currently
>>>>>> there are no faster interfaces for HMS that I know of.
>>>>>> 
>>>>>> Regards,
>>>>>> Peter
>>>>>> 
>>>>>> 
>>>>>> 侯宗田 <zo...@icloud.com> ezt írta (időpont: 2018. ápr. 23., Hét 2:40):
>>>>>> 
>>>>>>> Can anyone give me some suggestions? I have been stuck in this problem for
>>>>>>> several days. Need help!!
>>>>>>>> 在 2018年4月22日,下午9:38,侯宗田 <zo...@icloud.com> 写道:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I am writing a application which needs the metastore about hive tables.
>>>>>>> I have used webhcat to get the information about tables and process them.
>>>>>>> But a simple request takes over eight seconds to respond on localhost. Why
>>>>>>> is this so slow, and how can I fix it or is there other way I can extract
>>>>>>> the metadata in C?
>>>>>>>> 
>>>>>>>> $ time curl -s '
>>>>>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>>>>>> <
>>>>>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>>>>>>> '
>>>>>>>> {"columns":
>>>>>>>> [{"name":"id","type":"int"}],
>>>>>>>> "database":"default",
>>>>>>>> "table":"haha"}
>>>>>>>> 
>>>>>>>> real    0m8.400s
>>>>>>>> user    0m0.053s
>>>>>>>> sys     0m0.019s
>>>>>>>> it seems to run a hcat.py, and it create a bunch of things then clear
>>>>>>> them, it takes very long time, does anyone have some ideas about it?? Any
>>>>>>> suggestions will be very appreciated!
>>>>>>>> 
>>>>>>>> $hcat.py -e "use default; desc haha; "
>>>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>>>>> SLF4J: Found binding in
>>>>>>> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>>> SLF4J: Found binding in
>>>>>>> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <
>>>>>>> http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
>>>>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>>>>>> 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file
>>>>>>> file:/usr/local/hive/conf/hive-site.xml
>>>>>>>> 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load
>>>>>>> native-hadoop library for your platform... using builtin-java classes where
>>>>>>> applicable
>>>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>>>>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created local directory:
>>>>>>> /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>>>>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
>>>>>>>> 18/04/21 16:38:16 INFO ql.Driver: Compiling
>>>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>>>>>> use default
>>>>>>>> 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store
>>>>>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>>>>>> 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize
>>>>>>> called
>>>>>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>>>>>> hive.metastore.integral.jdo.pushdown unknown - will be ignored
>>>>>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>>>>>> datanucleus.cache.level2 unknown - will be ignored
>>>>>>>> 18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object
>>>>>>> pin classes with
>>>>>>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>>>>>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>>>>>>> underlying DB is MYSQL
>>>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in
>>>>>>> metastore
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in
>>>>>>> metastore
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in
>>>>>>> admin role, since config is empty
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions
>>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>>> ip=unknown-ip-addr      cmd=get_all_functions
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Semantic Analysis Completed
>>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Returning Hive schema:
>>>>>>> Schema(fieldSchemas:null, properties:null)
>>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed compiling
>>>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>>>>>>> Time taken: 3.936 seconds
>>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Concurrency mode is disabled, not
>>>>>>> creating a lock manager
>>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Executing
>>>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>>>>>> use default
>>>>>>>> 18/04/21 16:38:20 INFO sqlstd.SQLStdHiveAccessController: Created
>>>>>>> SQLStdHiveAccessController for session context : HiveAuthzSessionContext
>>>>>>> [sessionString=05096382-f9b6-4dae-aee2-dfa6750c0668, clientType=HIVECLI]
>>>>>>>> 18/04/21 16:38:20 WARN session.SessionState: METASTORE_FILTER_HOOK will
>>>>>>> be ignored, since hive.security.authorization.manager is set to instance of
>>>>>>> HiveAuthorizerFactory.
>>>>>>>> 18/04/21 16:38:20 INFO hive.metastore: Mestastore configuration
>>>>>>> hive.metastore.filter.hook changed from
>>>>>>> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to
>>>>>>> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Cleaning up thread
>>>>>>> local RawStore...
>>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>>> ip=unknown-ip-addr      cmd=Cleaning up thread local RawStore...
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Done cleaning up
>>>>>>> thread local RawStore
>>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>>> ip=unknown-ip-addr      cmd=Done cleaning up thread local RawStore
>>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Starting task [Stage-0:DDL] in serial
>>>>>>> mode
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Opening raw store
>>>>>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: ObjectStore, initialize
>>>>>>> called
>>>>>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>>>>>>> underlying DB is MYSQL
>>>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed executing
>>>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>>>>>>> Time taken: 0.202 seconds
>>>>>>>> OK
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


Re: how to extract metadata of hive tables in speed

Posted by 侯宗田 <zo...@icloud.com>.
Hi,

Thank you, I have looked up the source code of Hcatalog, it seems every time when I run hcat -e “query”, it called hcatCli, then it make configuration, create and start a session, then dump it after being used. It can’t keep a session or connection and don’t have a Cli. The initialization take all the time. Therefore, I only can use the thrift API to do my job. Thank you for your precious suggestions!

Best regards,
Hou
> 在 2018年4月24日,下午7:45,Peter Vary <pv...@cloudera.com> 写道:
> 
> Hi Hou,
> 
> Kudu uses the Thrift HMS interface, and written in C. An example could be found here:
> https://github.com/apache/kudu/tree/master/src/kudu/hms <https://github.com/apache/kudu/tree/master/src/kudu/hms>
> 
> As for parametrizing Hcatalog I have only found this:
> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Configuration+Properties <https://cwiki.apache.org/confluence/display/Hive/HCatalog+Configuration+Properties>
> But have not find anything there which might help you there.
> 
> Peter
> 
>> On Apr 24, 2018, at 10:51 AM, 侯宗田 <zo...@icloud.com> wrote:
>> 
>> Hi, Peter:
>> I have started a standalone metastore server and it indeed short that part of time, it does connection instead of initialization. But I still have some questions,
>> First, I believe the Hcatalog must be quick because it is a mature product and I have not seen others complaining about this problem, is there some configuration which controls starting new session or how to keep a session connected to the HMS, in the log below it started a new session and connected twice. 
>> Second, I am very interested in using the HMS thrift API, but I could not found an example of how to use it in C/C++ to access hive table info. Do you know some link about it?
>> Really thank you for your time!!
>> 
>> Best regards,
>> Hou
>> 
>> $time ./hcat.py -e "use default; show table extended like haha;"
>> 18/04/24 15:47:08 INFO conf.HiveConf: Found configuration file file:/usr/local/hive/conf/hive-site.xml
>> 18/04/24 15:47:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>> 18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory: /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f
>> 18/04/24 15:47:10 INFO session.SessionState: Created local directory: /tmp/hive/java/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f
>> 18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory: /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f/_tmp_space.db
>> 18/04/24 15:47:10 INFO ql.Driver: Compiling command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5): use default
>> 18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:9083
>> 18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore, current connections: 1
>> 18/04/24 15:47:12 INFO hive.metastore: Connected to metastore.
>> 18/04/24 15:47:12 INFO ql.Driver: Semantic Analysis Completed
>> 18/04/24 15:47:12 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>> 18/04/24 15:47:12 INFO ql.Driver: Completed compiling command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5); Time taken: 1.591 seconds
>> 18/04/24 15:47:12 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
>> 18/04/24 15:47:12 INFO ql.Driver: Executing command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5): use default
>> 18/04/24 15:47:12 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f, clientType=HIVECLI]
>> 18/04/24 15:47:12 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
>> 18/04/24 15:47:12 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
>> 18/04/24 15:47:12 INFO hive.metastore: Closed a connection to metastore, current connections: 0
>> 18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:9083
>> 18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore, current connections: 1
>> 18/04/24 15:47:12 INFO hive.metastore: Connected to metastore.
>> 18/04/24 15:47:12 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
>> 18/04/24 15:47:12 INFO ql.Driver: Completed executing command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5); Time taken: 0.119 seconds
>> OK
>> 18/04/24 15:47:12 INFO ql.Driver: OK
>> Time taken: 1.728 seconds
>> 18/04/24 15:47:12 INFO ql.Driver: Compiling command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59): show table extended like haha
>> 18/04/24 15:47:12 INFO ql.Driver: Semantic Analysis Completed
>> 18/04/24 15:47:12 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
>> 18/04/24 15:47:12 INFO exec.ListSinkOperator: Initializing operator LIST_SINK[0]
>> 18/04/24 15:47:12 INFO ql.Driver: Completed compiling command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59); Time taken: 0.166 seconds
>> 18/04/24 15:47:12 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
>> 18/04/24 15:47:12 INFO ql.Driver: Executing command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59): show table extended like haha
>> 18/04/24 15:47:12 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
>> 18/04/24 15:47:12 INFO exec.DDLTask: pattern: haha
>> 18/04/24 15:47:12 INFO exec.DDLTask: results : 1
>> 18/04/24 15:47:12 INFO ql.Driver: Completed executing command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59); Time taken: 0.187 seconds
>> OK
>> 18/04/24 15:47:12 INFO ql.Driver: OK
>> 18/04/24 15:47:12 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
>> 18/04/24 15:47:12 INFO mapred.FileInputFormat: Total input paths to process : 1
>> 18/04/24 15:47:12 INFO exec.ListSinkOperator: Closing operator LIST_SINK[0]
>> tableName:haha
>> owner:kousouda
>> location:hdfs://localhost:8020/user/hive/warehouse/haha
>> inputformat:org.apache.hadoop.mapred.TextInputFormat
>> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>> columns:struct columns { i32 id}
>> partitioned:false
>> partitionColumns:
>> totalNumberFiles:2
>> totalFileSize:4
>> maxFileSize:2
>> minFileSize:2
>> lastAccessTime:1524535110334
>> lastUpdateTime:1524535113101
>> 
>> Time taken: 0.394 seconds
>> 18/04/24 15:47:12 INFO session.SessionState: Deleted directory: /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f on fs with scheme hdfs
>> 18/04/24 15:47:12 INFO session.SessionState: Deleted directory: /tmp/hive/java/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f on fs with scheme file
>> 18/04/24 15:47:12 INFO hive.metastore: Closed a connection to metastore, current connections: 0 
>> 
>> real	0m5.593s
>> user	0m8.645s
>> sys	0m0.523s
>>> 在 2018年4月23日,下午3:57,Peter Vary <pv...@cloudera.com> 写道:
>>> 
>>> Hi,
>>> 
>>> Disclaimer: I am not too familiar with the webhcat yet.
>>> From the logs, I see, that:
>>> - the first 3 seconds spent on starting a new session, and maybe a driver - this can be reduced, if the session is already there, and the HiveServer2 is started (but do not know if webhcat could use HS2, or reuse sessions) - this delay could be avoided if you use any of the 3 solutions suggested in my last mail.
>>> - the next 3 seconds spent on initializing the metastore. This can be reduced if a standalone metastore is started, and the webhcat is configured to access this metastore.
>>> 
>>> Hope this helps,
>>> Peter
>>> 
>>>> On Apr 23, 2018, at 9:27 AM, 侯宗田 <zo...@icloud.com> wrote:
>>>> 
>>>> Thank you very much for your reply, I am wondering whether I use the webhcat rightly, I don’t think it is normal to create all the directories and objects to get a table describ and take 8 seconds. The webhcat should not be so slow, Or it is because I forget to start some server which can respond immediately?   
>>>>> 在 2018年4月23日,下午3:06,Peter Vary <pv...@cloudera.com> 写道:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Alexander Kolbasov has a project which might interest you (keeping in mind,
>>>>> that this is not production ready - more like a proof of concept):
>>>>> https://github.com/akolb1/gometastore/blob/master/hmstool/doc/hmstool.md
>>>>> 
>>>>> Also you can use HMS thrift API directly to access the MetaStore, or if you
>>>>> can/want write java code, you can use HiveMetastoreClient class to do it in
>>>>> java.
>>>>> 
>>>>> I am not sure about the performance gains compared to HCat, but currently
>>>>> there are no faster interfaces for HMS that I know of.
>>>>> 
>>>>> Regards,
>>>>> Peter
>>>>> 
>>>>> 
>>>>> 侯宗田 <zo...@icloud.com> ezt írta (időpont: 2018. ápr. 23., Hét 2:40):
>>>>> 
>>>>>> Can anyone give me some suggestions? I have been stuck in this problem for
>>>>>> several days. Need help!!
>>>>>>> 在 2018年4月22日,下午9:38,侯宗田 <zo...@icloud.com> 写道:
>>>>>>> 
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I am writing a application which needs the metastore about hive tables.
>>>>>> I have used webhcat to get the information about tables and process them.
>>>>>> But a simple request takes over eight seconds to respond on localhost. Why
>>>>>> is this so slow, and how can I fix it or is there other way I can extract
>>>>>> the metadata in C?
>>>>>>> 
>>>>>>> $ time curl -s '
>>>>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>>>>> <
>>>>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>>>>>> '
>>>>>>> {"columns":
>>>>>>> [{"name":"id","type":"int"}],
>>>>>>> "database":"default",
>>>>>>> "table":"haha"}
>>>>>>> 
>>>>>>> real    0m8.400s
>>>>>>> user    0m0.053s
>>>>>>> sys     0m0.019s
>>>>>>> it seems to run a hcat.py, and it create a bunch of things then clear
>>>>>> them, it takes very long time, does anyone have some ideas about it?? Any
>>>>>> suggestions will be very appreciated!
>>>>>>> 
>>>>>>> $hcat.py -e "use default; desc haha; "
>>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>>>> SLF4J: Found binding in
>>>>>> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: Found binding in
>>>>>> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <
>>>>>> http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
>>>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>>>>> 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file
>>>>>> file:/usr/local/hive/conf/hive-site.xml
>>>>>>> 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load
>>>>>> native-hadoop library for your platform... using builtin-java classes where
>>>>>> applicable
>>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>>>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created local directory:
>>>>>> /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>>>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
>>>>>>> 18/04/21 16:38:16 INFO ql.Driver: Compiling
>>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>>>>> use default
>>>>>>> 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store
>>>>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>>>>> 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize
>>>>>> called
>>>>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>>>>> hive.metastore.integral.jdo.pushdown unknown - will be ignored
>>>>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>>>>> datanucleus.cache.level2 unknown - will be ignored
>>>>>>> 18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object
>>>>>> pin classes with
>>>>>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>>>>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>>>>>> underlying DB is MYSQL
>>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in
>>>>>> metastore
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in
>>>>>> metastore
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in
>>>>>> admin role, since config is empty
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions
>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>> ip=unknown-ip-addr      cmd=get_all_functions
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Semantic Analysis Completed
>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Returning Hive schema:
>>>>>> Schema(fieldSchemas:null, properties:null)
>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed compiling
>>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>>>>>> Time taken: 3.936 seconds
>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Concurrency mode is disabled, not
>>>>>> creating a lock manager
>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Executing
>>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>>>>> use default
>>>>>>> 18/04/21 16:38:20 INFO sqlstd.SQLStdHiveAccessController: Created
>>>>>> SQLStdHiveAccessController for session context : HiveAuthzSessionContext
>>>>>> [sessionString=05096382-f9b6-4dae-aee2-dfa6750c0668, clientType=HIVECLI]
>>>>>>> 18/04/21 16:38:20 WARN session.SessionState: METASTORE_FILTER_HOOK will
>>>>>> be ignored, since hive.security.authorization.manager is set to instance of
>>>>>> HiveAuthorizerFactory.
>>>>>>> 18/04/21 16:38:20 INFO hive.metastore: Mestastore configuration
>>>>>> hive.metastore.filter.hook changed from
>>>>>> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to
>>>>>> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Cleaning up thread
>>>>>> local RawStore...
>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>> ip=unknown-ip-addr      cmd=Cleaning up thread local RawStore...
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Done cleaning up
>>>>>> thread local RawStore
>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>> ip=unknown-ip-addr      cmd=Done cleaning up thread local RawStore
>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Starting task [Stage-0:DDL] in serial
>>>>>> mode
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Opening raw store
>>>>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: ObjectStore, initialize
>>>>>> called
>>>>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>>>>>> underlying DB is MYSQL
>>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed executing
>>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>>>>>> Time taken: 0.202 seconds
>>>>>>> OK
>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
> 


Re: how to extract metadata of hive tables in speed

Posted by Peter Vary <pv...@cloudera.com>.
Hi Hou,

Kudu uses the Thrift HMS interface, and written in C. An example could be found here:
https://github.com/apache/kudu/tree/master/src/kudu/hms <https://github.com/apache/kudu/tree/master/src/kudu/hms>

As for parametrizing Hcatalog I have only found this:
https://cwiki.apache.org/confluence/display/Hive/HCatalog+Configuration+Properties <https://cwiki.apache.org/confluence/display/Hive/HCatalog+Configuration+Properties>
But have not find anything there which might help you there.

Peter

> On Apr 24, 2018, at 10:51 AM, 侯宗田 <zo...@icloud.com> wrote:
> 
> Hi, Peter:
> I have started a standalone metastore server and it indeed short that part of time, it does connection instead of initialization. But I still have some questions,
> First, I believe the Hcatalog must be quick because it is a mature product and I have not seen others complaining about this problem, is there some configuration which controls starting new session or how to keep a session connected to the HMS, in the log below it started a new session and connected twice. 
> Second, I am very interested in using the HMS thrift API, but I could not found an example of how to use it in C/C++ to access hive table info. Do you know some link about it?
> Really thank you for your time!!
> 
> Best regards,
> Hou
> 
> $time ./hcat.py -e "use default; show table extended like haha;"
> 18/04/24 15:47:08 INFO conf.HiveConf: Found configuration file file:/usr/local/hive/conf/hive-site.xml
> 18/04/24 15:47:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory: /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f
> 18/04/24 15:47:10 INFO session.SessionState: Created local directory: /tmp/hive/java/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f
> 18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory: /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f/_tmp_space.db
> 18/04/24 15:47:10 INFO ql.Driver: Compiling command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5): use default
> 18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:9083
> 18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore, current connections: 1
> 18/04/24 15:47:12 INFO hive.metastore: Connected to metastore.
> 18/04/24 15:47:12 INFO ql.Driver: Semantic Analysis Completed
> 18/04/24 15:47:12 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
> 18/04/24 15:47:12 INFO ql.Driver: Completed compiling command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5); Time taken: 1.591 seconds
> 18/04/24 15:47:12 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
> 18/04/24 15:47:12 INFO ql.Driver: Executing command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5): use default
> 18/04/24 15:47:12 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f, clientType=HIVECLI]
> 18/04/24 15:47:12 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
> 18/04/24 15:47:12 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
> 18/04/24 15:47:12 INFO hive.metastore: Closed a connection to metastore, current connections: 0
> 18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:9083
> 18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore, current connections: 1
> 18/04/24 15:47:12 INFO hive.metastore: Connected to metastore.
> 18/04/24 15:47:12 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
> 18/04/24 15:47:12 INFO ql.Driver: Completed executing command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5); Time taken: 0.119 seconds
> OK
> 18/04/24 15:47:12 INFO ql.Driver: OK
> Time taken: 1.728 seconds
> 18/04/24 15:47:12 INFO ql.Driver: Compiling command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59): show table extended like haha
> 18/04/24 15:47:12 INFO ql.Driver: Semantic Analysis Completed
> 18/04/24 15:47:12 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
> 18/04/24 15:47:12 INFO exec.ListSinkOperator: Initializing operator LIST_SINK[0]
> 18/04/24 15:47:12 INFO ql.Driver: Completed compiling command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59); Time taken: 0.166 seconds
> 18/04/24 15:47:12 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
> 18/04/24 15:47:12 INFO ql.Driver: Executing command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59): show table extended like haha
> 18/04/24 15:47:12 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
> 18/04/24 15:47:12 INFO exec.DDLTask: pattern: haha
> 18/04/24 15:47:12 INFO exec.DDLTask: results : 1
> 18/04/24 15:47:12 INFO ql.Driver: Completed executing command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59); Time taken: 0.187 seconds
> OK
> 18/04/24 15:47:12 INFO ql.Driver: OK
> 18/04/24 15:47:12 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
> 18/04/24 15:47:12 INFO mapred.FileInputFormat: Total input paths to process : 1
> 18/04/24 15:47:12 INFO exec.ListSinkOperator: Closing operator LIST_SINK[0]
> tableName:haha
> owner:kousouda
> location:hdfs://localhost:8020/user/hive/warehouse/haha
> inputformat:org.apache.hadoop.mapred.TextInputFormat
> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> columns:struct columns { i32 id}
> partitioned:false
> partitionColumns:
> totalNumberFiles:2
> totalFileSize:4
> maxFileSize:2
> minFileSize:2
> lastAccessTime:1524535110334
> lastUpdateTime:1524535113101
> 
> Time taken: 0.394 seconds
> 18/04/24 15:47:12 INFO session.SessionState: Deleted directory: /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f on fs with scheme hdfs
> 18/04/24 15:47:12 INFO session.SessionState: Deleted directory: /tmp/hive/java/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f on fs with scheme file
> 18/04/24 15:47:12 INFO hive.metastore: Closed a connection to metastore, current connections: 0 
> 
> real	0m5.593s
> user	0m8.645s
> sys	0m0.523s
>> 在 2018年4月23日,下午3:57,Peter Vary <pv...@cloudera.com> 写道:
>> 
>> Hi,
>> 
>> Disclaimer: I am not too familiar with the webhcat yet.
>> From the logs, I see, that:
>> - the first 3 seconds spent on starting a new session, and maybe a driver - this can be reduced, if the session is already there, and the HiveServer2 is started (but do not know if webhcat could use HS2, or reuse sessions) - this delay could be avoided if you use any of the 3 solutions suggested in my last mail.
>> - the next 3 seconds spent on initializing the metastore. This can be reduced if a standalone metastore is started, and the webhcat is configured to access this metastore.
>> 
>> Hope this helps,
>> Peter
>> 
>>> On Apr 23, 2018, at 9:27 AM, 侯宗田 <zo...@icloud.com> wrote:
>>> 
>>> Thank you very much for your reply, I am wondering whether I use the webhcat rightly, I don’t think it is normal to create all the directories and objects to get a table describ and take 8 seconds. The webhcat should not be so slow, Or it is because I forget to start some server which can respond immediately?   
>>>> 在 2018年4月23日,下午3:06,Peter Vary <pv...@cloudera.com> 写道:
>>>> 
>>>> Hi,
>>>> 
>>>> Alexander Kolbasov has a project which might interest you (keeping in mind,
>>>> that this is not production ready - more like a proof of concept):
>>>> https://github.com/akolb1/gometastore/blob/master/hmstool/doc/hmstool.md
>>>> 
>>>> Also you can use HMS thrift API directly to access the MetaStore, or if you
>>>> can/want write java code, you can use HiveMetastoreClient class to do it in
>>>> java.
>>>> 
>>>> I am not sure about the performance gains compared to HCat, but currently
>>>> there are no faster interfaces for HMS that I know of.
>>>> 
>>>> Regards,
>>>> Peter
>>>> 
>>>> 
>>>> 侯宗田 <zo...@icloud.com> ezt írta (időpont: 2018. ápr. 23., Hét 2:40):
>>>> 
>>>>> Can anyone give me some suggestions? I have been stuck in this problem for
>>>>> several days. Need help!!
>>>>>> 在 2018年4月22日,下午9:38,侯宗田 <zo...@icloud.com> 写道:
>>>>>> 
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I am writing a application which needs the metastore about hive tables.
>>>>> I have used webhcat to get the information about tables and process them.
>>>>> But a simple request takes over eight seconds to respond on localhost. Why
>>>>> is this so slow, and how can I fix it or is there other way I can extract
>>>>> the metadata in C?
>>>>>> 
>>>>>> $ time curl -s '
>>>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>>>> <
>>>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>>>>> '
>>>>>> {"columns":
>>>>>> [{"name":"id","type":"int"}],
>>>>>> "database":"default",
>>>>>> "table":"haha"}
>>>>>> 
>>>>>> real    0m8.400s
>>>>>> user    0m0.053s
>>>>>> sys     0m0.019s
>>>>>> it seems to run a hcat.py, and it create a bunch of things then clear
>>>>> them, it takes very long time, does anyone have some ideas about it?? Any
>>>>> suggestions will be very appreciated!
>>>>>> 
>>>>>> $hcat.py -e "use default; desc haha; "
>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>>> SLF4J: Found binding in
>>>>> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>> SLF4J: Found binding in
>>>>> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <
>>>>> http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
>>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>>>> 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file
>>>>> file:/usr/local/hive/conf/hive-site.xml
>>>>>> 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load
>>>>> native-hadoop library for your platform... using builtin-java classes where
>>>>> applicable
>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created local directory:
>>>>> /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
>>>>>> 18/04/21 16:38:16 INFO ql.Driver: Compiling
>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>>>> use default
>>>>>> 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store
>>>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>>>> 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize
>>>>> called
>>>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>>>> hive.metastore.integral.jdo.pushdown unknown - will be ignored
>>>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>>>> datanucleus.cache.level2 unknown - will be ignored
>>>>>> 18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object
>>>>> pin classes with
>>>>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>>>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>>>>> underlying DB is MYSQL
>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in
>>>>> metastore
>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in
>>>>> metastore
>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in
>>>>> admin role, since config is empty
>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions
>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>> ip=unknown-ip-addr      cmd=get_all_functions
>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Semantic Analysis Completed
>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Returning Hive schema:
>>>>> Schema(fieldSchemas:null, properties:null)
>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed compiling
>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>>>>> Time taken: 3.936 seconds
>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Concurrency mode is disabled, not
>>>>> creating a lock manager
>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Executing
>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>>>> use default
>>>>>> 18/04/21 16:38:20 INFO sqlstd.SQLStdHiveAccessController: Created
>>>>> SQLStdHiveAccessController for session context : HiveAuthzSessionContext
>>>>> [sessionString=05096382-f9b6-4dae-aee2-dfa6750c0668, clientType=HIVECLI]
>>>>>> 18/04/21 16:38:20 WARN session.SessionState: METASTORE_FILTER_HOOK will
>>>>> be ignored, since hive.security.authorization.manager is set to instance of
>>>>> HiveAuthorizerFactory.
>>>>>> 18/04/21 16:38:20 INFO hive.metastore: Mestastore configuration
>>>>> hive.metastore.filter.hook changed from
>>>>> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to
>>>>> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Cleaning up thread
>>>>> local RawStore...
>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>> ip=unknown-ip-addr      cmd=Cleaning up thread local RawStore...
>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Done cleaning up
>>>>> thread local RawStore
>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>> ip=unknown-ip-addr      cmd=Done cleaning up thread local RawStore
>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Starting task [Stage-0:DDL] in serial
>>>>> mode
>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Opening raw store
>>>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: ObjectStore, initialize
>>>>> called
>>>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>>>>> underlying DB is MYSQL
>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed executing
>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>>>>> Time taken: 0.202 seconds
>>>>>> OK
>>>>> 
>>>>> 
>>> 
>> 
> 


Re: how to extract metadata of hive tables in speed

Posted by 侯宗田 <zo...@icloud.com>.
Hi, Peter:
I have started a standalone metastore server and it indeed short that part of time, it does connection instead of initialization. But I still have some questions,
First, I believe the Hcatalog must be quick because it is a mature product and I have not seen others complaining about this problem, is there some configuration which controls starting new session or how to keep a session connected to the HMS, in the log below it started a new session and connected twice. 
Second, I am very interested in using the HMS thrift API, but I could not found an example of how to use it in C/C++ to access hive table info. Do you know some link about it?
Really thank you for your time!!

Best regards,
Hou

$time ./hcat.py -e "use default; show table extended like haha;"
18/04/24 15:47:08 INFO conf.HiveConf: Found configuration file file:/usr/local/hive/conf/hive-site.xml
18/04/24 15:47:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory: /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f
18/04/24 15:47:10 INFO session.SessionState: Created local directory: /tmp/hive/java/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f
18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory: /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f/_tmp_space.db
18/04/24 15:47:10 INFO ql.Driver: Compiling command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5): use default
18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:9083
18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore, current connections: 1
18/04/24 15:47:12 INFO hive.metastore: Connected to metastore.
18/04/24 15:47:12 INFO ql.Driver: Semantic Analysis Completed
18/04/24 15:47:12 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
18/04/24 15:47:12 INFO ql.Driver: Completed compiling command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5); Time taken: 1.591 seconds
18/04/24 15:47:12 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
18/04/24 15:47:12 INFO ql.Driver: Executing command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5): use default
18/04/24 15:47:12 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f, clientType=HIVECLI]
18/04/24 15:47:12 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
18/04/24 15:47:12 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
18/04/24 15:47:12 INFO hive.metastore: Closed a connection to metastore, current connections: 0
18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:9083
18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore, current connections: 1
18/04/24 15:47:12 INFO hive.metastore: Connected to metastore.
18/04/24 15:47:12 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
18/04/24 15:47:12 INFO ql.Driver: Completed executing command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5); Time taken: 0.119 seconds
OK
18/04/24 15:47:12 INFO ql.Driver: OK
Time taken: 1.728 seconds
18/04/24 15:47:12 INFO ql.Driver: Compiling command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59): show table extended like haha
18/04/24 15:47:12 INFO ql.Driver: Semantic Analysis Completed
18/04/24 15:47:12 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
18/04/24 15:47:12 INFO exec.ListSinkOperator: Initializing operator LIST_SINK[0]
18/04/24 15:47:12 INFO ql.Driver: Completed compiling command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59); Time taken: 0.166 seconds
18/04/24 15:47:12 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
18/04/24 15:47:12 INFO ql.Driver: Executing command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59): show table extended like haha
18/04/24 15:47:12 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
18/04/24 15:47:12 INFO exec.DDLTask: pattern: haha
18/04/24 15:47:12 INFO exec.DDLTask: results : 1
18/04/24 15:47:12 INFO ql.Driver: Completed executing command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59); Time taken: 0.187 seconds
OK
18/04/24 15:47:12 INFO ql.Driver: OK
18/04/24 15:47:12 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
18/04/24 15:47:12 INFO mapred.FileInputFormat: Total input paths to process : 1
18/04/24 15:47:12 INFO exec.ListSinkOperator: Closing operator LIST_SINK[0]
tableName:haha
owner:kousouda
location:hdfs://localhost:8020/user/hive/warehouse/haha
inputformat:org.apache.hadoop.mapred.TextInputFormat
outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
columns:struct columns { i32 id}
partitioned:false
partitionColumns:
totalNumberFiles:2
totalFileSize:4
maxFileSize:2
minFileSize:2
lastAccessTime:1524535110334
lastUpdateTime:1524535113101

Time taken: 0.394 seconds
18/04/24 15:47:12 INFO session.SessionState: Deleted directory: /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f on fs with scheme hdfs
18/04/24 15:47:12 INFO session.SessionState: Deleted directory: /tmp/hive/java/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f on fs with scheme file
18/04/24 15:47:12 INFO hive.metastore: Closed a connection to metastore, current connections: 0 

real	0m5.593s
user	0m8.645s
sys	0m0.523s
> 在 2018年4月23日,下午3:57,Peter Vary <pv...@cloudera.com> 写道:
> 
> Hi,
> 
> Disclaimer: I am not too familiar with the webhcat yet.
> From the logs, I see, that:
> - the first 3 seconds spent on starting a new session, and maybe a driver - this can be reduced, if the session is already there, and the HiveServer2 is started (but do not know if webhcat could use HS2, or reuse sessions) - this delay could be avoided if you use any of the 3 solutions suggested in my last mail.
> - the next 3 seconds spent on initializing the metastore. This can be reduced if a standalone metastore is started, and the webhcat is configured to access this metastore.
> 
> Hope this helps,
> Peter
> 
>> On Apr 23, 2018, at 9:27 AM, 侯宗田 <zo...@icloud.com> wrote:
>> 
>> Thank you very much for your reply, I am wondering whether I use the webhcat rightly, I don’t think it is normal to create all the directories and objects to get a table describ and take 8 seconds. The webhcat should not be so slow, Or it is because I forget to start some server which can respond immediately?   
>>> 在 2018年4月23日,下午3:06,Peter Vary <pv...@cloudera.com> 写道:
>>> 
>>> Hi,
>>> 
>>> Alexander Kolbasov has a project which might interest you (keeping in mind,
>>> that this is not production ready - more like a proof of concept):
>>> https://github.com/akolb1/gometastore/blob/master/hmstool/doc/hmstool.md
>>> 
>>> Also you can use HMS thrift API directly to access the MetaStore, or if you
>>> can/want write java code, you can use HiveMetastoreClient class to do it in
>>> java.
>>> 
>>> I am not sure about the performance gains compared to HCat, but currently
>>> there are no faster interfaces for HMS that I know of.
>>> 
>>> Regards,
>>> Peter
>>> 
>>> 
>>> 侯宗田 <zo...@icloud.com> ezt írta (időpont: 2018. ápr. 23., Hét 2:40):
>>> 
>>>> Can anyone give me some suggestions? I have been stuck in this problem for
>>>> several days. Need help!!
>>>>> 在 2018年4月22日,下午9:38,侯宗田 <zo...@icloud.com> 写道:
>>>>> 
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am writing a application which needs the metastore about hive tables.
>>>> I have used webhcat to get the information about tables and process them.
>>>> But a simple request takes over eight seconds to respond on localhost. Why
>>>> is this so slow, and how can I fix it or is there other way I can extract
>>>> the metadata in C?
>>>>> 
>>>>> $ time curl -s '
>>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>>> <
>>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>>>> '
>>>>> {"columns":
>>>>> [{"name":"id","type":"int"}],
>>>>> "database":"default",
>>>>> "table":"haha"}
>>>>> 
>>>>> real    0m8.400s
>>>>> user    0m0.053s
>>>>> sys     0m0.019s
>>>>> it seems to run a hcat.py, and it create a bunch of things then clear
>>>> them, it takes very long time, does anyone have some ideas about it?? Any
>>>> suggestions will be very appreciated!
>>>>> 
>>>>> $hcat.py -e "use default; desc haha; "
>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>> SLF4J: Found binding in
>>>> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> SLF4J: Found binding in
>>>> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <
>>>> http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>>> 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file
>>>> file:/usr/local/hive/conf/hive-site.xml
>>>>> 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load
>>>> native-hadoop library for your platform... using builtin-java classes where
>>>> applicable
>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created local directory:
>>>> /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
>>>>> 18/04/21 16:38:16 INFO ql.Driver: Compiling
>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>>> use default
>>>>> 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store
>>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>>> 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize
>>>> called
>>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>>> hive.metastore.integral.jdo.pushdown unknown - will be ignored
>>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>>> datanucleus.cache.level2 unknown - will be ignored
>>>>> 18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object
>>>> pin classes with
>>>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>>>> underlying DB is MYSQL
>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in
>>>> metastore
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in
>>>> metastore
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in
>>>> admin role, since config is empty
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions
>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>> ip=unknown-ip-addr      cmd=get_all_functions
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>> 18/04/21 16:38:20 INFO ql.Driver: Semantic Analysis Completed
>>>>> 18/04/21 16:38:20 INFO ql.Driver: Returning Hive schema:
>>>> Schema(fieldSchemas:null, properties:null)
>>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed compiling
>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>>>> Time taken: 3.936 seconds
>>>>> 18/04/21 16:38:20 INFO ql.Driver: Concurrency mode is disabled, not
>>>> creating a lock manager
>>>>> 18/04/21 16:38:20 INFO ql.Driver: Executing
>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>>> use default
>>>>> 18/04/21 16:38:20 INFO sqlstd.SQLStdHiveAccessController: Created
>>>> SQLStdHiveAccessController for session context : HiveAuthzSessionContext
>>>> [sessionString=05096382-f9b6-4dae-aee2-dfa6750c0668, clientType=HIVECLI]
>>>>> 18/04/21 16:38:20 WARN session.SessionState: METASTORE_FILTER_HOOK will
>>>> be ignored, since hive.security.authorization.manager is set to instance of
>>>> HiveAuthorizerFactory.
>>>>> 18/04/21 16:38:20 INFO hive.metastore: Mestastore configuration
>>>> hive.metastore.filter.hook changed from
>>>> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to
>>>> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Cleaning up thread
>>>> local RawStore...
>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>> ip=unknown-ip-addr      cmd=Cleaning up thread local RawStore...
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Done cleaning up
>>>> thread local RawStore
>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>> ip=unknown-ip-addr      cmd=Done cleaning up thread local RawStore
>>>>> 18/04/21 16:38:20 INFO ql.Driver: Starting task [Stage-0:DDL] in serial
>>>> mode
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Opening raw store
>>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: ObjectStore, initialize
>>>> called
>>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>>>> underlying DB is MYSQL
>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed executing
>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>>>> Time taken: 0.202 seconds
>>>>> OK
>>>> 
>>>> 
>> 
> 


Re: how to extract metadata of hive tables in speed

Posted by Peter Vary <pv...@cloudera.com>.
Hi,

Disclaimer: I am not too familiar with the webhcat yet.
From the logs, I see, that:
- the first 3 seconds spent on starting a new session, and maybe a driver - this can be reduced, if the session is already there, and the HiveServer2 is started (but do not know if webhcat could use HS2, or reuse sessions) - this delay could be avoided if you use any of the 3 solutions suggested in my last mail.
- the next 3 seconds spent on initializing the metastore. This can be reduced if a standalone metastore is started, and the webhcat is configured to access this metastore.

Hope this helps,
Peter

> On Apr 23, 2018, at 9:27 AM, 侯宗田 <zo...@icloud.com> wrote:
> 
> Thank you very much for your reply, I am wondering whether I use the webhcat rightly, I don’t think it is normal to create all the directories and objects to get a table describ and take 8 seconds. The webhcat should not be so slow, Or it is because I forget to start some server which can respond immediately?   
>> 在 2018年4月23日,下午3:06,Peter Vary <pv...@cloudera.com> 写道:
>> 
>> Hi,
>> 
>> Alexander Kolbasov has a project which might interest you (keeping in mind,
>> that this is not production ready - more like a proof of concept):
>> https://github.com/akolb1/gometastore/blob/master/hmstool/doc/hmstool.md
>> 
>> Also you can use HMS thrift API directly to access the MetaStore, or if you
>> can/want write java code, you can use HiveMetastoreClient class to do it in
>> java.
>> 
>> I am not sure about the performance gains compared to HCat, but currently
>> there are no faster interfaces for HMS that I know of.
>> 
>> Regards,
>> Peter
>> 
>> 
>> 侯宗田 <zo...@icloud.com> ezt írta (időpont: 2018. ápr. 23., Hét 2:40):
>> 
>>> Can anyone give me some suggestions? I have been stuck in this problem for
>>> several days. Need help!!
>>>> 在 2018年4月22日,下午9:38,侯宗田 <zo...@icloud.com> 写道:
>>>> 
>>>> 
>>>> Hi,
>>>> 
>>>> I am writing a application which needs the metastore about hive tables.
>>> I have used webhcat to get the information about tables and process them.
>>> But a simple request takes over eight seconds to respond on localhost. Why
>>> is this so slow, and how can I fix it or is there other way I can extract
>>> the metadata in C?
>>>> 
>>>> $ time curl -s '
>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>> <
>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>>> '
>>>> {"columns":
>>>> [{"name":"id","type":"int"}],
>>>> "database":"default",
>>>> "table":"haha"}
>>>> 
>>>> real    0m8.400s
>>>> user    0m0.053s
>>>> sys     0m0.019s
>>>> it seems to run a hcat.py, and it create a bunch of things then clear
>>> them, it takes very long time, does anyone have some ideas about it?? Any
>>> suggestions will be very appreciated!
>>>> 
>>>> $hcat.py -e "use default; desc haha; "
>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>> SLF4J: Found binding in
>>> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <
>>> http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>> 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file
>>> file:/usr/local/hive/conf/hive-site.xml
>>>> 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>>> 18/04/21 16:38:16 INFO session.SessionState: Created local directory:
>>> /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
>>>> 18/04/21 16:38:16 INFO ql.Driver: Compiling
>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>> use default
>>>> 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store
>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>> 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize
>>> called
>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>> hive.metastore.integral.jdo.pushdown unknown - will be ignored
>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>> datanucleus.cache.level2 unknown - will be ignored
>>>> 18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object
>>> pin classes with
>>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>>> underlying DB is MYSQL
>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in
>>> metastore
>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in
>>> metastore
>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in
>>> admin role, since config is empty
>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions
>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>> ip=unknown-ip-addr      cmd=get_all_functions
>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>> ip=unknown-ip-addr      cmd=get_database: default
>>>> 18/04/21 16:38:20 INFO ql.Driver: Semantic Analysis Completed
>>>> 18/04/21 16:38:20 INFO ql.Driver: Returning Hive schema:
>>> Schema(fieldSchemas:null, properties:null)
>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed compiling
>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>>> Time taken: 3.936 seconds
>>>> 18/04/21 16:38:20 INFO ql.Driver: Concurrency mode is disabled, not
>>> creating a lock manager
>>>> 18/04/21 16:38:20 INFO ql.Driver: Executing
>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>> use default
>>>> 18/04/21 16:38:20 INFO sqlstd.SQLStdHiveAccessController: Created
>>> SQLStdHiveAccessController for session context : HiveAuthzSessionContext
>>> [sessionString=05096382-f9b6-4dae-aee2-dfa6750c0668, clientType=HIVECLI]
>>>> 18/04/21 16:38:20 WARN session.SessionState: METASTORE_FILTER_HOOK will
>>> be ignored, since hive.security.authorization.manager is set to instance of
>>> HiveAuthorizerFactory.
>>>> 18/04/21 16:38:20 INFO hive.metastore: Mestastore configuration
>>> hive.metastore.filter.hook changed from
>>> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to
>>> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Cleaning up thread
>>> local RawStore...
>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>> ip=unknown-ip-addr      cmd=Cleaning up thread local RawStore...
>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Done cleaning up
>>> thread local RawStore
>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>> ip=unknown-ip-addr      cmd=Done cleaning up thread local RawStore
>>>> 18/04/21 16:38:20 INFO ql.Driver: Starting task [Stage-0:DDL] in serial
>>> mode
>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>> ip=unknown-ip-addr      cmd=get_database: default
>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Opening raw store
>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: ObjectStore, initialize
>>> called
>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>>> underlying DB is MYSQL
>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>> ip=unknown-ip-addr      cmd=get_database: default
>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed executing
>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>>> Time taken: 0.202 seconds
>>>> OK
>>> 
>>> 
> 


Re: how to extract metadata of hive tables in speed

Posted by 侯宗田 <zo...@icloud.com>.
Thank you very much for your reply, I am wondering whether I use the webhcat rightly, I don’t think it is normal to create all the directories and objects to get a table describ and take 8 seconds. The webhcat should not be so slow, Or it is because I forget to start some server which can respond immediately?   
> 在 2018年4月23日,下午3:06,Peter Vary <pv...@cloudera.com> 写道:
> 
> Hi,
> 
> Alexander Kolbasov has a project which might interest you (keeping in mind,
> that this is not production ready - more like a proof of concept):
> https://github.com/akolb1/gometastore/blob/master/hmstool/doc/hmstool.md
> 
> Also you can use HMS thrift API directly to access the MetaStore, or if you
> can/want write java code, you can use HiveMetastoreClient class to do it in
> java.
> 
> I am not sure about the performance gains compared to HCat, but currently
> there are no faster interfaces for HMS that I know of.
> 
> Regards,
> Peter
> 
> 
> 侯宗田 <zo...@icloud.com> ezt írta (időpont: 2018. ápr. 23., Hét 2:40):
> 
>> Can anyone give me some suggestions? I have been stuck in this problem for
>> several days. Need help!!
>>> 在 2018年4月22日,下午9:38,侯宗田 <zo...@icloud.com> 写道:
>>> 
>>> 
>>> Hi,
>>> 
>>> I am writing a application which needs the metastore about hive tables.
>> I have used webhcat to get the information about tables and process them.
>> But a simple request takes over eight seconds to respond on localhost. Why
>> is this so slow, and how can I fix it or is there other way I can extract
>> the metadata in C?
>>> 
>>> $ time curl -s '
>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>> <
>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>> '
>>> {"columns":
>>> [{"name":"id","type":"int"}],
>>> "database":"default",
>>> "table":"haha"}
>>> 
>>> real    0m8.400s
>>> user    0m0.053s
>>> sys     0m0.019s
>>> it seems to run a hcat.py, and it create a bunch of things then clear
>> them, it takes very long time, does anyone have some ideas about it?? Any
>> suggestions will be very appreciated!
>>> 
>>> $hcat.py -e "use default; desc haha; "
>>> SLF4J: Class path contains multiple SLF4J bindings.
>>> SLF4J: Found binding in
>> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <
>> http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>> 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file
>> file:/usr/local/hive/conf/hive-site.xml
>>> 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>> 18/04/21 16:38:16 INFO session.SessionState: Created local directory:
>> /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
>>> 18/04/21 16:38:16 INFO ql.Driver: Compiling
>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>> use default
>>> 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store
>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>> 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize
>> called
>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>> hive.metastore.integral.jdo.pushdown unknown - will be ignored
>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>> datanucleus.cache.level2 unknown - will be ignored
>>> 18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object
>> pin classes with
>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>> underlying DB is MYSQL
>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in
>> metastore
>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in
>> metastore
>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in
>> admin role, since config is empty
>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions
>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>> ip=unknown-ip-addr      cmd=get_all_functions
>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>> ip=unknown-ip-addr      cmd=get_database: default
>>> 18/04/21 16:38:20 INFO ql.Driver: Semantic Analysis Completed
>>> 18/04/21 16:38:20 INFO ql.Driver: Returning Hive schema:
>> Schema(fieldSchemas:null, properties:null)
>>> 18/04/21 16:38:20 INFO ql.Driver: Completed compiling
>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>> Time taken: 3.936 seconds
>>> 18/04/21 16:38:20 INFO ql.Driver: Concurrency mode is disabled, not
>> creating a lock manager
>>> 18/04/21 16:38:20 INFO ql.Driver: Executing
>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>> use default
>>> 18/04/21 16:38:20 INFO sqlstd.SQLStdHiveAccessController: Created
>> SQLStdHiveAccessController for session context : HiveAuthzSessionContext
>> [sessionString=05096382-f9b6-4dae-aee2-dfa6750c0668, clientType=HIVECLI]
>>> 18/04/21 16:38:20 WARN session.SessionState: METASTORE_FILTER_HOOK will
>> be ignored, since hive.security.authorization.manager is set to instance of
>> HiveAuthorizerFactory.
>>> 18/04/21 16:38:20 INFO hive.metastore: Mestastore configuration
>> hive.metastore.filter.hook changed from
>> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to
>> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Cleaning up thread
>> local RawStore...
>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>> ip=unknown-ip-addr      cmd=Cleaning up thread local RawStore...
>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Done cleaning up
>> thread local RawStore
>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>> ip=unknown-ip-addr      cmd=Done cleaning up thread local RawStore
>>> 18/04/21 16:38:20 INFO ql.Driver: Starting task [Stage-0:DDL] in serial
>> mode
>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>> ip=unknown-ip-addr      cmd=get_database: default
>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Opening raw store
>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: ObjectStore, initialize
>> called
>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>> underlying DB is MYSQL
>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>> ip=unknown-ip-addr      cmd=get_database: default
>>> 18/04/21 16:38:20 INFO ql.Driver: Completed executing
>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>> Time taken: 0.202 seconds
>>> OK
>> 
>> 


Re: how to extract metadata of hive tables in speed

Posted by Peter Vary <pv...@cloudera.com>.
Hi,

Alexander Kolbasov has a project which might interest you (keeping in mind,
that this is not production ready - more like a proof of concept):
https://github.com/akolb1/gometastore/blob/master/hmstool/doc/hmstool.md

Also you can use HMS thrift API directly to access the MetaStore, or if you
can/want write java code, you can use HiveMetastoreClient class to do it in
java.

I am not sure about the performance gains compared to HCat, but currently
there are no faster interfaces for HMS that I know of.

Regards,
Peter


侯宗田 <zo...@icloud.com> ezt írta (időpont: 2018. ápr. 23., Hét 2:40):

> Can anyone give me some suggestions? I have been stuck in this problem for
> several days. Need help!!
> > 在 2018年4月22日,下午9:38,侯宗田 <zo...@icloud.com> 写道:
> >
> >
> > Hi,
> >
> > I am writing a application which needs the metastore about hive tables.
> I have used webhcat to get the information about tables and process them.
> But a simple request takes over eight seconds to respond on localhost. Why
> is this so slow, and how can I fix it or is there other way I can extract
> the metadata in C?
> >
> > $ time curl -s '
> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
> <
> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
> >'
> > {"columns":
> >  [{"name":"id","type":"int"}],
> >  "database":"default",
> >  "table":"haha"}
> >
> > real    0m8.400s
> > user    0m0.053s
> > sys     0m0.019s
> > it seems to run a hcat.py, and it create a bunch of things then clear
> them, it takes very long time, does anyone have some ideas about it?? Any
> suggestions will be very appreciated!
> >
> > $hcat.py -e "use default; desc haha; "
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: Found binding in
> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <
> http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
> > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> > 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file
> file:/usr/local/hive/conf/hive-site.xml
> > 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> > 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
> > 18/04/21 16:38:16 INFO session.SessionState: Created local directory:
> /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
> > 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
> > 18/04/21 16:38:16 INFO ql.Driver: Compiling
> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
> use default
> > 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store
> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
> > 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize
> called
> > 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
> hive.metastore.integral.jdo.pushdown unknown - will be ignored
> > 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
> datanucleus.cache.level2 unknown - will be ignored
> > 18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object
> pin classes with
> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
> > 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
> underlying DB is MYSQL
> > 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
> > 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in
> metastore
> > 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in
> metastore
> > 18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in
> admin role, since config is empty
> > 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions
> > 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
> ip=unknown-ip-addr      cmd=get_all_functions
> > 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
> > 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
> ip=unknown-ip-addr      cmd=get_database: default
> > 18/04/21 16:38:20 INFO ql.Driver: Semantic Analysis Completed
> > 18/04/21 16:38:20 INFO ql.Driver: Returning Hive schema:
> Schema(fieldSchemas:null, properties:null)
> > 18/04/21 16:38:20 INFO ql.Driver: Completed compiling
> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
> Time taken: 3.936 seconds
> > 18/04/21 16:38:20 INFO ql.Driver: Concurrency mode is disabled, not
> creating a lock manager
> > 18/04/21 16:38:20 INFO ql.Driver: Executing
> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
> use default
> > 18/04/21 16:38:20 INFO sqlstd.SQLStdHiveAccessController: Created
> SQLStdHiveAccessController for session context : HiveAuthzSessionContext
> [sessionString=05096382-f9b6-4dae-aee2-dfa6750c0668, clientType=HIVECLI]
> > 18/04/21 16:38:20 WARN session.SessionState: METASTORE_FILTER_HOOK will
> be ignored, since hive.security.authorization.manager is set to instance of
> HiveAuthorizerFactory.
> > 18/04/21 16:38:20 INFO hive.metastore: Mestastore configuration
> hive.metastore.filter.hook changed from
> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to
> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
> > 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Cleaning up thread
> local RawStore...
> > 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
> ip=unknown-ip-addr      cmd=Cleaning up thread local RawStore...
> > 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Done cleaning up
> thread local RawStore
> > 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
> ip=unknown-ip-addr      cmd=Done cleaning up thread local RawStore
> > 18/04/21 16:38:20 INFO ql.Driver: Starting task [Stage-0:DDL] in serial
> mode
> > 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
> > 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
> ip=unknown-ip-addr      cmd=get_database: default
> > 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Opening raw store
> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
> > 18/04/21 16:38:20 INFO metastore.ObjectStore: ObjectStore, initialize
> called
> > 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
> underlying DB is MYSQL
> > 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
> > 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
> > 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
> ip=unknown-ip-addr      cmd=get_database: default
> > 18/04/21 16:38:20 INFO ql.Driver: Completed executing
> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
> Time taken: 0.202 seconds
> > OK
>
>

Re: how to extract metadata of hive tables in speed

Posted by 侯宗田 <zo...@icloud.com>.
Can anyone give me some suggestions? I have been stuck in this problem for several days. Need help!!
> 在 2018年4月22日,下午9:38,侯宗田 <zo...@icloud.com> 写道:
> 
> 
> Hi,
> 
> I am writing a application which needs the metastore about hive tables. I have used webhcat to get the information about tables and process them. But a simple request takes over eight seconds to respond on localhost. Why is this so slow, and how can I fix it or is there other way I can extract the metadata in C?
> 
> $ time curl -s 'http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean <http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean>'
> {"columns": 
>  [{"name":"id","type":"int"}],
>  "database":"default",
>  "table":"haha"}
> 
> real    0m8.400s
> user    0m0.053s
> sys     0m0.019s
> it seems to run a hcat.py, and it create a bunch of things then clear them, it takes very long time, does anyone have some ideas about it?? Any suggestions will be very appreciated!
> 
> $hcat.py -e "use default; desc haha; "
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file file:/usr/local/hive/conf/hive-site.xml
> 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory: /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
> 18/04/21 16:38:16 INFO session.SessionState: Created local directory: /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory: /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
> 18/04/21 16:38:16 INFO ql.Driver: Compiling command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62): use default
> 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
> 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize called
> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
> 18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in metastore
> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in metastore
> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions
> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda	ip=unknown-ip-addr	cmd=get_all_functions
> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda	ip=unknown-ip-addr	cmd=get_database: default
> 18/04/21 16:38:20 INFO ql.Driver: Semantic Analysis Completed
> 18/04/21 16:38:20 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
> 18/04/21 16:38:20 INFO ql.Driver: Completed compiling command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62); Time taken: 3.936 seconds
> 18/04/21 16:38:20 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
> 18/04/21 16:38:20 INFO ql.Driver: Executing command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62): use default
> 18/04/21 16:38:20 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=05096382-f9b6-4dae-aee2-dfa6750c0668, clientType=HIVECLI]
> 18/04/21 16:38:20 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
> 18/04/21 16:38:20 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Cleaning up thread local RawStore...
> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda	ip=unknown-ip-addr	cmd=Cleaning up thread local RawStore...
> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Done cleaning up thread local RawStore
> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda	ip=unknown-ip-addr	cmd=Done cleaning up thread local RawStore
> 18/04/21 16:38:20 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda	ip=unknown-ip-addr	cmd=get_database: default
> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
> 18/04/21 16:38:20 INFO metastore.ObjectStore: ObjectStore, initialize called
> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda	ip=unknown-ip-addr	cmd=get_database: default
> 18/04/21 16:38:20 INFO ql.Driver: Completed executing command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62); Time taken: 0.202 seconds
> OK