You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/12/16 18:28:24 UTC
[GitHub] [flink] bowenli86 commented on a change in pull request #10581: [FLINK-15263][hive][doc] add dedicated page for HiveCatalog

bowenli86 commented on a change in pull request #10581: [FLINK-15263][hive][doc] add dedicated page for HiveCatalog
URL: https://github.com/apache/flink/pull/10581#discussion_r358393109
 
 

 ##########
 File path: docs/dev/table/hive/hive_catalog.md
 ##########
 @@ -0,0 +1,245 @@
+---
+title: "HiveCatalog"
+nav-parent_id: hive_tableapi
+nav-pos: 1
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+Hive Metastore has evolved into the de facto metadata hub over the years in Hadoop ecosystem. Many companies have a single
+Hive Metastore service instance in their production to manage all of their metadata, either Hive metadata or non-Hive metadata,
+ as the source of truth.
+ 
+For users who have both Hive and Flink deployments, `HiveCatalog` enables them to use Hive Metastore to manage Flink's metadata.
+
+For users who have just Flink deployment, `HiveCatalog` is the only persistent catalog provided out-of-box by Flink.
+Without a persistent catalog, users using [Flink SQL DDL]({{ site.baseurl }}/dev/table/sql.html#specifying-a-ddl) have to repeatedly
+create meta-objects like a Kafka table in each session, which wastes a lot of time. `HiveCatalog` fills this gap by empowering
+users create tables and other meta-objects only once, and reference and manage them with convenience later on across sessions.
+
+
+## Set up HiveCatalog
+
+### Dependencies
+
+Setting up a `HiveCatalog` in Flink requires the same [dependencies]({{ site.baseurl }}/dev/table/hive/#dependencies) 
+as those of an overall Flink-Hive integration.
+
+### Configuration
+
+Setting up a `HiveCatalog` in Flink requires the same [configuration]({{ site.baseurl }}/dev/table/hive/#connecting-to-hive) 
+as those of an overall Flink-Hive integration.
+
+
+## How to use HiveCatalog
+
+Once configured properly, `HiveCatalog` should just work out of box. Users can create Flink meta-objects with DDL, and shoud
+see them immediately afterwards.
+
+### Example
+
+We will walk thru a simple example here.
+
+#### step 1: set up a Hive Metastore
+
+Set up a local Hive Metastore, and a `hive-site.xml` file with all the necessary configs in local path `/opt/hive-conf/hive-site.xml`.
+
+{% highlight xml %}
+
+<configuration>
+   <property>
+      <name>javax.jdo.option.ConnectionURL</name>
+      <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
+      <description>metadata is stored in a MySQL server</description>
+   </property>
+
+   <property>
+      <name>javax.jdo.option.ConnectionDriverName</name>
+      <value>com.mysql.jdbc.Driver</value>
+      <description>MySQL JDBC driver class</description>
+   </property>
+
+   <property>
+      <name>javax.jdo.option.ConnectionUserName</name>
+      <value>...</value>
+      <description>user name for connecting to mysql server</description>
+   </property>
+
+   <property>
+      <name>javax.jdo.option.ConnectionPassword</name>
+      <value>...</value>
+      <description>password for connecting to mysql server</description>
+   </property>
+
+   <property>
+       <name>hive.metastore.uris</name>
+       <value>thrift://localhost:9083</value>
+       <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
+   </property>
+
+   <property>
+       <name>hive.metastore.schema.verification</name>
+       <value>true</value>
+   </property>
+
+</configuration>
+{% endhighlight %}
+
+
+Test connection to the HMS with Hive Cli. Running some commands, we can see we have a database named `default` and there's no table in it.
+
+
+{% highlight bash %}
+
+hive> show databases;
+OK
+default
+Time taken: 0.032 seconds, Fetched: 1 row(s)
+
+hive> show tables;
+OK
+Time taken: 0.028 seconds, Fetched: 0 row(s)
+{% endhighlight %}
+
+
+#### step 2: configure Flink cluster and SQL CLI
+
+Add all Hive dependencies to `/lib` dir in Flink distribution, and modify SQL CLI's yaml config file `sql-cli-defaults.yaml` as following:
+
+{% highlight yaml %}
+
+execution:
+    planner: blink
+    type: streaming
+    ...
+    current-catalog: myhive  # set the HiveCatalog as the current catalog of the session
+    current-database: mydatabase
+    
+catalogs:
+   - name: myhive
+     type: hive
+     hive-conf-dir: /opt/hive-conf  # contains hive-site.xml
+     hive-version: 2.3.4
+{% endhighlight %}
+
+
+#### step 3: set up a Kafka cluster
+
+Bootstrap a local Kafka 2.3.0 cluster with a topic named "test", and produce some simple data to the topic as tuple of name and age.
+
+{% highlight bash %}
+
+localhost$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
+>tom,15
+>john,21
+
+{% endhighlight %}
+
+
+These message can be seen by starting a Kafka console consumer.
+
+{% highlight bash %}
+localhost$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
+
+tom,15
+john,21
+
+{% endhighlight %}
+
+
+#### step 4: start SQL CLI, and create a Kafka table with Flink SQL DDL
+
+Start Flink SQL CLI, create a simple Kafka 2.3.0 table via DDL, and verify its schema.
+
+{% highlight bash %}
+
+Flink SQL> CREATE TABLE mykafka (name String, age Int) WITH (
+   'connector.type' = 'kafka',
+   'connector.version' = 'universal',
+   'connector.topic' = 'test',
+   'connector.properties.zookeeper.connect' = 'localhost:2181',
+   'connector.properties.bootstrap.servers' = 'localhost:9092',
+   'format.type' = 'csv',
+   'update-mode' = 'append'
+);
+[INFO] Table has been created.
+
+Flink SQL> DESCRIBE test;
 
 Review comment:
   good catch!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services