You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by li...@apache.org on 2022/03/18 14:13:31 UTC
svn commit: r1899035 [1/3] - in /kylin/site: ./ blog/ blog/2022/03/ blog/2022/03/17/ blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/ cn/blog/ cn_blog/2022/03/ cn_blog/2022/03/17/ cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/ images...
Author: lidong
Date: Fri Mar 18 14:13:30 2022
New Revision: 1899035
URL: http://svn.apache.org/viewvc?rev=1899035&view=rev
Log:
# add blog: kylin4 now is supporting aws glue
Added:
kylin/site/blog/2022/03/
kylin/site/blog/2022/03/17/
kylin/site/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/
kylin/site/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html
kylin/site/cn_blog/2022/03/
kylin/site/cn_blog/2022/03/17/
kylin/site/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/
kylin/site/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html
kylin/site/images/blog/kylin4_support_aws_glue/
kylin/site/images/blog/kylin4_support_aws_glue/10_kylin_start_up_script_en.png (with props)
kylin/site/images/blog/kylin4_support_aws_glue/11_kylin_start_up_script_en.png (with props)
kylin/site/images/blog/kylin4_support_aws_glue/12_start_kylin_en.png (with props)
kylin/site/images/blog/kylin4_support_aws_glue/13_start_kylin_en.png (with props)
kylin/site/images/blog/kylin4_support_aws_glue/14_load_glue_meta_en.png (with props)
kylin/site/images/blog/kylin4_support_aws_glue/15_load_glue_meta_en.png (with props)
kylin/site/images/blog/kylin4_support_aws_glue/16_load_glue_meta_en.png (with props)
kylin/site/images/blog/kylin4_support_aws_glue/17_verify_query_en.png (with props)
kylin/site/images/blog/kylin4_support_aws_glue/1_prepare_aws_glue_table_en.png (with props)
kylin/site/images/blog/kylin4_support_aws_glue/2_prepare_aws_glue_table_en.png (with props)
kylin/site/images/blog/kylin4_support_aws_glue/3_prepare_hadoop_cluster_en.png (with props)
kylin/site/images/blog/kylin4_support_aws_glue/4_prepare_hadoop_cluster_en.png (with props)
kylin/site/images/blog/kylin4_support_aws_glue/5_test_sparksql_glue_en.png (with props)
kylin/site/images/blog/kylin4_support_aws_glue/6_test_sparksql_glue_en.png (with props)
kylin/site/images/blog/kylin4_support_aws_glue/7_test_sparksql_glue_en.png (with props)
kylin/site/images/blog/kylin4_support_aws_glue/8_kylin_start_up_script_en.png (with props)
kylin/site/images/blog/kylin4_support_aws_glue/9_kylin_start_up_script_en.png (with props)
Modified:
kylin/site/blog/index.html
kylin/site/cn/blog/index.html
kylin/site/feed.xml
Added: kylin/site/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html
URL: http://svn.apache.org/viewvc/kylin/site/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html?rev=1899035&view=auto
==============================================================================
--- kylin/site/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html (added)
+++ kylin/site/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html Fri Mar 18 14:13:30 2022
@@ -0,0 +1,638 @@
+<!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+<!doctype html>
+<html>
+ <!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<head>
+ <meta charset="utf-8">
+ <meta http-equiv="X-UA-Compatible" content="IE=edge">
+ <meta name="viewport" content="width=device-width, initial-scale=1">
+
+ <title>Apache Kylin | Kylin 4 now is supporting AWS Glue Catalog</title>
+ <meta name="description" content="Why does installing Kylin on EMR need to support AWS Glue?">
+ <meta name="author" content="Apache Kylin">
+ <link rel="shortcut icon" href="fav.png" type="image/png">
+
+
+
+<link rel="stylesheet" href="/assets/css/animate.css">
+<!-- Bootstrap -->
+<link rel="stylesheet" href="/assets/css/bootstrap.min.css">
+
+<!-- Fonts -->
+<!-- <link rel="stylesheet" href="http://fonts.googleapis.com/css?family=Alice|Open+Sans:400,300,700"> -->
+
+<!-- Icons -->
+<link rel="stylesheet" href="/assets/css/font-awesome.min.css">
+
+ <!-- Custom styles -->
+ <link rel="stylesheet" href="/assets/css/styles.css">
+ <link rel="stylesheet" href="/assets/css/docs.css">
+ <link rel="stylesheet" href="/assets/css/pygments.css">
+
+ <link rel="canonical" href="http://kylin.apache.org/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/">
+ <link rel="alternate" type="application/rss+xml" title="Apache Kylin" href="http://kylin.apache.org/feed.xml" />
+
+<!--[if lt IE 9]> <script src="assets/js/html5shiv.js"></script> <![endif]-->
+<!-- Global site tag (gtag.js) - Google Analytics -->
+<script async src="https://www.googletagmanager.com/gtag/js?id=UA-120788561-1"></script>
+<script>
+ window.dataLayer = window.dataLayer || [];
+ function gtag(){dataLayer.push(arguments);}
+ gtag('js', new Date());
+
+ gtag('config', 'UA-120788561-1');
+</script>
+<script type="text/javascript" src="/assets/js/jquery-1.9.1.min.js"></script>
+<script type="text/javascript" src="/assets/js/nside.js"></script> </script>
+<script type="text/javascript" src="/assets/js/nnav.js"></script> </script>
+<script>
+var _hmt = _hmt || [];
+(function() {
+ var hm = document.createElement("script");
+ hm.src = "https://hm.baidu.com/hm.js?bdc5e03add430c0b72cc0eb91eabfa99";
+ var s = document.getElementsByTagName("script")[0];
+ s.parentNode.insertBefore(hm, s);
+})();
+</script>
+
+</head>
+
+ <body>
+ <!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<header id="header" >
+
+ <!-- Main Menu -->
+ <nav class="navbar navbar-default" role="navigation" id="nav-wrapper">
+ <div class="container-fluid" id="nav">
+ <!--
+ <img class="img-circle" width="40px" height="40px" id="circlelogo" src="/assets/images/kylin_logo.jpg">
+ -->
+ <!-- Brand and toggle get grouped for better mobile display -->
+ <div class="navbar-header">
+ <img class="navbar-logo" width="46" src="/assets/images/kylin_logo.png" ></img>
+ <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
+ <span class="sr-only">Toggle navigation</span>
+ <span class="icon-bar"></span>
+ <span class="icon-bar"></span>
+ <span class="icon-bar"></span>
+ </button>
+ <ul class="nav icon-navbar">
+ <li><a href="https://twitter.com/apachekylin" target="_blank" class="fa fa-twitter fa-lg" title="Twitter: @ApacheKylin" ></a></li>
+ <li><a href="https://github.com/apache/kylin" target="_blank" class="fa fa-github-alt fa-lg" title="Github: apache/kylin" ></a></li>
+ <li><a href="https://www.facebook.com/kylinio" target="_blank" class="fa fa-facebook fa-lg" title="Facebook: kylin.io" ></a></li>
+ </ul>
+ </div>
+
+ <!-- Collect the nav links, forms, and other content for toggling -->
+ <div class="navbar-collapse collapse" id="bs-example-navbar-collapse-1">
+
+ <ul class="nav navbar-nav">
+
+ <li><a href="/">Home</a></li>
+ <li>
+ <a href="/docs" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Docs<span class="caret"></span></a>
+ <ul class="dropdown-menu">
+ <li><a href="/docs/">Latest Release(Kylin 4.0.1)</a></li>
+ <li><a href="/docs31/">Kylin 3.1.3</a></li>
+ <li><a href="/docs24/">Kylin 2.4.0</a></li>
+ <li><a href="/archive/">Archive</a></li>
+ </ul>
+ </li>
+ <li><a href="/download">Download</a></li>
+ <li><a href="/community" >Community</a></li>
+ <li>
+ <a href="/development" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Development<span class="caret"></span></a>
+ <ul class="dropdown-menu">
+ <li><a href="/development40/">Kylin 4.x</a></li>
+ <li><a href="/development/">Kylin 3.x And Older Versions</a></li>
+ </ul>
+ </li>
+ <li><a href="/blog">Blog</a></li>
+ <li><a href="/cn" >ä¸æç</a></li>
+ </ul>
+ </div><!-- /.navbar-collapse -->
+ </div><!-- /.container-fluid -->
+ </nav>
+
+ <div id="head" class="parallax normal-header" >
+ <div class="text-center header-apache">
+ <a href="http://apache.org/foundation/contributing.html" title="Support Apache" style="margin-left: 150px;">
+ <div>
+ <img src="https://www.apache.org/images/SupportApache-small.png" >
+ </div>
+ </a>
+ </div>
+ </div>
+
+ </header>
+
+ <div class="page-content main">
+ <header style=" padding:2em 0 0 ">
+ <div class="container" >
+ <div style=" padding:0 4em">
+ <div class="blog-icon">
+ <img width="30" src="/assets/images/icon_blog_w.png">
+ </div>
+ <h4 class="index-title" style=" float:left;"><span>Apache Kylin⢠Technical Blog</span></h4>
+ </div>
+ </div>
+ </div>
+
+ <div class="container blog">
+ <div>
+ <article class="post-content" >
+ <!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<div class="post" style=" padding:2em 4em 4em 4em">
+
+ <header class="post-header">
+ <h1 class="post-title">Kylin 4 now is supporting AWS Glue Catalog</h1>
+ <p class="post-meta" >Mar 17, 2022 ⢠Xiaoxiang Yu</p>
+ </header>
+
+ <article class="post-content" >
+ <h2 id="why-does-installing-kylin-on-emr-need-to-support-aws-glue">Why does installing Kylin on EMR need to support AWS Glue?</h2>
+
+<h3 id="what-is-aws-glue">What is AWS Glue?</h3>
+
+<p>AWS Glue is a fully hosted ETL (Extract, Transform, and Load) service that enables AWS users to easily and cost-effectively classify, cleanse, enrich data and move data between various data storages. AWS Glue consists of a central metastore called AWS Glue Data Catalog, an ETL engine that can automatically generate code and a flexible scheduler that can handle dependency resolution, monitor jobs and retry. AWS Glue is a serverless service, so there is no infrastructure to set up or manage.</p>
+
+<h3 id="why-does-kylin-need-aws-glue-catalog">Why does Kylin need AWS Glue Catalog?</h3>
+
+<p>At present, many users in the Kylin community use AWS EMR for running large-scale distributed data processing jobs on Hadoop, Spark, Hive, Presto, etc. Without AWS Glue Data Catalog, tables built on these data warehouse components (like Hive, Spark and Presto) can not be used by any other components. As the data warehouse needs to answer requirements from various business departments, they use AWS Glue Data Catalog for metadata storage when creating the AWS EMR clusters, to share the data sources among different components and business departments. That is, to build one data cube with data from each business department, so they can provide quick responses to different business requirements.<br />
+In modern companies, data is saved on cloud object storage and big data teams use AWS EMR for data processing, data analysis and model training. But with data explosion, it becomes really difficult to extract data and the response time is too long. In other words, the solution of EMR + Spark/Hive cannot meet the speedy data query requirements from data analysts, O&M personnel and sales. So some users turn to Apache Kylin as their open-source OLAP solution.<br />
+Recently, our users approached us with the request that Kylin 4 could directly read table metadata from AWS Glue. After some collaboration, now Kylin 4 supports AWS Glue Catalog, making it possible for tables and data to be shared among Hive, Presto, Spark and Kylin. This helps to break down the metadata barrier, so different topics can be combined to form a big data analysis platform.</p>
+
+<h3 id="does-kylin-support-aws-glue">Does Kylin support AWS Glue?</h3>
+
+<table>
+ <thead>
+ <tr>
+ <th>Â </th>
+ <th>Kylin version which supports Glue</th>
+ <th>Issue Link</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>Kylin on HBase (Before Kylin 4)</td>
+ <td>2.6.6 or higher<br />3.1.0 or higher</td>
+ <td>https://issues.apache.org/jira/browse/KYLIN-4206<br />https://zhuanlan.zhihu.com/p/99481373</td>
+ </tr>
+ <tr>
+ <td>Kylin on Parquet</td>
+ <td>4.0.1 or higher</td>
+ <td>This article.</td>
+ </tr>
+ </tbody>
+</table>
+
+<h2 id="prerequisites-for-deployment">Prerequisites for deployment</h2>
+
+<h3 id="software-version">Software Version</h3>
+
+<table>
+ <thead>
+ <tr>
+ <th><strong>Software</strong></th>
+ <th><strong>Version</strong></th>
+ <th>Reference</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>Apache Kylin</td>
+ <td>4.0.1 or higher</td>
+ <td><a href="https://cwiki.apache.org/confluence/display/KYLIN/KIP+10+refactor+hive+and+hadoop+dependency">KIP 10 refactor hive and hadoop dependency</a>.</td>
+ </tr>
+ <tr>
+ <td>AWS EMR</td>
+ <td>6.5.0 or higher<br />5.33.1 or higher</td>
+ <td><a href="https://docs.amazonaws.cn/en_us/emr/latest/ReleaseGuide/emr-650-release.html">Amazon EMR release 6.5.0 - Amazon EMR</a>.</td>
+ </tr>
+ </tbody>
+</table>
+
+<h3 id="prepare-aws-glue-database-and-tables">Prepare AWS Glue database and tables</h3>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/1_prepare_aws_glue_table_en.png" alt="" /></p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/2_prepare_aws_glue_table_en.png" alt="" /></p>
+
+<ul>
+ <li>Create an EMR cluster.</li>
+</ul>
+
+<p>Note: Parameter hive.metastore.client.factory.class is configured to enable AWS Glue. For details, you may refer to the commands below.</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>aws emr create-cluster --applications <span class="nv">Name</span><span class="o">=</span>Hadoop <span class="nv">Name</span><span class="o">=</span>Hive <span class="nv">Name</span><span class="o">=</span>Spark <span class="nv">Name</span><span class="o">=</span>ZooKeeper <span class="nv">Name</span><span class="o">=</span>Tez <span class="nv">Name</span><span class="o">=</span>Ganglia <span class="se">\</span>
+ --ec2-attributes <span class="k">${}</span> <span class="se">\</span>
+ --release-label emr-6.5.0 <span class="se">\</span>
+ --log-uri <span class="k">${}</span> <span class="se">\</span>
+ --instance-groups <span class="k">${}</span> <span class="se">\</span>
+ --configurations <span class="s1">'[{"Classification":"hive-site","Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]'</span> <span class="se">\</span>
+ --auto-scaling-role EMR_AutoScaling_DefaultRole <span class="se">\</span>
+ --ebs-root-volume-size 100 <span class="se">\</span>
+ --service-role EMR_DefaultRole <span class="se">\</span>
+ --enable-debugging <span class="se">\</span>
+ --name <span class="s1">'Kylin4_on_EMR65_with_Glue'</span> <span class="se">\</span>
+ --region cn-northwest-1
+</code></pre>
+</div>
+
+<ul>
+ <li>Log in to the Master node. Check the Hadoop version and whether the Hadoop cluster is successfully started.</li>
+</ul>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/3_prepare_hadoop_cluster_en.png" alt="" /></p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/4_prepare_hadoop_cluster_en.png" alt="" /></p>
+
+<h3 id="optionalget-environmental-information">(Optional)Get environmental information</h3>
+
+<blockquote>
+ <p>If you are using RDS or other metadata storage, you may skip this step.</p>
+</blockquote>
+
+<p>RDBMS is recommended for metastore in Kylin 4. So for testing purposes, in this article, we use MariaDB which comes with the Master node for metastore; for hostname, account and password of MariaDB, see <code class="highlighter-rouge">/etc/hive/conf/hive-site.xml</code>.</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>kylin.metadata.url<span class="o">=</span>kylin4_on_cloud@jdbc,url<span class="o">=</span>jdbc:mysql://<span class="k">${</span><span class="nv">HOSTNAME</span><span class="k">}</span>:3306/hue,username<span class="o">=</span>hive,password<span class="o">=</span><span class="k">${</span><span class="nv">PASSWORD</span><span class="k">}</span>,maxActive<span class="o">=</span>10,maxIdle<span class="o">=</span>10,driverClassName<span class="o">=</span>org.mariadb.jdbc.Driver
+kylin.env.zookeeper-connect-string<span class="o">=</span><span class="k">${</span><span class="nv">HOSTNAME</span><span class="k">}</span>
+</code></pre>
+</div>
+
+<p>Configure the variables as per the actual information, for example, replace ${PASSWORD} with the real password, save it locally and it will be used to start Kylin.</p>
+
+<h3 id="test-the-connectivity-between-spark-sql-and-aws-glue">Test the connectivity between Spark SQL and AWS Glue</h3>
+
+<p>Test whether AWS Spark SQL can access databases and table metadata through AWS Glue with Spark-SQL. For the first test, you will find that the startup fails with an error.</p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/5_test_sparksql_glue_en.png" alt="" /></p>
+
+<p>Replace <code class="highlighter-rouge">hive-site.xml</code> used by Spark with the following commands.</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> /etc/spark/conf
+sudo mv hive-site.xml hive-site.xml.bak
+sudo cp /etc/hive/conf/hive-site.xml .
+</code></pre>
+</div>
+
+<p>Then change the value of <code class="highlighter-rouge">hive.execution.engine</code> in file <code class="highlighter-rouge">/etc/spark/conf/hive-site.xml</code> to <code class="highlighter-rouge">mr</code>, restart Spark-SQL CLI and verify whether the query for AWS Glueâs table data is successful.</p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/6_test_sparksql_glue_en.png" alt="" /></p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/7_test_sparksql_glue_en.png" alt="" /></p>
+
+<h3 id="optional-prepare-kylin-spark-enginejar">(Optional) Prepare kylin-spark-engine.jar</h3>
+
+<blockquote>
+ <p>This issue will be fixed in Apache Kylin 4.0.2. So you can skip this step after updating to Apache Kylin 4.0.2. For users with Kylin 4.0.1, please refer to the following steps to replace kylin-spark-engine.jar:</p>
+</blockquote>
+
+<p>Clone Kylin git repository, execute <code class="highlighter-rouge">mvn clean package -DskipTests</code> to build a new <code class="highlighter-rouge">kylin-spark-project/kylin-spark-engine/target/kylin-spark-engine-4.0.0-SNAPSHOT.jar</code> .</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>git clone https://github.com/hit-lacus/kylin.git
+<span class="nb">cd </span>kylin
+git checkout KYLIN-5160
+mvn clean package -DskipTests
+
+<span class="c"># find -name kylin-spark-engine-4.0.0-SNAPSHOT.jar kylin-spark-project/kylin-spark-engine/target</span>
+</code></pre>
+</div>
+
+<p>Patch link: <a href="https://github.com/apache/kylin/pull/1819">https://github.com/apache/kylin/pull/1819</a></p>
+
+<h2 id="deploy-kylin-and-connect-to-aws-glue">Deploy Kylin and connect to AWS Glue</h2>
+
+<h3 id="download-kylin">Download Kylin</h3>
+
+<ol>
+ <li>
+ <p>Download and decompress Kylin. Please download the corresponding Kylin package according to your EMR version. That is, with EMR 5.X you can download Spark 2 package; with EMR 6.X you can download Spark 3 package.<br />
+ <code class="highlighter-rouge">shell
+ # aws s3 cp s3://${BUCKET}/apache-kylin-4.0.1-bin-spark3.tar.gz .
+ # wget apache-kylin-4.0.1-bin-spark3.tar.gz
+ tar zxvf apache-kylin-4.0.1-bin-spark3.tar.gz .
+ cd apache-kylin-4.0.1-bin-spark3
+ export KYLIN_HOME=/home/hadoop/apache-kylin-4.0.1-bin-spark3
+</code></p>
+ </li>
+ <li>
+ <p>(Optional) Get MariaDB driver jar<br />
+ > If you are using other databases for metastore, please skip this step.</p>
+
+ <p><code class="highlighter-rouge">shell
+ cd $KYLIN_HOME
+ mkdir ext
+ cp /usr/lib/hive/lib/mariadb-connector-java.jar $KYLIN_HOME/ext
+</code></p>
+ </li>
+</ol>
+
+<h3 id="prepare-spark">Prepare Spark</h3>
+
+<p>AWS Spark has built-in support of AWS Glue, so you will use AWS Spark when loading table metadata and building jobs. Kylin 4.0.1 supports Apache Spark officially. Because the compatibility between Apache Spark and AWS Spark is not very good, we will use Apache Spark for cube queries. To sum up, you need to switch between AWS Spark and Apache Spark according to your task (query task or build task).</p>
+
+<ul>
+ <li>Prepare AWS Spark</li>
+</ul>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>
+mkdir ext
+cp /usr/lib/hive/lib/mariadb-connector-java.jar <span class="nv">$KYLIN_HOME</span>/ext
+</code></pre>
+</div>
+
+<ul>
+ <li>Download Apache Spark
+ <ul>
+ <li>Please download the corresponding Spark installation package according to your EMR version. That is, with EMR 5.X you can download Spark 2.4.7 and with EMR 6.X you can download Spark 3.1.2.<br />
+<code class="highlighter-rouge">shell
+cd $KYLIN_HOME
+aws s3 cp s3://${BUCKET}/spark-2.4.7-bin-hadoop2.7.tgz $KYLIN_HOME # Or downloads spark-2.4.7-bin-hadoop2.7.tgz from offical website
+tar zxvf spark-2.4.7-bin-hadoop2.7.tgz
+mv spark-2.4.7-bin-hadoop2.7 spark-apache
+</code></li>
+ </ul>
+ </li>
+ <li>First, you need to load AWS Glue table, so direct <code class="highlighter-rouge">$KYLIN_HOME/spark</code> to AWS Spark with soft link. Note: you do not need to set up <code class="highlighter-rouge">SPARK_HOME</code>, because if <code class="highlighter-rouge">$KYLIN_HOME/spark</code> exists and <code class="highlighter-rouge">SPARK_HOME</code> is not set up, Kylin will use <code class="highlighter-rouge">$KYLIN_HOME/spark</code> as <code class="highlighter-rouge">SPARK_HOME</code> by default.</li>
+</ul>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>ln -s spark-aws spark
+</code></pre>
+</div>
+
+<h3 id="modify-kylin-startup-script">Modify Kylin startup script</h3>
+
+<ol>
+ <li>Start Spark SQL CLI and keep it in running status.</li>
+ <li>
+ <p>Acquire PID of <code class="highlighter-rouge">SparkSQLCLIDriver</code> with <code class="highlighter-rouge">jps -ml ${PID}</code>. Then acquire <code class="highlighter-rouge">spark.driver.extraClasspath</code> of <strong>Driver</strong>. Or, you can acquire these from /etc/spark/conf/spark-defaults.conf.<br />
+ <code class="highlighter-rouge">shell
+ jps -ml | grep SparkSubmit
+ jinfo ${PID} | grep "spark.driver.extraClassPath"
+</code><br />
+ <img src="/images/blog/kylin4_support_aws_glue/8_kylin_start_up_script_en.png" alt="" /></p>
+ </li>
+ <li>Edit <code class="highlighter-rouge">bin/kylin.sh</code>, modify <code class="highlighter-rouge">KYLIN_TOMCAT_CLASSPATH</code> and add <code class="highlighter-rouge">kylin_driver_classpath</code>; save bin/kylin.sh, then exit Spark SQL CLI.</li>
+</ol>
+
+<ul>
+ <li>kylin.sh before modifying</li>
+</ul>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/9_kylin_start_up_script_en.png" alt="" /></p>
+
+<ul>
+ <li>For EMR 6.5.0, in the modified <code class="highlighter-rouge">kylin.sh</code>, <code class="highlighter-rouge">kylin_driver_classpath</code> is at the end of the code.</li>
+</ul>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/10_kylin_start_up_script_en.png" alt="" /></p>
+
+<ul>
+ <li>For EMR 5.33.1, in the modified <code class="highlighter-rouge">kylin.sh</code>, <code class="highlighter-rouge">kylin_driver_classpath</code> is placed before <code class="highlighter-rouge">$SPARK_HOME/jars</code>.</li>
+</ul>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/11_kylin_start_up_script_en.png" alt="" /></p>
+
+<h3 id="configure-kylin">Configure Kylin</h3>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>
+vim conf/kylin.properties
+</code></pre>
+</div>
+
+<h4 id="minimal-kylin-configuration">Minimal Kylin Configuration</h4>
+
+<table>
+ <thead>
+ <tr>
+ <th>Property Key</th>
+ <th>Property Value(Example)</th>
+ <th>Notes</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>kylin.metadata.url</td>
+ <td>kylin4_on_cloud@jdbc,url=jdbc:mysql://${HOSTNAME}:3306/hue,username=hive,password=${PASSWORD},maxActive=10,maxIdle=10,driverClassName=org.mariadb.jdbc.Driver</td>
+ <td>N/A</td>
+ </tr>
+ <tr>
+ <td>kylin.env.zookeeper-connect-string</td>
+ <td>${HOSTNAME}</td>
+ <td>N/A</td>
+ </tr>
+ <tr>
+ <td>kylin.engine.spark-conf.spark.driver.extraClassPath</td>
+ <td>/usr/lib/hadoop-lzo/lib/<em>:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/</em>:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar</td>
+ <td>Copied from spark.driver.extraClasspath in /etc/spark/conf/spark-default.conf</td>
+ </tr>
+ </tbody>
+</table>
+
+<h3 id="start-kylin-and-verify-the-building-job">Start Kylin and verify the building job</h3>
+
+<h4 id="start-kylin">Start Kylin</h4>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>
+ln -s spark spark_aws <span class="c"># skip this step if soft link 'spark' exists </span>
+bin/kylin.sh restart
+</code></pre>
+</div>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/12_start_kylin_en.png" alt="" /></p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/13_start_kylin_en.png" alt="" /></p>
+
+<h4 id="optional-replace-kylin-spark-enginejar">(Optional) Replace kylin-spark-engine.jar</h4>
+
+<blockquote>
+ <p>This step is only required for Kylin 4.0.1 users.</p>
+</blockquote>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>/tomcat/webapps/kylin/WEB-INF/lib/
+mv kylin-spark-engine-4.0.1.jar kylin-spark-engine-4.0.1.jar.bak <span class="c"># remove old one </span>
+cp kylin-spark-engine-4.0.0-SNAPSHOT.jar .
+
+bin/kylin.sh restart <span class="c"># restart kylin to make new jar be loaded</span>
+</code></pre>
+</div>
+
+<h4 id="load-aws-glue-table-and-build">Load AWS Glue table and build</h4>
+
+<ul>
+ <li>Load AWS Glue table metadata</li>
+</ul>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/14_load_glue_meta_en.png" alt="" /></p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/15_load_glue_meta_en.png" alt="" /></p>
+
+<ul>
+ <li>Create Model and Cube, then trigger a building job.</li>
+</ul>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/16_load_glue_meta_en.png" alt="" /></p>
+
+<h3 id="verify-the-query">Verify the query</h3>
+
+<p>Switch the Spark used by Kylin and restart Kylin.</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>
+rm spark <span class="c"># 'spark' is a soft link, it is point to aws spark</span>
+ln -s spark_apache spark <span class="c"># switch from aws spark to apache spark</span>
+bin/kylin.sh restart
+</code></pre>
+</div>
+
+<p>Perform a test query and this query is successful.</p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/17_verify_query_en.png" alt="" /></p>
+
+<h2 id="discussion-and-qa">Discussion and Q&A</h2>
+
+<h3 id="why-we-must-use-both-aws-spark-and-apache-spark">Why we must use both AWS Spark and Apache Sparkï¼</h3>
+
+<p>AWS Spark has built-in support for AWS Glue so you will use AWS Spark when loading table metadata and building jobs; Kylin 4.0.1 supports Apache Spark. Because the compatibility between Apache Spark and AWS Spark is not very good, we will use Apache Spark for cube query. To sum up, you need to switch between AWS Spark and Apache Spark according to your task (query task or build task).</p>
+
+<h3 id="why-do-users-need-to-modify-kylinsh">Why do users need to modify kylin.sh?</h3>
+
+<p>As Spark Driver, Kylin needs to load table metadata through <code class="highlighter-rouge">aws-glue-datacatalog-spark-client.jar</code>, so you need to modify kylin.sh and load the relevant jar into classpath of Kylin process.</p>
+
+<h3 id="if-i-faced-more-questions-where-should-i-asked">If I faced more questions, where should I asked?</h3>
+
+<p>If you have any questions about using Kylin on AWS, please contact us via mailling list(<a href="mailto:user@kylin.apache.org">user@kylin.apache.org</a>), please check for detail <a href="https://kylin.apache.org/community/">https://kylin.apache.org/community/</a> .</p>
+
+ </article>
+
+</div>
+
+
+
+
+
+ </article>
+ </div>
+ </div>
+ <!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<footer id="underfooter">
+ <div>
+ <div class="row">
+ <div class="col-md-12 widget">
+ <div class="widget-body">
+ <div class="footer-img">
+ <a href="http://www.apache.org">
+ <img id="asf-logo" height="78px" alt="Apache Software Foundation" src="/assets/images/apache_footer.png">
+ </a>
+ </div>
+ <p style="padding-top: 11px;">
+ The contents of this website are © 2015 Apache Software Foundation under the terms of the
+ <a href="http://www.apache.org/licenses/LICENSE-2.0"> Apache License v2 </a>.
+ </p>
+ <p style="margin-bottom: 11px;">
+ Apache Kylin and its logo are trademarks of the Apache Software Foundation.
+ </div>
+
+ </div>
+ </div>
+ </div>
+ <!-- /row of widgets -->
+
+ </div>
+ <div></div>
+
+</footer>
+
+ <script src="/assets/js/jquery-1.9.1.min.js"></script>
+ <script src="/assets/js/bootstrap.min.js"></script>
+ <script src="/assets/js/main.js"></script>
+ </body>
+</html>
+
+
+
+
Modified: kylin/site/blog/index.html
URL: http://svn.apache.org/viewvc/kylin/site/blog/index.html?rev=1899035&r1=1899034&r2=1899035&view=diff
==============================================================================
--- kylin/site/blog/index.html (original)
+++ kylin/site/blog/index.html Fri Mar 18 14:13:30 2022
@@ -197,6 +197,16 @@ var _hmt = _hmt || [];
<div class="col-md-6 col-lg-6 col-xs-12">
+ <a class="blog-card" href="/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/">
+ <div class="blog-pic">
+ <img width="20" src="../assets/images/icon_blog_w.png" />
+ </div>
+ <p class="blog-title">Kylin 4 now is supporting AWS Glue Catalog</p>
+ <p align="left" class="post-meta">posted: Mar 17, 2022</p>
+ </a>
+ </div>
+
+ <div class="col-md-6 col-lg-6 col-xs-12">
<a class="blog-card" href="/blog/2022/01/12/The-Future-Of-Kylin/">
<div class="blog-pic">
<img width="20" src="../assets/images/icon_blog_w.png" />
Modified: kylin/site/cn/blog/index.html
URL: http://svn.apache.org/viewvc/kylin/site/cn/blog/index.html?rev=1899035&r1=1899034&r2=1899035&view=diff
==============================================================================
--- kylin/site/cn/blog/index.html (original)
+++ kylin/site/cn/blog/index.html Fri Mar 18 14:13:30 2022
@@ -199,6 +199,16 @@ var _hmt = _hmt || [];
<div class="col-md-6 col-lg-6 col-xs-12">
+ <a class="blog-card" href="/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/">
+ <div class="blog-pic">
+ <img width="20" src="/assets/images/icon_blog_w.png" />
+ </div>
+ <p class="blog-title">å®æï¼Kylin 4 ç°å·²æ¯æ AWS Glue Catalog</p>
+ <p align="left" class="post-meta">posted: Mar 17, 2022</p>
+ </a>
+ </div>
+
+ <div class="col-md-6 col-lg-6 col-xs-12">
<a class="blog-card" href="/cn_blog/2022/01/12/The-Future-Of-Kylin/">
<div class="blog-pic">
<img width="20" src="/assets/images/icon_blog_w.png" />
Added: kylin/site/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html
URL: http://svn.apache.org/viewvc/kylin/site/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html?rev=1899035&view=auto
==============================================================================
--- kylin/site/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html (added)
+++ kylin/site/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/index.html Fri Mar 18 14:13:30 2022
@@ -0,0 +1,638 @@
+<!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+<!doctype html>
+<html>
+ <!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<head>
+ <meta charset="utf-8">
+ <meta http-equiv="X-UA-Compatible" content="IE=edge">
+ <meta name="viewport" content="width=device-width, initial-scale=1">
+
+ <title>Apache Kylin | å®æï¼Kylin 4 ç°å·²æ¯æ AWS Glue Catalog</title>
+ <meta name="description" content="为ä»ä¹å¨ EMR é¨ç½² Kylin éè¦æ¯æ Glue ï¼">
+ <meta name="author" content="Apache Kylin">
+ <link rel="shortcut icon" href="fav.png" type="image/png">
+
+
+
+<link rel="stylesheet" href="/assets/css/animate.css">
+<!-- Bootstrap -->
+<link rel="stylesheet" href="/assets/css/bootstrap.min.css">
+
+<!-- Fonts -->
+<!-- <link rel="stylesheet" href="http://fonts.googleapis.com/css?family=Alice|Open+Sans:400,300,700"> -->
+
+<!-- Icons -->
+<link rel="stylesheet" href="/assets/css/font-awesome.min.css">
+
+ <!-- Custom styles -->
+ <link rel="stylesheet" href="/assets/css/styles.css">
+ <link rel="stylesheet" href="/assets/css/docs.css">
+ <link rel="stylesheet" href="/assets/css/pygments.css">
+
+ <link rel="canonical" href="http://kylin.apache.org/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/">
+ <link rel="alternate" type="application/rss+xml" title="Apache Kylin" href="http://kylin.apache.org/feed.xml" />
+
+<!--[if lt IE 9]> <script src="assets/js/html5shiv.js"></script> <![endif]-->
+<!-- Global site tag (gtag.js) - Google Analytics -->
+<script async src="https://www.googletagmanager.com/gtag/js?id=UA-120788561-1"></script>
+<script>
+ window.dataLayer = window.dataLayer || [];
+ function gtag(){dataLayer.push(arguments);}
+ gtag('js', new Date());
+
+ gtag('config', 'UA-120788561-1');
+</script>
+<script type="text/javascript" src="/assets/js/jquery-1.9.1.min.js"></script>
+<script type="text/javascript" src="/assets/js/nside.js"></script> </script>
+<script type="text/javascript" src="/assets/js/nnav.js"></script> </script>
+<script>
+var _hmt = _hmt || [];
+(function() {
+ var hm = document.createElement("script");
+ hm.src = "https://hm.baidu.com/hm.js?bdc5e03add430c0b72cc0eb91eabfa99";
+ var s = document.getElementsByTagName("script")[0];
+ s.parentNode.insertBefore(hm, s);
+})();
+</script>
+
+</head>
+
+ <body>
+ <!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<header id="header" >
+
+ <!-- Main Menu -->
+ <nav class="navbar navbar-default" role="navigation" id="nav-wrapper">
+ <div class="container-fluid" id="nav">
+ <!--
+ <img class="img-circle" width="40px" height="40px" id="circlelogo" src="/assets/images/kylin_logo.jpg">
+ -->
+ <!-- Brand and toggle get grouped for better mobile display -->
+ <div class="navbar-header">
+ <img class="navbar-logo" width="46" src="/assets/images/kylin_logo.png" ></img>
+ <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
+ <span class="sr-only">Toggle navigation</span>
+ <span class="icon-bar"></span>
+ <span class="icon-bar"></span>
+ <span class="icon-bar"></span>
+ </button>
+ <ul class="nav icon-navbar">
+ <li><a href="https://twitter.com/apachekylin" target="_blank" class="fa fa-twitter fa-lg" title="Twitter: @ApacheKylin" ></a></li>
+ <li><a href="https://github.com/apache/kylin" target="_blank" class="fa fa-github-alt fa-lg" title="Github: apache/kylin" ></a></li>
+ <li><a href="https://www.facebook.com/kylinio" target="_blank" class="fa fa-facebook fa-lg" title="Facebook: kylin.io" ></a></li>
+ </ul>
+ </div>
+
+ <!-- Collect the nav links, forms, and other content for toggling -->
+ <div class="navbar-collapse collapse" id="bs-example-navbar-collapse-1">
+
+ <ul class="nav navbar-nav">
+
+ <li><a href="/">Home</a></li>
+ <li>
+ <a href="/docs" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Docs<span class="caret"></span></a>
+ <ul class="dropdown-menu">
+ <li><a href="/docs/">Latest Release(Kylin 4.0.1)</a></li>
+ <li><a href="/docs31/">Kylin 3.1.3</a></li>
+ <li><a href="/docs24/">Kylin 2.4.0</a></li>
+ <li><a href="/archive/">Archive</a></li>
+ </ul>
+ </li>
+ <li><a href="/download">Download</a></li>
+ <li><a href="/community" >Community</a></li>
+ <li>
+ <a href="/development" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Development<span class="caret"></span></a>
+ <ul class="dropdown-menu">
+ <li><a href="/development40/">Kylin 4.x</a></li>
+ <li><a href="/development/">Kylin 3.x And Older Versions</a></li>
+ </ul>
+ </li>
+ <li><a href="/blog">Blog</a></li>
+ <li><a href="/cn" >ä¸æç</a></li>
+ </ul>
+ </div><!-- /.navbar-collapse -->
+ </div><!-- /.container-fluid -->
+ </nav>
+
+ <div id="head" class="parallax normal-header" >
+ <div class="text-center header-apache">
+ <a href="http://apache.org/foundation/contributing.html" title="Support Apache" style="margin-left: 150px;">
+ <div>
+ <img src="https://www.apache.org/images/SupportApache-small.png" >
+ </div>
+ </a>
+ </div>
+ </div>
+
+ </header>
+
+ <div class="page-content main">
+ <header style=" padding:2em 0 0 ">
+ <div class="container" >
+ <div style=" padding:0 4em">
+ <div class="blog-icon">
+ <img width="30" src="/assets/images/icon_blog_w.png">
+ </div>
+ <h4 class="index-title" style=" float:left;"><span>Apache Kylin⢠Technical Blog</span></h4>
+ </div>
+ </div>
+ </div>
+
+ <div class="container blog">
+ <div>
+ <article class="post-content" >
+ <!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<div class="post" style=" padding:2em 4em 4em 4em">
+
+ <header class="post-header">
+ <h1 class="post-title">å®æï¼Kylin 4 ç°å·²æ¯æ AWS Glue Catalog</h1>
+ <p class="post-meta" >Mar 17, 2022 ⢠Xiaoxiang Yu</p>
+ </header>
+
+ <article class="post-content" >
+ <h2 id="emr--kylin--glue-">为ä»ä¹å¨ EMR é¨ç½² Kylin éè¦æ¯æ Glue ï¼</h2>
+
+<h3 id="aws-glue">ä»ä¹æ¯ AWS Glueï¼</h3>
+
+<p>AWS Glue æ¯ä¸é¡¹å®å
¨æ管ç ETLï¼æåã转æ¢åå è½½ï¼æå¡ï¼ä½¿ AWS ç¨æ·è½å¤è½»æ¾èç»æµé«æå°å¯¹æ°æ®è¿è¡åç±»ãæ¸
çåæ©å
ï¼å¹¶å¨åç§æ°æ®åå¨ä¹é´å¯é å°ç§»å¨æ°æ®ãAWS Glue ç±ä¸ä¸ªç§°ä¸º AWS Glue æ°æ®ç®å½çä¸å¤®å
æ°æ®åå¨åºãä¸ä¸ªèªå¨çæ代ç ç ETL å¼æ以åä¸ä¸ªå¤çä¾èµé¡¹è§£æãä½ä¸çæ§åéè¯ççµæ´»è®¡åç¨åºç»æãAWS Glue æ¯æ æå¡å¨æå¡ï¼å æ¤æ é设置æ管çåºç¡è®¾æ½ã</p>
+
+<h3 id="kylin--aws-glue-catalog">Kylin 为ä»ä¹éè¦æ¯æ AWS Glue Catalogï¼</h3>
+
+<p>ç®å社åºæå¾å¤ Kylin ç¨æ·å¨ä½¿ç¨ AWS EMRï¼ç»ä»¶ä¸»è¦å
æ¬ HadoopãSparkãHiveãPresto çï¼å¦æ没æé
ç½®ä½¿ç¨ AWS Glue data Catalogï¼é£ä¹å¨å个æ°æ®ä»åºç»ä»¶å¦ HiveãSparkãPresto 建çæ°æ®è¡¨ï¼å¨å
¶å®ç»ä»¶ä¸æ¯æ¾ä¸å°çï¼ä¹å°±ä¸è½ä½¿ç¨ï¼å
¬å¸åºå±çæ°æ®ä»åºæ¯æä¾ç»å个ä¸å¡é¨é¨æ¥è¿è¡ä½¿ç¨ï¼ä¸ºäºè§£å³è¿ä¸ªé®é¢ï¼å¨å建 AWS EMR é群æ¶å°±å¯ä»¥ä½¿ç¨ AWS Glue data Catalog æ¥åå¨å
æ°æ®ï¼å¯¹å个ç»ä»¶å
±äº«æ°æ®æºï¼å¯¹å个ä¸å¡é¨é¨è¿è¡å
±äº«æ°æ�
�®æºï¼å°å个ä¸å¡é¨é¨çæ°æ®æ建æä¸ä¸ªå¤§çæ°æ®ç«æ¹ä½ï¼è½å¤å¿«éååºå
¬å¸é«éåå±çä¸å¡éæ±ã<br />
+ç°ä»£å
¬å¸çæ°æ®é½æ¯åºäºäºå¹³å°æ建ï¼å¤§æ°æ®å¢é使ç¨ç AWS EMR æ¥è¿è¡æ°æ®å å·¥ãæ°æ®åæã以å模åè®ç»ï¼éçæ°æ®æ´å¢å¸¦æ¥ææ°æ
¢ãææ°é¾ï¼EMR/Spark/Hive å¾é¾æ»¡è¶³æ°æ®åæå¸ãè¿è¥äººåãéå®çå¿«éæ¥è¯¢æ°æ®çéæ±ï¼äºæ¯ä¸äºç¨æ·éæ©äº Apache Kylin ä½ä¸ºå¼æº OLAP 解å³æ¹æ¡ã<br />
+ä½æ¯æè¿ç¤¾åºç¨æ·èç³»å°æ们ï¼åç¥ Kylin 4 è¿ä¸æ¯æä» Glue 读å表å
æ°æ®ï¼æ以æ们å社åºç¨æ·åä½ä¸èµ·æ£æ¥è¿ééå°çé®é¢å¹¶æç»è§£å³äºé®é¢ï¼ä»èä½¿å¾ Kylin 4 æ¯æäº AWS Glue Catalogï¼è¿æ ·å¸¦æ¥ç好å¤å¨äº HiveãPrestoãSparkãKylin ä¸å¯ä»¥å
±äº«è¡¨åæ°æ®ï¼ä½¿å¾æ¯ä¸ªä¸»é¢é½ä¸²èèµ·æ¥å½¢æä¸ä¸ªå¤§çæ°æ®åæå¹³å°ï¼æç ´å
æ°æ®éç¢ã</p>
+
+<h3 id="apache-kylin--aws-glue-">Apache Kylin æ¯æ AWS Glue åï¼</h3>
+
+<table>
+ <thead>
+ <tr>
+ <th>Â </th>
+ <th>æ¯æ Glue ç Kylin çæ¬</th>
+ <th>Issue Link</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>Kylin on HBase (Before Kylin 4)</td>
+ <td>2.6.6 or higher<br /> 3.1.0 or higher</td>
+ <td>https://issues.apache.org/jira/browse/KYLIN-4206<br />https://zhuanlan.zhihu.com/p/99481373</td>
+ </tr>
+ <tr>
+ <td>Kylin on Parquet</td>
+ <td>4.0.1 or higher</td>
+ <td>æ¬æã</td>
+ </tr>
+ </tbody>
+</table>
+
+<h2 id="section">é¨ç½²ååå¤</h2>
+
+<h3 id="section-1">软件信æ¯ä¸è§</h3>
+
+<table>
+ <thead>
+ <tr>
+ <th><strong>Software</strong></th>
+ <th><strong>Version</strong></th>
+ <th>Reference</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>Apache Kylin</td>
+ <td>4.0.1 or higher</td>
+ <td>å¿
é¡»æ¯ 4.0.1 以åä¸ï¼è¯¦æ
åè <a href="https://cwiki.apache.org/confluence/display/KYLIN/KIP+10+refactor+hive+and+hadoop+dependency">KIP 10 refactor hive and hadoop dependency</a>.</td>
+ </tr>
+ <tr>
+ <td>AWS EMR</td>
+ <td>6.5.0 or higher<br />5.33.1 or higher</td>
+ <td>è¦çEMR 6 / EMR 5 çè¾æ°çæ¬ï¼<a href="https://docs.amazonaws.cn/en_us/emr/latest/ReleaseGuide/emr-650-release.html">Amazon EMR release 6.5.0 - Amazon EMR</a>.</td>
+ </tr>
+ </tbody>
+</table>
+
+<h3 id="glue-">åå¤ Glue æ°æ®åºå表</h3>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/1_prepare_aws_glue_table_en.png" alt="" /></p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/2_prepare_aws_glue_table_en.png" alt="" /></p>
+
+<ul>
+ <li>å建 AWS EMR é群ã</li>
+</ul>
+
+<p>è¿éå¯å¨ä¸ä¸ª EMR çé群ï¼éè¦æ³¨æçæ¯ï¼è¿ééè¿é
ç½® <code class="highlighter-rouge">hive.metastore.client.factory.class</code> å¯å¨äº Glue å¤é¨å
æ°æ®ã以ä¸å½ä»¤å¯ä»¥ä½ä¸ºåèã</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>aws emr create-cluster --applications <span class="nv">Name</span><span class="o">=</span>Hadoop <span class="nv">Name</span><span class="o">=</span>Hive <span class="nv">Name</span><span class="o">=</span>Spark <span class="nv">Name</span><span class="o">=</span>ZooKeeper <span class="nv">Name</span><span class="o">=</span>Tez <span class="nv">Name</span><span class="o">=</span>Ganglia <span class="se">\</span>
+ --ec2-attributes <span class="k">${}</span> <span class="se">\</span>
+ --release-label emr-6.5.0 <span class="se">\</span>
+ --log-uri <span class="k">${}</span> <span class="se">\</span>
+ --instance-groups <span class="k">${}</span> <span class="se">\</span>
+ --configurations <span class="s1">'[{"Classification":"hive-site","Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]'</span> <span class="se">\</span>
+ --auto-scaling-role EMR_AutoScaling_DefaultRole <span class="se">\</span>
+ --ebs-root-volume-size 100 <span class="se">\</span>
+ --service-role EMR_DefaultRole <span class="se">\</span>
+ --enable-debugging <span class="se">\</span>
+ --name <span class="s1">'Kylin4_on_EMR65_with_Glue'</span> <span class="se">\</span>
+ --region cn-northwest-1
+</code></pre>
+</div>
+
+<ul>
+ <li>ç»å½ Master èç¹ï¼å¹¶ä¸æ£æ¥ Hadoop çæ¬ å Hadoop é群æ¯å¦å¯å¨æåã</li>
+</ul>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/3_prepare_hadoop_cluster_en.png" alt="" /></p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/4_prepare_hadoop_cluster_en.png" alt="" /></p>
+
+<h3 id="optional">è·åç¯å¢ä¿¡æ¯ï¼Optionalï¼</h3>
+
+<blockquote>
+ <p>å¦æä½ ä½¿ç¨ RDS æè
å
¶ä»å
æ°æ®åå¨ï¼è¯·é
æ
è·³è¿æ¤æ¥ã</p>
+</blockquote>
+
+<p>ç±äº Kylin 4.X æ¨èä½¿ç¨ RDBMS ä½ä¸ºå
æ°æ®åå¨ï¼å¤äºæµè¯ç®çï¼è¿éä½¿ç¨ Master èç¹èªå¸¦ç MariaDB ä½ä¸ºå
æ°æ®åå¨ï¼å
³äº MariaDB ç主æºå称ãè´¦å·ãå¯ç çä¿¡æ¯ï¼å¯ä»¥ä» <code class="highlighter-rouge">/etc/hive/conf/hive-site.xml</code> è·åã</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>kylin.metadata.url<span class="o">=</span>kylin4_on_cloud@jdbc,url<span class="o">=</span>jdbc:mysql://<span class="k">${</span><span class="nv">HOSTNAME</span><span class="k">}</span>:3306/hue,username<span class="o">=</span>hive,password<span class="o">=</span><span class="k">${</span><span class="nv">PASSWORD</span><span class="k">}</span>,maxActive<span class="o">=</span>10,maxIdle<span class="o">=</span>10,driverClassName<span class="o">=</span>org.mariadb.jdbc.Driver
+kylin.env.zookeeper-connect-string<span class="o">=</span><span class="k">${</span><span class="nv">HOSTNAME</span><span class="k">}</span>
+</code></pre>
+</div>
+
+<p>è·åè¿äºä¿¡æ¯åï¼å¹¶ä¸æ¿æ¢ä»¥ä¸ Kylin é
置项éé¢çåéï¼å¦ <code class="highlighter-rouge">${PASSWORD}</code>ï¼ä¿åå°æ¬å°ï¼ä¾ä¸ä¸æ¥å¯å¨ Kylin è¿ç¨ä½¿ç¨ã</p>
+
+<h3 id="spark-sql--aws-glue-">æµè¯ Spark SQL å AWS Glue çè¿éæ§</h3>
+
+<p>éè¿ spark-sql æ¥æµè¯ AWS ç Spark SQL æ¯å¦è½å¤éè¿ Glue è·åæ°æ®åºå表çå
æ°æ®ï¼é¦æ¬¡ä¼åç°å¯å¨æ¥é失败ã</p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/5_test_sparksql_glue_en.png" alt="" /></p>
+
+<p>å
¶éè¿ä»¥ä¸å½ä»¤æ¿æ¢ Spark 使ç¨ç <code class="highlighter-rouge">hive-site.xml</code>ã</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> /etc/spark/conf
+sudo mv hive-site.xml hive-site.xml.bak
+sudo cp /etc/hive/conf/hive-site.xml .
+</code></pre>
+</div>
+
+<p>并ä¸ä¿®æ¹ <code class="highlighter-rouge">/etc/spark/conf/hive-site.xml</code> æä»¶ä¸ <code class="highlighter-rouge">hive.execution.engine</code> çå¼ä¸º<code class="highlighter-rouge">mr</code>ï¼å次å°è¯å¯å¨ Spark-SQL CLIï¼éªè¯å¯¹ Glue ç表æ°æ®æ§è¡æ¥è¯¢æåã</p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/6_test_sparksql_glue_en.png" alt="" /></p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/7_test_sparksql_glue_en.png" alt="" /></p>
+
+<h3 id="kylin-spark-enginejaroptional">åå¤ kylin-spark-engine.jarï¼Optionalï¼</h3>
+
+<blockquote>
+ <p>å¦æ Apache Kylin 4.0.2 å·²ç»åå¸ï¼é£ä¹åºè¯¥å·²ç»ä¿®æ¹è¯¥é®é¢ï¼å¯ä»¥è·³è¿æ¤æ¥ãå¦å请åè以ä¸æ¥éª¤ï¼æ¿æ¢ <code class="highlighter-rouge">kylin-spark-engine.jar</code>ï¼</p>
+</blockquote>
+
+<p>åèä¸é¢çå½ä»¤ï¼å
é kylin ä»åºï¼æ§è¡ <code class="highlighter-rouge">mvn clean package -DskipTests</code>ï¼è·å <code class="highlighter-rouge">kylin-spark-project/kylin-spark-engine/target/kylin-spark-engine-4.0.0-SNAPSHOT.jar</code> ã</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>git clone https://github.com/hit-lacus/kylin.git
+<span class="nb">cd </span>kylin
+git checkout KYLIN-5160
+mvn clean package -DskipTests
+
+<span class="c"># find -name kylin-spark-engine-4.0.0-SNAPSHOT.jar kylin-spark-project/kylin-spark-engine/target</span>
+</code></pre>
+</div>
+
+<p>Patch link: <a href="https://github.com/apache/kylin/pull/1819">https://github.com/apache/kylin/pull/1819</a></p>
+
+<h2 id="kylin--glue">é¨ç½² Kylin 并è¿æ¥ Glue</h2>
+
+<h3 id="kylin">ä¸è½½ Kylin</h3>
+
+<ol>
+ <li>
+ <p>ä¸è½½å¹¶è§£å Kylin ï¼è¯·æ ¹æ® EMR ççæ¬éæ©å¯¹åºç Kylin packageï¼å
·ä½æ¥è¯´ï¼EMR 5.X ä½¿ç¨ spark2 ç packageï¼EMR 6.X ä½¿ç¨ spark3 ç packageã<br />
+ <code class="highlighter-rouge">shell
+ # aws s3 cp s3://${BUCKET}/apache-kylin-4.0.1-bin-spark3.tar.gz .
+ # wget apache-kylin-4.0.1-bin-spark3.tar.gz
+ tar zxvf apache-kylin-4.0.1-bin-spark3.tar.gz .
+ cd apache-kylin-4.0.1-bin-spark3
+ export KYLIN_HOME=/home/hadoop/apache-kylin-4.0.1-bin-spark3
+</code></p>
+ </li>
+ <li>
+ <p>è·å RDBMS ç é©±å¨ jarï¼Optionalï¼</p>
+
+ <blockquote>
+ <p>å¦æä½ æ¯ç¨å«ç RDBMS ä½ä¸ºå
æ°æ®åå¨ï¼è¯·è·³è¿æ¤æ¥éª¤ã</p>
+ </blockquote>
+
+ <p><code class="highlighter-rouge">shell
+ cd $KYLIN_HOME
+ mkdir ext
+ cp /usr/lib/hive/lib/mariadb-connector-java.jar $KYLIN_HOME/ext
+</code></p>
+ </li>
+</ol>
+
+<h3 id="spark">åå¤ Spark</h3>
+
+<p>ç±äº AWS Spark å
置对 AWS Glue çæ¯æï¼æ以 <strong>å 载表å
æ°æ®åæ§è¡æ建éè¦ä½¿ç¨ AWS Spark</strong>ï¼ä½æ¯èèå° Kylin 4.0.1 æ¯æ¯æ Apache Sparkï¼å¹¶ä¸ AWS Spark ç¸å¯¹ Apache Spark ææ¯è¾å¤§ç代ç ä¿®æ¹ï¼ä¸¤è
å
¼å®¹æ§è¾å·®ï¼æ以<strong>æ¥è¯¢ Cube éè¦ä½¿ç¨ Apache Spark</strong>ã综ä¸æè¿°ï¼éè¦æ ¹æ® Kylin éè¦æ§è¡æ¥è¯¢ä»»å¡è¿æ¯æ建任å¡ï¼æ¥åæ¢æ使ç¨çç Sparkã</p>
+
+<ul>
+ <li>åå¤ AWS Spark</li>
+</ul>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>
+mkdir ext
+cp /usr/lib/hive/lib/mariadb-connector-java.jar <span class="nv">$KYLIN_HOME</span>/ext
+</code></pre>
+</div>
+
+<ul>
+ <li>åå¤ Apache Spark
+ <ul>
+ <li>è¯·æ ¹æ® EMR ççæ¬éæ©å¯¹åºç Spark çæ¬å®è£
å
ï¼å
·ä½æ¥è¯´ï¼EMR 5.X ä½¿ç¨ <code class="highlighter-rouge">Spark 2.4.7</code> ç Spark å®è£
å
ï¼EMR 6.X ä½¿ç¨ <code class="highlighter-rouge">Spark 3.1.2</code> ç Spark å®è£
å
ã<br />
+<code class="highlighter-rouge">shell
+cd $KYLIN_HOME
+aws s3 cp s3://${BUCKET}/spark-2.4.7-bin-hadoop2.7.tgz $KYLIN_HOME # Or downloads spark-2.4.7-bin-hadoop2.7.tgz from offical website
+tar zxvf spark-2.4.7-bin-hadoop2.7.tgz
+mv spark-2.4.7-bin-hadoop2.7 spark-apache
+</code></li>
+ </ul>
+ </li>
+ <li>å 为è¦å
å è½½ Glue 表ï¼æ以è¿ééè¿è½¯é¾æ¥å°<code class="highlighter-rouge">$KYLIN_HOME/spark</code>æå AWS Sparkï¼è¯·æ³¨ææ é设置 <code class="highlighter-rouge">SPARK_HOME</code>ï¼å ä¸ºå¨ <code class="highlighter-rouge">$KYLIN_HOME/spark</code> åå¨å¹¶ä¸ <code class="highlighter-rouge">SPARK_HOME</code> æªè®¾ç½®çæ
åµä¸ï¼Kylin ä¼é»è®¤ä½¿ç¨ <code class="highlighter-rouge">$KYLIN_HOME/spark</code> ã</li>
+</ul>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>ln -s spark-aws spark
+</code></pre>
+</div>
+
+<h3 id="kylin-">ä¿®æ¹ Kylin å¯å¨èæ¬</h3>
+
+<ol>
+ <li>å¯å¨ Spark SQL CLIï¼ä¸éåº</li>
+ <li>
+ <p>éè¿ <code class="highlighter-rouge">jps -ml ${PID}</code> è·å <code class="highlighter-rouge">SparkSQLCLIDriver</code> ç PIDï¼ç¶åè·å Driver ç <code class="highlighter-rouge">spark.driver.extraClasspath</code>ãæè
ä¹å¯ä»¥ä» <code class="highlighter-rouge">/etc/spark/conf/spark-defaults.conf</code> è·åã<br />
+ <code class="highlighter-rouge">shell
+ jps -ml | grep SparkSubmit
+ jinfo ${PID} | grep "spark.driver.extraClassPath"
+</code><br />
+ <img src="/images/blog/kylin4_support_aws_glue/8_kylin_start_up_script_en.png" alt="" /></p>
+ </li>
+ <li>ç¼è¾ <code class="highlighter-rouge">bin/kylin.sh</code>ï¼ä¿®æ¹ <code class="highlighter-rouge">KYLIN_TOMCAT_CLASSPATH</code> åéï¼è¿½å <code class="highlighter-rouge">kylin_driver_classpath</code> ï¼ä¿å好 <code class="highlighter-rouge">bin/kylin.sh</code> åéåº Spark SQL CLI</li>
+</ol>
+
+<ul>
+ <li>ä¿®æ¹åç kylin.sh</li>
+</ul>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/9_kylin_start_up_script_en.png" alt="" /></p>
+
+<ul>
+ <li>é对 EMR 6.5.0ï¼ä¿®æ¹åç kylin.shï¼<code class="highlighter-rouge">kylin_driver_classpath</code> æ¾å°æåã</li>
+</ul>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/10_kylin_start_up_script_en.png" alt="" /></p>
+
+<ul>
+ <li>é对 EMR 5.33.1ï¼ä¿®æ¹åç kylin.shï¼<code class="highlighter-rouge">kylin_driver_classpath</code> æ¾å° <code class="highlighter-rouge">$SPARK_HOME/jars</code> ä¹åã</li>
+</ul>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/11_kylin_start_up_script_en.png" alt="" /></p>
+
+<h3 id="kylin-1">é
ç½® Kylin</h3>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>
+vim conf/kylin.properties
+</code></pre>
+</div>
+
+<h4 id="minimal-kylin-configuration">Minimal Kylin Configuration</h4>
+
+<table>
+ <thead>
+ <tr>
+ <th>Property Key</th>
+ <th>Property Value(Example)</th>
+ <th>Notes</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>kylin.metadata.url</td>
+ <td>kylin4_on_cloud@jdbc,url=jdbc:mysql://${HOSTNAME}:3306/hue,username=hive,password=${PASSWORD},maxActive=10,maxIdle=10,driverClassName=org.mariadb.jdbc.Driver</td>
+ <td>N/A</td>
+ </tr>
+ <tr>
+ <td>kylin.env.zookeeper-connect-string</td>
+ <td>${HOSTNAME}</td>
+ <td>N/A</td>
+ </tr>
+ <tr>
+ <td>kylin.engine.spark-conf.spark.driver.extraClassPath</td>
+ <td>/usr/lib/hadoop-lzo/lib/<em>:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/</em>:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar</td>
+ <td>Copied from spark.driver.extraClasspath in /etc/spark/conf/spark-default.conf</td>
+ </tr>
+ </tbody>
+</table>
+
+<h3 id="kylin--1">å¯å¨ Kylin 并éªè¯æ建</h3>
+
+<h4 id="kylin-2">å¯å¨ Kylin</h4>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>
+ln -s spark spark_aws <span class="c"># skip this step if soft link 'spark' exists </span>
+bin/kylin.sh restart
+</code></pre>
+</div>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/12_start_kylin_en.png" alt="" /></p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/13_start_kylin_en.png" alt="" /></p>
+
+<h4 id="kylin-spark-enginejar-optional">æ¿æ¢ kylin-spark-engine.jar (Optional)</h4>
+
+<blockquote>
+ <p>ä»
å¯¹äº 4.0.1 éè¦æä½è¯¥æ¥éª¤ã</p>
+</blockquote>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>/tomcat/webapps/kylin/WEB-INF/lib/
+mv kylin-spark-engine-4.0.1.jar kylin-spark-engine-4.0.1.jar.bak <span class="c"># remove old one </span>
+cp kylin-spark-engine-4.0.0-SNAPSHOT.jar .
+
+bin/kylin.sh restart <span class="c"># restart kylin to make new jar be loaded</span>
+</code></pre>
+</div>
+
+<h4 id="glue--1">å è½½ Glue 表ãæ建</h4>
+
+<ul>
+ <li>å è½½ Glue 表å
æ°æ®</li>
+</ul>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/14_load_glue_meta_en.png" alt="" /></p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/15_load_glue_meta_en.png" alt="" /></p>
+
+<ul>
+ <li>å建 Model å Cubeï¼ç¶å触åæ建</li>
+</ul>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/16_load_glue_meta_en.png" alt="" /></p>
+
+<h3 id="section-2">éªè¯æ¥è¯¢</h3>
+
+<p>åæ¢ Kylin 使ç¨ç Sparkï¼éå¯ Kylinã</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>
+rm spark <span class="c"># 'spark' is a soft link, it is point to aws spark</span>
+ln -s spark_apache spark <span class="c"># switch from aws spark to apache spark</span>
+bin/kylin.sh restart
+</code></pre>
+</div>
+
+<p>æ§è¡æµè¯æ¥è¯¢ï¼æ¥è¯¢æå</p>
+
+<p><img src="/images/blog/kylin4_support_aws_glue/17_verify_query_en.png" alt="" /></p>
+
+<h2 id="section-3">讨论åé®ç</h2>
+
+<h3 id="sparkaws-spark--apache-spark">为ä»ä¹å¿
须使ç¨ä¸¤ä¸ª Sparkï¼AWS Spark & Apache Sparkï¼ï¼</h3>
+
+<p>ç±äº AWS Spark å
置对 AWS Glue Catalog çæ¯æï¼å¹¶ä¸å 载表åæ建å¼æéè¦è·å表ï¼æ以<strong>å 载表å
æ°æ®åæ§è¡æ建éè¦ä½¿ç¨ AWS Spark</strong>ï¼ä½æ¯èèå° Kylin 4.0.1 æ¯æ¯æ Apache Sparkï¼å¹¶ä¸ AWS Spark ç¸å¯¹ Apache Spark ææ¯è¾å¤§ç代ç ä¿®æ¹ï¼é æ两è
å
¼å®¹æ§è¾å·®ï¼æ以<strong>æ¥è¯¢ Cube éè¦ä½¿ç¨ Apache Spark</strong>ã综ä¸æè¿°ï¼éè¦æ ¹æ® Kylin éè¦æ§è¡æ¥è¯¢ä»»å¡è¿æ¯æ建任å¡ï¼æ¥åæ¢æ使ç¨çç Sparkã<br />
+å¨å®é
使ç¨è¿ç¨ä¸ï¼å¯ä»¥èè Job Nodeï¼æ建任å¡ï¼ä½¿ç¨ AWS Sparkï¼Query Nodeï¼æ¥è¯¢ä»»å¡ï¼ä½¿ç¨ Apache Sparkã</p>
+
+<h3 id="kylinsh">为ä»ä¹éè¦ä¿®æ¹ kylin.shï¼</h3>
+
+<p>Kylin è¿ç¨ä½ä¸º Spark Driver éè¦éè¿<code class="highlighter-rouge">aws-glue-datacatalog-spark-client.jar</code>å 载表å
æ°æ®ï¼æ以è¿åéè¦ä¿®æ¹ kylin.shï¼å°ç¸å
³ jar å è½½å° Kylin è¿ç¨ç classpathã</p>
+
+ </article>
+
+</div>
+
+
+
+
+
+ </article>
+ </div>
+ </div>
+ <!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements. See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership. The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<footer id="underfooter">
+ <div>
+ <div class="row">
+ <div class="col-md-12 widget">
+ <div class="widget-body">
+ <div class="footer-img">
+ <a href="http://www.apache.org">
+ <img id="asf-logo" height="78px" alt="Apache Software Foundation" src="/assets/images/apache_footer.png">
+ </a>
+ </div>
+ <p style="padding-top: 11px;">
+ The contents of this website are © 2015 Apache Software Foundation under the terms of the
+ <a href="http://www.apache.org/licenses/LICENSE-2.0"> Apache License v2 </a>.
+ </p>
+ <p style="margin-bottom: 11px;">
+ Apache Kylin and its logo are trademarks of the Apache Software Foundation.
+ </div>
+
+ </div>
+ </div>
+ </div>
+ <!-- /row of widgets -->
+
+ </div>
+ <div></div>
+
+</footer>
+
+ <script src="/assets/js/jquery-1.9.1.min.js"></script>
+ <script src="/assets/js/bootstrap.min.js"></script>
+ <script src="/assets/js/main.js"></script>
+ </body>
+</html>
+
+
+
+