You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by " Kaige Liu (Jira)" <ji...@apache.org> on 2019/12/24 21:11:00 UTC

[jira] [Commented] (KYLIN-3685) AWS Glue Catalog Not Supported

    [ https://issues.apache.org/jira/browse/KYLIN-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002995#comment-17002995 ] 

 Kaige Liu commented on KYLIN-3685:
-----------------------------------

Hi [~rjarvis], [~rongneng.wei],

There is a solution to this issue. You can give it a shot as below steps:

1) Use beeline instead of Hive CLI to connect Hive metastore.

    Change configurations in kylin.properties
{quote}kylin.source.hive.client=beeline

kylin.source.hive.beeline-params=-u jdbc:hive2://ip-172-31-84-101.ec2.internal:10000 -n root
{quote}
2) copy missed jars
{quote}cp /usr/share/aws/hmclient/lib/aws-glue-datacatalog-hive2-client-1.11.0.jar $KYLIN_HOME/ext

cp $KYLIN_HOME/spark/jars/joda-time-2.9.3.jar $KYLIN_HOME/lib
{quote}
I have tried this on AWS EMR 5.28. It works well.

 

_*Root cause analysis*_

1. Kylin connects Hive metastore via HiveMetaStoreClient like this:
{code:java}
private HiveMetaStoreClient getMetaStoreClient() throws Exception {
    if (metaStoreClient == null) {
        metaStoreClient = new HiveMetaStoreClient(hiveConf);
    }
    return metaStoreClient;
}
{code}
This will ignore the configurations in hive-site.xml cause it initializes the client directly.
{quote}<property>

    <name>hive.metastore.client.factory.class</name>

    <value>com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory</value>

  </property>
{quote}
When changing to beeline, the client will not be created by kylin and beeline can handle this properly.

 

2. We need to add  /usr/share/aws/hmclient/lib/aws-glue-datacatalog-hive2-client-1.11.0.jar to classpath to avoid below error:
{quote}java.lang.RuntimeException: java.io.IOException: MetaException(message:Unable to instantiate a metastore client factory com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory due to: java.lang.ClassNotFoundException: Class com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory not found)
 at org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:83)
 at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.setupMapper(FactDistinctColumnsJob.java:126)
 at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:104)
 at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:144)
 at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
 at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
 at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
 at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
{quote}
2. Why do we need to copy joda-time-2.9.3.jar to $KYLIN_HOME/lib?

AWS java SDK uses a newer version of joda-time while hbase introduces an old version joda-time( < 2.0 ) shipped with jruby-complete-1.6.8.jar . Putting the new version to $KYLIN_HOME/lib so that it will appear in front of jruby-complete-1.6.8.jar in classpath.

If not, below error will occur
{quote}org.apache.kylin.job.exception.ExecuteException: org.apache.kylin.job.exception.ExecuteException: com.google.common.util.concurrent.ExecutionError: java.lang.NoSuchMethodError: org.joda.time.format.DateTimeFormatter.withZoneUTC()Lorg/joda/time/format/DateTimeFormatter;

        at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:194)

        at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.kylin.job.exception.ExecuteException: com.google.common.util.concurrent.ExecutionError: java.lang.NoSuchMethodError: org.joda.time.format.DateTimeFormatter.withZoneUTC()Lorg/joda/time/format/DateTimeFormatter;
{quote}
 

 

> AWS Glue Catalog Not Supported
> ------------------------------
>
>                 Key: KYLIN-3685
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3685
>             Project: Kylin
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: v2.5.0
>            Reporter: Richard Jarvis
>            Assignee:  Kaige Liu
>            Priority: Major
>
> I am trying to use Kylin on AWS (EMR 5.18.0).
> I use AWS Glue as the catalog and as a result Kylin can't find the tables. 
> I am able to see the schemas and tables in the GUI because I have set the AWS glue properties in hive-site.xml:
>  
> <property>
> <name>hive.metastore.client.factory.class</name>    <value>com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory</value>
> </property>
> However, the job org.apache.kylin.source.hive.cardinality.HiveColumnCardinalityJob fails to find the tables (it's looking in the Hive metadata catalog instead of AWS Glue).
> I think this is because Hive 1.2.1 is too old to support the client factory class.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)