You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Koji Kawamura (JIRA)" <ji...@apache.org> on 2018/01/26 01:19:00 UTC

[jira] [Created] (NIFI-4818) Fix transit URL parsing at Hive2JDBC and KafkaTopic for ReportLineageToAtlas

Koji Kawamura created NIFI-4818:
-----------------------------------

             Summary: Fix transit URL parsing at Hive2JDBC and KafkaTopic for ReportLineageToAtlas
                 Key: NIFI-4818
                 URL: https://issues.apache.org/jira/browse/NIFI-4818
             Project: Apache NiFi
          Issue Type: Bug
          Components: Extensions
    Affects Versions: 1.5.0
            Reporter: Koji Kawamura
            Assignee: Koji Kawamura


ReportLineageToAtlas parses Hive JDBC connection URLs to get database names. It works if a connection URL does not have parameters. (e.g. jdbc:hive2://host:port/dbName) But it reports wrong database name if there are parameters. (e.g. jdbc:hive2://host.port/dbName;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2)

Also, if there are more than one host:port defined, it will not be able to analyze cluster name from hostnames correctly.

Similarly for Kafka topic, the reporting task uses transit URIs to analyze hostnames and topic names. It does handle multiple host:port definitions within a URI, however, current logic only uses the first hostname entry even if there are multiple ones. For example, with a transit URI, "PLAINTEXT://0.example.com:6667,1.example.com:6667/topicA", it uses "0.example.com" to match configured regular expressions to derive a cluster name. If none of regex matches, then it uses the default cluster name without looping through all hostnames. It never uses the 2nd or later hostnames to derive a cluster name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)