You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by hosur narahari <hn...@gmail.com> on 2018/09/20 03:43:27 UTC

select with catalog fails

Hi,

I was using kylin JDBC driver to fetch data. By using DatabaseMetaData if
we get catalog using getCatalogs() method, it return value
"defaultCatalog". It returns actual hive schema when we execute
getSchemas().

According to JDBC contract, catalog.schema.table should be valid from
clause and many query layers use that. But kylin fails when we execute that
query.
I've tried to write sample code piece for that below.

*DatabaseMetaData db = conn.getMetaData();*
*        ResultSet catalogSet = db.getCatalogs();*
*        String catalog = "";*
*        if(catalogSet.next()) {*
*            catalog = catalogSet.getString("TABLE_CAT");*
*        }*
*        ResultSet schemaSet = db.getSchemas();*
*        String schema = "";*
*        if(schemaSet.next()) {*
*            schema = schemaSet.getString("TABLE_SCHEM");*
*        }*
*        StringBuilder sb = new StringBuilder("SELECT * FROM ");*
*        if(!catalog.isEmpty()) {*
*            sb.append(catalog + ".");*
*        }*
*        if(!schema.isEmpty()) {*
*            sb.append(schema + ".");*
*        }*
*        sb.append("kylin_sales limit 10");*
*        String query = sb.toString();*
*        Statement stat = conn.createStatement();*
*        ResultSet rs = stat.executeQuery(query);*
*        while(rs.next()) {*
*            System.out.println(rs.getObject("trans_id"));*
*        }*

In short, the above snippet is executing the query,
*select * from defaultCatalog.DEFAULT.kylin_sales.*

Same thing happens even with different schemas if we have like,
*select * from defaultCatalog.test.kylin_sales* also fails.

Also if hive schema is anything other than default, then <schema
name>.<table name> in from clause works. But with default schema it fails.

Ex. If tables are in *test *schema of hive, then *select * from
test.kylin_sales *works. But when the tables are in *default *schema of
hive, then *select * from default.kylin_sales *fails.

Please let me know if I'm doing something wrong.

Best Regards,
Hari

Re: select with catalog fails

Posted by hosur narahari <hn...@gmail.com>.
Sure I'll do that :)

Best Regards,
Hari

On Tue, 25 Sep 2018, 09:05 ShaoFeng Shi, <sh...@apache.org> wrote:

> Hi Hari,
>
> Thanks for your comment. The default schema is supported, but please aware
> that the 'default' is a keyword in Calcite, so in the SQL please use
> "DEFAULT" to escape. For example:
>
> select count(*) from "DEFAULT".kylin_sales
>
> Regarding the catalog issue, would you like to report a JIRA to track the
> problem? Kylin should follow the JDBC standard as much as possible.
>
> hosur narahari <hn...@gmail.com> 于2018年9月24日周一 下午8:14写道:
>
>> Hi,
>>
>> Yes whatever you said about mysql is true. But when queried from JDBC
>> driver, the values in column "SCHEMA_NAME" is returned for *getCatalogs()
>> *call and *getSchemas()* return empty resultset.
>>
>> All databases' jdbc drivers have one thing in common. If catalog or
>> schema is not empty, it can be used in select query to get the data.
>>
>> I'll explain the scenario where I'm facing problem currently. I'm trying
>> to integrate Kylin with query engines like presto and drill, since Kylin is
>> one of the very few open source cube engines. All query engines query as
>> [catalogName.][schemaName.]tableName. So both will fail to query from
>> Kylin. By adhering it to JDBC standards, we can make integration of Kylin
>> with other open source tools easier.
>>
>> Also is it possible to actually create Catalog in Kylin. If so I can test
>> creating custom catalog and see how it works.
>>
>> Best Regards,
>> Hari
>>
>> On Mon, Sep 24, 2018 at 5:10 PM Xiaoxiang Yu <xi...@kyligence.io>
>> wrote:
>>
>>> Hi,
>>>
>>> I interested in your question and I check it carefully. I want to share
>>> what I found and I guess it will be a bit helpful.
>>>
>>>
>>>
>>> It is a problem about how to realize/obey SQL standard and JDBC
>>> standard. You think JDBC standard/SQL standard should support [ [
>>> catalogName . ] schemaName . ] tableName in *from clause*. I think it
>>> is right, I found a link(https://calcite.apache.org/docs/reference.html),
>>> apache calcite says they can resolve that pattern.
>>>
>>>
>>>
>>> *tablePrimary:*
>>>
>>> *      [ [ catalogName . ] schemaName . ] tableName*
>>>
>>> *      '(' TABLE [ [ catalogName . ] schemaName . ] tableName ')'*
>>>
>>> *  |   tablePrimary [ EXTEND ] '(' columnDecl [, columnDecl ]* ')'*
>>>
>>> *  |   [ LATERAL ] '(' query ')'*
>>>
>>> *  |   UNNEST '(' expression ')' [ WITH ORDINALITY ]*
>>>
>>> *  |   [ LATERAL ] TABLE '(' [ SPECIFIC ] functionName '(' expression [,
>>> expression ]* ')' ')'*
>>>
>>>
>>>
>>>
>>>
>>> And in JDBC interface these two methods as defined as followed:
>>> (java.sql.DatabaseMetaData)
>>>
>>> [image: * * * * * * * Retrieves the schema names available in this
>>> database. are ordered by and <P>The schema columns are: COL >
>>> <LI><B>TABLE_SCHEM</B> String {Ocode =>} schema name The results String
>>> {ä)code catalog name (may be <code>null</code>) aparam catalog a catalog
>>> name; must match the catalog name as it is stored in the database;
>>> retrieves those without a catalog; null means catalog name should not be
>>> used to narrow down the search. aparam schemapattern a schema name; must
>>> match the schema name as it is stored in the database; null means schema
>>> name should not be used to narrow down the search. areturn a object in
>>> which each row is a schema description Dexception SQLException if a
>>> database access error occurs asee #getSearchStringEscape asince 1.6
>>> ResultSet —(string catalog, String schemapattern) throws SQLException;]
>>>
>>>
>>>
>>> [image: * Retrieves the catalog names available in this database. * are
>>> ordered by catalog name. * catalog column is: * String {acode =>} catalog
>>> name The results * areturn a object in which each row has a single
>>> <code>String</code> column that is a catalog name * aexception SQLException
>>> if a database access error occurs ( ) throws SQLException; ResultSet]
>>>
>>>
>>>
>>> And I write a simple program to test behavior of MySQL JDBC driver and Kylin
>>> JDBC driver. Here are the code and output.
>>>
>>> MySQL did not return empty when call *getSchemas* and return four items
>>> in *getCatalogs*
>>>
>>>
>>>
>>> [image: public static void main(string[] args) throws Exception {
>>> testDriver( •tariverNameSD "org.apache.kylin.jdbc.Driver" , " 7079/1acus" ,
>>> cuserrqame:' "ADMIN", coassworoo "IQ'LIN"); testDriver( 'griverNametSD
>>> "com.mysql . jdbc . Driver" , " jdbc :mysql ://localhost: 3306" , " root" ,
>>> "lacus" ) ; public static void testDriver(String driverName, String url,
>>> String userName, + driverName); Class . forName(driverName) ; Connection
>>> conn = DriverManager.getConnection(ur1, userName, password); getMetaData( )
>>> ; DatabaseMetaData db conn. ResultSet catalogSet = db. getcatalogs() String
>>> catalog = " while (catalogset.next()) { String password) throws Exception {
>>> .getschemas(), Resultset schemaset = db String schema " "; while
>>> (schemaset.next()) { " + "TABLE_CATALOG")); close() conn .]
>>>
>>>
>>>
>>> [image: /Library/Java/JavaVirtuaIMachines/jdk1.8.Ø_181.
>>> jdk/Contents/Home/bin/java . Test log4j • log4j • log4j • TABLE TABLE
>>> org.apache.kylin.jdbc.Driver •WARN No appenders could be found for logger
>>> (org.apache.kylin.jdbc.KylinConnection) . •WARQ Please initialize the log4j
>>> system properly. •WARN See
>>> http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. _CAT
>>> defaultCatalog TABLE_SCHEM LACUS _CATALOG defaultCatalog Test
>>> com.mysql.jdbc.Driver Loading class 'com.mysql.jdbc.Driver'. This is
>>> deprecated. The new driver class is •com.mysql.cj.jdbc.Driver'. The driver
>>> is automatically regis driver class is generally unnecessary. Mon Sep 24
>>> 17:46:33 CST 2018 WAR•J: Establishing SSL connection without server's
>>> identity verification is not recommended. According to MySQL 5.5 connection
>>> must be established by default if explicit option isn't set. For compliance
>>> with existing applications not using SSL the verifyServf You TABLE TABLE
>>> TABLE TABLE need either to explicitly disable SSL by setting useSSL=false,
>>> or set useSSL=true and provide truststore for server certificate
>>> verificatit _CAT _CAT _CAT _CAT information_schema mysql performance_schema
>>> sys]
>>>
>>>
>>>
>>>
>>>
>>> Here is code of how MySQL JDBC do.(com.mysql.jdbc.DatabaseMetadata)
>>>
>>> [image: ( ) throws SQLException { public ResultSet ResultSet results
>>> null; Statement stmt = nut Z; try { - this. conn.createstatement(); stmt -
>>> stmt . setEscapeprocessing( false); stmt. executeQuery( "SHOW DATABASES");
>>> results = ResultsetMetaData resultsMD = results.getMetaData( fields new
>>> Field( " = new ArrayList(); ArrayList tuples eolumnNamee "TABLE _ CAT" ,
>>> (LßDcTypeVD 12 , resul tsMD. getCoIumnD - new acolum rowval tuples. add(new
>>> ByteArrayRow(rowVa1, this. getExceptionIn tuples); Resultset var18 = this.
>>> return var18; finally { if (results null) { try { results . } catch
>>> (SQLException var16) { Assertion Fai led Except ion . shouldNotHappen(
>>> var16) ; eptor())); results = null; if (stmt != null) { try { stmt } catch
>>> (SQLException var15) { Assert ion Failed Exception . shouldNotHappen(
>>> var15) ; - null; stmt -]
>>>
>>>
>>>
>>> [image: public ResultSet Field[] fields hemas() throws SQLException { =
>>> new Field( catablet•ämeo " " , "TABLE_SCHEM", CLBDCTYP%DI, .:tengtmJJ O ) ,
>>> new Field( , ArrayList tuples = new ArrayList(); ResultSet results -
>>> tuples); - this. return results;]
>>>
>>>
>>>
>>>
>>>
>>> Let us look how mysql store infomation of catalog/schema/table, It seems
>>> that MySQL JDBC did not use column CATALOG_NAME of table SCHEMATA.
>>>
>>>
>>>
>>>
>>>
>>> *My opinion:*
>>>
>>> So, MySQL JDBC did not follow standard strictly . As far as I concerned,
>>> kylin jdbc driver may not need to follow SQL standard so strictly.
>>>
>>> You can avoid misleading/error message if you do not use  [ [
>>> catalogName . ] schemaName . ] tableName  pattern in your *from clause*.
>>>
>>>
>>>
>>>
>>> But I think Kylin should not return a ugly error stacktrace when user
>>> use that pattern. I think I may find a better way to avoid such ugly error
>>> stacktrace.
>>>
>>>
>>>
>>>
>>>
>>> Link:
>>>
>>>
>>> https://stackoverflow.com/questions/7022755/whats-the-difference-between-a-catalog-and-a-schema-in-a-relational-database
>>>
>>> https://en.wikipedia.org/wiki/Database_catalog
>>>
>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
>>>
>>>
>>>
>>>
>>>
>>> If I mistake anything, please let me know.
>>>
>>> Best Regards,
>>>
>>> Xiaoxiang Yu
>>>
>>> *发件人**: *hosur narahari <hn...@gmail.com>
>>> *答复**: *"user@kylin.apache.org" <us...@kylin.apache.org>
>>> *日期**: *2018年9月20日 星期四 11:43
>>> *收件人**: *"user@kylin.apache.org" <us...@kylin.apache.org>
>>> *主题**: *select with catalog fails
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I was using kylin JDBC driver to fetch data. By using DatabaseMetaData
>>> if we get catalog using getCatalogs() method, it return value
>>> "defaultCatalog". It returns actual hive schema when we execute
>>> getSchemas().
>>>
>>>
>>>
>>> According to JDBC contract, catalog.schema.table should be valid from
>>> clause and many query layers use that. But kylin fails when we execute that
>>> query.
>>>
>>> I've tried to write sample code piece for that below.
>>>
>>>
>>>
>>> *DatabaseMetaData db = conn.getMetaData();*
>>>
>>> *        ResultSet catalogSet = db.getCatalogs();*
>>>
>>> *        String catalog = "";*
>>>
>>> *        if(catalogSet.next()) {*
>>>
>>> *            catalog = catalogSet.getString("TABLE_CAT");*
>>>
>>> *        }*
>>>
>>> *        ResultSet schemaSet = db.getSchemas();*
>>>
>>> *        String schema = "";*
>>>
>>> *        if(schemaSet.next()) {*
>>>
>>> *            schema = schemaSet.getString("TABLE_SCHEM");*
>>>
>>> *        }*
>>>
>>> *        StringBuilder sb = new StringBuilder("SELECT * FROM ");*
>>>
>>> *        if(!catalog.isEmpty()) {*
>>>
>>> *            sb.append(catalog + ".");*
>>>
>>> *        }*
>>>
>>> *        if(!schema.isEmpty()) {*
>>>
>>> *            sb.append(schema + ".");*
>>>
>>> *        }*
>>>
>>> *        sb.append("kylin_sales limit 10");*
>>>
>>> *        String query = sb.toString();*
>>>
>>> *        Statement stat = conn.createStatement();*
>>>
>>> *        ResultSet rs = stat.executeQuery(query);*
>>>
>>> *        while(rs.next()) {*
>>>
>>> *            System.out.println(rs.getObject("trans_id"));*
>>>
>>> *        }*
>>>
>>>
>>>
>>> In short, the above snippet is executing the query,
>>>
>>> *select * from defaultCatalog.DEFAULT.kylin_sales.*
>>>
>>>
>>>
>>> Same thing happens even with different schemas if we have like,
>>>
>>> *select * from defaultCatalog.test.kylin_sales* also fails.
>>>
>>>
>>>
>>> Also if hive schema is anything other than default, then <schema
>>> name>.<table name> in from clause works. But with default schema it fails.
>>>
>>>
>>>
>>> Ex. If tables are in *test *schema of hive, then *select * from
>>> test.kylin_sales *works. But when the tables are in *default *schema of
>>> hive, then *select * from default.kylin_sales *fails.
>>>
>>>
>>>
>>> Please let me know if I'm doing something wrong.
>>>
>>>
>>>
>>> Best Regards,
>>>
>>> Hari
>>>
>>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>

Re: select with catalog fails

Posted by ShaoFeng Shi <sh...@apache.org>.
Hi Hari,

Thanks for your comment. The default schema is supported, but please aware
that the 'default' is a keyword in Calcite, so in the SQL please use
"DEFAULT" to escape. For example:

select count(*) from "DEFAULT".kylin_sales

Regarding the catalog issue, would you like to report a JIRA to track the
problem? Kylin should follow the JDBC standard as much as possible.

hosur narahari <hn...@gmail.com> 于2018年9月24日周一 下午8:14写道:

> Hi,
>
> Yes whatever you said about mysql is true. But when queried from JDBC
> driver, the values in column "SCHEMA_NAME" is returned for *getCatalogs()
> *call and *getSchemas()* return empty resultset.
>
> All databases' jdbc drivers have one thing in common. If catalog or schema
> is not empty, it can be used in select query to get the data.
>
> I'll explain the scenario where I'm facing problem currently. I'm trying
> to integrate Kylin with query engines like presto and drill, since Kylin is
> one of the very few open source cube engines. All query engines query as
> [catalogName.][schemaName.]tableName. So both will fail to query from
> Kylin. By adhering it to JDBC standards, we can make integration of Kylin
> with other open source tools easier.
>
> Also is it possible to actually create Catalog in Kylin. If so I can test
> creating custom catalog and see how it works.
>
> Best Regards,
> Hari
>
> On Mon, Sep 24, 2018 at 5:10 PM Xiaoxiang Yu <xi...@kyligence.io>
> wrote:
>
>> Hi,
>>
>> I interested in your question and I check it carefully. I want to share
>> what I found and I guess it will be a bit helpful.
>>
>>
>>
>> It is a problem about how to realize/obey SQL standard and JDBC standard.
>> You think JDBC standard/SQL standard should support [ [ catalogName . ]
>> schemaName . ] tableName in *from clause*. I think it is right, I found
>> a link(https://calcite.apache.org/docs/reference.html), apache calcite
>> says they can resolve that pattern.
>>
>>
>>
>> *tablePrimary:*
>>
>> *      [ [ catalogName . ] schemaName . ] tableName*
>>
>> *      '(' TABLE [ [ catalogName . ] schemaName . ] tableName ')'*
>>
>> *  |   tablePrimary [ EXTEND ] '(' columnDecl [, columnDecl ]* ')'*
>>
>> *  |   [ LATERAL ] '(' query ')'*
>>
>> *  |   UNNEST '(' expression ')' [ WITH ORDINALITY ]*
>>
>> *  |   [ LATERAL ] TABLE '(' [ SPECIFIC ] functionName '(' expression [,
>> expression ]* ')' ')'*
>>
>>
>>
>>
>>
>> And in JDBC interface these two methods as defined as followed:
>> (java.sql.DatabaseMetaData)
>>
>> [image: * * * * * * * Retrieves the schema names available in this
>> database. are ordered by and <P>The schema columns are: COL >
>> <LI><B>TABLE_SCHEM</B> String {Ocode =>} schema name The results String
>> {ä)code catalog name (may be <code>null</code>) aparam catalog a catalog
>> name; must match the catalog name as it is stored in the database;
>> retrieves those without a catalog; null means catalog name should not be
>> used to narrow down the search. aparam schemapattern a schema name; must
>> match the schema name as it is stored in the database; null means schema
>> name should not be used to narrow down the search. areturn a object in
>> which each row is a schema description Dexception SQLException if a
>> database access error occurs asee #getSearchStringEscape asince 1.6
>> ResultSet —(string catalog, String schemapattern) throws SQLException;]
>>
>>
>>
>> [image: * Retrieves the catalog names available in this database. * are
>> ordered by catalog name. * catalog column is: * String {acode =>} catalog
>> name The results * areturn a object in which each row has a single
>> <code>String</code> column that is a catalog name * aexception SQLException
>> if a database access error occurs ( ) throws SQLException; ResultSet]
>>
>>
>>
>> And I write a simple program to test behavior of MySQL JDBC driver and Kylin
>> JDBC driver. Here are the code and output.
>>
>> MySQL did not return empty when call *getSchemas* and return four items
>> in *getCatalogs*
>>
>>
>>
>> [image: public static void main(string[] args) throws Exception {
>> testDriver( •tariverNameSD "org.apache.kylin.jdbc.Driver" , " 7079/1acus" ,
>> cuserrqame:' "ADMIN", coassworoo "IQ'LIN"); testDriver( 'griverNametSD
>> "com.mysql . jdbc . Driver" , " jdbc :mysql ://localhost: 3306" , " root" ,
>> "lacus" ) ; public static void testDriver(String driverName, String url,
>> String userName, + driverName); Class . forName(driverName) ; Connection
>> conn = DriverManager.getConnection(ur1, userName, password); getMetaData( )
>> ; DatabaseMetaData db conn. ResultSet catalogSet = db. getcatalogs() String
>> catalog = " while (catalogset.next()) { String password) throws Exception {
>> .getschemas(), Resultset schemaset = db String schema " "; while
>> (schemaset.next()) { " + "TABLE_CATALOG")); close() conn .]
>>
>>
>>
>> [image: /Library/Java/JavaVirtuaIMachines/jdk1.8.Ø_181.
>> jdk/Contents/Home/bin/java . Test log4j • log4j • log4j • TABLE TABLE
>> org.apache.kylin.jdbc.Driver •WARN No appenders could be found for logger
>> (org.apache.kylin.jdbc.KylinConnection) . •WARQ Please initialize the log4j
>> system properly. •WARN See
>> http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. _CAT
>> defaultCatalog TABLE_SCHEM LACUS _CATALOG defaultCatalog Test
>> com.mysql.jdbc.Driver Loading class 'com.mysql.jdbc.Driver'. This is
>> deprecated. The new driver class is •com.mysql.cj.jdbc.Driver'. The driver
>> is automatically regis driver class is generally unnecessary. Mon Sep 24
>> 17:46:33 CST 2018 WAR•J: Establishing SSL connection without server's
>> identity verification is not recommended. According to MySQL 5.5 connection
>> must be established by default if explicit option isn't set. For compliance
>> with existing applications not using SSL the verifyServf You TABLE TABLE
>> TABLE TABLE need either to explicitly disable SSL by setting useSSL=false,
>> or set useSSL=true and provide truststore for server certificate
>> verificatit _CAT _CAT _CAT _CAT information_schema mysql performance_schema
>> sys]
>>
>>
>>
>>
>>
>> Here is code of how MySQL JDBC do.(com.mysql.jdbc.DatabaseMetadata)
>>
>> [image: ( ) throws SQLException { public ResultSet ResultSet results
>> null; Statement stmt = nut Z; try { - this. conn.createstatement(); stmt -
>> stmt . setEscapeprocessing( false); stmt. executeQuery( "SHOW DATABASES");
>> results = ResultsetMetaData resultsMD = results.getMetaData( fields new
>> Field( " = new ArrayList(); ArrayList tuples eolumnNamee "TABLE _ CAT" ,
>> (LßDcTypeVD 12 , resul tsMD. getCoIumnD - new acolum rowval tuples. add(new
>> ByteArrayRow(rowVa1, this. getExceptionIn tuples); Resultset var18 = this.
>> return var18; finally { if (results null) { try { results . } catch
>> (SQLException var16) { Assertion Fai led Except ion . shouldNotHappen(
>> var16) ; eptor())); results = null; if (stmt != null) { try { stmt } catch
>> (SQLException var15) { Assert ion Failed Exception . shouldNotHappen(
>> var15) ; - null; stmt -]
>>
>>
>>
>> [image: public ResultSet Field[] fields hemas() throws SQLException { =
>> new Field( catablet•ämeo " " , "TABLE_SCHEM", CLBDCTYP%DI, .:tengtmJJ O ) ,
>> new Field( , ArrayList tuples = new ArrayList(); ResultSet results -
>> tuples); - this. return results;]
>>
>>
>>
>>
>>
>> Let us look how mysql store infomation of catalog/schema/table, It seems
>> that MySQL JDBC did not use column CATALOG_NAME of table SCHEMATA.
>>
>>
>>
>>
>>
>> *My opinion:*
>>
>> So, MySQL JDBC did not follow standard strictly . As far as I concerned,
>> kylin jdbc driver may not need to follow SQL standard so strictly.
>>
>> You can avoid misleading/error message if you do not use  [ [
>> catalogName . ] schemaName . ] tableName  pattern in your *from clause*.
>>
>>
>>
>> But I think Kylin should not return a ugly error stacktrace when user use
>> that pattern. I think I may find a better way to avoid such ugly error
>> stacktrace.
>>
>>
>>
>>
>>
>> Link:
>>
>>
>> https://stackoverflow.com/questions/7022755/whats-the-difference-between-a-catalog-and-a-schema-in-a-relational-database
>>
>> https://en.wikipedia.org/wiki/Database_catalog
>>
>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
>>
>>
>>
>>
>>
>> If I mistake anything, please let me know.
>>
>> Best Regards,
>>
>> Xiaoxiang Yu
>>
>> *发件人**: *hosur narahari <hn...@gmail.com>
>> *答复**: *"user@kylin.apache.org" <us...@kylin.apache.org>
>> *日期**: *2018年9月20日 星期四 11:43
>> *收件人**: *"user@kylin.apache.org" <us...@kylin.apache.org>
>> *主题**: *select with catalog fails
>>
>>
>>
>> Hi,
>>
>>
>>
>> I was using kylin JDBC driver to fetch data. By using DatabaseMetaData if
>> we get catalog using getCatalogs() method, it return value
>> "defaultCatalog". It returns actual hive schema when we execute
>> getSchemas().
>>
>>
>>
>> According to JDBC contract, catalog.schema.table should be valid from
>> clause and many query layers use that. But kylin fails when we execute that
>> query.
>>
>> I've tried to write sample code piece for that below.
>>
>>
>>
>> *DatabaseMetaData db = conn.getMetaData();*
>>
>> *        ResultSet catalogSet = db.getCatalogs();*
>>
>> *        String catalog = "";*
>>
>> *        if(catalogSet.next()) {*
>>
>> *            catalog = catalogSet.getString("TABLE_CAT");*
>>
>> *        }*
>>
>> *        ResultSet schemaSet = db.getSchemas();*
>>
>> *        String schema = "";*
>>
>> *        if(schemaSet.next()) {*
>>
>> *            schema = schemaSet.getString("TABLE_SCHEM");*
>>
>> *        }*
>>
>> *        StringBuilder sb = new StringBuilder("SELECT * FROM ");*
>>
>> *        if(!catalog.isEmpty()) {*
>>
>> *            sb.append(catalog + ".");*
>>
>> *        }*
>>
>> *        if(!schema.isEmpty()) {*
>>
>> *            sb.append(schema + ".");*
>>
>> *        }*
>>
>> *        sb.append("kylin_sales limit 10");*
>>
>> *        String query = sb.toString();*
>>
>> *        Statement stat = conn.createStatement();*
>>
>> *        ResultSet rs = stat.executeQuery(query);*
>>
>> *        while(rs.next()) {*
>>
>> *            System.out.println(rs.getObject("trans_id"));*
>>
>> *        }*
>>
>>
>>
>> In short, the above snippet is executing the query,
>>
>> *select * from defaultCatalog.DEFAULT.kylin_sales.*
>>
>>
>>
>> Same thing happens even with different schemas if we have like,
>>
>> *select * from defaultCatalog.test.kylin_sales* also fails.
>>
>>
>>
>> Also if hive schema is anything other than default, then <schema
>> name>.<table name> in from clause works. But with default schema it fails.
>>
>>
>>
>> Ex. If tables are in *test *schema of hive, then *select * from
>> test.kylin_sales *works. But when the tables are in *default *schema of
>> hive, then *select * from default.kylin_sales *fails.
>>
>>
>>
>> Please let me know if I'm doing something wrong.
>>
>>
>>
>> Best Regards,
>>
>> Hari
>>
>

-- 
Best regards,

Shaofeng Shi 史少锋

Re: select with catalog fails

Posted by Xiaoxiang Yu <xi...@kyligence.io>.
Hi,

Firstly, Kylin does not support catalog, so you cannot create one. The name of catalog you see is a static final filed of org.apache.kylin.rest.constant.Constant as follow:
public final static String FakeCatalogName = "defaultCatalog";
You can also see getMetadata method of org.apache.kylin.rest.service.QueryService.

So I think there are two solution :

First. Like MySQL jdbc, modify the code to return emtpy list when call DatabaseMetaData.getCatalogs().
It is difficult because the real implmetation of DatabaseMetaData is delegated to org.apache.calcite.avatica.AvaticaDatabaseMetaData. Considering the complecity of calcite framework, change its behavior is not so easy.

Second. Remove the catalog automatically in your sql query before send it to SQL parser.
This can be done by modifying the massageSql method of org.apache.kylin.query.util.QueryUtil, add following code at the final line of massageSql.
sql1 = removeCatalog(sql1);
Method removeCatalog is as follow. It just simply replace all 'a.b.c' to 'b.c'.

public static String removeCatalog(String sql) {
        String catalogSchemaTable = "[\\w|\\d]+\\.[\\w|\\d]+\\.[\\w|\\d]+";
        Pattern catalogSchemaTablePattern = Pattern.compile(catalogSchemaTable);
        Matcher catalogSchemaTableMatcher = catalogSchemaTablePattern.matcher(sql);
        while (catalogSchemaTableMatcher.find()) {
            String group = catalogSchemaTableMatcher.group();
            int catalogIndex = group.indexOf('.');
            System.out.println("before: " + group);
            System.out.println("after: " + group.substring(catalogIndex + 1));
            sql = sql.replace(group, group.substring(catalogIndex + 1));
        }
        return sql;
}

Please open an issue on JIRA, I will do my best to fix it.

Best wishes,
Xiaoxiang Yu


发件人: hosur narahari <hn...@gmail.com>
答复: "user@kylin.apache.org" <us...@kylin.apache.org>
日期: 2018年9月24日 星期一 20:15
收件人: "user@kylin.apache.org" <us...@kylin.apache.org>
主题: Re: select with catalog fails

actually

Re: select with catalog fails

Posted by hosur narahari <hn...@gmail.com>.
Hi,

Yes whatever you said about mysql is true. But when queried from JDBC
driver, the values in column "SCHEMA_NAME" is returned for *getCatalogs() *call
and *getSchemas()* return empty resultset.

All databases' jdbc drivers have one thing in common. If catalog or schema
is not empty, it can be used in select query to get the data.

I'll explain the scenario where I'm facing problem currently. I'm trying to
integrate Kylin with query engines like presto and drill, since Kylin is
one of the very few open source cube engines. All query engines query as
[catalogName.][schemaName.]tableName. So both will fail to query from
Kylin. By adhering it to JDBC standards, we can make integration of Kylin
with other open source tools easier.

Also is it possible to actually create Catalog in Kylin. If so I can test
creating custom catalog and see how it works.

Best Regards,
Hari

On Mon, Sep 24, 2018 at 5:10 PM Xiaoxiang Yu <xi...@kyligence.io>
wrote:

> Hi,
>
> I interested in your question and I check it carefully. I want to share
> what I found and I guess it will be a bit helpful.
>
>
>
> It is a problem about how to realize/obey SQL standard and JDBC standard.
> You think JDBC standard/SQL standard should support [ [ catalogName . ]
> schemaName . ] tableName in *from clause*. I think it is right, I found a
> link(https://calcite.apache.org/docs/reference.html), apache calcite says
> they can resolve that pattern.
>
>
>
> *tablePrimary:*
>
> *      [ [ catalogName . ] schemaName . ] tableName*
>
> *      '(' TABLE [ [ catalogName . ] schemaName . ] tableName ')'*
>
> *  |   tablePrimary [ EXTEND ] '(' columnDecl [, columnDecl ]* ')'*
>
> *  |   [ LATERAL ] '(' query ')'*
>
> *  |   UNNEST '(' expression ')' [ WITH ORDINALITY ]*
>
> *  |   [ LATERAL ] TABLE '(' [ SPECIFIC ] functionName '(' expression [,
> expression ]* ')' ')'*
>
>
>
>
>
> And in JDBC interface these two methods as defined as followed:
> (java.sql.DatabaseMetaData)
>
> [image: * * * * * * * Retrieves the schema names available in this
> database. are ordered by and <P>The schema columns are: COL >
> <LI><B>TABLE_SCHEM</B> String {Ocode =>} schema name The results String
> {ä)code catalog name (may be <code>null</code>) aparam catalog a catalog
> name; must match the catalog name as it is stored in the database;
> retrieves those without a catalog; null means catalog name should not be
> used to narrow down the search. aparam schemapattern a schema name; must
> match the schema name as it is stored in the database; null means schema
> name should not be used to narrow down the search. areturn a object in
> which each row is a schema description Dexception SQLException if a
> database access error occurs asee #getSearchStringEscape asince 1.6
> ResultSet —(string catalog, String schemapattern) throws SQLException;]
>
>
>
> [image: * Retrieves the catalog names available in this database. * are
> ordered by catalog name. * catalog column is: * String {acode =>} catalog
> name The results * areturn a object in which each row has a single
> <code>String</code> column that is a catalog name * aexception SQLException
> if a database access error occurs ( ) throws SQLException; ResultSet]
>
>
>
> And I write a simple program to test behavior of MySQL JDBC driver and Kylin
> JDBC driver. Here are the code and output.
>
> MySQL did not return empty when call *getSchemas* and return four items
> in *getCatalogs*
>
>
>
> [image: public static void main(string[] args) throws Exception {
> testDriver( •tariverNameSD "org.apache.kylin.jdbc.Driver" , " 7079/1acus" ,
> cuserrqame:' "ADMIN", coassworoo "IQ'LIN"); testDriver( 'griverNametSD
> "com.mysql . jdbc . Driver" , " jdbc :mysql ://localhost: 3306" , " root" ,
> "lacus" ) ; public static void testDriver(String driverName, String url,
> String userName, + driverName); Class . forName(driverName) ; Connection
> conn = DriverManager.getConnection(ur1, userName, password); getMetaData( )
> ; DatabaseMetaData db conn. ResultSet catalogSet = db. getcatalogs() String
> catalog = " while (catalogset.next()) { String password) throws Exception {
> .getschemas(), Resultset schemaset = db String schema " "; while
> (schemaset.next()) { " + "TABLE_CATALOG")); close() conn .]
>
>
>
> [image: /Library/Java/JavaVirtuaIMachines/jdk1.8.Ø_181.
> jdk/Contents/Home/bin/java . Test log4j • log4j • log4j • TABLE TABLE
> org.apache.kylin.jdbc.Driver •WARN No appenders could be found for logger
> (org.apache.kylin.jdbc.KylinConnection) . •WARQ Please initialize the log4j
> system properly. •WARN See
> http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. _CAT
> defaultCatalog TABLE_SCHEM LACUS _CATALOG defaultCatalog Test
> com.mysql.jdbc.Driver Loading class 'com.mysql.jdbc.Driver'. This is
> deprecated. The new driver class is •com.mysql.cj.jdbc.Driver'. The driver
> is automatically regis driver class is generally unnecessary. Mon Sep 24
> 17:46:33 CST 2018 WAR•J: Establishing SSL connection without server's
> identity verification is not recommended. According to MySQL 5.5 connection
> must be established by default if explicit option isn't set. For compliance
> with existing applications not using SSL the verifyServf You TABLE TABLE
> TABLE TABLE need either to explicitly disable SSL by setting useSSL=false,
> or set useSSL=true and provide truststore for server certificate
> verificatit _CAT _CAT _CAT _CAT information_schema mysql performance_schema
> sys]
>
>
>
>
>
> Here is code of how MySQL JDBC do.(com.mysql.jdbc.DatabaseMetadata)
>
> [image: ( ) throws SQLException { public ResultSet ResultSet results null;
> Statement stmt = nut Z; try { - this. conn.createstatement(); stmt - stmt .
> setEscapeprocessing( false); stmt. executeQuery( "SHOW DATABASES"); results
> = ResultsetMetaData resultsMD = results.getMetaData( fields new Field( " =
> new ArrayList(); ArrayList tuples eolumnNamee "TABLE _ CAT" , (LßDcTypeVD
> 12 , resul tsMD. getCoIumnD - new acolum rowval tuples. add(new
> ByteArrayRow(rowVa1, this. getExceptionIn tuples); Resultset var18 = this.
> return var18; finally { if (results null) { try { results . } catch
> (SQLException var16) { Assertion Fai led Except ion . shouldNotHappen(
> var16) ; eptor())); results = null; if (stmt != null) { try { stmt } catch
> (SQLException var15) { Assert ion Failed Exception . shouldNotHappen(
> var15) ; - null; stmt -]
>
>
>
> [image: public ResultSet Field[] fields hemas() throws SQLException { =
> new Field( catablet•ämeo " " , "TABLE_SCHEM", CLBDCTYP%DI, .:tengtmJJ O ) ,
> new Field( , ArrayList tuples = new ArrayList(); ResultSet results -
> tuples); - this. return results;]
>
>
>
>
>
> Let us look how mysql store infomation of catalog/schema/table, It seems
> that MySQL JDBC did not use column CATALOG_NAME of table SCHEMATA.
>
>
>
>
>
> *My opinion:*
>
> So, MySQL JDBC did not follow standard strictly . As far as I concerned,
> kylin jdbc driver may not need to follow SQL standard so strictly.
>
> You can avoid misleading/error message if you do not use  [ [ catalogName
> . ] schemaName . ] tableName  pattern in your *from clause*.
>
>
>
> But I think Kylin should not return a ugly error stacktrace when user use
> that pattern. I think I may find a better way to avoid such ugly error
> stacktrace.
>
>
>
>
>
> Link:
>
>
> https://stackoverflow.com/questions/7022755/whats-the-difference-between-a-catalog-and-a-schema-in-a-relational-database
>
> https://en.wikipedia.org/wiki/Database_catalog
>
> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
>
>
>
>
>
> If I mistake anything, please let me know.
>
> Best Regards,
>
> Xiaoxiang Yu
>
> *发件人**: *hosur narahari <hn...@gmail.com>
> *答复**: *"user@kylin.apache.org" <us...@kylin.apache.org>
> *日期**: *2018年9月20日 星期四 11:43
> *收件人**: *"user@kylin.apache.org" <us...@kylin.apache.org>
> *主题**: *select with catalog fails
>
>
>
> Hi,
>
>
>
> I was using kylin JDBC driver to fetch data. By using DatabaseMetaData if
> we get catalog using getCatalogs() method, it return value
> "defaultCatalog". It returns actual hive schema when we execute
> getSchemas().
>
>
>
> According to JDBC contract, catalog.schema.table should be valid from
> clause and many query layers use that. But kylin fails when we execute that
> query.
>
> I've tried to write sample code piece for that below.
>
>
>
> *DatabaseMetaData db = conn.getMetaData();*
>
> *        ResultSet catalogSet = db.getCatalogs();*
>
> *        String catalog = "";*
>
> *        if(catalogSet.next()) {*
>
> *            catalog = catalogSet.getString("TABLE_CAT");*
>
> *        }*
>
> *        ResultSet schemaSet = db.getSchemas();*
>
> *        String schema = "";*
>
> *        if(schemaSet.next()) {*
>
> *            schema = schemaSet.getString("TABLE_SCHEM");*
>
> *        }*
>
> *        StringBuilder sb = new StringBuilder("SELECT * FROM ");*
>
> *        if(!catalog.isEmpty()) {*
>
> *            sb.append(catalog + ".");*
>
> *        }*
>
> *        if(!schema.isEmpty()) {*
>
> *            sb.append(schema + ".");*
>
> *        }*
>
> *        sb.append("kylin_sales limit 10");*
>
> *        String query = sb.toString();*
>
> *        Statement stat = conn.createStatement();*
>
> *        ResultSet rs = stat.executeQuery(query);*
>
> *        while(rs.next()) {*
>
> *            System.out.println(rs.getObject("trans_id"));*
>
> *        }*
>
>
>
> In short, the above snippet is executing the query,
>
> *select * from defaultCatalog.DEFAULT.kylin_sales.*
>
>
>
> Same thing happens even with different schemas if we have like,
>
> *select * from defaultCatalog.test.kylin_sales* also fails.
>
>
>
> Also if hive schema is anything other than default, then <schema
> name>.<table name> in from clause works. But with default schema it fails.
>
>
>
> Ex. If tables are in *test *schema of hive, then *select * from
> test.kylin_sales *works. But when the tables are in *default *schema of
> hive, then *select * from default.kylin_sales *fails.
>
>
>
> Please let me know if I'm doing something wrong.
>
>
>
> Best Regards,
>
> Hari
>

Re: select with catalog fails

Posted by Xiaoxiang Yu <xi...@kyligence.io>.
Hi,
I interested in your question and I check it carefully. I want to share what I found and I guess it will be a bit helpful.

It is a problem about how to realize/obey SQL standard and JDBC standard. You think JDBC standard/SQL standard should support [ [ catalogName . ] schemaName . ] tableName in from clause. I think it is right, I found a link(https://calcite.apache.org/docs/reference.html), apache calcite says they can resolve that pattern.

tablePrimary:
      [ [ catalogName . ] schemaName . ] tableName
      '(' TABLE [ [ catalogName . ] schemaName . ] tableName ')'
  |   tablePrimary [ EXTEND ] '(' columnDecl [, columnDecl ]* ')'
  |   [ LATERAL ] '(' query ')'
  |   UNNEST '(' expression ')' [ WITH ORDINALITY ]
  |   [ LATERAL ] TABLE '(' [ SPECIFIC ] functionName '(' expression [, expression ]* ')' ')'


And in JDBC interface these two methods as defined as followed: (java.sql.DatabaseMetaData)

[*  *  *  *  *  *  *  Retrieves the schema names available in this database.  are ordered by and  <P>The schema columns are:  COL >  <LI><B>TABLE_SCHEM</B> String {Ocode =>} schema name  The results  String {ä)code catalog name (may be <code>null</code>)  aparam catalog a catalog name; must match the catalog name as it is stored  in the database;  retrieves those without a catalog; null means catalog  name should not be used to narrow down the search.  aparam schemapattern a schema name; must match the schema name as it is  stored in the database; null means  schema name should not be used to narrow down the search.  areturn a object in which each row is a  schema description  Dexception SQLException if a database access error occurs  asee #getSearchStringEscape  asince 1.6  ResultSet —(string catalog, String schemapattern) throws SQLException;]


[* Retrieves the catalog names available in this database.  * are ordered by catalog name.  * catalog column is:  * String {acode =>} catalog name  The results  * areturn a object in which each row has a  single <code>String</code> column that is a catalog name  * aexception SQLException if a database access error occurs  ( ) throws SQLException;  ResultSet]

And I write a simple program to test behavior of MySQL JDBC driver and Kylin JDBC driver. Here are the code and output.
MySQL did not return empty when call getSchemas and return four items in getCatalogs


[public static void main(string[] args) throws Exception {  testDriver( •tariverNameSD "org.apache.kylin.jdbc.Driver" ,  " 7079/1acus" ,  cuserrqame:' "ADMIN", coassworoo "IQ'LIN");  testDriver( 'griverNametSD "com.mysql . jdbc . Driver" ,  " jdbc :mysql ://localhost: 3306" ,  " root" , "lacus" ) ;  public static void testDriver(String driverName, String url, String userName,  + driverName);  Class . forName(driverName) ;  Connection conn = DriverManager.getConnection(ur1, userName, password);  getMetaData( ) ;  DatabaseMetaData db  conn.  ResultSet catalogSet = db. getcatalogs()  String catalog = "  while (catalogset.next()) {  String password) throws Exception {  .getschemas(),  Resultset schemaset = db  String schema " ";  while (schemaset.next()) {  " + "TABLE_CATALOG"));  close()  conn .]



[/Library/Java/JavaVirtuaIMachines/jdk1.8.Ø_181. jdk/Contents/Home/bin/java .  Test  log4j •  log4j •  log4j •  TABLE  TABLE  org.apache.kylin.jdbc.Driver  •WARN No appenders could be found for  logger (org.apache.kylin.jdbc.KylinConnection) .  •WARQ Please initialize the log4j system properly.  •WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.  _CAT defaultCatalog  TABLE_SCHEM LACUS  _CATALOG defaultCatalog  Test  com.mysql.jdbc.Driver  Loading class 'com.mysql.jdbc.Driver'. This is deprecated. The new driver class is •com.mysql.cj.jdbc.Driver'. The driver is automatically regis  driver class is generally unnecessary.  Mon Sep 24 17:46:33 CST 2018 WAR•J: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5  connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServf  You  TABLE  TABLE  TABLE  TABLE  need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verificatit  _CAT  _CAT  _CAT  _CAT  information_schema  mysql  performance_schema  sys]


Here is code of how MySQL JDBC do.(com.mysql.jdbc.DatabaseMetadata)

[( ) throws SQLException {  public ResultSet  ResultSet results  null;  Statement stmt = nut Z;  try {  - this. conn.createstatement();  stmt -  stmt . setEscapeprocessing( false);  stmt. executeQuery( "SHOW DATABASES");  results =  ResultsetMetaData resultsMD = results.getMetaData(  fields  new Field( "  = new ArrayList();  ArrayList tuples  eolumnNamee "TABLE _ CAT" ,  (LßDcTypeVD 12 ,  resul tsMD. getCoIumnD  - new acolum  rowval  tuples. add(new ByteArrayRow(rowVa1, this. getExceptionIn  tuples);  Resultset var18 = this.  return var18;  finally {  if (results null) {  try {  results .  } catch (SQLException var16) {  Assertion Fai led Except ion . shouldNotHappen( var16) ;  eptor()));  results  = null;  if (stmt != null) {  try {  stmt  } catch (SQLException var15) {  Assert ion Failed Exception . shouldNotHappen( var15) ;  - null;  stmt -]



[public ResultSet  Field[] fields  hemas() throws SQLException {  = new Field( catablet•ämeo " " , "TABLE_SCHEM", CLBDCTYP%DI,  .:tengtmJJ O ) ,  new Field( ,  ArrayList tuples = new ArrayList();  ResultSet results -  tuples);  - this.  return results;]


Let us look how mysql store infomation of catalog/schema/table, It seems that MySQL JDBC did not use column CATALOG_NAME of table SCHEMATA.
[cid:image007.png@01D4543D.ECEA18D0]


My opinion:
So, MySQL JDBC did not follow standard strictly . As far as I concerned, kylin jdbc driver may not need to follow SQL standard so strictly.
You can avoid misleading/error message if you do not use  [ [ catalogName . ] schemaName . ] tableName  pattern in your from clause.

But I think Kylin should not return a ugly error stacktrace when user use that pattern. I think I may find a better way to avoid such ugly error stacktrace.


Link:
https://stackoverflow.com/questions/7022755/whats-the-difference-between-a-catalog-and-a-schema-in-a-relational-database
https://en.wikipedia.org/wiki/Database_catalog
http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt


If I mistake anything, please let me know.
Best Regards,
Xiaoxiang Yu
发件人: hosur narahari <hn...@gmail.com>
答复: "user@kylin.apache.org" <us...@kylin.apache.org>
日期: 2018年9月20日 星期四 11:43
收件人: "user@kylin.apache.org" <us...@kylin.apache.org>
主题: select with catalog fails

Hi,

I was using kylin JDBC driver to fetch data. By using DatabaseMetaData if we get catalog using getCatalogs() method, it return value "defaultCatalog". It returns actual hive schema when we execute getSchemas().

According to JDBC contract, catalog.schema.table should be valid from clause and many query layers use that. But kylin fails when we execute that query.
I've tried to write sample code piece for that below.

DatabaseMetaData db = conn.getMetaData();
        ResultSet catalogSet = db.getCatalogs();
        String catalog = "";
        if(catalogSet.next()) {
            catalog = catalogSet.getString("TABLE_CAT");
        }
        ResultSet schemaSet = db.getSchemas();
        String schema = "";
        if(schemaSet.next()) {
            schema = schemaSet.getString("TABLE_SCHEM");
        }
        StringBuilder sb = new StringBuilder("SELECT * FROM ");
        if(!catalog.isEmpty()) {
            sb.append(catalog + ".");
        }
        if(!schema.isEmpty()) {
            sb.append(schema + ".");
        }
        sb.append("kylin_sales limit 10");
        String query = sb.toString();
        Statement stat = conn.createStatement();
        ResultSet rs = stat.executeQuery(query);
        while(rs.next()) {
            System.out.println(rs.getObject("trans_id"));
        }

In short, the above snippet is executing the query,
select * from defaultCatalog.DEFAULT.kylin_sales.

Same thing happens even with different schemas if we have like,
select * from defaultCatalog.test.kylin_sales also fails.

Also if hive schema is anything other than default, then <schema name>.<table name> in from clause works. But with default schema it fails.

Ex. If tables are in test schema of hive, then select * from test.kylin_sales works. But when the tables are in default schema of hive, then select * from default.kylin_sales fails.

Please let me know if I'm doing something wrong.

Best Regards,
Hari