You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@griffin.apache.org by "ishan verma (Jira)" <ji...@apache.org> on 2020/08/06 12:40:00 UTC

[jira] [Commented] (GRIFFIN-332) JDBC Connector: Ability to Select Specific Columns Instead of All the Columns

    [ https://issues.apache.org/jira/browse/GRIFFIN-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172322#comment-17172322 ] 

ishan verma commented on GRIFFIN-332:
-------------------------------------

hi [~obaid] ,

I am currently working on data quality POC using griffin.

So far till now everything is working fine using HIVE  as a data source, but there is new requirement to add mysql as source.

 

I have tried every possible way to have *mysql* as custom data connector but its not working. Measure is getting created and job is going to successful but griffin showing *NO CONTENT* on ui . below is my code for that:-
"data.sources": [
    {
      "name": "source",
      "connectors": [
        {
        "name": "source1595488803031",  
        "type": "CUSTOM",
        "data.unit": "1day",
        "data.time.zone": "",
        "config": 

{           "class": "org.apache.griffin.measure.datasource.connector.batch.MySqlDataConnector",           "database": "griffin_poc",           "tablename": "person_src",           "url": "jdbc:mysql://griffin:3306/griffin_poc",           "user": "test_u",           "password": "test_p",           "driver": "com.mysql.jdbc.Driver"         }

        }
can you please provide some valuable suggestions on this , how to use *mysql/jdbc* as my datasource as it is very critical for my POC. Its an urgent issue.
Anything i am missing  here to link with  mysql, please guide me through this.

It would be great if you can provide one sample on this.

I have also used 0.6 latest commit sql and jdbc class connector but still its showing no content in UI
 Also i have setup mysql on ec2 instance using griffin docker image.

 

[~obaid] request you to please provide your valuable inputs on this if you have done similar setup like that. very much critical for my poc.
 
 Any leads will be appreciated.
Thanks:)

> JDBC Connector: Ability to Select Specific Columns Instead of All the Columns
> -----------------------------------------------------------------------------
>
>                 Key: GRIFFIN-332
>                 URL: https://issues.apache.org/jira/browse/GRIFFIN-332
>             Project: Griffin
>          Issue Type: Improvement
>          Components: accuracy-batch
>    Affects Versions: 0.6.0
>            Reporter: Obaidul Karim
>            Priority: Major
>              Labels: columns, jdbc
>
> *Background:*
>  Thanks to https://issues.apache.org/jira/browse/GRIFFIN-315, we already have JDBC connector.
>  However, currently, it is pulling all the columns using`"SELECT * FROM $fullTableName"`.
>  It will cause some issues for larger JDBC tables -
>  - memory overhead for spark data frame
>  - longer execution time
>  - resource overhear for RDBMS
> *Proposed Improvement:*
>  So, I propose the feature to allow JDBC connector to able to select only required columns.
> *Example:*
>  We have a rule `"rule":"src.id = tgt.id and src.country = tgt.country "`. Then we only need two columns `id` and 'country'.
>  So, in connector we can add additional clause `columns` to select only required columns, like below:
>  
> {code:java}
> {   "name":"src",
>    "connector":{      "type":"jdbc",
>       "config":{         "database":"mydatabase",
>          "tablename":"mytable",
>          "columns":"id, country",
>          "url":"jdbc:sqlserver://myhost:1433;databaseName=mydatabase",
>          "user":"user",
>          "password":"password",
>          "driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
>          "where":""
>       }
>    }
> }
> {code}
> We can implement it like this, if there is `columns` clause then use it otherwise use `*` as default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)