You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@griffin.apache.org by chemikadze <gi...@git.apache.org> on 2018/11/17 22:38:52 UTC
[GitHub] incubator-griffin pull request #456: [GRIFFIN-213] Custom connector support
GitHub user chemikadze opened a pull request:
https://github.com/apache/incubator-griffin/pull/456
[GRIFFIN-213] Custom connector support
Provide ability to extend batch and streaming data integrations
with custom user-provided connectors. Introduces new data connector
type, `CUSTOM`, parameterized with `class` property. Also adds support
for custom data connector enum on service side.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/chemikadze/incubator-griffin GRIFFIN-213
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-griffin/pull/456.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #456
----
commit d487347a363f172cfc9e26225d5687cc3f95ab73
Author: Nikolay Sokolov <ch...@...>
Date: 2018-11-17T22:37:36Z
[GRIFFIN-213] Custom connector support
Provide ability to extend batch and streaming data integrations
with custom user-provided connectors. Introduces new data connector
type, `CUSTOM`, parameterized with `class` property. Also adds support
for custom data connector enum on service side.
----
---
[GitHub] incubator-griffin pull request #456: [GRIFFIN-213] Custom connector support
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-griffin/pull/456
---
[GitHub] incubator-griffin pull request #456: [GRIFFIN-213] Custom connector support
Posted by gavlyukovskiy <gi...@git.apache.org>.
Github user gavlyukovskiy commented on a diff in the pull request:
https://github.com/apache/incubator-griffin/pull/456#discussion_r234425707
--- Diff: measure/src/main/scala/org/apache/griffin/measure/datasource/connector/DataConnectorFactory.scala ---
@@ -84,6 +87,26 @@ object DataConnectorFactory extends Loggable {
}
}
+ private def getCustomConnector(session: SparkSession,
+ context: StreamingContext,
+ param: DataConnectorParam,
+ storage: TimestampStorage,
+ maybeClient: Option[StreamingCacheClient]): DataConnector = {
+ val className = param.getConfig("class").asInstanceOf[String]
+ val cls = Class.forName(className)
+ if (classOf[BatchDataConnector].isAssignableFrom(cls)) {
+ val ctx = BatchDataConnectorContext(session, param, storage)
+ val meth = cls.getDeclaredMethod("apply", classOf[BatchDataConnectorContext])
+ meth.invoke(null, ctx).asInstanceOf[BatchDataConnector]
+ } else if (classOf[StreamingDataConnector].isAssignableFrom(cls)) {
+ val ctx = StreamingDataConnectorContext(session, context, param, storage, maybeClient)
+ val meth = cls.getDeclaredMethod("apply", classOf[StreamingDataConnectorContext])
+ meth.invoke(null, ctx).asInstanceOf[StreamingDataConnector]
+ } else {
+ throw new ClassCastException("")
--- End diff --
It would be nice to have here message that custom connector class must extend `BatchDataConnector` or `StreamingDataConnector`
---
[GitHub] incubator-griffin pull request #456: [GRIFFIN-213] Custom connector support
Posted by chemikadze <gi...@git.apache.org>.
Github user chemikadze commented on a diff in the pull request:
https://github.com/apache/incubator-griffin/pull/456#discussion_r235266418
--- Diff: griffin-doc/measure/measure-configuration-guide.md ---
@@ -188,7 +188,7 @@ Above lists DQ job configure parameters.
- **sinks**: Whitelisted sink types for this job. Note: no sinks will be used, if empty or omitted.
### <a name="data-connector"></a>Data Connector
-- **type**: Data connector type, "AVRO", "HIVE", "TEXT-DIR" for batch mode, "KAFKA" for streaming mode.
+- **type**: Data connector type: "AVRO", "HIVE", "TEXT-DIR", "CUSTOM" for batch mode; "KAFKA", "CUSTOM" for streaming mode.
--- End diff --
ANY seems to have "auto-detect" meaning to it, while in fact it's just plugin mechanism.
What about CLASS, CLASSNAME, PLUGIN or EXTERNAL?
---
[GitHub] incubator-griffin pull request #456: [GRIFFIN-213] Custom connector support
Posted by toyboxman <gi...@git.apache.org>.
Github user toyboxman commented on a diff in the pull request:
https://github.com/apache/incubator-griffin/pull/456#discussion_r235585607
--- Diff: griffin-doc/measure/measure-configuration-guide.md ---
@@ -188,7 +188,7 @@ Above lists DQ job configure parameters.
- **sinks**: Whitelisted sink types for this job. Note: no sinks will be used, if empty or omitted.
### <a name="data-connector"></a>Data Connector
-- **type**: Data connector type, "AVRO", "HIVE", "TEXT-DIR" for batch mode, "KAFKA" for streaming mode.
+- **type**: Data connector type: "AVRO", "HIVE", "TEXT-DIR", "CUSTOM" for batch mode; "KAFKA", "CUSTOM" for streaming mode.
--- End diff --
EXTERNAL seems better, which means a third-party or vendor's data connector.
@guoyuepeng @chemikadze how do you think about?
---
[GitHub] incubator-griffin pull request #456: [GRIFFIN-213] Custom connector support
Posted by toyboxman <gi...@git.apache.org>.
Github user toyboxman commented on a diff in the pull request:
https://github.com/apache/incubator-griffin/pull/456#discussion_r234475362
--- Diff: griffin-doc/measure/measure-configuration-guide.md ---
@@ -188,7 +188,7 @@ Above lists DQ job configure parameters.
- **sinks**: Whitelisted sink types for this job. Note: no sinks will be used, if empty or omitted.
### <a name="data-connector"></a>Data Connector
-- **type**: Data connector type, "AVRO", "HIVE", "TEXT-DIR" for batch mode, "KAFKA" for streaming mode.
+- **type**: Data connector type: "AVRO", "HIVE", "TEXT-DIR", "CUSTOM" for batch mode; "KAFKA", "CUSTOM" for streaming mode.
--- End diff --
do you think about 'ANY' as replacement for 'CUSTOM'
**type**: Data connector type, "AVRO", "HIVE", "TEXT-DIR" for batch mode, "KAFKA" for streaming mode, "ANY" for boths.
---
[GitHub] incubator-griffin pull request #456: [GRIFFIN-213] Custom connector support
Posted by chemikadze <gi...@git.apache.org>.
Github user chemikadze commented on a diff in the pull request:
https://github.com/apache/incubator-griffin/pull/456#discussion_r235619493
--- Diff: griffin-doc/measure/measure-configuration-guide.md ---
@@ -188,7 +188,7 @@ Above lists DQ job configure parameters.
- **sinks**: Whitelisted sink types for this job. Note: no sinks will be used, if empty or omitted.
### <a name="data-connector"></a>Data Connector
-- **type**: Data connector type, "AVRO", "HIVE", "TEXT-DIR" for batch mode, "KAFKA" for streaming mode.
+- **type**: Data connector type: "AVRO", "HIVE", "TEXT-DIR", "CUSTOM" for batch mode; "KAFKA", "CUSTOM" for streaming mode.
--- End diff --
Let stick to CUSTOM then.
---
[GitHub] incubator-griffin pull request #456: [GRIFFIN-213] Custom connector support
Posted by toyboxman <gi...@git.apache.org>.
Github user toyboxman commented on a diff in the pull request:
https://github.com/apache/incubator-griffin/pull/456#discussion_r235621954
--- Diff: griffin-doc/measure/measure-configuration-guide.md ---
@@ -188,7 +188,7 @@ Above lists DQ job configure parameters.
- **sinks**: Whitelisted sink types for this job. Note: no sinks will be used, if empty or omitted.
### <a name="data-connector"></a>Data Connector
-- **type**: Data connector type, "AVRO", "HIVE", "TEXT-DIR" for batch mode, "KAFKA" for streaming mode.
+- **type**: Data connector type: "AVRO", "HIVE", "TEXT-DIR", "CUSTOM" for batch mode; "KAFKA", "CUSTOM" for streaming mode.
--- End diff --
sure, please go ahead
---
[GitHub] incubator-griffin pull request #456: [GRIFFIN-213] Custom connector support
Posted by chemikadze <gi...@git.apache.org>.
Github user chemikadze commented on a diff in the pull request:
https://github.com/apache/incubator-griffin/pull/456#discussion_r234426015
--- Diff: measure/src/main/scala/org/apache/griffin/measure/datasource/connector/DataConnectorFactory.scala ---
@@ -84,6 +87,26 @@ object DataConnectorFactory extends Loggable {
}
}
+ private def getCustomConnector(session: SparkSession,
+ context: StreamingContext,
+ param: DataConnectorParam,
+ storage: TimestampStorage,
+ maybeClient: Option[StreamingCacheClient]): DataConnector = {
+ val className = param.getConfig("class").asInstanceOf[String]
+ val cls = Class.forName(className)
+ if (classOf[BatchDataConnector].isAssignableFrom(cls)) {
+ val ctx = BatchDataConnectorContext(session, param, storage)
+ val meth = cls.getDeclaredMethod("apply", classOf[BatchDataConnectorContext])
+ meth.invoke(null, ctx).asInstanceOf[BatchDataConnector]
+ } else if (classOf[StreamingDataConnector].isAssignableFrom(cls)) {
+ val ctx = StreamingDataConnectorContext(session, context, param, storage, maybeClient)
+ val meth = cls.getDeclaredMethod("apply", classOf[StreamingDataConnectorContext])
+ meth.invoke(null, ctx).asInstanceOf[StreamingDataConnector]
+ } else {
+ throw new ClassCastException("")
--- End diff --
Oh, thanks for reminding! Planned to do that, but got distracted.
---
[GitHub] incubator-griffin pull request #456: [GRIFFIN-213] Custom connector support
Posted by guoyuepeng <gi...@git.apache.org>.
Github user guoyuepeng commented on a diff in the pull request:
https://github.com/apache/incubator-griffin/pull/456#discussion_r235603134
--- Diff: griffin-doc/measure/measure-configuration-guide.md ---
@@ -188,7 +188,7 @@ Above lists DQ job configure parameters.
- **sinks**: Whitelisted sink types for this job. Note: no sinks will be used, if empty or omitted.
### <a name="data-connector"></a>Data Connector
-- **type**: Data connector type, "AVRO", "HIVE", "TEXT-DIR" for batch mode, "KAFKA" for streaming mode.
+- **type**: Data connector type: "AVRO", "HIVE", "TEXT-DIR", "CUSTOM" for batch mode; "KAFKA", "CUSTOM" for streaming mode.
--- End diff --
I still think 'custom' better than others.
---
[GitHub] incubator-griffin pull request #456: [GRIFFIN-213] Custom connector support
Posted by guoyuepeng <gi...@git.apache.org>.
Github user guoyuepeng commented on a diff in the pull request:
https://github.com/apache/incubator-griffin/pull/456#discussion_r235226675
--- Diff: griffin-doc/measure/measure-configuration-guide.md ---
@@ -188,7 +188,7 @@ Above lists DQ job configure parameters.
- **sinks**: Whitelisted sink types for this job. Note: no sinks will be used, if empty or omitted.
### <a name="data-connector"></a>Data Connector
-- **type**: Data connector type, "AVRO", "HIVE", "TEXT-DIR" for batch mode, "KAFKA" for streaming mode.
+- **type**: Data connector type: "AVRO", "HIVE", "TEXT-DIR", "CUSTOM" for batch mode; "KAFKA", "CUSTOM" for streaming mode.
--- End diff --
CUSTOM looks good to me.
---