You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Attila Simon (JIRA)" <ji...@apache.org> on 2016/07/01 15:54:11 UTC

[jira] [Commented] (FLUME-2938) JDBC Source

    [ https://issues.apache.org/jira/browse/FLUME-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359170#comment-15359170 ] 

Attila Simon commented on FLUME-2938:
-------------------------------------

As it turned out I'm a big fan of the "do one thing well" design. My concern is duplicating work should be avoided so would be good to know what would be the additional functionality. 

On the other hand flume is for streaming data and the source you mentioned should have a scheduler. Is this really a functionality flume should provide? Or only a little tweak in Sqoop is required if there is any at all.

> JDBC Source
> -----------
>
>                 Key: FLUME-2938
>                 URL: https://issues.apache.org/jira/browse/FLUME-2938
>             Project: Flume
>          Issue Type: New Feature
>          Components: Sinks+Sources
>    Affects Versions: v1.8.0
>            Reporter: Lior Zeno
>             Fix For: v1.8.0
>
>
> The idea is to allow migrating data from SQL stores to NoSQL stores or HDFS for archiving purposes.
> This source will get a statement to execute and a scheduling policy. It will be able to fetch timestamped data by performing range queries on a configurable field (this can fetch data with incremental id as well). For fault-tolerance, the last fetched value can be checkpointed to a file.
> Dealing with large datasets can be done via the fetch_size parameter. (Ref: https://docs.oracle.com/cd/A87860_01/doc/java.817/a83724/resltse5.htm)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)