You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by "Srikanth Sundarrajan (JIRA)" <ji...@apache.org> on 2013/08/09 17:29:47 UTC
[jira] [Commented] (FALCON-36) Ability to ingest data from databases

    [ https://issues.apache.org/jira/browse/FALCON-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734902#comment-13734902 ] 

Srikanth Sundarrajan commented on FALCON-36:
--------------------------------------------

Hi Venkatesh, Thanks for this fairly large contribution. Few things that caught my eye during review:
1. AbstractWorkflowEngine: Would prefer that we take lifeCycle as an enum value instead of a String. 
2. AbstractInstanceManager: lifeCycle value's validity is not checked uniformly across all methods (ex. getRunningInstances)
3. acquisition-feed-0.1.xml (test xml): The example includes database in both source & target cluster. This is a bit unclear. Wouldn't target be a replication target, if so, what is the relevance of a database section in the target cluster ?
4. acquistion-feed-template.xml: Has a commented section. Can we remove that if not necessary.
5. ClassLoaderForJarOnHDFS: Did you mean classLoaderCache.putIfAbsent(databaseName, jarsOnHdfsClassLoader); ?
6. database-acquisition-workflow.xml: Should we forcibly turn off speculative execution for this job ?, FALCON-67 has removed dependency on ant.jar. That should be applied to this workflow definition as well.
7. database-export-coordinator.xml: FALCON-47 fixes issues relating to feed generation lag and replication. Same issues may apply to export functionality as well.
8. database-template.xml: Refers to mysql-connector-java. Are these expected to be added to the oozie shared lib by administrator or is it included by falcon ?
9. DatabaseHelper: fetchPasswordFromFile - should run as feed owner. The current implementation would break if the password file is stored with 0400 permissions. getWrite & Read end point functions are different only by interface type, can they be merged and parameterized ?
10. DatabaseEntityParser: Can we avoid creating dir directly under /tmp, instead create something like /tmp/falcon/dbjars/<database name> or something of this sort. With the current implementation, there might be a collision, as the chances of a directory (or even worse a file) existing there with the same name is high.
11. DatabaseIT:         Thread.sleep(5000); is this intentional or accidentally left behind ? Also some commented out code 
12. EmbeddedDatabase: Commented code left behind. log statement are printed on System.out. hsqldb is already a dependency in the project (child dependency of hadoop). Can we use that instead of derby to avoid additional dependency. Dont have a firm opinion on this one though.
13. EntityInstanceMessage: I see addition for acquire, what about export ?
14. FalconCLI.twiki: Where we mention usage, database needs to be added (along side cluster)
15. FeedEntityParser: Shouldn't database section be disallowed in feed/target section ?


                
> Ability to ingest data from databases
> -------------------------------------
>
>                 Key: FALCON-36
>                 URL: https://issues.apache.org/jira/browse/FALCON-36
>             Project: Falcon
>          Issue Type: Improvement
>          Components: acquisition
>    Affects Versions: 0.3
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkatesh Seetharam
>         Attachments: FALCON-36.patch, FALCON-36.rebase.patch, FALCON-36.review.patch
>
>
> Attempt to address data import from RDBMS into hadoop and export of data from Hadoop into RDBMS. The plan is to use sqoop 1.x to materialize data motion from/to RDBMS to/from HDFS. Hive will not be integrated in the first pass until Falcon has a first class integration with HCatalog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira