You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Gwen Shapira (JIRA)" <ji...@apache.org> on 2015/07/27 08:27:04 UTC
[jira] [Issue Comment Deleted] (KAFKA-2365) Copycat checklist

     [ https://issues.apache.org/jira/browse/KAFKA-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gwen Shapira updated KAFKA-2365:
--------------------------------
    Comment: was deleted

(was: I keep waiting for [~ewencp] to create the "log connector" jira so I can add specific requirements, but its getting late, so I'm adding them here for reference and will move the comment to the right place tomorrow :)

Top log features missing from Flume (more or less in order of importance):
* Ability to safely tail a log (vs. only ingest completed files)
* rename / move file after done ingesting
* recursively ingest from sub-directories
* handling gzipped files (i.e. unzip and split into records)
* pulling from FTP / http / SFTP
* some intelligence regarding record splitting (support for CSV and JSON is pretty high on the list)
* conversion to Avro (duh!), bonus if the schema can be magically inferred and lated editted.)

> Copycat checklist
> -----------------
>
>                 Key: KAFKA-2365
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2365
>             Project: Kafka
>          Issue Type: New Feature
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Ewen Cheslack-Postava
>              Labels: feature
>             Fix For: 0.8.3
>
>
> This covers the development plan for [KIP-26|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767]. There are a number of features that can be developed in sequence to make incremental progress, and often in parallel:
> * Initial patch - connector API and core implementation
> * Runtime data API
> * Standalone CLI
> * REST API
> * Distributed copycat - CLI
> * Distributed copycat - coordinator
> * Distributed copycat - config storage
> * Distributed copycat - offset storage
> * Log/file connector (sample source/sink connector)
> * Elasticsearch sink connector (sample sink connector for full log -> Kafka -> Elasticsearch sample pipeline)
> * Copycat metrics
> * System tests (including connector tests)
> * Mirrormaker connector
> * Copycat documentation
> This is an initial list, but it might need refinement to allow for more incremental progress and may be missing features we find we want before the initial release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)