You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/09/06 06:21:20 UTC

[jira] [Commented] (BEAM-553) Add a custom text source

    [ https://issues.apache.org/jira/browse/BEAM-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15466569#comment-15466569 ] 

ASF GitHub Bot commented on BEAM-553:
-------------------------------------

GitHub user chamikaramj opened a pull request:

    https://github.com/apache/incubator-beam/pull/920

    [BEAM-553] Adds a text source for Python SDK.

    Current text source (fileio.TextFileSource) is specific to Dataflow runner. This adds a runner independent TextSource that is based on iobase.BoundedSource interface.
    
    Adds a textio module that contains text source, text sink, and PTransforms that can be used to read and write text files.
    
    Adds a significant number of tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chamikaramj/incubator-beam text_source

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-beam/pull/920.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #920
    
----
commit bb1ff90307b563656a54731cada05e41cd9e82b8
Author: Chamikara Jayalath <ch...@google.com>
Date:   2016-08-30T01:08:46Z

    Adds a text source to Python SDK.

----


> Add a custom text source
> ------------------------
>
>                 Key: BEAM-553
>                 URL: https://issues.apache.org/jira/browse/BEAM-553
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py
>            Reporter: Chamikara Jayalath
>            Assignee: Chamikara Jayalath
>
> Currently, the text source implementation available for Python SDK [1] is a Dataflow native source which only works efficiently for Dataflow runner. We should add a custom text source on top of custom file-based source framework [2] so that other runner implementations can potentially use the same text source implementation.
> Custom text source implementation for Java SDK is at [3].
> [1] https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/fileio.py#L70
> [2] https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/filebasedsource.py
> [3] https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L745



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)