You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/07/07 04:49:10 UTC

[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types

    [ https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365604#comment-15365604 ] 

ASF GitHub Bot commented on BEAM-360:
-------------------------------------

GitHub user chamikaramj opened a pull request:

    https://github.com/apache/incubator-beam/pull/599

    [BEAM-360] Some updates related to dynamic work rebalancing of custom sources.

    Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing results of custom sources.
    
    Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources.
    
    Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chamikaramj/incubator-beam custom_sources_dwr

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-beam/pull/599.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #599
    
----
commit e51d4acf12133a79671c567c9ff709c941c54f8c
Author: Chamikara Jayalath <ch...@apache.org>
Date:   2016-06-21T01:09:50Z

    Implements a framework for developing sources for new file types.
    
    Module 'filebasedsource' provides a framework for  creating sources for new file types. This framework readily implements several features common to many sources based on files.
    
    Additionally, module 'avroio' contains a new source, 'AvroSource', that is implemented using the framework described above. 'AvroSource' is a source for reading Avro files.
    
    Adds many unit tests for 'filebasedsource' and 'avroio' modules.

commit cacb613448b47592f8415570f7b64bc6de797f91
Author: Chamikara Jayalath <ch...@apache.org>
Date:   2016-07-07T03:25:04Z

    Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources.
    
    Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources.
    
    Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'.

commit 264b4afc17c255e568a490e02ce47e9fb4b1e17a
Author: Chamikara Jayalath <ch...@apache.org>
Date:   2016-07-07T03:34:21Z

    Adds more comments.

commit 49e097f9c5c3d8c2bca48d3416b4934a4d86ed34
Author: Chamikara Jayalath <ch...@google.com>
Date:   2016-07-07T04:41:06Z

    Some updates related to dynamic work rebalancing custom sources.
    
    Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources.
    
    Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources.
    
    Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'.

commit c9696c9e17c9c7a6fc13d53d4da21ac9b325c73c
Author: Chamikara Jayalath <ch...@google.com>
Date:   2016-07-07T04:41:20Z

    Some updates related to dynamic work rebalancing custom sources.
    
    Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources.
    
    Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources.
    
    Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'.

----


> Add a framework for creating Python-SDK sources for new file types
> ------------------------------------------------------------------
>
>                 Key: BEAM-360
>                 URL: https://issues.apache.org/jira/browse/BEAM-360
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py
>            Reporter: Chamikara Jayalath
>            Assignee: Chamikara Jayalath
>
> We already have a framework for creating new sources for Beam Python SDK - https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326
> It would be great if we can add a framework on top of this that encapsulates logic common to sources that are based on files. This framework can include following features that are common to sources based on files.
> (1) glob expansion
> (2) support for new file-systems
> (3) dynamic work rebalancing based on byte offsets
> (4) support for reading compressed files.
> Java SDK has a similar framework and it's available at - https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)