You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/06 14:39:00 UTC

[jira] [Commented] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

    [ https://issues.apache.org/jira/browse/BEAM-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240377#comment-16240377 ] 

ASF GitHub Bot commented on BEAM-3060:
--------------------------------------

GitHub user lgajowy opened a pull request:

    https://github.com/apache/beam/pull/4081

    [BEAM-3060] Adds TextIOIT for DirectRunner and local filesystem

    This is one of multiple commits to resolve the 3060 issue. Currently only local filesystem,
    relatively small datasets and DirectRunner are supported. More runners, filesystems
    and larger dataset testing ability (of gigabytes size) will be added soon in further commits.
    
    See: https://docs.google.com/document/d/1dA-5s6OHiP_cz-NRAbwapoKF5MEC1wKps4A5tFbIPKE/edit#
    
    Follow this checklist to help us incorporate your contribution quickly and easily:
    
     - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it).  Trivial changes like typos do not require a JIRA issue.  Your pull request should address just this issue, without pulling in other changes.
     - [ ] Each commit in the pull request should have a meaningful subject line and body.
     - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue.
     - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
     - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
     - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
    
    ---


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lgajowy/beam text-io-it

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/4081.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4081
    
----
commit c6c7070ad92424707d3720d3a4dc2c0fb6961440
Author: Ɓukasz Gajowy <lu...@polidea.com>
Date:   2017-10-31T09:25:22Z

    [BEAM-3060] Adds TextIOIT for DirectRunner and local filesystem
    
    This is one of multiple commits to resolve the 3060 issue. Currently only local filesystem,
    relatively small datasets and DirectRunner are supported. More runners, filesystems
    and larger dataset testing ability (of gigabytes size) will be added soon in further commits.
    
    See: https://docs.google.com/document/d/1dA-5s6OHiP_cz-NRAbwapoKF5MEC1wKps4A5tFbIPKE/edit#

----


> Add performance tests for commonly used file-based I/O PTransforms
> ------------------------------------------------------------------
>
>                 Key: BEAM-3060
>                 URL: https://issues.apache.org/jira/browse/BEAM-3060
>             Project: Beam
>          Issue Type: Test
>          Components: sdk-java-core
>            Reporter: Chamikara Jayalath
>            Assignee: Szymon Nieradka
>
> We recently added a performance testing framework [1] that can be used to do following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used file-based I/O PTransforms using this framework. I suggest looking into following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) easily.
> In the initial version, tests can be made manually triggerable for PRs through Jenkins. Later, we could make some of these tests run periodically and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)