You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/04/05 23:37:41 UTC

[jira] [Commented] (BEAM-1892) Log process during size estimation in filebasedsource

    [ https://issues.apache.org/jira/browse/BEAM-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958030#comment-15958030 ] 

ASF GitHub Bot commented on BEAM-1892:
--------------------------------------

GitHub user sb2nov opened a pull request:

    https://github.com/apache/beam/pull/2445

    [BEAM-1892] File size estimation thresholding and process reporting

    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
    
     - [ ] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
           Travis-CI on your fork and ensure the whole test matrix passes).
     - [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [ ] If this contribution is large, please file an Apache
           [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
    
    ---
    
    R: @chamikaramj PTAL


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sb2nov/beam BEAM-1892-file-size-estimation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/2445.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2445
    
----
commit 507d8a4b35b35de79729f813500197ae920ce9a8
Author: Sourabh Bajaj <so...@google.com>
Date:   2017-04-05T23:36:18Z

    [BEAM-1892] File size estimation thresholding and process reporting

----


> Log process during size estimation in filebasedsource
> -----------------------------------------------------
>
>                 Key: BEAM-1892
>                 URL: https://issues.apache.org/jira/browse/BEAM-1892
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py
>            Reporter: Sourabh Bajaj
>            Assignee: Sourabh Bajaj
>
> http://stackoverflow.com/questions/43095445/how-to-iterate-all-files-in-google-cloud-storage-to-be-used-as-dataflow-input
> The user mentioned that there was no output and a huge delay in submitting the pipeline. The file size estimation process can be slow for really large datasets and this reports no process to the end user right now. We should be logging process and thresholding the pre submission size estimation as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)