You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Pulasthi Supun Wickramasinghe <pu...@gmail.com> on 2020/02/07 17:42:05 UTC

Re: Executing the runner validation tests for the Twister2 runner

Hi Kenn

Thanks for the information, Will add information accordingly and update the
community.

Best Regards,
Pulasthi

On Wed, Jan 29, 2020 at 8:28 AM Kenneth Knowles <ke...@apache.org> wrote:

> In my opinion it is fine to add the documentation after the runner is
> added. I do think we should have input from more members of the community
> about accepting the donation. Since there is time here are places where you
> should add information about the runner:
>
> https://beam.apache.org/documentation/runners/capability-matrix/ source
> data at
> https://github.com/apache/beam/blob/master/website/src/_data/capability-matrix.yml
>
> https://beam.apache.org/documentation/runners/twister2 (new page - see
> the pages for other runners)
>
> https://beam.apache.org/get-started/quickstart-java/ has some snippets
> with toggles per runner
>
> https://beam.apache.org/roadmap/ has the roadmaps for different runners.
> For a new runner especially this could be helpful for users.
>
> Kenn
>
> On Sun, Jan 12, 2020 at 9:36 AM Pulasthi Supun Wickramasinghe <
> pulasthi911@gmail.com> wrote:
>
>> Hi Kenn,
>>
>> Is there any documentation that needs to accompany the new runner in the
>> pull request or is the documentation added after the pull request is
>> approved?.  I would be great if you can point me in the right direction
>> regarding this.
>>
>> Best Regards,
>> Pulasthi
>>
>> On Mon, Jan 6, 2020 at 9:56 PM Pulasthi Supun Wickramasinghe <
>> pulasthi911@gmail.com> wrote:
>>
>>> Hi Kenn,
>>>
>>>
>>>
>>> On Mon, Jan 6, 2020 at 9:09 PM Kenneth Knowles <ke...@apache.org> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Jan 6, 2020 at 8:30 AM Pulasthi Supun Wickramasinghe <
>>>> pulasthi911@gmail.com> wrote:
>>>>
>>>>> Hi Kenn,
>>>>>
>>>>> I was able to solve the problem mentioned above, I am currently
>>>>> running the "ValidatesRunner" tests, I have around 4-5 tests that are
>>>>> failing that I should be able to fix in a couple of days. I wanted to check
>>>>> the next steps I would need to take after all the "ValidatesRunner" tests
>>>>> are passing. I assume that the runner does not need to pass all the
>>>>> "NeedsRunner" tests.
>>>>>
>>>>
>>>> That's correct. You don't need to run the NeedsRunner tests. Those are
>>>> tests of the core SDK's functionality, not the runner. The annotation is a
>>>> workaround for the false cycle in deps "SDK tests" -> "direct runner" ->
>>>> "SDK", which to maven looks like a cyclic dependency. You should run on the
>>>> ValidatesRunner tests. It is fine to also disable some of them, either by
>>>> excluding categories or test classes, or adding new categories to exclude.
>>>>
>>>>
>>>>> The runner is only implemented for the batch mode at the moment
>>>>> because the higher-level API's for streaming on Twister2 are still being
>>>>> finalized. Once that work is done we will add streaming support for the
>>>>> runner as well.
>>>>>
>>>>
>>>> Nice! Batch-only is perfectly fine for a runner. You should be able to
>>>> detect and reject pipelines that the runner cannot execute.
>>>>
>>>
>>> I will make sure that the capability is there.
>>>
>>>
>>>> I re-read the old thread, but I may have missed the answer to a
>>>> fundamental question. Just to get it clear on the mailing list: are you
>>>> intending to submit the runner's code to Apache Beam and ask the community
>>>> to maintain it?
>>>>
>>>> To answer the original question the issue was with forwarding
>>> exceptions that happen during execution since Twister2 has a
>>> distributed model for execution, I added the ability in the Twister2 side
>>> so that the Twister2 job submitting client will receive a job state object
>>> that contains any exceptions thrown during runtime once the job is
>>> completed.
>>>
>>> And about maintaining the twister2 runner, we would like to submit the
>>> runner to the beam codebase but the Twister2 team will maintain and update
>>> it continuously, in that case, we would become part of the Beam community I
>>> suppose. And any contributions from other members of the community are more
>>> than welcome. I hope that answers your question.
>>>
>>> Best Regards,
>>> Pulasthi
>>>
>>>
>>>> Kenn
>>>>
>>>>
>>>> Best Regards,
>>>>> Pulasthi
>>>>>
>>>>> On Thu, Dec 12, 2019 at 11:27 AM Pulasthi Supun Wickramasinghe <
>>>>> pulasthi911@gmail.com> wrote:
>>>>>
>>>>>> Hi Kenn
>>>>>>
>>>>>> We are still working on aspects like automated job monitoring so
>>>>>> currently do not have those capabilities built-in. I discussed with the
>>>>>> Twister2 team on a way we can forward failure information from the workers
>>>>>> to the Jobmaster which would be a solution to this problem. It might take a
>>>>>> little time to develop and test. I will update you after looking into that
>>>>>> solution in a little more detail.
>>>>>>
>>>>>> Best Regards,
>>>>>> Pulasthi
>>>>>>
>>>>>> On Wed, Dec 11, 2019 at 10:51 PM Kenneth Knowles <ke...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I dug in to Twister2 a little bit to understand the question better,
>>>>>>> checking how the various resource managers / launchers are plumbed.
>>>>>>>
>>>>>>> How would a user set up automated monitoring for a job? If that is
>>>>>>> scraping the logs, then it seems unfortunate for users, but I think the
>>>>>>> Beam runner would naturally use whatever a user might use.
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> On Wed, Dec 11, 2019 at 10:45 AM Pulasthi Supun Wickramasinghe <
>>>>>>> pulasthi911@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Dev's
>>>>>>>>
>>>>>>>> I have been making some progress on the Twister2 runner for the
>>>>>>>> beam that I mentioned before on the mailing list. The runner is able to run
>>>>>>>> the wordcount example and produce correct results. So I am currently trying
>>>>>>>> to run the runner validation tests.
>>>>>>>>
>>>>>>>> From what I understood looking at a couple examples is that tests
>>>>>>>> are validated based on the exceptions that are thrown (or not) during test
>>>>>>>> runtime.  However in Twister2 currently the job submission client does not
>>>>>>>> get failure information such as exceptions back once the job is submitted.
>>>>>>>> These are however recorded in the worker log files.
>>>>>>>>
>>>>>>>> So in order to validate the tests for Twister2 I would have to
>>>>>>>> parse the worker logfile and check what exceptions are in the logs. Would
>>>>>>>> that be an acceptable solution for the validation tests?
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Pulasthi
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Pulasthi S. Wickramasinghe
>>>>>>>> PhD Candidate  | Research Assistant
>>>>>>>> School of Informatics and Computing | Digital Science Center
>>>>>>>> Indiana University, Bloomington
>>>>>>>> cell: 224-386-9035 <(224)%20386-9035>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Pulasthi S. Wickramasinghe
>>>>>> PhD Candidate  | Research Assistant
>>>>>> School of Informatics and Computing | Digital Science Center
>>>>>> Indiana University, Bloomington
>>>>>> cell: 224-386-9035 <(224)%20386-9035>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pulasthi S. Wickramasinghe
>>>>> PhD Candidate  | Research Assistant
>>>>> School of Informatics and Computing | Digital Science Center
>>>>> Indiana University, Bloomington
>>>>> cell: 224-386-9035 <(224)%20386-9035>
>>>>>
>>>>
>>>
>>> --
>>> Pulasthi S. Wickramasinghe
>>> PhD Candidate  | Research Assistant
>>> School of Informatics and Computing | Digital Science Center
>>> Indiana University, Bloomington
>>> cell: 224-386-9035 <(224)%20386-9035>
>>>
>>
>>
>> --
>> Pulasthi S. Wickramasinghe
>> PhD Candidate  | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>> cell: 224-386-9035 <(224)%20386-9035>
>>
>

-- 
Pulasthi S. Wickramasinghe
PhD Candidate  | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
cell: 224-386-9035