You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Ilya Kozyrev <il...@akvelon.com> on 2020/11/03 17:36:45 UTC

Re: [Proposal] Add a new Beam example to ingest data from Kafka to Pub/Sub

Hi Beam Community,

Could someone please take a look at PR<https://github.com/apache/beam/pull/13112> related to this issue<https://issues.apache.org/jira/browse/BEAM-11065>?
We are looking forward to reviewing and approval our PR.

Your feedback will help us a lot.

Thank you,
Ilya
On 26 Oct 2020, 17:25 +0300, Ilya Kozyrev Akvelon <il...@akvelon.com>, wrote:
Hi everyone,

We completed the development of the template.
Could someone kindly help with reviewing the PR<https://github.com/apache/beam/pull/13112>?

Thank you,
Ilya
On 16 Oct 2020, 17:25 +0300, Ilya Kozyrev Akvelon <il...@akvelon.com>, wrote:
Thanks a lot for your comments!

I see your point regarding the location of such templates.
I moved the templates folder under the examples folder. Currently it looks like /examples/templates/java/kafka-to-pubsub.

Does it look better?

Thank you,
Ilya
On 15 Oct 2020, 20:30 +0300, Kyle Weaver <kc...@google.com>, wrote:
I agree with Kenn. Since any pipeline can be made into a template, it doesn't really make sense to have a separate "templates" directory. Based on a quick skim of your PR, the only thing that's specific to Dataflow templates is the instructions.

Gradle does not make subprojects inherit dependencies from their parents. So you could move your subproject under the `examples/java` directory and it should still build exactly the same.

On Thu, Oct 15, 2020 at 7:34 AM Ilya Kozyrev <il...@akvelon.com>> wrote:
Hi!

Thanks for joining the discussion a lot!

By creating new templates folder in the Beam we try to make templates generic. As it seems to us, another important reason for adding a new folder is Gradle build. We will configure Gradle build directly for templates, without a necessity to build all examples, and without unused dependencies.

Creating subfolders for different runners makes a sense, however, the proposed template could be running on the different runners, and GCP here is an optional approach than a requirement.

I suggest adding a subfolder for different languages, under the /templates e.g.: /templates/python, /templates/java, etc. If we will need more differences between templates we can add new subfolders for specific cases in the future.

Thank you,
Ilya
On 15 Oct 2020, 07:30 +0300, Reza Ardeshir Rokni <ra...@gmail.com>>, wrote:
Just a thought, but what if in the future there were templates for other runners?

Then having a template folder would fit nicely no? We could even have a runner specific subfolder and maybe even a shared area for things that could be used by all templates for all runners?

On Thu, 15 Oct 2020 at 11:47, Kenneth Knowles <ke...@apache.org>> wrote:
Hi Ilya,

I have added you to the "Contributors" role on Jira so you can be assigned tickets, and given you the ticket you filed since you are already solving it. Thanks!

I have a very high level thought: Since Dataflow's "Flex Templates" feature is just any pipeline, perhaps the main pipeline can be more of an "example" and fit into the `examples/` folder? Then the containerization and Google-specific* JSON could be alongside. In this way, users of other runners could possibly use or learn from it even if they are not interested in GCP. I understand this is not your primary goal, considering the contribution. I just want to open this for discussion.

Kenn

*In fact, the JSON is very generic. It is not really "Google specific" in concept, just in practice.

On Wed, Oct 14, 2020 at 12:14 PM Ilya Kozyrev <il...@akvelon.com>> wrote:
Hi Beam Community,

There was no feedback on the proposal, and I would like to submit PR for this proposal.

I created a JIRA improvement<https://issues.apache.org/jira/browse/BEAM-11065> to track this proposal and now submitting  PR<https://github.com/apache/beam/pull/13112> in the Beam repository related to the proposal that I sent before. We suggest adding /template folder to the repository root to help discover templates by developers. This will provide structure for future templates development for Beam.

Could someone kindle help with reviewing the PR<https://github.com/apache/beam/pull/13112> ?

Thank you,
Ilya

On 7 Oct 2020, at 21:23, Ilya Kozyrev <il...@akvelon.com>> wrote:

Hi Beam Community,

I have a proposal to add Apache Beam example that is a template to ingest data from Apache Kafka to Google Cloud Pub/Sub. More detailed information about the proposed template can be found in README<https://github.com/akvelon/beam/blob/KafkaToPubsubTemplate/templates/README.md> file, and a prototype<https://github.com/akvelon/beam/pull/1> was built with a team. I'd like to ask for your feedback before moving forward with finishing the template.

I did not see a folder that provides easily discoverable templates to a developer.  I would like to propose adding a "templates" folder where other Apache Beam templates may be added in the future. E.g., beam/templates/java/kafka-to-pubsub could be used for the Kafka to Pub/Sub template.

Please share your feedback/comments about this proposal in the thread.

Thank you,
Ilya