You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gsoc@community.apache.org by "Miko Aro (Jira)" <ji...@apache.org> on 2024/04/01 16:36:00 UTC

[jira] [Commented] (GSOC-259) [GSOC][Beam] Build out Beam Use Cases

    [ https://issues.apache.org/jira/browse/GSOC-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832893#comment-17832893 ] 

Miko Aro commented on GSOC-259:
-------------------------------

Will the implementation for the integration be based on Java and Python only? Or will it involve some other programming languages supported by the Apache Beam?

> [GSOC][Beam] Build out Beam Use Cases
> -------------------------------------
>
>                 Key: GSOC-259
>                 URL: https://issues.apache.org/jira/browse/GSOC-259
>             Project: Comdev GSOC
>          Issue Type: New Feature
>            Reporter: Danny McCormick
>            Priority: Major
>              Labels: Beam, gsoc, gsoc2024
>
> Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends. On top of providing lower level primitives, Beam has also introduced several higher level transforms used for machine learning and some general data processing use cases. This project focuses on identifying and implementing real world use cases that use these transforms
> Objectives:
> 1. Add real world use cases demonstrating Beam's MLTransform for preprocessing data and generating embeddings
> 2. Add real world use cases demonstrating Beam's Enrichment transform for enriching existing data with data from a slowly changing source.
> 3. (Stretch) Implement 1 or more additional "enrichment handlers" for interacting with currently unsupported sources
> Useful links:
> Apache Beam repo - [https://github.com/apache/beam]
> MLTransform docs - [https://beam.apache.org/documentation/transforms/python/elementwise/mltransform/|https://github.com/apache/beam/tree/master/sdks/python/apache_beam/yaml]
> Enrichment code - [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/enrichment.py]
> Enrichment docs (should be published soon) - [https://github.com/apache/beam/pull/30187]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: gsoc-unsubscribe@community.apache.org
For additional commands, e-mail: gsoc-help@community.apache.org