You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Nikolay Sokolov (JIRA)" <ji...@apache.org> on 2018/01/28 05:39:00 UTC

[jira] [Commented] (BEAM-995) Apache Pig DSL

    [ https://issues.apache.org/jira/browse/BEAM-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342456#comment-16342456 ] 

Nikolay Sokolov commented on BEAM-995:
--------------------------------------

[~xumingming] I'm new to beam, but currently working on data warehouse project which heavily relied on pig in the past. We are quite interested in possibility to run that legacy on Dataflow via Beam without major overhaul, so here are my few humble comments on this topic:

> If we do pig-on-beam on beam-side, we will have something like `UDFAdapter` which will adapt all existing UDFs, so we can use them in the new pig-on-beam.

It feels like pig is not so popular nowadays, from other hand there is humongous amount of legacy code across many organizations, where full pig compatibility would be required. Existing code frequently depends on a way how pig discovers additional jars, specific Loaders/Storers (custom ones also might be possible), and shell command arguments of pig command itself. For such legacy codebases, pig-on-beam would be more benefitial.

> There is pipeline optimizer in BEAM, and also an optimizer in underline engine(Spark, MapReduce)

I'm not particularly sure about pig side of things, but hive provides optimizations such as map joins, sorted bucketed joins, and skewed joins, on logical plan level. Some of these optimizations require knowledge of metadata (for example, in HCat case). Would optimizers on beam side cover those cases?

> Apache Pig DSL
> --------------
>
>                 Key: BEAM-995
>                 URL: https://issues.apache.org/jira/browse/BEAM-995
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-ideas
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Jean-Baptiste Onofré
>            Priority: Major
>
> Apache Pig is still popular and the language is not so large.
> Providing a DSL using the Pig language would potentially allow more people to use Beam (at least during a transition period).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)