You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Ted Dunning <td...@veoh.com> on 2007/12/06 18:19:50 UTC
Jaql reactions?
Does anybody in the pig developer community have a reaction to Jaql yet?
My impression is that they have done some very interesting work. Things I
like:
A) specific and direct access to map/reduce in a functional programming
syntax.
B) data has a concrete syntactic form that can be displayed and understood
along with other concrete forms that guarantee to keep the same semantics in
terms of tagged data elements. This universal tagging in the data makes a
lot of run-time schema things pretty trivial. It also allows test data to
be written into a script or example program and allows that test data to be
processed to a concrete result without involving the cluster.
C) they keep some of the best parts of pig like group and co-group.
Things I don't like:
1) Doesn't do map-reduce for all operations yet (presumably coming).
2) Doesn't have a provision for displaying the map-reduce version of the
program.
3) Not open source.
Does anybody else have any thoughts on this?
Re: Jaql reactions?
Posted by Ted Dunning <td...@veoh.com>.
This is a pretty darned important point.
My guess is that users will never be all that interested in the functional
side of any of this, but that a functional underpinning might be very useful
for the core developers and the penumbral developers because it could
provide a useful framework for expressing interior capabilities.
On 12/7/07 5:41 PM, "Benjamin Reed" <br...@yahoo-inc.com> wrote:
> So far
> we haven't run into anyone here at Yahoo who has asked for a functional
> language. The cries are all for Python and Perl bindings.
Re: Jaql reactions?
Posted by Benjamin Reed <br...@yahoo-inc.com>.
Ted I think you are looking at this from a specific functional programming
perspective. Pig Latin is not a functional programming language. It's not
even Turing Complete. It is a language to specify distributed computations
and is meant to be embedded into other languages. Sounds like you want
emaSchay or perhaps askellHey.
To some people [lambda(Map) return lambda(input) {foreach input generate
flatten(Map(*))] would be a thing of beauty. To others it is a terrible
flashback from a failed programming course; they would much rather have
something embedded in Bash. Rather than pick one, we want to express what is
necessary to optimize a run a computation and allow those expressions to be
added to a host language. (Grunt is a noise made by a Pig, not a language :)
I think you and Utkarsh are looking at Jaql differently (and potentially
filling in the blanks differently). For example:
When Utkarsh said:
> > If people really want map-reduce as a programming abstraction, where
> > the "group" operation is implicit, it would be easy to add this as a
> > macro in Pig.
and you said
>
> Indeed, but macros do not make a functional language.
Utkarsh was talking about the ability to specify a particular computation, not
asserting that Pig was a functional language.
> Pig's lazy evaluation semantics remind me quite a bit of functional
> programming. Why stop halfway?
I think you are right on here. We stopped where we did because the other half
would be part of embedding Pig into Haskell (or any other language). So far
we haven't run into anyone here at Yahoo who has asked for a functional
language. The cries are all for Python and Perl bindings.
ben
Re: Jaql reactions?
Posted by Doug Cutting <cu...@apache.org>.
Ted Dunning wrote:
> I would rather see the two languages diverge somewhat on this sort of count.
> Better that each community of developers push the virtues of a particular
> idiosyncratic emphasis.
That works if each community has sufficient, independent critical mass.
An alternate approach would be to have Pig and Jaql's developers join
forces, finding common ground, in order to build a larger and more
diverse community that can share the workload, potentially increasing
the longevity and generality of the project. Or something like that.
Doug
Re: Jaql reactions?
Posted by Ted Dunning <td...@veoh.com>.
I would rather see the two languages diverge somewhat on this sort of count.
Better that each community of developers push the virtues of a particular
idiosyncratic emphasis.
Pig has some very interesting potential and I think that the emphasis on
"this is relational algebra" is pretty cool and interesting. Jaql's
functional focus is really cool, but may ultimately be of little use. Or it
may be the lynch-pin for some really powerful program rewriting facilities.
I can't wait to see.
On 12/7/07 6:43 PM, "Utkarsh Srivastava" <ut...@yahoo-inc.com> wrote:
> As regards Pig, as Ben said, I don't think becoming a full-fledged
> functional programming language is on our roadmap simply because we
> haven't seen uses for it yet (unless of course, our community votes
> otherwise).
Re: Jaql reactions?
Posted by Utkarsh Srivastava <ut...@yahoo-inc.com>.
Hi Ted,
>
> I get the impression that Jaql is tied less to JSON than it appears at
> first. In particular, it looked to me like the on-disk format of
> data files
> could be more flexible. Certainly adding an abstraction layer for any
> record reader would be trivial. Similarly, there is nothing that
> says or
> requires that they actually pass around JSON encoded strings
> internally and
> there are several statements that imply that they actually pass
> around data
> structures whose only relationship to JSON is of data to a
> printable form.
>
JSON is a serialization format. As regards the data model that it
tries to capture, I think Pig, Jaql, and various programming
languages use the same: atomic values, and lists and maps. Hence you
are right: JSON can be left out of our discussion.
>>> A) specific and direct access to map/reduce in a functional
>>> programming
>>> syntax.
>>
>> If a language has primitives for per-record processing, grouping, and
>> group-wise aggregation, which both Pig and Jaql do, then direct
>> access to map-reduce is just syntactic sugar on top of these
>> primitives.
>
> Hmmm.... The key-word here is functional. Jaql is a higher-order
> functional
>
Ah, sorry! I had totally missed that your emphasis was on functional.
Jaql does seem to have a functional flavor since the map function is
specified as the value for a key in the data itself. However, how
close they are to a full functional language is not clear. We will
try to clarify this by communicating with the Jaql developers.
As regards Pig, as Ben said, I don't think becoming a full-fledged
functional programming language is on our roadmap simply because we
haven't seen uses for it yet (unless of course, our community votes
otherwise).
Utkarsh
Re: Jaql reactions?
Posted by Ted Dunning <td...@veoh.com>.
Utkarsh,
Thanks for your comments. I think I must have been a little unclear on some
of my statements. See below for more.
On 12/7/07 12:18 PM, "Utkarsh Srivastava" <ut...@yahoo-inc.com> wrote:
> Jaql is tied to JSON data, whereas Pig is data-format-agnostic.
I get the impression that Jaql is tied less to JSON than it appears at
first. In particular, it looked to me like the on-disk format of data files
could be more flexible. Certainly adding an abstraction layer for any
record reader would be trivial. Similarly, there is nothing that says or
requires that they actually pass around JSON encoded strings internally and
there are several statements that imply that they actually pass around data
structures whose only relationship to JSON is of data to a printable form.
>> A) specific and direct access to map/reduce in a functional programming
>> syntax.
>
> If a language has primitives for per-record processing, grouping, and
> group-wise aggregation, which both Pig and Jaql do, then direct
> access to map-reduce is just syntactic sugar on top of these primitives.
Hmmm.... The key-word here is functional. Jaql is a higher-order functional
language with lambda. And map-reduce is a function that operates on
functions and data together. The only thing I might like better is a
curried version of map-reduce as a function of two functions that returns a
function that processes data (fast).
Pig doesn't do anything like this and the difference appears to me to be
much more than syntactic sugar. Having the functional representation gives
you the guts of programmatic transformations essentially for free. This is
important.
I can't tell if Jaql things of data processing expressions as functional
compositions, but if it does, very cool things can become doable.
You are nearly right that in terms of expressive power, Jaql's explicit
map-reduce is only sugar, but this is only true if you limit yourself to
record processing primitives. If it is a full-scale first-class
higher-order function, then it is a different beast altogether.
>
> In Pig, Map-Reduce is written as:
>
> A = foreach input generate flatten(Map(*));
> B = group A by $0;
> C = foreach B generate Reduce(*);
And here is an important difference. The expression [foreach input generate
flatten(Map(*))] CANNOT be expressed in Pig in functional form. There isn't
something equivalent to [lambda(Map) return lambda(input) {foreach input
generate flatten(Map(*))]. If that were available, then I would be able to
write programs that manipulate program expressions in very interesting ways.
Just as importantly, what you have provided is a recipe for computing, but
not a function. Providing mapreduce as a function is important for
supporting programmatic transmformations.
> If people really want map-reduce as a programming abstraction, where
> the "group" operation is implicit, it would be easy to add this as a
> macro in Pig.
Indeed, but macros do not make a functional language.
Pig's lazy evaluation semantics remind me quite a bit of functional
programming. Why stop halfway?
Re: Jaql reactions?
Posted by Utkarsh Srivastava <ut...@yahoo-inc.com>.
Jaql is very much in the same spirit as Pig, and in fact the language
is quite similar. (They've chosen to sprinkle in some SQL-style
declarative clauses, such as WHERE clauses attached to many of the
operators, whereas in Pig we've explicitly avoided having operators
do multiple different kinds of things.) You would do a WHERE clause
in Pig by writing an explicit FILTER statement.
Jaql is tied to JSON data, whereas Pig is data-format-agnostic. Pig
can operate over JSON data as a special case. To demonstrate this, I
put together a JSON StorageFunction for Pig, and examples of how it
can be used (both attached). With this function, Pig can operate
over JSON data in much the same way that Jaql does. (It requires the
latest version of Pig; so if you want to try it please refresh from
SVN first.)
Some other observations:
>A) specific and direct access to map/reduce in a functional
programming
>syntax.
If a language has primitives for per-record processing, grouping, and
group-wise aggregation, which both Pig and Jaql do, then direct
access to map-reduce is just syntactic sugar on top of these primitives.
In Pig, Map-Reduce is written as:
A = foreach input generate flatten(Map(*));
B = group A by $0;
C = foreach B generate Reduce(*);
Where "Map" and "Reduce" are user-supplied Pig functions.
If people really want map-reduce as a programming abstraction, where
the "group" operation is implicit, it would be easy to add this as a
macro in Pig.
>B) data has a concrete syntactic form that can be displayed and
understood
>along with other concrete forms that guarantee to keep the same
semantics in
>terms of tagged data elements. This universal tagging in the data
makes a
>lot of run-time schema things pretty trivial. It also allows test
data to
>be written into a script or example program and allows that test
data to be
>processed to a concrete result without involving the cluster.
Pig's "maps" give very similar functionality:
(1) the schema can vary from record to record (i.e., each record can
have a different set of fields)
(2) operations can reference the schema of a record at run-time, just
like in Jaql.
In fact, "map" structures are the bread-and-butter of JSON.
Utkarsh