You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Ted Dunning <td...@veoh.com> on 2007/12/06 18:19:50 UTC

Jaql reactions?

Does anybody in the pig developer community have a reaction to Jaql yet?

My impression is that they have done some very interesting work.  Things I
like:

A) specific and direct access to map/reduce in a functional programming
syntax.  

B) data has a concrete syntactic form that can be displayed and understood
along with other concrete forms that guarantee to keep the same semantics in
terms of tagged data elements.  This universal tagging in the data makes a
lot of run-time schema things pretty trivial.  It also allows test data to
be written into a script or example program and allows that test data to be
processed to a concrete result without involving the cluster.

C) they keep some of the best parts of pig like group and co-group.


Things I don't like:

1) Doesn't do map-reduce for all operations yet (presumably coming).

2) Doesn't have a provision for displaying the map-reduce version of the
program.

3) Not open source.


Does anybody else have any thoughts on this?

  


Re: Jaql reactions?

Posted by Ted Dunning <td...@veoh.com>.
This is a pretty darned important point.

My guess is that users will never be all that interested in the functional
side of any of this, but that a functional underpinning might be very useful
for the core developers and the penumbral developers because it could
provide a useful framework for expressing interior capabilities.


On 12/7/07 5:41 PM, "Benjamin Reed" <br...@yahoo-inc.com> wrote:

> So far 
> we haven't run into anyone here at Yahoo who has asked for a functional
> language. The cries are all for Python and Perl bindings.


Re: Jaql reactions?

Posted by Benjamin Reed <br...@yahoo-inc.com>.
Ted I think you are looking at this from a specific functional programming 
perspective. Pig Latin is not a functional programming language. It's not 
even Turing Complete. It is a language to specify distributed computations 
and is meant to be embedded into other languages. Sounds like you want 
emaSchay or perhaps askellHey.

To some people [lambda(Map) return lambda(input) {foreach input generate 
flatten(Map(*))] would be a thing of beauty. To others it is a terrible 
flashback from a failed programming course; they would much rather have 
something embedded in Bash. Rather than pick one, we want to express what is 
necessary to optimize a run a computation and allow those expressions to be 
added to a host language. (Grunt is a noise made by a Pig, not a language :)

I think you and Utkarsh are looking at Jaql differently (and potentially 
filling in the blanks differently). For example: 

When Utkarsh said: 
> > If people really want map-reduce as a programming abstraction, where
> > the "group" operation is implicit, it would be easy to add this as a
> > macro in Pig.

and you said

>
> Indeed, but macros do not make a functional language.

Utkarsh was talking about the ability to specify a particular computation, not 
asserting that Pig was a functional language.

> Pig's lazy evaluation semantics remind me quite a bit of functional
> programming.  Why stop halfway?

I think you are right on here. We stopped where we did because the other half 
would be part of embedding Pig into Haskell (or any other language). So far 
we haven't run into anyone here at Yahoo who has asked for a functional 
language. The cries are all for Python and Perl bindings.

ben



Re: Jaql reactions?

Posted by Doug Cutting <cu...@apache.org>.
Ted Dunning wrote:
> I would rather see the two languages diverge somewhat on this sort of count.
> Better that each community of developers push the virtues of a particular
> idiosyncratic emphasis.

That works if each community has sufficient, independent critical mass. 
  An alternate approach would be to have Pig and Jaql's developers join 
forces, finding common ground, in order to build a larger and more 
diverse community that can share the workload, potentially increasing 
the longevity and generality of the project.  Or something like that.

Doug

Re: Jaql reactions?

Posted by Ted Dunning <td...@veoh.com>.
I would rather see the two languages diverge somewhat on this sort of count.
Better that each community of developers push the virtues of a particular
idiosyncratic emphasis.

Pig has some very interesting potential and I think that the emphasis on
"this is relational algebra" is pretty cool and interesting.  Jaql's
functional focus is really cool, but may ultimately be of little use.  Or it
may be the lynch-pin for some really powerful program rewriting facilities.

I can't wait to see.


On 12/7/07 6:43 PM, "Utkarsh Srivastava" <ut...@yahoo-inc.com> wrote:

> As regards Pig, as Ben said, I don't think becoming a full-fledged
> functional programming language is on our roadmap simply because we
> haven't seen uses for it yet (unless of course, our community votes
> otherwise).


Re: Jaql reactions?

Posted by Utkarsh Srivastava <ut...@yahoo-inc.com>.
Hi Ted,

>
> I get the impression that Jaql is tied less to JSON than it appears at
> first.  In particular, it looked to me like the on-disk format of  
> data files
> could be more flexible.  Certainly adding an abstraction layer for any
> record reader would be trivial.  Similarly, there is nothing that  
> says or
> requires that they actually pass around JSON encoded strings  
> internally and
> there are several statements that imply that they actually pass  
> around data
> structures whose only relationship to JSON is of data to a  
> printable form.
>


JSON is a serialization format. As regards the data model that it  
tries to capture, I think Pig, Jaql, and various programming  
languages use the same: atomic values, and lists and maps. Hence you  
are right: JSON can be left out of our discussion.

>>> A) specific and direct access to map/reduce in a functional  
>>> programming
>>> syntax.
>>
>> If a language has primitives for per-record processing, grouping, and
>> group-wise aggregation, which both Pig and Jaql do, then direct
>> access to map-reduce is just syntactic sugar on top of these  
>> primitives.
>
> Hmmm.... The key-word here is functional.  Jaql is a higher-order  
> functional
>

Ah, sorry! I had totally missed that your emphasis was on functional.  
Jaql does seem to have a functional flavor since the map function is  
specified as the value for a key in the data itself. However, how  
close they are to a full functional language is not clear. We will  
try to clarify this by communicating with the Jaql developers.

As regards Pig, as Ben said, I don't think becoming a full-fledged  
functional programming language is on our roadmap simply because we  
haven't seen uses for it yet (unless of course, our community votes  
otherwise).

Utkarsh

Re: Jaql reactions?

Posted by Ted Dunning <td...@veoh.com>.
Utkarsh, 

Thanks for your comments.  I think I must have been a little unclear on some
of my statements.  See below for more.


On 12/7/07 12:18 PM, "Utkarsh Srivastava" <ut...@yahoo-inc.com> wrote:

> Jaql is tied to JSON data, whereas Pig is data-format-agnostic.

I get the impression that Jaql is tied less to JSON than it appears at
first.  In particular, it looked to me like the on-disk format of data files
could be more flexible.  Certainly adding an abstraction layer for any
record reader would be trivial.  Similarly, there is nothing that says or
requires that they actually pass around JSON encoded strings internally and
there are several statements that imply that they actually pass around data
structures whose only relationship to JSON is of data to a printable form.

>> A) specific and direct access to map/reduce in a functional programming
>> syntax.
> 
> If a language has primitives for per-record processing, grouping, and
> group-wise aggregation, which both Pig and Jaql do, then direct
> access to map-reduce is just syntactic sugar on top of these primitives.

Hmmm.... The key-word here is functional.  Jaql is a higher-order functional
language with lambda.  And map-reduce is a function that operates on
functions and data together.  The only thing I might like better is a
curried version of map-reduce as a function of two functions that returns a
function that processes data (fast).

Pig doesn't do anything like this and the difference appears to me to be
much more than syntactic sugar.  Having the functional representation gives
you the guts of programmatic transformations essentially for free.  This is
important.

I can't tell if Jaql things of data processing expressions as functional
compositions, but if it does, very cool things can become doable.

You are nearly right that in terms of expressive power, Jaql's explicit
map-reduce is only sugar, but this is only true if you limit yourself to
record processing primitives.  If it is a full-scale first-class
higher-order function, then it is a different beast altogether.

> 
> In Pig, Map-Reduce is written as:
> 
> A = foreach input generate flatten(Map(*));
> B = group A by $0;
> C = foreach B generate Reduce(*);

And here is an important difference.  The expression [foreach input generate
flatten(Map(*))] CANNOT be expressed in Pig in functional form.  There isn't
something equivalent to [lambda(Map) return lambda(input) {foreach input
generate flatten(Map(*))].  If that were available, then I would be able to
write programs that manipulate program expressions in very interesting ways.

Just as importantly, what you have provided is a recipe for computing, but
not a function.  Providing mapreduce as a function is important for
supporting programmatic transmformations.

> If people really want map-reduce as a programming abstraction, where
> the "group" operation is implicit, it would be easy to add this as a
> macro in Pig.

Indeed, but macros do not make a functional language.

Pig's lazy evaluation semantics remind me quite a bit of functional
programming.  Why stop halfway?



Re: Jaql reactions?

Posted by Utkarsh Srivastava <ut...@yahoo-inc.com>.
Jaql is very much in the same spirit as Pig, and in fact the language  
is quite similar. (They've chosen to sprinkle in some SQL-style  
declarative clauses, such as WHERE clauses attached to many of the  
operators, whereas in Pig we've explicitly avoided having operators  
do multiple different kinds of things.) You would do a WHERE clause  
in Pig by writing an explicit FILTER statement.

Jaql is tied to JSON data, whereas Pig is data-format-agnostic. Pig  
can operate over JSON data as a special case.  To demonstrate this, I  
put together a JSON StorageFunction for Pig, and examples of how it  
can be used  (both attached). With this function, Pig can operate  
over JSON data in much the same way that Jaql does. (It requires the  
latest version of Pig; so if you want to try it please refresh from  
SVN first.)

Some other observations:

 >A) specific and direct access to map/reduce in a functional  
programming
 >syntax.

If a language has primitives for per-record processing, grouping, and  
group-wise aggregation, which both Pig and Jaql do, then direct  
access to map-reduce is just syntactic sugar on top of these primitives.

In Pig, Map-Reduce is written as:

A = foreach input generate flatten(Map(*));
B = group A by $0;
C = foreach B generate Reduce(*);

Where "Map" and "Reduce" are user-supplied Pig functions.

If people really want map-reduce as a programming abstraction, where  
the "group" operation is implicit, it would be easy to add this as a  
macro in Pig.


 >B) data has a concrete syntactic form that can be displayed and  
understood
 >along with other concrete forms that guarantee to keep the same  
semantics in
 >terms of tagged data elements.  This universal tagging in the data  
makes a
 >lot of run-time schema things pretty trivial.  It also allows test  
data to
 >be written into a script or example program and allows that test  
data to be
 >processed to a concrete result without involving the cluster.

Pig's "maps" give very similar functionality:
(1) the schema can vary from record to record (i.e., each record can  
have a different set of fields)
(2) operations can reference the schema of a record at run-time, just  
like in Jaql.

In fact, "map" structures are the bread-and-butter of JSON.


Utkarsh