You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jeremy Hanna <je...@gmail.com> on 2011/04/27 20:57:10 UTC

Pygmalion - a github project for pig + cassandra

Hi all,

A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra.  Currently there are a few handy UDFs in there like:

FromCassandraBag: a way to convert from what Cassandra returns (key:chararray, columns:bag {column:tuple (name, value)}) to something more tabular (key, value1, value2, value3).  You specify the values you want to project - it's good for tabular data.
ToCassandraBag: a way to convert from (key, value1, value2, value3) to what Cassandra expects when writing - (key:chararray, columns:bag {column:tuple (name, value)}) - the column names are extracted from the variable names in the Pig script.
Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna

StringConcat: probably something everyone implements but instead of CONCAT that only does two strings, it does any number of strings.

GenerateTimeUUID: a udf that generates a time uuid with or without a time to base it on.

https://github.com/jeromatron/pygmalion/

It definitely needs more work and examples, but I've been using the UDFs in there for a while with Cassandra 0.7.5 (previously 0.7-branch).  Now that 0.7.5 is released, I'd just like to let people know about it if they would like to contribute or even just use it.

Re: Pygmalion - a github project for pig + cassandra

Posted by Jeremy Hanna <je...@gmail.com>.
On Apr 27, 2011, at 4:53 PM, Bill Graham wrote:

> Very cool.
> 
> FYI  there's a StringConcat in pig like you describe that you can use like this:
> 
> define concat org.apache.pig.builtin.StringConcat();
> 
> Reference JIRA:
> https://issues.apache.org/jira/browse/PIG-1420

Oh cool - gtk, thanks Bill!

> 
> 
> On Wed, Apr 27, 2011 at 12:31 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> Nice!
>> 
>> On Wed, Apr 27, 2011 at 1:57 PM, Jeremy Hanna
>> <je...@gmail.com> wrote:
>>> Hi all,
>>> 
>>> A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra.  Currently there are a few handy UDFs in there like:
>>> 
>>> FromCassandraBag: a way to convert from what Cassandra returns (key:chararray, columns:bag {column:tuple (name, value)}) to something more tabular (key, value1, value2, value3).  You specify the values you want to project - it's good for tabular data.
>>> ToCassandraBag: a way to convert from (key, value1, value2, value3) to what Cassandra expects when writing - (key:chararray, columns:bag {column:tuple (name, value)}) - the column names are extracted from the variable names in the Pig script.
>>> Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna
>>> 
>>> StringConcat: probably something everyone implements but instead of CONCAT that only does two strings, it does any number of strings.
>>> 
>>> GenerateTimeUUID: a udf that generates a time uuid with or without a time to base it on.
>>> 
>>> https://github.com/jeromatron/pygmalion/
>>> 
>>> It definitely needs more work and examples, but I've been using the UDFs in there for a while with Cassandra 0.7.5 (previously 0.7-branch).  Now that 0.7.5 is released, I'd just like to let people know about it if they would like to contribute or even just use it.
>> 
>> 
>> 
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>> 


Re: Pygmalion - a github project for pig + cassandra

Posted by Bill Graham <bi...@gmail.com>.
Very cool.

FYI  there's a StringConcat in pig like you describe that you can use like this:

define concat org.apache.pig.builtin.StringConcat();

Reference JIRA:
https://issues.apache.org/jira/browse/PIG-1420


On Wed, Apr 27, 2011 at 12:31 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Nice!
>
> On Wed, Apr 27, 2011 at 1:57 PM, Jeremy Hanna
> <je...@gmail.com> wrote:
>> Hi all,
>>
>> A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra.  Currently there are a few handy UDFs in there like:
>>
>> FromCassandraBag: a way to convert from what Cassandra returns (key:chararray, columns:bag {column:tuple (name, value)}) to something more tabular (key, value1, value2, value3).  You specify the values you want to project - it's good for tabular data.
>> ToCassandraBag: a way to convert from (key, value1, value2, value3) to what Cassandra expects when writing - (key:chararray, columns:bag {column:tuple (name, value)}) - the column names are extracted from the variable names in the Pig script.
>> Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna
>>
>> StringConcat: probably something everyone implements but instead of CONCAT that only does two strings, it does any number of strings.
>>
>> GenerateTimeUUID: a udf that generates a time uuid with or without a time to base it on.
>>
>> https://github.com/jeromatron/pygmalion/
>>
>> It definitely needs more work and examples, but I've been using the UDFs in there for a while with Cassandra 0.7.5 (previously 0.7-branch).  Now that 0.7.5 is released, I'd just like to let people know about it if they would like to contribute or even just use it.
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: Pygmalion - a github project for pig + cassandra

Posted by Jonathan Ellis <jb...@gmail.com>.
Nice!

On Wed, Apr 27, 2011 at 1:57 PM, Jeremy Hanna
<je...@gmail.com> wrote:
> Hi all,
>
> A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra.  Currently there are a few handy UDFs in there like:
>
> FromCassandraBag: a way to convert from what Cassandra returns (key:chararray, columns:bag {column:tuple (name, value)}) to something more tabular (key, value1, value2, value3).  You specify the values you want to project - it's good for tabular data.
> ToCassandraBag: a way to convert from (key, value1, value2, value3) to what Cassandra expects when writing - (key:chararray, columns:bag {column:tuple (name, value)}) - the column names are extracted from the variable names in the Pig script.
> Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna
>
> StringConcat: probably something everyone implements but instead of CONCAT that only does two strings, it does any number of strings.
>
> GenerateTimeUUID: a udf that generates a time uuid with or without a time to base it on.
>
> https://github.com/jeromatron/pygmalion/
>
> It definitely needs more work and examples, but I've been using the UDFs in there for a while with Cassandra 0.7.5 (previously 0.7-branch).  Now that 0.7.5 is released, I'd just like to let people know about it if they would like to contribute or even just use it.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Pygmalion - a github project for pig + cassandra

Posted by Jonathan Ellis <jb...@gmail.com>.
Nice!

On Wed, Apr 27, 2011 at 1:57 PM, Jeremy Hanna
<je...@gmail.com> wrote:
> Hi all,
>
> A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra.  Currently there are a few handy UDFs in there like:
>
> FromCassandraBag: a way to convert from what Cassandra returns (key:chararray, columns:bag {column:tuple (name, value)}) to something more tabular (key, value1, value2, value3).  You specify the values you want to project - it's good for tabular data.
> ToCassandraBag: a way to convert from (key, value1, value2, value3) to what Cassandra expects when writing - (key:chararray, columns:bag {column:tuple (name, value)}) - the column names are extracted from the variable names in the Pig script.
> Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna
>
> StringConcat: probably something everyone implements but instead of CONCAT that only does two strings, it does any number of strings.
>
> GenerateTimeUUID: a udf that generates a time uuid with or without a time to base it on.
>
> https://github.com/jeromatron/pygmalion/
>
> It definitely needs more work and examples, but I've been using the UDFs in there for a while with Cassandra 0.7.5 (previously 0.7-branch).  Now that 0.7.5 is released, I'd just like to let people know about it if they would like to contribute or even just use it.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com