You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jeremy Hanna <je...@gmail.com> on 2011/04/27 20:57:10 UTC
Pygmalion - a github project for pig + cassandra
Hi all,
A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra. Currently there are a few handy UDFs in there like:
FromCassandraBag: a way to convert from what Cassandra returns (key:chararray, columns:bag {column:tuple (name, value)}) to something more tabular (key, value1, value2, value3). You specify the values you want to project - it's good for tabular data.
ToCassandraBag: a way to convert from (key, value1, value2, value3) to what Cassandra expects when writing - (key:chararray, columns:bag {column:tuple (name, value)}) - the column names are extracted from the variable names in the Pig script.
Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna
StringConcat: probably something everyone implements but instead of CONCAT that only does two strings, it does any number of strings.
GenerateTimeUUID: a udf that generates a time uuid with or without a time to base it on.
https://github.com/jeromatron/pygmalion/
It definitely needs more work and examples, but I've been using the UDFs in there for a while with Cassandra 0.7.5 (previously 0.7-branch). Now that 0.7.5 is released, I'd just like to let people know about it if they would like to contribute or even just use it.
Re: Pygmalion - a github project for pig + cassandra
Posted by Jeremy Hanna <je...@gmail.com>.
On Apr 27, 2011, at 4:53 PM, Bill Graham wrote:
> Very cool.
>
> FYI there's a StringConcat in pig like you describe that you can use like this:
>
> define concat org.apache.pig.builtin.StringConcat();
>
> Reference JIRA:
> https://issues.apache.org/jira/browse/PIG-1420
Oh cool - gtk, thanks Bill!
>
>
> On Wed, Apr 27, 2011 at 12:31 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> Nice!
>>
>> On Wed, Apr 27, 2011 at 1:57 PM, Jeremy Hanna
>> <je...@gmail.com> wrote:
>>> Hi all,
>>>
>>> A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra. Currently there are a few handy UDFs in there like:
>>>
>>> FromCassandraBag: a way to convert from what Cassandra returns (key:chararray, columns:bag {column:tuple (name, value)}) to something more tabular (key, value1, value2, value3). You specify the values you want to project - it's good for tabular data.
>>> ToCassandraBag: a way to convert from (key, value1, value2, value3) to what Cassandra expects when writing - (key:chararray, columns:bag {column:tuple (name, value)}) - the column names are extracted from the variable names in the Pig script.
>>> Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna
>>>
>>> StringConcat: probably something everyone implements but instead of CONCAT that only does two strings, it does any number of strings.
>>>
>>> GenerateTimeUUID: a udf that generates a time uuid with or without a time to base it on.
>>>
>>> https://github.com/jeromatron/pygmalion/
>>>
>>> It definitely needs more work and examples, but I've been using the UDFs in there for a while with Cassandra 0.7.5 (previously 0.7-branch). Now that 0.7.5 is released, I'd just like to let people know about it if they would like to contribute or even just use it.
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
Re: Pygmalion - a github project for pig + cassandra
Posted by Bill Graham <bi...@gmail.com>.
Very cool.
FYI there's a StringConcat in pig like you describe that you can use like this:
define concat org.apache.pig.builtin.StringConcat();
Reference JIRA:
https://issues.apache.org/jira/browse/PIG-1420
On Wed, Apr 27, 2011 at 12:31 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Nice!
>
> On Wed, Apr 27, 2011 at 1:57 PM, Jeremy Hanna
> <je...@gmail.com> wrote:
>> Hi all,
>>
>> A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra. Currently there are a few handy UDFs in there like:
>>
>> FromCassandraBag: a way to convert from what Cassandra returns (key:chararray, columns:bag {column:tuple (name, value)}) to something more tabular (key, value1, value2, value3). You specify the values you want to project - it's good for tabular data.
>> ToCassandraBag: a way to convert from (key, value1, value2, value3) to what Cassandra expects when writing - (key:chararray, columns:bag {column:tuple (name, value)}) - the column names are extracted from the variable names in the Pig script.
>> Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna
>>
>> StringConcat: probably something everyone implements but instead of CONCAT that only does two strings, it does any number of strings.
>>
>> GenerateTimeUUID: a udf that generates a time uuid with or without a time to base it on.
>>
>> https://github.com/jeromatron/pygmalion/
>>
>> It definitely needs more work and examples, but I've been using the UDFs in there for a while with Cassandra 0.7.5 (previously 0.7-branch). Now that 0.7.5 is released, I'd just like to let people know about it if they would like to contribute or even just use it.
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>
Re: Pygmalion - a github project for pig + cassandra
Posted by Jonathan Ellis <jb...@gmail.com>.
Nice!
On Wed, Apr 27, 2011 at 1:57 PM, Jeremy Hanna
<je...@gmail.com> wrote:
> Hi all,
>
> A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra. Currently there are a few handy UDFs in there like:
>
> FromCassandraBag: a way to convert from what Cassandra returns (key:chararray, columns:bag {column:tuple (name, value)}) to something more tabular (key, value1, value2, value3). You specify the values you want to project - it's good for tabular data.
> ToCassandraBag: a way to convert from (key, value1, value2, value3) to what Cassandra expects when writing - (key:chararray, columns:bag {column:tuple (name, value)}) - the column names are extracted from the variable names in the Pig script.
> Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna
>
> StringConcat: probably something everyone implements but instead of CONCAT that only does two strings, it does any number of strings.
>
> GenerateTimeUUID: a udf that generates a time uuid with or without a time to base it on.
>
> https://github.com/jeromatron/pygmalion/
>
> It definitely needs more work and examples, but I've been using the UDFs in there for a while with Cassandra 0.7.5 (previously 0.7-branch). Now that 0.7.5 is released, I'd just like to let people know about it if they would like to contribute or even just use it.
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com
Re: Pygmalion - a github project for pig + cassandra
Posted by Jonathan Ellis <jb...@gmail.com>.
Nice!
On Wed, Apr 27, 2011 at 1:57 PM, Jeremy Hanna
<je...@gmail.com> wrote:
> Hi all,
>
> A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra. Currently there are a few handy UDFs in there like:
>
> FromCassandraBag: a way to convert from what Cassandra returns (key:chararray, columns:bag {column:tuple (name, value)}) to something more tabular (key, value1, value2, value3). You specify the values you want to project - it's good for tabular data.
> ToCassandraBag: a way to convert from (key, value1, value2, value3) to what Cassandra expects when writing - (key:chararray, columns:bag {column:tuple (name, value)}) - the column names are extracted from the variable names in the Pig script.
> Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna
>
> StringConcat: probably something everyone implements but instead of CONCAT that only does two strings, it does any number of strings.
>
> GenerateTimeUUID: a udf that generates a time uuid with or without a time to base it on.
>
> https://github.com/jeromatron/pygmalion/
>
> It definitely needs more work and examples, but I've been using the UDFs in there for a while with Cassandra 0.7.5 (previously 0.7-branch). Now that 0.7.5 is released, I'd just like to let people know about it if they would like to contribute or even just use it.
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com