You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Scott Wine <sc...@whitepages.com> on 2010/05/13 00:59:43 UTC

CONCAT multiple fields

Hello,

I am trying to create a full address and full location field in Pig by combining multiple fields.

file = LOAD 'file.txt' USING PigStorage() AS
  (house:chararray,
  predir:chararray,
  street:chararray,
  streettype:chararray,
  postdir:chararray
  city:chararray,
  state:chararray,
  zip:chararray)

I need an output that is full address and full location:

full_address == house + ' ' + predir + ' ' + street + ' ' + streettype + ' ' + postdir
full_location == city + ' ' + state + ' ' + zip

I can get two to merge with CONCAT using but am not able to add more or the spaces in between.

Temp1 = FOREACH file GENERATE CONCAT (house,street)

Any ideas?

Thanks
Scott



Re: CONCAT multiple fields

Posted by Russell Jurney <ru...@gmail.com>.
Yeah, that sounds like a good idea.  I can do that patch.

On Wed, May 12, 2010 at 5:04 PM, Alan Gates <ga...@yahoo-inc.com> wrote:

> Can't we just change the built-in CONCAT to accept additional fields?  This
> would be totally backward compatible.  I know it won't help now.
>
> Alan.
>
>
> On May 12, 2010, at 4:15 PM, Russell Jurney wrote:
>
>  The CONCAT in the oink project (LinkedIn's UDFs) does concatenation of
>> any number of string arguments:
>>
>> http://github.com/criccomini/oink/blob/master/src/java/oink/udf/CONCAT.java
>>
>> We're going to merge this with elephant-bird when we get a chance, and
>> this UDF could use a new name like MULTI_CONCAT, but it should work
>> for you.
>>
>> Russell Jurney
>> russell.jurney@gmail.com
>> (404) 317-3620
>> http://twitter.com/rjurney
>> http://linkedin.com/in/russelljurney
>>
>> On May 12, 2010, at 3:59 PM, Scott Wine <sc...@whitepages.com> wrote:
>>
>>  Hello,
>>>
>>> I am trying to create a full address and full location field in Pig by
>>> combining multiple fields.
>>>
>>> file = LOAD 'file.txt' USING PigStorage() AS
>>> (house:chararray,
>>> predir:chararray,
>>> street:chararray,
>>> streettype:chararray,
>>> postdir:chararray
>>> city:chararray,
>>> state:chararray,
>>> zip:chararray)
>>>
>>> I need an output that is full address and full location:
>>>
>>> full_address == house + ' ' + predir + ' ' + street + ' ' + streettype +
>>> ' ' + postdir
>>> full_location == city + ' ' + state + ' ' + zip
>>>
>>> I can get two to merge with CONCAT using but am not able to add more or
>>> the spaces in between.
>>>
>>> Temp1 = FOREACH file GENERATE CONCAT (house,street)
>>>
>>> Any ideas?
>>>
>>> Thanks
>>> Scott
>>>
>>>
>>>
>

Re: CONCAT multiple fields

Posted by Russell Jurney <ru...@gmail.com>.
Scratch that, grep is my friend.

On Fri, May 14, 2010 at 12:20 AM, Russell Jurney
<ru...@gmail.com>wrote:

> I wrote the patch, but looking around, I'm not sure where the unit tests
> for this stuff is.  Can someone point me in the right direction?  I added an
> append() method to DataByteArray, as that seemed the cleanest way to do
> this.
>
> Should I make a JIRA then submit the patch?
>
> Russ
>
> On Wed, May 12, 2010 at 5:04 PM, Alan Gates <ga...@yahoo-inc.com> wrote:
>
>> Can't we just change the built-in CONCAT to accept additional fields?
>>  This would be totally backward compatible.  I know it won't help now.
>>
>> Alan.
>>
>>
>> On May 12, 2010, at 4:15 PM, Russell Jurney wrote:
>>
>>  The CONCAT in the oink project (LinkedIn's UDFs) does concatenation of
>>> any number of string arguments:
>>>
>>> http://github.com/criccomini/oink/blob/master/src/java/oink/udf/CONCAT.java
>>>
>>> We're going to merge this with elephant-bird when we get a chance, and
>>> this UDF could use a new name like MULTI_CONCAT, but it should work
>>> for you.
>>>
>>> Russell Jurney
>>> russell.jurney@gmail.com
>>> (404) 317-3620
>>> http://twitter.com/rjurney
>>> http://linkedin.com/in/russelljurney
>>>
>>> On May 12, 2010, at 3:59 PM, Scott Wine <sc...@whitepages.com> wrote:
>>>
>>>  Hello,
>>>>
>>>> I am trying to create a full address and full location field in Pig by
>>>> combining multiple fields.
>>>>
>>>> file = LOAD 'file.txt' USING PigStorage() AS
>>>> (house:chararray,
>>>> predir:chararray,
>>>> street:chararray,
>>>> streettype:chararray,
>>>> postdir:chararray
>>>> city:chararray,
>>>> state:chararray,
>>>> zip:chararray)
>>>>
>>>> I need an output that is full address and full location:
>>>>
>>>> full_address == house + ' ' + predir + ' ' + street + ' ' + streettype +
>>>> ' ' + postdir
>>>> full_location == city + ' ' + state + ' ' + zip
>>>>
>>>> I can get two to merge with CONCAT using but am not able to add more or
>>>> the spaces in between.
>>>>
>>>> Temp1 = FOREACH file GENERATE CONCAT (house,street)
>>>>
>>>> Any ideas?
>>>>
>>>> Thanks
>>>> Scott
>>>>
>>>>
>>>>
>>
>

Re: CONCAT multiple fields

Posted by Russell Jurney <ru...@gmail.com>.
https://issues.apache.org/jira/browse/PIG-1420

Patch soon.

On Fri, May 14, 2010 at 9:10 AM, Alan Gates <ga...@yahoo-inc.com> wrote:

>
> On May 14, 2010, at 12:20 AM, Russell Jurney wrote:
>
>
>>
>> Should I make a JIRA then submit the patch?
>>
>>  Yes.
>
> Alan.
>

Re: CONCAT multiple fields

Posted by Alan Gates <ga...@yahoo-inc.com>.
On May 14, 2010, at 12:20 AM, Russell Jurney wrote:

>
>
> Should I make a JIRA then submit the patch?
>
Yes.

Alan.

Re: CONCAT multiple fields

Posted by Russell Jurney <ru...@gmail.com>.
I wrote the patch, but looking around, I'm not sure where the unit tests for
this stuff is.  Can someone point me in the right direction?  I added an
append() method to DataByteArray, as that seemed the cleanest way to do
this.

Should I make a JIRA then submit the patch?

Russ

On Wed, May 12, 2010 at 5:04 PM, Alan Gates <ga...@yahoo-inc.com> wrote:

> Can't we just change the built-in CONCAT to accept additional fields?  This
> would be totally backward compatible.  I know it won't help now.
>
> Alan.
>
>
> On May 12, 2010, at 4:15 PM, Russell Jurney wrote:
>
>  The CONCAT in the oink project (LinkedIn's UDFs) does concatenation of
>> any number of string arguments:
>>
>> http://github.com/criccomini/oink/blob/master/src/java/oink/udf/CONCAT.java
>>
>> We're going to merge this with elephant-bird when we get a chance, and
>> this UDF could use a new name like MULTI_CONCAT, but it should work
>> for you.
>>
>> Russell Jurney
>> russell.jurney@gmail.com
>> (404) 317-3620
>> http://twitter.com/rjurney
>> http://linkedin.com/in/russelljurney
>>
>> On May 12, 2010, at 3:59 PM, Scott Wine <sc...@whitepages.com> wrote:
>>
>>  Hello,
>>>
>>> I am trying to create a full address and full location field in Pig by
>>> combining multiple fields.
>>>
>>> file = LOAD 'file.txt' USING PigStorage() AS
>>> (house:chararray,
>>> predir:chararray,
>>> street:chararray,
>>> streettype:chararray,
>>> postdir:chararray
>>> city:chararray,
>>> state:chararray,
>>> zip:chararray)
>>>
>>> I need an output that is full address and full location:
>>>
>>> full_address == house + ' ' + predir + ' ' + street + ' ' + streettype +
>>> ' ' + postdir
>>> full_location == city + ' ' + state + ' ' + zip
>>>
>>> I can get two to merge with CONCAT using but am not able to add more or
>>> the spaces in between.
>>>
>>> Temp1 = FOREACH file GENERATE CONCAT (house,street)
>>>
>>> Any ideas?
>>>
>>> Thanks
>>> Scott
>>>
>>>
>>>
>

Re: CONCAT multiple fields

Posted by Alan Gates <ga...@yahoo-inc.com>.
Can't we just change the built-in CONCAT to accept additional fields?   
This would be totally backward compatible.  I know it won't help now.

Alan.

On May 12, 2010, at 4:15 PM, Russell Jurney wrote:

> The CONCAT in the oink project (LinkedIn's UDFs) does concatenation of
> any number of string arguments:
> http://github.com/criccomini/oink/blob/master/src/java/oink/udf/CONCAT.java
>
> We're going to merge this with elephant-bird when we get a chance, and
> this UDF could use a new name like MULTI_CONCAT, but it should work
> for you.
>
> Russell Jurney
> russell.jurney@gmail.com
> (404) 317-3620
> http://twitter.com/rjurney
> http://linkedin.com/in/russelljurney
>
> On May 12, 2010, at 3:59 PM, Scott Wine <sc...@whitepages.com> wrote:
>
>> Hello,
>>
>> I am trying to create a full address and full location field in Pig  
>> by combining multiple fields.
>>
>> file = LOAD 'file.txt' USING PigStorage() AS
>> (house:chararray,
>> predir:chararray,
>> street:chararray,
>> streettype:chararray,
>> postdir:chararray
>> city:chararray,
>> state:chararray,
>> zip:chararray)
>>
>> I need an output that is full address and full location:
>>
>> full_address == house + ' ' + predir + ' ' + street + ' ' +  
>> streettype + ' ' + postdir
>> full_location == city + ' ' + state + ' ' + zip
>>
>> I can get two to merge with CONCAT using but am not able to add  
>> more or the spaces in between.
>>
>> Temp1 = FOREACH file GENERATE CONCAT (house,street)
>>
>> Any ideas?
>>
>> Thanks
>> Scott
>>
>>


Re: CONCAT multiple fields

Posted by Russell Jurney <ru...@gmail.com>.
The CONCAT in the oink project (LinkedIn's UDFs) does concatenation of
any number of string arguments:
http://github.com/criccomini/oink/blob/master/src/java/oink/udf/CONCAT.java

We're going to merge this with elephant-bird when we get a chance, and
this UDF could use a new name like MULTI_CONCAT, but it should work
for you.

Russell Jurney
russell.jurney@gmail.com
(404) 317-3620
http://twitter.com/rjurney
http://linkedin.com/in/russelljurney

On May 12, 2010, at 3:59 PM, Scott Wine <sc...@whitepages.com> wrote:

> Hello,
>
> I am trying to create a full address and full location field in Pig by combining multiple fields.
>
> file = LOAD 'file.txt' USING PigStorage() AS
>  (house:chararray,
>  predir:chararray,
>  street:chararray,
>  streettype:chararray,
>  postdir:chararray
>  city:chararray,
>  state:chararray,
>  zip:chararray)
>
> I need an output that is full address and full location:
>
> full_address == house + ' ' + predir + ' ' + street + ' ' + streettype + ' ' + postdir
> full_location == city + ' ' + state + ' ' + zip
>
> I can get two to merge with CONCAT using but am not able to add more or the spaces in between.
>
> Temp1 = FOREACH file GENERATE CONCAT (house,street)
>
> Any ideas?
>
> Thanks
> Scott
>
>

Re: CONCAT multiple fields

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
It's not the prettiest thing, but:

FOREACH file GENERATE CONCAT(CONCAT(  CONCAT(house, ' '),
CONCAT(predir, ' ')), street)

(and so on)

A better solution would be to write a UDF that wraps StringBuilder,
and simply call

GENERATE StringBuilder(house, ' ', predir, ' ', street ....);

-D

On Wed, May 12, 2010 at 3:59 PM, Scott Wine <sc...@whitepages.com> wrote:
> Hello,
>
> I am trying to create a full address and full location field in Pig by combining multiple fields.
>
> file = LOAD 'file.txt' USING PigStorage() AS
>  (house:chararray,
>  predir:chararray,
>  street:chararray,
>  streettype:chararray,
>  postdir:chararray
>  city:chararray,
>  state:chararray,
>  zip:chararray)
>
> I need an output that is full address and full location:
>
> full_address == house + ' ' + predir + ' ' + street + ' ' + streettype + ' ' + postdir
> full_location == city + ' ' + state + ' ' + zip
>
> I can get two to merge with CONCAT using but am not able to add more or the spaces in between.
>
> Temp1 = FOREACH file GENERATE CONCAT (house,street)
>
> Any ideas?
>
> Thanks
> Scott
>
>
>