You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Scott Wine <sc...@whitepages.com> on 2010/05/13 00:59:43 UTC
CONCAT multiple fields
Hello,
I am trying to create a full address and full location field in Pig by combining multiple fields.
file = LOAD 'file.txt' USING PigStorage() AS
(house:chararray,
predir:chararray,
street:chararray,
streettype:chararray,
postdir:chararray
city:chararray,
state:chararray,
zip:chararray)
I need an output that is full address and full location:
full_address == house + ' ' + predir + ' ' + street + ' ' + streettype + ' ' + postdir
full_location == city + ' ' + state + ' ' + zip
I can get two to merge with CONCAT using but am not able to add more or the spaces in between.
Temp1 = FOREACH file GENERATE CONCAT (house,street)
Any ideas?
Thanks
Scott
Re: CONCAT multiple fields
Posted by Russell Jurney <ru...@gmail.com>.
Yeah, that sounds like a good idea. I can do that patch.
On Wed, May 12, 2010 at 5:04 PM, Alan Gates <ga...@yahoo-inc.com> wrote:
> Can't we just change the built-in CONCAT to accept additional fields? This
> would be totally backward compatible. I know it won't help now.
>
> Alan.
>
>
> On May 12, 2010, at 4:15 PM, Russell Jurney wrote:
>
> The CONCAT in the oink project (LinkedIn's UDFs) does concatenation of
>> any number of string arguments:
>>
>> http://github.com/criccomini/oink/blob/master/src/java/oink/udf/CONCAT.java
>>
>> We're going to merge this with elephant-bird when we get a chance, and
>> this UDF could use a new name like MULTI_CONCAT, but it should work
>> for you.
>>
>> Russell Jurney
>> russell.jurney@gmail.com
>> (404) 317-3620
>> http://twitter.com/rjurney
>> http://linkedin.com/in/russelljurney
>>
>> On May 12, 2010, at 3:59 PM, Scott Wine <sc...@whitepages.com> wrote:
>>
>> Hello,
>>>
>>> I am trying to create a full address and full location field in Pig by
>>> combining multiple fields.
>>>
>>> file = LOAD 'file.txt' USING PigStorage() AS
>>> (house:chararray,
>>> predir:chararray,
>>> street:chararray,
>>> streettype:chararray,
>>> postdir:chararray
>>> city:chararray,
>>> state:chararray,
>>> zip:chararray)
>>>
>>> I need an output that is full address and full location:
>>>
>>> full_address == house + ' ' + predir + ' ' + street + ' ' + streettype +
>>> ' ' + postdir
>>> full_location == city + ' ' + state + ' ' + zip
>>>
>>> I can get two to merge with CONCAT using but am not able to add more or
>>> the spaces in between.
>>>
>>> Temp1 = FOREACH file GENERATE CONCAT (house,street)
>>>
>>> Any ideas?
>>>
>>> Thanks
>>> Scott
>>>
>>>
>>>
>
Re: CONCAT multiple fields
Posted by Russell Jurney <ru...@gmail.com>.
Scratch that, grep is my friend.
On Fri, May 14, 2010 at 12:20 AM, Russell Jurney
<ru...@gmail.com>wrote:
> I wrote the patch, but looking around, I'm not sure where the unit tests
> for this stuff is. Can someone point me in the right direction? I added an
> append() method to DataByteArray, as that seemed the cleanest way to do
> this.
>
> Should I make a JIRA then submit the patch?
>
> Russ
>
> On Wed, May 12, 2010 at 5:04 PM, Alan Gates <ga...@yahoo-inc.com> wrote:
>
>> Can't we just change the built-in CONCAT to accept additional fields?
>> This would be totally backward compatible. I know it won't help now.
>>
>> Alan.
>>
>>
>> On May 12, 2010, at 4:15 PM, Russell Jurney wrote:
>>
>> The CONCAT in the oink project (LinkedIn's UDFs) does concatenation of
>>> any number of string arguments:
>>>
>>> http://github.com/criccomini/oink/blob/master/src/java/oink/udf/CONCAT.java
>>>
>>> We're going to merge this with elephant-bird when we get a chance, and
>>> this UDF could use a new name like MULTI_CONCAT, but it should work
>>> for you.
>>>
>>> Russell Jurney
>>> russell.jurney@gmail.com
>>> (404) 317-3620
>>> http://twitter.com/rjurney
>>> http://linkedin.com/in/russelljurney
>>>
>>> On May 12, 2010, at 3:59 PM, Scott Wine <sc...@whitepages.com> wrote:
>>>
>>> Hello,
>>>>
>>>> I am trying to create a full address and full location field in Pig by
>>>> combining multiple fields.
>>>>
>>>> file = LOAD 'file.txt' USING PigStorage() AS
>>>> (house:chararray,
>>>> predir:chararray,
>>>> street:chararray,
>>>> streettype:chararray,
>>>> postdir:chararray
>>>> city:chararray,
>>>> state:chararray,
>>>> zip:chararray)
>>>>
>>>> I need an output that is full address and full location:
>>>>
>>>> full_address == house + ' ' + predir + ' ' + street + ' ' + streettype +
>>>> ' ' + postdir
>>>> full_location == city + ' ' + state + ' ' + zip
>>>>
>>>> I can get two to merge with CONCAT using but am not able to add more or
>>>> the spaces in between.
>>>>
>>>> Temp1 = FOREACH file GENERATE CONCAT (house,street)
>>>>
>>>> Any ideas?
>>>>
>>>> Thanks
>>>> Scott
>>>>
>>>>
>>>>
>>
>
Re: CONCAT multiple fields
Posted by Russell Jurney <ru...@gmail.com>.
https://issues.apache.org/jira/browse/PIG-1420
Patch soon.
On Fri, May 14, 2010 at 9:10 AM, Alan Gates <ga...@yahoo-inc.com> wrote:
>
> On May 14, 2010, at 12:20 AM, Russell Jurney wrote:
>
>
>>
>> Should I make a JIRA then submit the patch?
>>
>> Yes.
>
> Alan.
>
Re: CONCAT multiple fields
Posted by Alan Gates <ga...@yahoo-inc.com>.
On May 14, 2010, at 12:20 AM, Russell Jurney wrote:
>
>
> Should I make a JIRA then submit the patch?
>
Yes.
Alan.
Re: CONCAT multiple fields
Posted by Russell Jurney <ru...@gmail.com>.
I wrote the patch, but looking around, I'm not sure where the unit tests for
this stuff is. Can someone point me in the right direction? I added an
append() method to DataByteArray, as that seemed the cleanest way to do
this.
Should I make a JIRA then submit the patch?
Russ
On Wed, May 12, 2010 at 5:04 PM, Alan Gates <ga...@yahoo-inc.com> wrote:
> Can't we just change the built-in CONCAT to accept additional fields? This
> would be totally backward compatible. I know it won't help now.
>
> Alan.
>
>
> On May 12, 2010, at 4:15 PM, Russell Jurney wrote:
>
> The CONCAT in the oink project (LinkedIn's UDFs) does concatenation of
>> any number of string arguments:
>>
>> http://github.com/criccomini/oink/blob/master/src/java/oink/udf/CONCAT.java
>>
>> We're going to merge this with elephant-bird when we get a chance, and
>> this UDF could use a new name like MULTI_CONCAT, but it should work
>> for you.
>>
>> Russell Jurney
>> russell.jurney@gmail.com
>> (404) 317-3620
>> http://twitter.com/rjurney
>> http://linkedin.com/in/russelljurney
>>
>> On May 12, 2010, at 3:59 PM, Scott Wine <sc...@whitepages.com> wrote:
>>
>> Hello,
>>>
>>> I am trying to create a full address and full location field in Pig by
>>> combining multiple fields.
>>>
>>> file = LOAD 'file.txt' USING PigStorage() AS
>>> (house:chararray,
>>> predir:chararray,
>>> street:chararray,
>>> streettype:chararray,
>>> postdir:chararray
>>> city:chararray,
>>> state:chararray,
>>> zip:chararray)
>>>
>>> I need an output that is full address and full location:
>>>
>>> full_address == house + ' ' + predir + ' ' + street + ' ' + streettype +
>>> ' ' + postdir
>>> full_location == city + ' ' + state + ' ' + zip
>>>
>>> I can get two to merge with CONCAT using but am not able to add more or
>>> the spaces in between.
>>>
>>> Temp1 = FOREACH file GENERATE CONCAT (house,street)
>>>
>>> Any ideas?
>>>
>>> Thanks
>>> Scott
>>>
>>>
>>>
>
Re: CONCAT multiple fields
Posted by Alan Gates <ga...@yahoo-inc.com>.
Can't we just change the built-in CONCAT to accept additional fields?
This would be totally backward compatible. I know it won't help now.
Alan.
On May 12, 2010, at 4:15 PM, Russell Jurney wrote:
> The CONCAT in the oink project (LinkedIn's UDFs) does concatenation of
> any number of string arguments:
> http://github.com/criccomini/oink/blob/master/src/java/oink/udf/CONCAT.java
>
> We're going to merge this with elephant-bird when we get a chance, and
> this UDF could use a new name like MULTI_CONCAT, but it should work
> for you.
>
> Russell Jurney
> russell.jurney@gmail.com
> (404) 317-3620
> http://twitter.com/rjurney
> http://linkedin.com/in/russelljurney
>
> On May 12, 2010, at 3:59 PM, Scott Wine <sc...@whitepages.com> wrote:
>
>> Hello,
>>
>> I am trying to create a full address and full location field in Pig
>> by combining multiple fields.
>>
>> file = LOAD 'file.txt' USING PigStorage() AS
>> (house:chararray,
>> predir:chararray,
>> street:chararray,
>> streettype:chararray,
>> postdir:chararray
>> city:chararray,
>> state:chararray,
>> zip:chararray)
>>
>> I need an output that is full address and full location:
>>
>> full_address == house + ' ' + predir + ' ' + street + ' ' +
>> streettype + ' ' + postdir
>> full_location == city + ' ' + state + ' ' + zip
>>
>> I can get two to merge with CONCAT using but am not able to add
>> more or the spaces in between.
>>
>> Temp1 = FOREACH file GENERATE CONCAT (house,street)
>>
>> Any ideas?
>>
>> Thanks
>> Scott
>>
>>
Re: CONCAT multiple fields
Posted by Russell Jurney <ru...@gmail.com>.
The CONCAT in the oink project (LinkedIn's UDFs) does concatenation of
any number of string arguments:
http://github.com/criccomini/oink/blob/master/src/java/oink/udf/CONCAT.java
We're going to merge this with elephant-bird when we get a chance, and
this UDF could use a new name like MULTI_CONCAT, but it should work
for you.
Russell Jurney
russell.jurney@gmail.com
(404) 317-3620
http://twitter.com/rjurney
http://linkedin.com/in/russelljurney
On May 12, 2010, at 3:59 PM, Scott Wine <sc...@whitepages.com> wrote:
> Hello,
>
> I am trying to create a full address and full location field in Pig by combining multiple fields.
>
> file = LOAD 'file.txt' USING PigStorage() AS
> (house:chararray,
> predir:chararray,
> street:chararray,
> streettype:chararray,
> postdir:chararray
> city:chararray,
> state:chararray,
> zip:chararray)
>
> I need an output that is full address and full location:
>
> full_address == house + ' ' + predir + ' ' + street + ' ' + streettype + ' ' + postdir
> full_location == city + ' ' + state + ' ' + zip
>
> I can get two to merge with CONCAT using but am not able to add more or the spaces in between.
>
> Temp1 = FOREACH file GENERATE CONCAT (house,street)
>
> Any ideas?
>
> Thanks
> Scott
>
>
Re: CONCAT multiple fields
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
It's not the prettiest thing, but:
FOREACH file GENERATE CONCAT(CONCAT( CONCAT(house, ' '),
CONCAT(predir, ' ')), street)
(and so on)
A better solution would be to write a UDF that wraps StringBuilder,
and simply call
GENERATE StringBuilder(house, ' ', predir, ' ', street ....);
-D
On Wed, May 12, 2010 at 3:59 PM, Scott Wine <sc...@whitepages.com> wrote:
> Hello,
>
> I am trying to create a full address and full location field in Pig by combining multiple fields.
>
> file = LOAD 'file.txt' USING PigStorage() AS
> (house:chararray,
> predir:chararray,
> street:chararray,
> streettype:chararray,
> postdir:chararray
> city:chararray,
> state:chararray,
> zip:chararray)
>
> I need an output that is full address and full location:
>
> full_address == house + ' ' + predir + ' ' + street + ' ' + streettype + ' ' + postdir
> full_location == city + ' ' + state + ' ' + zip
>
> I can get two to merge with CONCAT using but am not able to add more or the spaces in between.
>
> Temp1 = FOREACH file GENERATE CONCAT (house,street)
>
> Any ideas?
>
> Thanks
> Scott
>
>
>