You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Tamir Kamara <ta...@gmail.com> on 2009/03/11 14:49:37 UTC

NOT IN ?

Hi,

Is it possible to filter one file by a key not present in the other (similar
to NOT IN or LEFT JOIN & IS NULL that can be done in DB) ?

Thanks,
Tamir

RE: pigpen from windows? ( whoami error )

Posted by Avram Aelony <Av...@eharmony.com>.
Unfortunately, yes.  
Neither c:\cygwin\bin nor c:\cygwin in my PATH resolves the error.  It still can't find 'whoami'.

-Avram


-----Original Message-----
From: Mridul Muralidharan [mailto:mridulm@yahoo-inc.com] 
Sent: Wednesday, March 11, 2009 11:31 PM
To: pig-user@hadoop.apache.org
Subject: Re: pigpen from windows? ( whoami error )


I have not tried it, but do you have c:\cygwin and c:\cygwin\bin in your 
PATH ?

- Mridul

Avram Aelony wrote:
> Dear pig users,
> 
> I currently use pig via the grunt shell, but would like to be able to also use the eclipse pigpen plugin.
> 
> I have pigpen installed on a win XP box and see the pig-menus in eclipse.  Pigpen appears to connect via org.apache.hadoop.fs.FileSystem correctly but then complains that "whoami" cannot be run.  
> 
> Since I actually happen to have cygwin installed with a whoami at c:\cygwin\whoami.exe ( /usr/bin/whoami from within the cygwin environment ) is there a way to tell pigpen to use c:\cygwin\whoami.exe?  Is there a better work-around?  Has anyone been able to use the pigpen plugin from windows to submit jobs to a distant hadoop cluster?
> 
> Many thanks for your help,
> Avram
> 
> 
> 
> javax.security.auth.login.LoginException: Login failed: Cannot run program "whoami": CreateProcess error=2, The system cannot find the file specified  
> 
> ...
> 
> Caused by: javax.security.auth.login.LoginException: Login failed: Cannot run program "whoami": CreateProcess error=2, The system cannot find the file specified
> 	at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
> 	at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
> 	at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:169)
> 	... 14 more
> 
> 


Re: pigpen from windows? ( whoami error )

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
I have not tried it, but do you have c:\cygwin and c:\cygwin\bin in your 
PATH ?

- Mridul

Avram Aelony wrote:
> Dear pig users,
> 
> I currently use pig via the grunt shell, but would like to be able to also use the eclipse pigpen plugin.
> 
> I have pigpen installed on a win XP box and see the pig-menus in eclipse.  Pigpen appears to connect via org.apache.hadoop.fs.FileSystem correctly but then complains that "whoami" cannot be run.  
> 
> Since I actually happen to have cygwin installed with a whoami at c:\cygwin\whoami.exe ( /usr/bin/whoami from within the cygwin environment ) is there a way to tell pigpen to use c:\cygwin\whoami.exe?  Is there a better work-around?  Has anyone been able to use the pigpen plugin from windows to submit jobs to a distant hadoop cluster?
> 
> Many thanks for your help,
> Avram
> 
> 
> 
> javax.security.auth.login.LoginException: Login failed: Cannot run program "whoami": CreateProcess error=2, The system cannot find the file specified  
> 
> ...
> 
> Caused by: javax.security.auth.login.LoginException: Login failed: Cannot run program "whoami": CreateProcess error=2, The system cannot find the file specified
> 	at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
> 	at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
> 	at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:169)
> 	... 14 more
> 
> 


pigpen from windows? ( whoami error )

Posted by Avram Aelony <Av...@eharmony.com>.
Dear pig users,

I currently use pig via the grunt shell, but would like to be able to also use the eclipse pigpen plugin.

I have pigpen installed on a win XP box and see the pig-menus in eclipse.  Pigpen appears to connect via org.apache.hadoop.fs.FileSystem correctly but then complains that "whoami" cannot be run.  

Since I actually happen to have cygwin installed with a whoami at c:\cygwin\whoami.exe ( /usr/bin/whoami from within the cygwin environment ) is there a way to tell pigpen to use c:\cygwin\whoami.exe?  Is there a better work-around?  Has anyone been able to use the pigpen plugin from windows to submit jobs to a distant hadoop cluster?

Many thanks for your help,
Avram



javax.security.auth.login.LoginException: Login failed: Cannot run program "whoami": CreateProcess error=2, The system cannot find the file specified  

...

Caused by: javax.security.auth.login.LoginException: Login failed: Cannot run program "whoami": CreateProcess error=2, The system cannot find the file specified
	at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
	at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
	at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:169)
	... 14 more



Re: NOT IN ?

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Read the problem and the script wrong - INNER wont help, true.
You are right, you would need what Alan has proposed.

Regards,
Mridul

Tamir Kamara wrote:
> Thanks,
> 
> I read in the manual that INNER ensures that only bags with at least one
> tuple is returned, but I need to return bags with zero tuples (key not in
> the fltr file). So it seems that INNER won't help me.
> There's an OUTER keyword but with no explanation of what it does. will that
> be good is this case ?
> 
> On Wed, Mar 11, 2009 at 5:59 PM, Mridul Muralidharan
> <mr...@yahoo-inc.com>wrote:
> 
>> Use of INNER should remove the need for filter ?
>>
>> - Mridul
>>
>>
>> Alan Gates wrote:
>>
>>> I think this will do what you want.  It cogroups the two files, filters
>>> out entries where the bag for the filter_file is not empty, and then returns
>>> just entries from the main file.
>>>
>>> Alan.
>>>
>>> main = load 'main_file';
>>> fltr = load 'filter_file';
>>> grpd = cogroup main by $0, fltr by $0; -- or replace $0 with whatever your
>>> key is
>>> fltrd = filter grpd by COUNT(fltr) = 0;
>>> rslt = foreach fltrd generate flatten(main);
>>>
>>>
>>> On Mar 11, 2009, at 6:49 AM, Tamir Kamara wrote:
>>>
>>>  Hi,
>>>> Is it possible to filter one file by a key not present in the other
>>>> (similar
>>>> to NOT IN or LEFT JOIN & IS NULL that can be done in DB) ?
>>>>
>>>> Thanks,
>>>> Tamir
>>>>
>>>
> 


Re: NOT IN ?

Posted by Alan Gates <ga...@yahoo-inc.com>.
Doesn't matter.  Underneath they both call bag.size() to find the size  
of the bag.  And this call causes very little overhead because the bag  
counts as elements are inserted.

Alan.

On Mar 12, 2009, at 9:13 AM, Tamir Kamara wrote:

> Thanks Alan.
>
> In your original reply you did: filter grpd by COUNT(fltr) = 0. I  
> saw that
> there's another way of doing this with: filter grpd by IsEmpty(fltr)
> Would IsEmpty preform better since I don't really need the exact  
> count ?
>
> Tamir
> On Thu, Mar 12, 2009 at 5:47 PM, Alan Gates <ga...@yahoo-inc.com>  
> wrote:
>
>> COGROUP is outer by default, so you need not add it.
>>
>> Alan.
>>
>>
>> On Mar 12, 2009, at 12:21 AM, Tamir Kamara wrote:
>>
>> Thanks,
>>>
>>> I read in the manual that INNER ensures that only bags with at  
>>> least one
>>> tuple is returned, but I need to return bags with zero tuples (key  
>>> not in
>>> the fltr file). So it seems that INNER won't help me.
>>> There's an OUTER keyword but with no explanation of what it does.  
>>> will
>>> that
>>> be good is this case ?
>>>
>>> On Wed, Mar 11, 2009 at 5:59 PM, Mridul Muralidharan
>>> <mr...@yahoo-inc.com>wrote:
>>>
>>>
>>>> Use of INNER should remove the need for filter ?
>>>>
>>>> - Mridul
>>>>
>>>>
>>>> Alan Gates wrote:
>>>>
>>>> I think this will do what you want.  It cogroups the two files,  
>>>> filters
>>>>> out entries where the bag for the filter_file is not empty, and  
>>>>> then
>>>>> returns
>>>>> just entries from the main file.
>>>>>
>>>>> Alan.
>>>>>
>>>>> main = load 'main_file';
>>>>> fltr = load 'filter_file';
>>>>> grpd = cogroup main by $0, fltr by $0; -- or replace $0 with  
>>>>> whatever
>>>>> your
>>>>> key is
>>>>> fltrd = filter grpd by COUNT(fltr) = 0;
>>>>> rslt = foreach fltrd generate flatten(main);
>>>>>
>>>>>
>>>>> On Mar 11, 2009, at 6:49 AM, Tamir Kamara wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>>
>>>>>> Is it possible to filter one file by a key not present in the  
>>>>>> other
>>>>>> (similar
>>>>>> to NOT IN or LEFT JOIN & IS NULL that can be done in DB) ?
>>>>>>
>>>>>> Thanks,
>>>>>> Tamir
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>


Re: NOT IN ?

Posted by Tamir Kamara <ta...@gmail.com>.
Thanks Alan.

In your original reply you did: filter grpd by COUNT(fltr) = 0. I saw that
there's another way of doing this with: filter grpd by IsEmpty(fltr)
Would IsEmpty preform better since I don't really need the exact count ?

Tamir
On Thu, Mar 12, 2009 at 5:47 PM, Alan Gates <ga...@yahoo-inc.com> wrote:

> COGROUP is outer by default, so you need not add it.
>
> Alan.
>
>
> On Mar 12, 2009, at 12:21 AM, Tamir Kamara wrote:
>
> Thanks,
>>
>> I read in the manual that INNER ensures that only bags with at least one
>> tuple is returned, but I need to return bags with zero tuples (key not in
>> the fltr file). So it seems that INNER won't help me.
>> There's an OUTER keyword but with no explanation of what it does. will
>> that
>> be good is this case ?
>>
>> On Wed, Mar 11, 2009 at 5:59 PM, Mridul Muralidharan
>> <mr...@yahoo-inc.com>wrote:
>>
>>
>>> Use of INNER should remove the need for filter ?
>>>
>>> - Mridul
>>>
>>>
>>> Alan Gates wrote:
>>>
>>> I think this will do what you want.  It cogroups the two files, filters
>>>> out entries where the bag for the filter_file is not empty, and then
>>>> returns
>>>> just entries from the main file.
>>>>
>>>> Alan.
>>>>
>>>> main = load 'main_file';
>>>> fltr = load 'filter_file';
>>>> grpd = cogroup main by $0, fltr by $0; -- or replace $0 with whatever
>>>> your
>>>> key is
>>>> fltrd = filter grpd by COUNT(fltr) = 0;
>>>> rslt = foreach fltrd generate flatten(main);
>>>>
>>>>
>>>> On Mar 11, 2009, at 6:49 AM, Tamir Kamara wrote:
>>>>
>>>> Hi,
>>>>
>>>>>
>>>>> Is it possible to filter one file by a key not present in the other
>>>>> (similar
>>>>> to NOT IN or LEFT JOIN & IS NULL that can be done in DB) ?
>>>>>
>>>>> Thanks,
>>>>> Tamir
>>>>>
>>>>>
>>>>
>>>>
>>>
>

Re: NOT IN ?

Posted by Alan Gates <ga...@yahoo-inc.com>.
COGROUP is outer by default, so you need not add it.

Alan.

On Mar 12, 2009, at 12:21 AM, Tamir Kamara wrote:

> Thanks,
>
> I read in the manual that INNER ensures that only bags with at least  
> one
> tuple is returned, but I need to return bags with zero tuples (key  
> not in
> the fltr file). So it seems that INNER won't help me.
> There's an OUTER keyword but with no explanation of what it does.  
> will that
> be good is this case ?
>
> On Wed, Mar 11, 2009 at 5:59 PM, Mridul Muralidharan
> <mr...@yahoo-inc.com>wrote:
>
>>
>> Use of INNER should remove the need for filter ?
>>
>> - Mridul
>>
>>
>> Alan Gates wrote:
>>
>>> I think this will do what you want.  It cogroups the two files,  
>>> filters
>>> out entries where the bag for the filter_file is not empty, and  
>>> then returns
>>> just entries from the main file.
>>>
>>> Alan.
>>>
>>> main = load 'main_file';
>>> fltr = load 'filter_file';
>>> grpd = cogroup main by $0, fltr by $0; -- or replace $0 with  
>>> whatever your
>>> key is
>>> fltrd = filter grpd by COUNT(fltr) = 0;
>>> rslt = foreach fltrd generate flatten(main);
>>>
>>>
>>> On Mar 11, 2009, at 6:49 AM, Tamir Kamara wrote:
>>>
>>> Hi,
>>>>
>>>> Is it possible to filter one file by a key not present in the other
>>>> (similar
>>>> to NOT IN or LEFT JOIN & IS NULL that can be done in DB) ?
>>>>
>>>> Thanks,
>>>> Tamir
>>>>
>>>
>>>
>>


Re: NOT IN ?

Posted by Tamir Kamara <ta...@gmail.com>.
Thanks,

I read in the manual that INNER ensures that only bags with at least one
tuple is returned, but I need to return bags with zero tuples (key not in
the fltr file). So it seems that INNER won't help me.
There's an OUTER keyword but with no explanation of what it does. will that
be good is this case ?

On Wed, Mar 11, 2009 at 5:59 PM, Mridul Muralidharan
<mr...@yahoo-inc.com>wrote:

>
> Use of INNER should remove the need for filter ?
>
> - Mridul
>
>
> Alan Gates wrote:
>
>> I think this will do what you want.  It cogroups the two files, filters
>> out entries where the bag for the filter_file is not empty, and then returns
>> just entries from the main file.
>>
>> Alan.
>>
>> main = load 'main_file';
>> fltr = load 'filter_file';
>> grpd = cogroup main by $0, fltr by $0; -- or replace $0 with whatever your
>> key is
>> fltrd = filter grpd by COUNT(fltr) = 0;
>> rslt = foreach fltrd generate flatten(main);
>>
>>
>> On Mar 11, 2009, at 6:49 AM, Tamir Kamara wrote:
>>
>>  Hi,
>>>
>>> Is it possible to filter one file by a key not present in the other
>>> (similar
>>> to NOT IN or LEFT JOIN & IS NULL that can be done in DB) ?
>>>
>>> Thanks,
>>> Tamir
>>>
>>
>>
>

Re: NOT IN ?

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Use of INNER should remove the need for filter ?

- Mridul

Alan Gates wrote:
> I think this will do what you want.  It cogroups the two files, filters 
> out entries where the bag for the filter_file is not empty, and then 
> returns just entries from the main file.
> 
> Alan.
> 
> main = load 'main_file';
> fltr = load 'filter_file';
> grpd = cogroup main by $0, fltr by $0; -- or replace $0 with whatever 
> your key is
> fltrd = filter grpd by COUNT(fltr) = 0;
> rslt = foreach fltrd generate flatten(main);
> 
> 
> On Mar 11, 2009, at 6:49 AM, Tamir Kamara wrote:
> 
>> Hi,
>>
>> Is it possible to filter one file by a key not present in the other 
>> (similar
>> to NOT IN or LEFT JOIN & IS NULL that can be done in DB) ?
>>
>> Thanks,
>> Tamir
> 


Re: NOT IN ?

Posted by Alan Gates <ga...@yahoo-inc.com>.
I think this will do what you want.  It cogroups the two files,  
filters out entries where the bag for the filter_file is not empty,  
and then returns just entries from the main file.

Alan.

main = load 'main_file';
fltr = load 'filter_file';
grpd = cogroup main by $0, fltr by $0; -- or replace $0 with whatever  
your key is
fltrd = filter grpd by COUNT(fltr) = 0;
rslt = foreach fltrd generate flatten(main);


On Mar 11, 2009, at 6:49 AM, Tamir Kamara wrote:

> Hi,
>
> Is it possible to filter one file by a key not present in the other  
> (similar
> to NOT IN or LEFT JOIN & IS NULL that can be done in DB) ?
>
> Thanks,
> Tamir