You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Vikas Jadhav <vi...@gmail.com> on 2013/04/09 07:15:50 UTC

Re: How can I record some position of context in Reduce()?

Hi
I am also woring on join using MapReduce
i think instead of finding postion of table in RawKeyValuIterator.
what we can do modify context.write method to alway write key as table name
or id
then we dont need to find postion we can get Key and Value from
"reducerContext"

befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
method join in Reducer.java Reducer class and give call to
reducer.join(reduceContext)


I just wonder how r going to support NON EQUI join.

I am also having same problem how to do join if datasets cant fit in to
memory.


for now i am cloning using following code :


KEYIN key = context.getCurrentKey() ;
KEYIN outKey = null;
try {
    outKey = (KEYIN)key.getClass().newInstance();
   }
catch(Exception e)
 {}
ReflectionUtils.copy(context.getConfiguration(), key, outKey);

 Iterable<VALUEIN> values = context.getValues();
 ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
 for(VALUEIN value: values) {
   VALUEIN outValue = null;
    try {
         outValue = (VALUEIN)value.getClass().newInstance();
   }
   catch(Exception e)    {}
   ReflectionUtils.copy(context.getConfiguration(), value, outValue);
 }


if you have found any other solution please feel free to share

Thank You.







On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:

> In reduce() we have:
>
> key1 values1
> key2 values2
> ...
> keyn valuesn
>
> so,what i want to do is join all values like a SQL:
>
> select * from values1,values2...valuesn;
>
> if memory is not enough to cache values,how to complete the join operation?
> my idea is clone the reducecontext,but it maybe not easy.
>
> Any help will be appreciated.
>
>
> 2013/3/13 Roth Effy <ef...@gmail.com>
>
>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks
>> like only support equal join.
>> I want a non-equal join,but I have no idea now.
>>
>>
>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>
>>> you want a n:n join or 1:n join?
>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>
>>>> I want to join two table data in reducer.So I need to find the start of
>>>> the table.
>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>
>>>>
>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>
>>>>> you cannot use RecordReader in Reducer.
>>>>>
>>>>> what's the mean of you want get the record position? I cannot
>>>>> understand, can you give a simple example?
>>>>>
>>>>>
>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
>>>>>
>>>>>> sorry，I still can't understand how to use recordreader in the
>>>>>> reduce(),because the input is a RawKeyValueIterator in the class
>>>>>> reducecontext.so,I'm confused.
>>>>>> anyway,thank you.
>>>>>>
>>>>>>
>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>
>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi,everyone,
>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record
>>>>>>>> position?
>>>>>>>> Now,what I thought is to save the context status,but class Context
>>>>>>>> doesn't implement a clone construct method.
>>>>>>>>
>>>>>>>> Any help will be appreciated.
>>>>>>>> Thank you very much.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>


-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Re: How can I record some position of context in Reduce()?

Posted by Michel Segel <mi...@hotmail.com>.

Theta joins don't do we'll in any system.

Are both tables large?
If not its a map side join and the reducer will be just an ordinary reducer(s).


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 11, 2013, at 12:18 AM, Vikas Jadhav <vi...@gmail.com> wrote:

> I wil express it in SQL form
>  
> select * from table1, table2 where table1.attr < table2.attr
>  
> it is also called theta join where theta can be <, >, <=,>=,!=
>  
> 
> 
> On Wed, Apr 10, 2013 at 9:35 PM, Michel Segel <mi...@hotmail.com> wrote:
>> Not sure what is meant by a non equi join.
>> 
>> Are you saying something like for every row in X, join it to all of the rows in Y where Y.a < something?
>> 
>> Is that what you are suggesting?
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Apr 10, 2013, at 9:11 AM, Vikas Jadhav <vi...@gmail.com> wrote:
>> 
>>> How are you going to support NON EQUI Join using MapReduce ?
>>> As per my understanding there is only one way to do this is
>>> to bring all data to one reducer then reducer will know lesser/greater
>>> values correctly.
>>> Correct me if I am wrong.
>>> Thank You.
>>>  
>>>   Regards,
>>>   Vikas
>>>  
>>> 
>>> 
>>> On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <mi...@hotmail.com> wrote:
>>>> Can you show an example of your join?
>>>> All joins are an equality in that the key has to match.
>>>> Whether its a one to one , one to many, or many to many remains to be seen.
>>>> 
>>>> 
>>>> Sent from a remote device. Please excuse any typos...
>>>> 
>>>> Mike Segel
>>>> 
>>>> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:
>>>> 
>>>>> Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. Also, more than two tables can be joined in Hive.
>>>>> 
>>>>> 
>>>>> 2013/4/9 Michael Segel <mi...@hotmail.com>
>>>>>> Hi,
>>>>>> 
>>>>>> Your cross join is supported in both pig and hive. (Cross, and Theta joins) 
>>>>>> 
>>>>>> So there must be code to do this. 
>>>>>> 
>>>>>> Essentially in the reducer you would have your key and then the set of rows that match the key. You would then perform the cross product on the key's result set and output them to the collector as separate rows. 
>>>>>> 
>>>>>> I'm not sure why you would need the reduce context. 
>>>>>> 
>>>>>> But then again, I'm still on my first cup of coffee. ;-)
>>>>>> 
>>>>>> 
>>>>>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com> wrote:
>>>>>> 
>>>>>>> Hi
>>>>>>> I am also woring on join using MapReduce
>>>>>>> i think instead of finding postion of table in RawKeyValuIterator.
>>>>>>> what we can do modify context.write method to alway write key as table name or id
>>>>>>> then we dont need to find postion we can get Key and Value from "reducerContext"
>>>>>>>  
>>>>>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>>>>>> method join in Reducer.java Reducer class and give call to reducer.join(reduceContext)
>>>>>>>  
>>>>>>>  
>>>>>>> I just wonder how r going to support NON EQUI join.
>>>>>>>  
>>>>>>> I am also having same problem how to do join if datasets cant fit in to memory.
>>>>>>>  
>>>>>>>  
>>>>>>> for now i am cloning using following code :
>>>>>>>  
>>>>>>>  
>>>>>>> KEYIN key = context.getCurrentKey() ;
>>>>>>> KEYIN outKey = null;
>>>>>>> try {
>>>>>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>>>>>    }
>>>>>>> catch(Exception e)
>>>>>>>  {}         
>>>>>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
>>>>>>> 
>>>>>>>  Iterable<VALUEIN> values = context.getValues();
>>>>>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>>>>>  for(VALUEIN value: values) {        
>>>>>>>    VALUEIN outValue = null;
>>>>>>>     try {
>>>>>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>>>>>    }
>>>>>>>    catch(Exception e)    {}          
>>>>>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>>>>>  }
>>>>>>>  
>>>>>>>  
>>>>>>> if you have found any other solution please feel free to share
>>>>>>>  
>>>>>>> Thank You.
>>>>>>>  
>>>>>>>        
>>>>>>>  
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>> In reduce() we have:
>>>>>>>> 
>>>>>>>> key1 values1
>>>>>>>> key2 values2
>>>>>>>> ...
>>>>>>>> keyn valuesn
>>>>>>>> 
>>>>>>>> so,what i want to do is join all values like a SQL:
>>>>>>>> 
>>>>>>>> select * from values1,values2...valuesn;
>>>>>>>> 
>>>>>>>> if memory is not enough to cache values,how to complete the join operation?
>>>>>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>>>>>> 
>>>>>>>> Any help will be appreciated.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>>>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks like only support equal join.
>>>>>>>>> I want a non-equal join,but I have no idea now.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>>>> you want a n:n join or 1:n join?
>>>>>>>>>> 
>>>>>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>>>>>>> I want to join two table data in reducer.So I need to find the start of the table.
>>>>>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>>>>>  
>>>>>>>>>>>> what's the mean of you want get the record position? I cannot understand, can you give a simple example?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>>>> sorry，I still can't understand how to use recordreader in the reduce(),because the input is a RawKeyValueIterator in the class reducecontext.so,I'm confused.
>>>>>>>>>>>>> anyway,thank you.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record position?
>>>>>>>>>>>>>>> Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>>>>>> Thank you very much.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> 
>>>>>>> 
>>>>>>> Thanx and Regards
>>>>>>>  Vikas Jadhav
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> 
>>> Thanx and Regards
>>>  Vikas Jadhav
> 
> 
> 
> -- 
> 
> 
>   Regards,
>    Vikas

Re: How can I record some position of context in Reduce()?

Posted by Michel Segel <mi...@hotmail.com>.

Theta joins don't do we'll in any system.

Are both tables large?
If not its a map side join and the reducer will be just an ordinary reducer(s).


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 11, 2013, at 12:18 AM, Vikas Jadhav <vi...@gmail.com> wrote:

> I wil express it in SQL form
>  
> select * from table1, table2 where table1.attr < table2.attr
>  
> it is also called theta join where theta can be <, >, <=,>=,!=
>  
> 
> 
> On Wed, Apr 10, 2013 at 9:35 PM, Michel Segel <mi...@hotmail.com> wrote:
>> Not sure what is meant by a non equi join.
>> 
>> Are you saying something like for every row in X, join it to all of the rows in Y where Y.a < something?
>> 
>> Is that what you are suggesting?
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Apr 10, 2013, at 9:11 AM, Vikas Jadhav <vi...@gmail.com> wrote:
>> 
>>> How are you going to support NON EQUI Join using MapReduce ?
>>> As per my understanding there is only one way to do this is
>>> to bring all data to one reducer then reducer will know lesser/greater
>>> values correctly.
>>> Correct me if I am wrong.
>>> Thank You.
>>>  
>>>   Regards,
>>>   Vikas
>>>  
>>> 
>>> 
>>> On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <mi...@hotmail.com> wrote:
>>>> Can you show an example of your join?
>>>> All joins are an equality in that the key has to match.
>>>> Whether its a one to one , one to many, or many to many remains to be seen.
>>>> 
>>>> 
>>>> Sent from a remote device. Please excuse any typos...
>>>> 
>>>> Mike Segel
>>>> 
>>>> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:
>>>> 
>>>>> Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. Also, more than two tables can be joined in Hive.
>>>>> 
>>>>> 
>>>>> 2013/4/9 Michael Segel <mi...@hotmail.com>
>>>>>> Hi,
>>>>>> 
>>>>>> Your cross join is supported in both pig and hive. (Cross, and Theta joins) 
>>>>>> 
>>>>>> So there must be code to do this. 
>>>>>> 
>>>>>> Essentially in the reducer you would have your key and then the set of rows that match the key. You would then perform the cross product on the key's result set and output them to the collector as separate rows. 
>>>>>> 
>>>>>> I'm not sure why you would need the reduce context. 
>>>>>> 
>>>>>> But then again, I'm still on my first cup of coffee. ;-)
>>>>>> 
>>>>>> 
>>>>>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com> wrote:
>>>>>> 
>>>>>>> Hi
>>>>>>> I am also woring on join using MapReduce
>>>>>>> i think instead of finding postion of table in RawKeyValuIterator.
>>>>>>> what we can do modify context.write method to alway write key as table name or id
>>>>>>> then we dont need to find postion we can get Key and Value from "reducerContext"
>>>>>>>  
>>>>>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>>>>>> method join in Reducer.java Reducer class and give call to reducer.join(reduceContext)
>>>>>>>  
>>>>>>>  
>>>>>>> I just wonder how r going to support NON EQUI join.
>>>>>>>  
>>>>>>> I am also having same problem how to do join if datasets cant fit in to memory.
>>>>>>>  
>>>>>>>  
>>>>>>> for now i am cloning using following code :
>>>>>>>  
>>>>>>>  
>>>>>>> KEYIN key = context.getCurrentKey() ;
>>>>>>> KEYIN outKey = null;
>>>>>>> try {
>>>>>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>>>>>    }
>>>>>>> catch(Exception e)
>>>>>>>  {}         
>>>>>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
>>>>>>> 
>>>>>>>  Iterable<VALUEIN> values = context.getValues();
>>>>>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>>>>>  for(VALUEIN value: values) {        
>>>>>>>    VALUEIN outValue = null;
>>>>>>>     try {
>>>>>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>>>>>    }
>>>>>>>    catch(Exception e)    {}          
>>>>>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>>>>>  }
>>>>>>>  
>>>>>>>  
>>>>>>> if you have found any other solution please feel free to share
>>>>>>>  
>>>>>>> Thank You.
>>>>>>>  
>>>>>>>        
>>>>>>>  
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>> In reduce() we have:
>>>>>>>> 
>>>>>>>> key1 values1
>>>>>>>> key2 values2
>>>>>>>> ...
>>>>>>>> keyn valuesn
>>>>>>>> 
>>>>>>>> so,what i want to do is join all values like a SQL:
>>>>>>>> 
>>>>>>>> select * from values1,values2...valuesn;
>>>>>>>> 
>>>>>>>> if memory is not enough to cache values,how to complete the join operation?
>>>>>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>>>>>> 
>>>>>>>> Any help will be appreciated.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>>>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks like only support equal join.
>>>>>>>>> I want a non-equal join,but I have no idea now.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>>>> you want a n:n join or 1:n join?
>>>>>>>>>> 
>>>>>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>>>>>>> I want to join two table data in reducer.So I need to find the start of the table.
>>>>>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>>>>>  
>>>>>>>>>>>> what's the mean of you want get the record position? I cannot understand, can you give a simple example?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>>>> sorry，I still can't understand how to use recordreader in the reduce(),because the input is a RawKeyValueIterator in the class reducecontext.so,I'm confused.
>>>>>>>>>>>>> anyway,thank you.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record position?
>>>>>>>>>>>>>>> Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>>>>>> Thank you very much.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> 
>>>>>>> 
>>>>>>> Thanx and Regards
>>>>>>>  Vikas Jadhav
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> 
>>> Thanx and Regards
>>>  Vikas Jadhav
> 
> 
> 
> -- 
> 
> 
>   Regards,
>    Vikas

Re: How can I record some position of context in Reduce()?

Posted by Michel Segel <mi...@hotmail.com>.

Theta joins don't do we'll in any system.

Are both tables large?
If not its a map side join and the reducer will be just an ordinary reducer(s).


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 11, 2013, at 12:18 AM, Vikas Jadhav <vi...@gmail.com> wrote:

> I wil express it in SQL form
>  
> select * from table1, table2 where table1.attr < table2.attr
>  
> it is also called theta join where theta can be <, >, <=,>=,!=
>  
> 
> 
> On Wed, Apr 10, 2013 at 9:35 PM, Michel Segel <mi...@hotmail.com> wrote:
>> Not sure what is meant by a non equi join.
>> 
>> Are you saying something like for every row in X, join it to all of the rows in Y where Y.a < something?
>> 
>> Is that what you are suggesting?
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Apr 10, 2013, at 9:11 AM, Vikas Jadhav <vi...@gmail.com> wrote:
>> 
>>> How are you going to support NON EQUI Join using MapReduce ?
>>> As per my understanding there is only one way to do this is
>>> to bring all data to one reducer then reducer will know lesser/greater
>>> values correctly.
>>> Correct me if I am wrong.
>>> Thank You.
>>>  
>>>   Regards,
>>>   Vikas
>>>  
>>> 
>>> 
>>> On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <mi...@hotmail.com> wrote:
>>>> Can you show an example of your join?
>>>> All joins are an equality in that the key has to match.
>>>> Whether its a one to one , one to many, or many to many remains to be seen.
>>>> 
>>>> 
>>>> Sent from a remote device. Please excuse any typos...
>>>> 
>>>> Mike Segel
>>>> 
>>>> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:
>>>> 
>>>>> Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. Also, more than two tables can be joined in Hive.
>>>>> 
>>>>> 
>>>>> 2013/4/9 Michael Segel <mi...@hotmail.com>
>>>>>> Hi,
>>>>>> 
>>>>>> Your cross join is supported in both pig and hive. (Cross, and Theta joins) 
>>>>>> 
>>>>>> So there must be code to do this. 
>>>>>> 
>>>>>> Essentially in the reducer you would have your key and then the set of rows that match the key. You would then perform the cross product on the key's result set and output them to the collector as separate rows. 
>>>>>> 
>>>>>> I'm not sure why you would need the reduce context. 
>>>>>> 
>>>>>> But then again, I'm still on my first cup of coffee. ;-)
>>>>>> 
>>>>>> 
>>>>>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com> wrote:
>>>>>> 
>>>>>>> Hi
>>>>>>> I am also woring on join using MapReduce
>>>>>>> i think instead of finding postion of table in RawKeyValuIterator.
>>>>>>> what we can do modify context.write method to alway write key as table name or id
>>>>>>> then we dont need to find postion we can get Key and Value from "reducerContext"
>>>>>>>  
>>>>>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>>>>>> method join in Reducer.java Reducer class and give call to reducer.join(reduceContext)
>>>>>>>  
>>>>>>>  
>>>>>>> I just wonder how r going to support NON EQUI join.
>>>>>>>  
>>>>>>> I am also having same problem how to do join if datasets cant fit in to memory.
>>>>>>>  
>>>>>>>  
>>>>>>> for now i am cloning using following code :
>>>>>>>  
>>>>>>>  
>>>>>>> KEYIN key = context.getCurrentKey() ;
>>>>>>> KEYIN outKey = null;
>>>>>>> try {
>>>>>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>>>>>    }
>>>>>>> catch(Exception e)
>>>>>>>  {}         
>>>>>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
>>>>>>> 
>>>>>>>  Iterable<VALUEIN> values = context.getValues();
>>>>>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>>>>>  for(VALUEIN value: values) {        
>>>>>>>    VALUEIN outValue = null;
>>>>>>>     try {
>>>>>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>>>>>    }
>>>>>>>    catch(Exception e)    {}          
>>>>>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>>>>>  }
>>>>>>>  
>>>>>>>  
>>>>>>> if you have found any other solution please feel free to share
>>>>>>>  
>>>>>>> Thank You.
>>>>>>>  
>>>>>>>        
>>>>>>>  
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>> In reduce() we have:
>>>>>>>> 
>>>>>>>> key1 values1
>>>>>>>> key2 values2
>>>>>>>> ...
>>>>>>>> keyn valuesn
>>>>>>>> 
>>>>>>>> so,what i want to do is join all values like a SQL:
>>>>>>>> 
>>>>>>>> select * from values1,values2...valuesn;
>>>>>>>> 
>>>>>>>> if memory is not enough to cache values,how to complete the join operation?
>>>>>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>>>>>> 
>>>>>>>> Any help will be appreciated.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>>>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks like only support equal join.
>>>>>>>>> I want a non-equal join,but I have no idea now.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>>>> you want a n:n join or 1:n join?
>>>>>>>>>> 
>>>>>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>>>>>>> I want to join two table data in reducer.So I need to find the start of the table.
>>>>>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>>>>>  
>>>>>>>>>>>> what's the mean of you want get the record position? I cannot understand, can you give a simple example?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>>>> sorry，I still can't understand how to use recordreader in the reduce(),because the input is a RawKeyValueIterator in the class reducecontext.so,I'm confused.
>>>>>>>>>>>>> anyway,thank you.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record position?
>>>>>>>>>>>>>>> Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>>>>>> Thank you very much.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> 
>>>>>>> 
>>>>>>> Thanx and Regards
>>>>>>>  Vikas Jadhav
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> 
>>> Thanx and Regards
>>>  Vikas Jadhav
> 
> 
> 
> -- 
> 
> 
>   Regards,
>    Vikas

Re: How can I record some position of context in Reduce()?

Posted by Michel Segel <mi...@hotmail.com>.

Theta joins don't do we'll in any system.

Are both tables large?
If not its a map side join and the reducer will be just an ordinary reducer(s).


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 11, 2013, at 12:18 AM, Vikas Jadhav <vi...@gmail.com> wrote:

> I wil express it in SQL form
>  
> select * from table1, table2 where table1.attr < table2.attr
>  
> it is also called theta join where theta can be <, >, <=,>=,!=
>  
> 
> 
> On Wed, Apr 10, 2013 at 9:35 PM, Michel Segel <mi...@hotmail.com> wrote:
>> Not sure what is meant by a non equi join.
>> 
>> Are you saying something like for every row in X, join it to all of the rows in Y where Y.a < something?
>> 
>> Is that what you are suggesting?
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Apr 10, 2013, at 9:11 AM, Vikas Jadhav <vi...@gmail.com> wrote:
>> 
>>> How are you going to support NON EQUI Join using MapReduce ?
>>> As per my understanding there is only one way to do this is
>>> to bring all data to one reducer then reducer will know lesser/greater
>>> values correctly.
>>> Correct me if I am wrong.
>>> Thank You.
>>>  
>>>   Regards,
>>>   Vikas
>>>  
>>> 
>>> 
>>> On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <mi...@hotmail.com> wrote:
>>>> Can you show an example of your join?
>>>> All joins are an equality in that the key has to match.
>>>> Whether its a one to one , one to many, or many to many remains to be seen.
>>>> 
>>>> 
>>>> Sent from a remote device. Please excuse any typos...
>>>> 
>>>> Mike Segel
>>>> 
>>>> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:
>>>> 
>>>>> Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. Also, more than two tables can be joined in Hive.
>>>>> 
>>>>> 
>>>>> 2013/4/9 Michael Segel <mi...@hotmail.com>
>>>>>> Hi,
>>>>>> 
>>>>>> Your cross join is supported in both pig and hive. (Cross, and Theta joins) 
>>>>>> 
>>>>>> So there must be code to do this. 
>>>>>> 
>>>>>> Essentially in the reducer you would have your key and then the set of rows that match the key. You would then perform the cross product on the key's result set and output them to the collector as separate rows. 
>>>>>> 
>>>>>> I'm not sure why you would need the reduce context. 
>>>>>> 
>>>>>> But then again, I'm still on my first cup of coffee. ;-)
>>>>>> 
>>>>>> 
>>>>>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com> wrote:
>>>>>> 
>>>>>>> Hi
>>>>>>> I am also woring on join using MapReduce
>>>>>>> i think instead of finding postion of table in RawKeyValuIterator.
>>>>>>> what we can do modify context.write method to alway write key as table name or id
>>>>>>> then we dont need to find postion we can get Key and Value from "reducerContext"
>>>>>>>  
>>>>>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>>>>>> method join in Reducer.java Reducer class and give call to reducer.join(reduceContext)
>>>>>>>  
>>>>>>>  
>>>>>>> I just wonder how r going to support NON EQUI join.
>>>>>>>  
>>>>>>> I am also having same problem how to do join if datasets cant fit in to memory.
>>>>>>>  
>>>>>>>  
>>>>>>> for now i am cloning using following code :
>>>>>>>  
>>>>>>>  
>>>>>>> KEYIN key = context.getCurrentKey() ;
>>>>>>> KEYIN outKey = null;
>>>>>>> try {
>>>>>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>>>>>    }
>>>>>>> catch(Exception e)
>>>>>>>  {}         
>>>>>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
>>>>>>> 
>>>>>>>  Iterable<VALUEIN> values = context.getValues();
>>>>>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>>>>>  for(VALUEIN value: values) {        
>>>>>>>    VALUEIN outValue = null;
>>>>>>>     try {
>>>>>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>>>>>    }
>>>>>>>    catch(Exception e)    {}          
>>>>>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>>>>>  }
>>>>>>>  
>>>>>>>  
>>>>>>> if you have found any other solution please feel free to share
>>>>>>>  
>>>>>>> Thank You.
>>>>>>>  
>>>>>>>        
>>>>>>>  
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>> In reduce() we have:
>>>>>>>> 
>>>>>>>> key1 values1
>>>>>>>> key2 values2
>>>>>>>> ...
>>>>>>>> keyn valuesn
>>>>>>>> 
>>>>>>>> so,what i want to do is join all values like a SQL:
>>>>>>>> 
>>>>>>>> select * from values1,values2...valuesn;
>>>>>>>> 
>>>>>>>> if memory is not enough to cache values,how to complete the join operation?
>>>>>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>>>>>> 
>>>>>>>> Any help will be appreciated.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>>>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks like only support equal join.
>>>>>>>>> I want a non-equal join,but I have no idea now.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>>>> you want a n:n join or 1:n join?
>>>>>>>>>> 
>>>>>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>>>>>>> I want to join two table data in reducer.So I need to find the start of the table.
>>>>>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>>>>>  
>>>>>>>>>>>> what's the mean of you want get the record position? I cannot understand, can you give a simple example?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>>>> sorry，I still can't understand how to use recordreader in the reduce(),because the input is a RawKeyValueIterator in the class reducecontext.so,I'm confused.
>>>>>>>>>>>>> anyway,thank you.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record position?
>>>>>>>>>>>>>>> Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>>>>>> Thank you very much.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> 
>>>>>>> 
>>>>>>> Thanx and Regards
>>>>>>>  Vikas Jadhav
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> 
>>> Thanx and Regards
>>>  Vikas Jadhav
> 
> 
> 
> -- 
> 
> 
>   Regards,
>    Vikas

Re: How can I record some position of context in Reduce()?

Posted by Vikas Jadhav <vi...@gmail.com>.

I wil express it in SQL form

select * from table1, table2 where table1.attr < table2.attr

it is also called theta join where theta can be <, >, <=,>=,!=



On Wed, Apr 10, 2013 at 9:35 PM, Michel Segel <mi...@hotmail.com>wrote:

> Not sure what is meant by a non equi join.
>
> Are you saying something like for every row in X, join it to all of the
> rows in Y where Y.a < something?
>
> Is that what you are suggesting?
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Apr 10, 2013, at 9:11 AM, Vikas Jadhav <vi...@gmail.com>
> wrote:
>
> How are you going to support NON EQUI Join using MapReduce ?
> As per my understanding there is only one way to do this is
> to bring all data to one reducer then reducer will know lesser/greater
> values correctly.
> Correct me if I am wrong.
> Thank You.
>
> *  Regards,*
> *  Vikas *
>
>
>
> On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <mi...@hotmail.com>wrote:
>
>> Can you show an example of your join?
>> All joins are an equality in that the key has to match.
>> Whether its a one to one , one to many, or many to many remains to be
>> seen.
>>
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:
>>
>> Only equality joins, outer joins, and left semi joins are supported in
>> Hive. Hive does not support join conditions that are not equality
>> conditions as it is very difficult to express such conditions as a
>> map/reduce job. Also, more than two tables can be joined in Hive.
>>
>>
>> 2013/4/9 Michael Segel <mi...@hotmail.com>
>>
>>> Hi,
>>>
>>> Your cross join is supported in both pig and hive. (Cross, and Theta
>>> joins)
>>>
>>> So there must be code to do this.
>>>
>>> Essentially in the reducer you would have your key and then the set of
>>> rows that match the key. You would then perform the cross product on the
>>> key's result set and output them to the collector as separate rows.
>>>
>>> I'm not sure why you would need the reduce context.
>>>
>>> But then again, I'm still on my first cup of coffee. ;-)
>>>
>>>
>>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com>
>>> wrote:
>>>
>>> Hi
>>> I am also woring on join using MapReduce
>>> i think instead of finding postion of table in RawKeyValuIterator.
>>> what we can do modify context.write method to alway write key as table
>>> name or id
>>> then we dont need to find postion we can get Key and Value from
>>> "reducerContext"
>>>
>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>> method join in Reducer.java Reducer class and give call to
>>> reducer.join(reduceContext)
>>>
>>>
>>> I just wonder how r going to support NON EQUI join.
>>>
>>> I am also having same problem how to do join if datasets cant fit in to
>>> memory.
>>>
>>>
>>> for now i am cloning using following code :
>>>
>>>
>>> KEYIN key = context.getCurrentKey() ;
>>> KEYIN outKey = null;
>>> try {
>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>    }
>>> catch(Exception e)
>>>  {}
>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);
>>>
>>>  Iterable<VALUEIN> values = context.getValues();
>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>  for(VALUEIN value: values) {
>>>    VALUEIN outValue = null;
>>>     try {
>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>    }
>>>    catch(Exception e)    {}
>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>  }
>>>
>>>
>>> if you have found any other solution please feel free to share
>>>
>>> Thank You.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>>
>>>> In reduce() we have:
>>>>
>>>> key1 values1
>>>> key2 values2
>>>> ...
>>>> keyn valuesn
>>>>
>>>> so,what i want to do is join all values like a SQL:
>>>>
>>>> select * from values1,values2...valuesn;
>>>>
>>>> if memory is not enough to cache values,how to complete the join
>>>> operation?
>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>>
>>>> Any help will be appreciated.
>>>>
>>>>
>>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>>
>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks
>>>>> like only support equal join.
>>>>> I want a non-equal join,but I have no idea now.
>>>>>
>>>>>
>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>
>>>>>> you want a n:n join or 1:n join?
>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>>
>>>>>>> I want to join two table data in reducer.So I need to find the start
>>>>>>> of the table.
>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>>
>>>>>>>
>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>
>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>
>>>>>>>> what's the mean of you want get the record position? I cannot
>>>>>>>> understand, can you give a simple example?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> sorry，I still can't understand how to use recordreader in the
>>>>>>>>> reduce(),because the input is a RawKeyValueIterator in the class
>>>>>>>>> reducecontext.so,I'm confused.
>>>>>>>>> anyway,thank you.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>>
>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the
>>>>>>>>>>> record position?
>>>>>>>>>>> Now,what I thought is to save the context status,but class
>>>>>>>>>>> Context doesn't implement a clone construct method.
>>>>>>>>>>>
>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>> Thank you very much.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *
>>> *
>>> *
>>>
>>> Thanx and Regards*
>>> * Vikas Jadhav*
>>>
>>>
>>>
>>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>
>


-- 
*
*
*

  Regards,*
*   Vikas *

Re: How can I record some position of context in Reduce()?

Posted by Vikas Jadhav <vi...@gmail.com>.

I wil express it in SQL form

select * from table1, table2 where table1.attr < table2.attr

it is also called theta join where theta can be <, >, <=,>=,!=



On Wed, Apr 10, 2013 at 9:35 PM, Michel Segel <mi...@hotmail.com>wrote:

> Not sure what is meant by a non equi join.
>
> Are you saying something like for every row in X, join it to all of the
> rows in Y where Y.a < something?
>
> Is that what you are suggesting?
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Apr 10, 2013, at 9:11 AM, Vikas Jadhav <vi...@gmail.com>
> wrote:
>
> How are you going to support NON EQUI Join using MapReduce ?
> As per my understanding there is only one way to do this is
> to bring all data to one reducer then reducer will know lesser/greater
> values correctly.
> Correct me if I am wrong.
> Thank You.
>
> *  Regards,*
> *  Vikas *
>
>
>
> On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <mi...@hotmail.com>wrote:
>
>> Can you show an example of your join?
>> All joins are an equality in that the key has to match.
>> Whether its a one to one , one to many, or many to many remains to be
>> seen.
>>
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:
>>
>> Only equality joins, outer joins, and left semi joins are supported in
>> Hive. Hive does not support join conditions that are not equality
>> conditions as it is very difficult to express such conditions as a
>> map/reduce job. Also, more than two tables can be joined in Hive.
>>
>>
>> 2013/4/9 Michael Segel <mi...@hotmail.com>
>>
>>> Hi,
>>>
>>> Your cross join is supported in both pig and hive. (Cross, and Theta
>>> joins)
>>>
>>> So there must be code to do this.
>>>
>>> Essentially in the reducer you would have your key and then the set of
>>> rows that match the key. You would then perform the cross product on the
>>> key's result set and output them to the collector as separate rows.
>>>
>>> I'm not sure why you would need the reduce context.
>>>
>>> But then again, I'm still on my first cup of coffee. ;-)
>>>
>>>
>>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com>
>>> wrote:
>>>
>>> Hi
>>> I am also woring on join using MapReduce
>>> i think instead of finding postion of table in RawKeyValuIterator.
>>> what we can do modify context.write method to alway write key as table
>>> name or id
>>> then we dont need to find postion we can get Key and Value from
>>> "reducerContext"
>>>
>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>> method join in Reducer.java Reducer class and give call to
>>> reducer.join(reduceContext)
>>>
>>>
>>> I just wonder how r going to support NON EQUI join.
>>>
>>> I am also having same problem how to do join if datasets cant fit in to
>>> memory.
>>>
>>>
>>> for now i am cloning using following code :
>>>
>>>
>>> KEYIN key = context.getCurrentKey() ;
>>> KEYIN outKey = null;
>>> try {
>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>    }
>>> catch(Exception e)
>>>  {}
>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);
>>>
>>>  Iterable<VALUEIN> values = context.getValues();
>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>  for(VALUEIN value: values) {
>>>    VALUEIN outValue = null;
>>>     try {
>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>    }
>>>    catch(Exception e)    {}
>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>  }
>>>
>>>
>>> if you have found any other solution please feel free to share
>>>
>>> Thank You.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>>
>>>> In reduce() we have:
>>>>
>>>> key1 values1
>>>> key2 values2
>>>> ...
>>>> keyn valuesn
>>>>
>>>> so,what i want to do is join all values like a SQL:
>>>>
>>>> select * from values1,values2...valuesn;
>>>>
>>>> if memory is not enough to cache values,how to complete the join
>>>> operation?
>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>>
>>>> Any help will be appreciated.
>>>>
>>>>
>>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>>
>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks
>>>>> like only support equal join.
>>>>> I want a non-equal join,but I have no idea now.
>>>>>
>>>>>
>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>
>>>>>> you want a n:n join or 1:n join?
>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>>
>>>>>>> I want to join two table data in reducer.So I need to find the start
>>>>>>> of the table.
>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>>
>>>>>>>
>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>
>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>
>>>>>>>> what's the mean of you want get the record position? I cannot
>>>>>>>> understand, can you give a simple example?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> sorry，I still can't understand how to use recordreader in the
>>>>>>>>> reduce(),because the input is a RawKeyValueIterator in the class
>>>>>>>>> reducecontext.so,I'm confused.
>>>>>>>>> anyway,thank you.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>>
>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the
>>>>>>>>>>> record position?
>>>>>>>>>>> Now,what I thought is to save the context status,but class
>>>>>>>>>>> Context doesn't implement a clone construct method.
>>>>>>>>>>>
>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>> Thank you very much.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *
>>> *
>>> *
>>>
>>> Thanx and Regards*
>>> * Vikas Jadhav*
>>>
>>>
>>>
>>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>
>


-- 
*
*
*

  Regards,*
*   Vikas *

Re: How can I record some position of context in Reduce()?

Posted by Vikas Jadhav <vi...@gmail.com>.

I wil express it in SQL form

select * from table1, table2 where table1.attr < table2.attr

it is also called theta join where theta can be <, >, <=,>=,!=



On Wed, Apr 10, 2013 at 9:35 PM, Michel Segel <mi...@hotmail.com>wrote:

> Not sure what is meant by a non equi join.
>
> Are you saying something like for every row in X, join it to all of the
> rows in Y where Y.a < something?
>
> Is that what you are suggesting?
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Apr 10, 2013, at 9:11 AM, Vikas Jadhav <vi...@gmail.com>
> wrote:
>
> How are you going to support NON EQUI Join using MapReduce ?
> As per my understanding there is only one way to do this is
> to bring all data to one reducer then reducer will know lesser/greater
> values correctly.
> Correct me if I am wrong.
> Thank You.
>
> *  Regards,*
> *  Vikas *
>
>
>
> On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <mi...@hotmail.com>wrote:
>
>> Can you show an example of your join?
>> All joins are an equality in that the key has to match.
>> Whether its a one to one , one to many, or many to many remains to be
>> seen.
>>
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:
>>
>> Only equality joins, outer joins, and left semi joins are supported in
>> Hive. Hive does not support join conditions that are not equality
>> conditions as it is very difficult to express such conditions as a
>> map/reduce job. Also, more than two tables can be joined in Hive.
>>
>>
>> 2013/4/9 Michael Segel <mi...@hotmail.com>
>>
>>> Hi,
>>>
>>> Your cross join is supported in both pig and hive. (Cross, and Theta
>>> joins)
>>>
>>> So there must be code to do this.
>>>
>>> Essentially in the reducer you would have your key and then the set of
>>> rows that match the key. You would then perform the cross product on the
>>> key's result set and output them to the collector as separate rows.
>>>
>>> I'm not sure why you would need the reduce context.
>>>
>>> But then again, I'm still on my first cup of coffee. ;-)
>>>
>>>
>>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com>
>>> wrote:
>>>
>>> Hi
>>> I am also woring on join using MapReduce
>>> i think instead of finding postion of table in RawKeyValuIterator.
>>> what we can do modify context.write method to alway write key as table
>>> name or id
>>> then we dont need to find postion we can get Key and Value from
>>> "reducerContext"
>>>
>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>> method join in Reducer.java Reducer class and give call to
>>> reducer.join(reduceContext)
>>>
>>>
>>> I just wonder how r going to support NON EQUI join.
>>>
>>> I am also having same problem how to do join if datasets cant fit in to
>>> memory.
>>>
>>>
>>> for now i am cloning using following code :
>>>
>>>
>>> KEYIN key = context.getCurrentKey() ;
>>> KEYIN outKey = null;
>>> try {
>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>    }
>>> catch(Exception e)
>>>  {}
>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);
>>>
>>>  Iterable<VALUEIN> values = context.getValues();
>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>  for(VALUEIN value: values) {
>>>    VALUEIN outValue = null;
>>>     try {
>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>    }
>>>    catch(Exception e)    {}
>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>  }
>>>
>>>
>>> if you have found any other solution please feel free to share
>>>
>>> Thank You.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>>
>>>> In reduce() we have:
>>>>
>>>> key1 values1
>>>> key2 values2
>>>> ...
>>>> keyn valuesn
>>>>
>>>> so,what i want to do is join all values like a SQL:
>>>>
>>>> select * from values1,values2...valuesn;
>>>>
>>>> if memory is not enough to cache values,how to complete the join
>>>> operation?
>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>>
>>>> Any help will be appreciated.
>>>>
>>>>
>>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>>
>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks
>>>>> like only support equal join.
>>>>> I want a non-equal join,but I have no idea now.
>>>>>
>>>>>
>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>
>>>>>> you want a n:n join or 1:n join?
>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>>
>>>>>>> I want to join two table data in reducer.So I need to find the start
>>>>>>> of the table.
>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>>
>>>>>>>
>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>
>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>
>>>>>>>> what's the mean of you want get the record position? I cannot
>>>>>>>> understand, can you give a simple example?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> sorry，I still can't understand how to use recordreader in the
>>>>>>>>> reduce(),because the input is a RawKeyValueIterator in the class
>>>>>>>>> reducecontext.so,I'm confused.
>>>>>>>>> anyway,thank you.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>>
>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the
>>>>>>>>>>> record position?
>>>>>>>>>>> Now,what I thought is to save the context status,but class
>>>>>>>>>>> Context doesn't implement a clone construct method.
>>>>>>>>>>>
>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>> Thank you very much.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *
>>> *
>>> *
>>>
>>> Thanx and Regards*
>>> * Vikas Jadhav*
>>>
>>>
>>>
>>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>
>


-- 
*
*
*

  Regards,*
*   Vikas *

Re: How can I record some position of context in Reduce()?

Posted by Vikas Jadhav <vi...@gmail.com>.

I wil express it in SQL form

select * from table1, table2 where table1.attr < table2.attr

it is also called theta join where theta can be <, >, <=,>=,!=



On Wed, Apr 10, 2013 at 9:35 PM, Michel Segel <mi...@hotmail.com>wrote:

> Not sure what is meant by a non equi join.
>
> Are you saying something like for every row in X, join it to all of the
> rows in Y where Y.a < something?
>
> Is that what you are suggesting?
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Apr 10, 2013, at 9:11 AM, Vikas Jadhav <vi...@gmail.com>
> wrote:
>
> How are you going to support NON EQUI Join using MapReduce ?
> As per my understanding there is only one way to do this is
> to bring all data to one reducer then reducer will know lesser/greater
> values correctly.
> Correct me if I am wrong.
> Thank You.
>
> *  Regards,*
> *  Vikas *
>
>
>
> On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <mi...@hotmail.com>wrote:
>
>> Can you show an example of your join?
>> All joins are an equality in that the key has to match.
>> Whether its a one to one , one to many, or many to many remains to be
>> seen.
>>
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:
>>
>> Only equality joins, outer joins, and left semi joins are supported in
>> Hive. Hive does not support join conditions that are not equality
>> conditions as it is very difficult to express such conditions as a
>> map/reduce job. Also, more than two tables can be joined in Hive.
>>
>>
>> 2013/4/9 Michael Segel <mi...@hotmail.com>
>>
>>> Hi,
>>>
>>> Your cross join is supported in both pig and hive. (Cross, and Theta
>>> joins)
>>>
>>> So there must be code to do this.
>>>
>>> Essentially in the reducer you would have your key and then the set of
>>> rows that match the key. You would then perform the cross product on the
>>> key's result set and output them to the collector as separate rows.
>>>
>>> I'm not sure why you would need the reduce context.
>>>
>>> But then again, I'm still on my first cup of coffee. ;-)
>>>
>>>
>>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com>
>>> wrote:
>>>
>>> Hi
>>> I am also woring on join using MapReduce
>>> i think instead of finding postion of table in RawKeyValuIterator.
>>> what we can do modify context.write method to alway write key as table
>>> name or id
>>> then we dont need to find postion we can get Key and Value from
>>> "reducerContext"
>>>
>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>> method join in Reducer.java Reducer class and give call to
>>> reducer.join(reduceContext)
>>>
>>>
>>> I just wonder how r going to support NON EQUI join.
>>>
>>> I am also having same problem how to do join if datasets cant fit in to
>>> memory.
>>>
>>>
>>> for now i am cloning using following code :
>>>
>>>
>>> KEYIN key = context.getCurrentKey() ;
>>> KEYIN outKey = null;
>>> try {
>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>    }
>>> catch(Exception e)
>>>  {}
>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);
>>>
>>>  Iterable<VALUEIN> values = context.getValues();
>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>  for(VALUEIN value: values) {
>>>    VALUEIN outValue = null;
>>>     try {
>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>    }
>>>    catch(Exception e)    {}
>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>  }
>>>
>>>
>>> if you have found any other solution please feel free to share
>>>
>>> Thank You.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>>
>>>> In reduce() we have:
>>>>
>>>> key1 values1
>>>> key2 values2
>>>> ...
>>>> keyn valuesn
>>>>
>>>> so,what i want to do is join all values like a SQL:
>>>>
>>>> select * from values1,values2...valuesn;
>>>>
>>>> if memory is not enough to cache values,how to complete the join
>>>> operation?
>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>>
>>>> Any help will be appreciated.
>>>>
>>>>
>>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>>
>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks
>>>>> like only support equal join.
>>>>> I want a non-equal join,but I have no idea now.
>>>>>
>>>>>
>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>
>>>>>> you want a n:n join or 1:n join?
>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>>
>>>>>>> I want to join two table data in reducer.So I need to find the start
>>>>>>> of the table.
>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>>
>>>>>>>
>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>
>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>
>>>>>>>> what's the mean of you want get the record position? I cannot
>>>>>>>> understand, can you give a simple example?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> sorry，I still can't understand how to use recordreader in the
>>>>>>>>> reduce(),because the input is a RawKeyValueIterator in the class
>>>>>>>>> reducecontext.so,I'm confused.
>>>>>>>>> anyway,thank you.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>>
>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the
>>>>>>>>>>> record position?
>>>>>>>>>>> Now,what I thought is to save the context status,but class
>>>>>>>>>>> Context doesn't implement a clone construct method.
>>>>>>>>>>>
>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>> Thank you very much.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *
>>> *
>>> *
>>>
>>> Thanx and Regards*
>>> * Vikas Jadhav*
>>>
>>>
>>>
>>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>
>


-- 
*
*
*

  Regards,*
*   Vikas *

Re: How can I record some position of context in Reduce()?

Posted by Michel Segel <mi...@hotmail.com>.

Not sure what is meant by a non equi join.

Are you saying something like for every row in X, join it to all of the rows in Y where Y.a < something?

Is that what you are suggesting?


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 10, 2013, at 9:11 AM, Vikas Jadhav <vi...@gmail.com> wrote:

> How are you going to support NON EQUI Join using MapReduce ?
> As per my understanding there is only one way to do this is
> to bring all data to one reducer then reducer will know lesser/greater
> values correctly.
> Correct me if I am wrong.
> Thank You.
>  
>   Regards,
>   Vikas
>  
> 
> 
> On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <mi...@hotmail.com> wrote:
>> Can you show an example of your join?
>> All joins are an equality in that the key has to match.
>> Whether its a one to one , one to many, or many to many remains to be seen.
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:
>> 
>>> Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. Also, more than two tables can be joined in Hive.
>>> 
>>> 
>>> 2013/4/9 Michael Segel <mi...@hotmail.com>
>>>> Hi,
>>>> 
>>>> Your cross join is supported in both pig and hive. (Cross, and Theta joins) 
>>>> 
>>>> So there must be code to do this. 
>>>> 
>>>> Essentially in the reducer you would have your key and then the set of rows that match the key. You would then perform the cross product on the key's result set and output them to the collector as separate rows. 
>>>> 
>>>> I'm not sure why you would need the reduce context. 
>>>> 
>>>> But then again, I'm still on my first cup of coffee. ;-)
>>>> 
>>>> 
>>>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com> wrote:
>>>> 
>>>>> Hi
>>>>> I am also woring on join using MapReduce
>>>>> i think instead of finding postion of table in RawKeyValuIterator.
>>>>> what we can do modify context.write method to alway write key as table name or id
>>>>> then we dont need to find postion we can get Key and Value from "reducerContext"
>>>>>  
>>>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>>>> method join in Reducer.java Reducer class and give call to reducer.join(reduceContext)
>>>>>  
>>>>>  
>>>>> I just wonder how r going to support NON EQUI join.
>>>>>  
>>>>> I am also having same problem how to do join if datasets cant fit in to memory.
>>>>>  
>>>>>  
>>>>> for now i am cloning using following code :
>>>>>  
>>>>>  
>>>>> KEYIN key = context.getCurrentKey() ;
>>>>> KEYIN outKey = null;
>>>>> try {
>>>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>>>    }
>>>>> catch(Exception e)
>>>>>  {}         
>>>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
>>>>> 
>>>>>  Iterable<VALUEIN> values = context.getValues();
>>>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>>>  for(VALUEIN value: values) {        
>>>>>    VALUEIN outValue = null;
>>>>>     try {
>>>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>>>    }
>>>>>    catch(Exception e)    {}          
>>>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>>>  }
>>>>>  
>>>>>  
>>>>> if you have found any other solution please feel free to share
>>>>>  
>>>>> Thank You.
>>>>>  
>>>>>        
>>>>>  
>>>>>  
>>>>> 
>>>>> 
>>>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>> In reduce() we have:
>>>>>> 
>>>>>> key1 values1
>>>>>> key2 values2
>>>>>> ...
>>>>>> keyn valuesn
>>>>>> 
>>>>>> so,what i want to do is join all values like a SQL:
>>>>>> 
>>>>>> select * from values1,values2...valuesn;
>>>>>> 
>>>>>> if memory is not enough to cache values,how to complete the join operation?
>>>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>>>> 
>>>>>> Any help will be appreciated.
>>>>>> 
>>>>>> 
>>>>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks like only support equal join.
>>>>>>> I want a non-equal join,but I have no idea now.
>>>>>>> 
>>>>>>> 
>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>> you want a n:n join or 1:n join?
>>>>>>>> 
>>>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>>>>> I want to join two table data in reducer.So I need to find the start of the table.
>>>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>>>  
>>>>>>>>>> what's the mean of you want get the record position? I cannot understand, can you give a simple example?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>> sorry，I still can't understand how to use recordreader in the reduce(),because the input is a RawKeyValueIterator in the class reducecontext.so,I'm confused.
>>>>>>>>>>> anyway,thank you.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record position?
>>>>>>>>>>>>> Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>>>> Thank you very much.
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> 
>>>>> 
>>>>> Thanx and Regards
>>>>>  Vikas Jadhav
> 
> 
> 
> -- 
> 
> 
> Thanx and Regards
>  Vikas Jadhav

Re: How can I record some position of context in Reduce()?

Posted by Michel Segel <mi...@hotmail.com>.

Not sure what is meant by a non equi join.

Are you saying something like for every row in X, join it to all of the rows in Y where Y.a < something?

Is that what you are suggesting?


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 10, 2013, at 9:11 AM, Vikas Jadhav <vi...@gmail.com> wrote:

> How are you going to support NON EQUI Join using MapReduce ?
> As per my understanding there is only one way to do this is
> to bring all data to one reducer then reducer will know lesser/greater
> values correctly.
> Correct me if I am wrong.
> Thank You.
>  
>   Regards,
>   Vikas
>  
> 
> 
> On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <mi...@hotmail.com> wrote:
>> Can you show an example of your join?
>> All joins are an equality in that the key has to match.
>> Whether its a one to one , one to many, or many to many remains to be seen.
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:
>> 
>>> Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. Also, more than two tables can be joined in Hive.
>>> 
>>> 
>>> 2013/4/9 Michael Segel <mi...@hotmail.com>
>>>> Hi,
>>>> 
>>>> Your cross join is supported in both pig and hive. (Cross, and Theta joins) 
>>>> 
>>>> So there must be code to do this. 
>>>> 
>>>> Essentially in the reducer you would have your key and then the set of rows that match the key. You would then perform the cross product on the key's result set and output them to the collector as separate rows. 
>>>> 
>>>> I'm not sure why you would need the reduce context. 
>>>> 
>>>> But then again, I'm still on my first cup of coffee. ;-)
>>>> 
>>>> 
>>>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com> wrote:
>>>> 
>>>>> Hi
>>>>> I am also woring on join using MapReduce
>>>>> i think instead of finding postion of table in RawKeyValuIterator.
>>>>> what we can do modify context.write method to alway write key as table name or id
>>>>> then we dont need to find postion we can get Key and Value from "reducerContext"
>>>>>  
>>>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>>>> method join in Reducer.java Reducer class and give call to reducer.join(reduceContext)
>>>>>  
>>>>>  
>>>>> I just wonder how r going to support NON EQUI join.
>>>>>  
>>>>> I am also having same problem how to do join if datasets cant fit in to memory.
>>>>>  
>>>>>  
>>>>> for now i am cloning using following code :
>>>>>  
>>>>>  
>>>>> KEYIN key = context.getCurrentKey() ;
>>>>> KEYIN outKey = null;
>>>>> try {
>>>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>>>    }
>>>>> catch(Exception e)
>>>>>  {}         
>>>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
>>>>> 
>>>>>  Iterable<VALUEIN> values = context.getValues();
>>>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>>>  for(VALUEIN value: values) {        
>>>>>    VALUEIN outValue = null;
>>>>>     try {
>>>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>>>    }
>>>>>    catch(Exception e)    {}          
>>>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>>>  }
>>>>>  
>>>>>  
>>>>> if you have found any other solution please feel free to share
>>>>>  
>>>>> Thank You.
>>>>>  
>>>>>        
>>>>>  
>>>>>  
>>>>> 
>>>>> 
>>>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>> In reduce() we have:
>>>>>> 
>>>>>> key1 values1
>>>>>> key2 values2
>>>>>> ...
>>>>>> keyn valuesn
>>>>>> 
>>>>>> so,what i want to do is join all values like a SQL:
>>>>>> 
>>>>>> select * from values1,values2...valuesn;
>>>>>> 
>>>>>> if memory is not enough to cache values,how to complete the join operation?
>>>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>>>> 
>>>>>> Any help will be appreciated.
>>>>>> 
>>>>>> 
>>>>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks like only support equal join.
>>>>>>> I want a non-equal join,but I have no idea now.
>>>>>>> 
>>>>>>> 
>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>> you want a n:n join or 1:n join?
>>>>>>>> 
>>>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>>>>> I want to join two table data in reducer.So I need to find the start of the table.
>>>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>>>  
>>>>>>>>>> what's the mean of you want get the record position? I cannot understand, can you give a simple example?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>> sorry，I still can't understand how to use recordreader in the reduce(),because the input is a RawKeyValueIterator in the class reducecontext.so,I'm confused.
>>>>>>>>>>> anyway,thank you.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record position?
>>>>>>>>>>>>> Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>>>> Thank you very much.
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> 
>>>>> 
>>>>> Thanx and Regards
>>>>>  Vikas Jadhav
> 
> 
> 
> -- 
> 
> 
> Thanx and Regards
>  Vikas Jadhav

Re: How can I record some position of context in Reduce()?

Posted by Michel Segel <mi...@hotmail.com>.

Not sure what is meant by a non equi join.

Are you saying something like for every row in X, join it to all of the rows in Y where Y.a < something?

Is that what you are suggesting?


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 10, 2013, at 9:11 AM, Vikas Jadhav <vi...@gmail.com> wrote:

> How are you going to support NON EQUI Join using MapReduce ?
> As per my understanding there is only one way to do this is
> to bring all data to one reducer then reducer will know lesser/greater
> values correctly.
> Correct me if I am wrong.
> Thank You.
>  
>   Regards,
>   Vikas
>  
> 
> 
> On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <mi...@hotmail.com> wrote:
>> Can you show an example of your join?
>> All joins are an equality in that the key has to match.
>> Whether its a one to one , one to many, or many to many remains to be seen.
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:
>> 
>>> Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. Also, more than two tables can be joined in Hive.
>>> 
>>> 
>>> 2013/4/9 Michael Segel <mi...@hotmail.com>
>>>> Hi,
>>>> 
>>>> Your cross join is supported in both pig and hive. (Cross, and Theta joins) 
>>>> 
>>>> So there must be code to do this. 
>>>> 
>>>> Essentially in the reducer you would have your key and then the set of rows that match the key. You would then perform the cross product on the key's result set and output them to the collector as separate rows. 
>>>> 
>>>> I'm not sure why you would need the reduce context. 
>>>> 
>>>> But then again, I'm still on my first cup of coffee. ;-)
>>>> 
>>>> 
>>>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com> wrote:
>>>> 
>>>>> Hi
>>>>> I am also woring on join using MapReduce
>>>>> i think instead of finding postion of table in RawKeyValuIterator.
>>>>> what we can do modify context.write method to alway write key as table name or id
>>>>> then we dont need to find postion we can get Key and Value from "reducerContext"
>>>>>  
>>>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>>>> method join in Reducer.java Reducer class and give call to reducer.join(reduceContext)
>>>>>  
>>>>>  
>>>>> I just wonder how r going to support NON EQUI join.
>>>>>  
>>>>> I am also having same problem how to do join if datasets cant fit in to memory.
>>>>>  
>>>>>  
>>>>> for now i am cloning using following code :
>>>>>  
>>>>>  
>>>>> KEYIN key = context.getCurrentKey() ;
>>>>> KEYIN outKey = null;
>>>>> try {
>>>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>>>    }
>>>>> catch(Exception e)
>>>>>  {}         
>>>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
>>>>> 
>>>>>  Iterable<VALUEIN> values = context.getValues();
>>>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>>>  for(VALUEIN value: values) {        
>>>>>    VALUEIN outValue = null;
>>>>>     try {
>>>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>>>    }
>>>>>    catch(Exception e)    {}          
>>>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>>>  }
>>>>>  
>>>>>  
>>>>> if you have found any other solution please feel free to share
>>>>>  
>>>>> Thank You.
>>>>>  
>>>>>        
>>>>>  
>>>>>  
>>>>> 
>>>>> 
>>>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>> In reduce() we have:
>>>>>> 
>>>>>> key1 values1
>>>>>> key2 values2
>>>>>> ...
>>>>>> keyn valuesn
>>>>>> 
>>>>>> so,what i want to do is join all values like a SQL:
>>>>>> 
>>>>>> select * from values1,values2...valuesn;
>>>>>> 
>>>>>> if memory is not enough to cache values,how to complete the join operation?
>>>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>>>> 
>>>>>> Any help will be appreciated.
>>>>>> 
>>>>>> 
>>>>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks like only support equal join.
>>>>>>> I want a non-equal join,but I have no idea now.
>>>>>>> 
>>>>>>> 
>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>> you want a n:n join or 1:n join?
>>>>>>>> 
>>>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>>>>> I want to join two table data in reducer.So I need to find the start of the table.
>>>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>>>  
>>>>>>>>>> what's the mean of you want get the record position? I cannot understand, can you give a simple example?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>> sorry，I still can't understand how to use recordreader in the reduce(),because the input is a RawKeyValueIterator in the class reducecontext.so,I'm confused.
>>>>>>>>>>> anyway,thank you.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record position?
>>>>>>>>>>>>> Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>>>> Thank you very much.
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> 
>>>>> 
>>>>> Thanx and Regards
>>>>>  Vikas Jadhav
> 
> 
> 
> -- 
> 
> 
> Thanx and Regards
>  Vikas Jadhav

Re: How can I record some position of context in Reduce()?

Posted by Michel Segel <mi...@hotmail.com>.

Not sure what is meant by a non equi join.

Are you saying something like for every row in X, join it to all of the rows in Y where Y.a < something?

Is that what you are suggesting?


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 10, 2013, at 9:11 AM, Vikas Jadhav <vi...@gmail.com> wrote:

> How are you going to support NON EQUI Join using MapReduce ?
> As per my understanding there is only one way to do this is
> to bring all data to one reducer then reducer will know lesser/greater
> values correctly.
> Correct me if I am wrong.
> Thank You.
>  
>   Regards,
>   Vikas
>  
> 
> 
> On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <mi...@hotmail.com> wrote:
>> Can you show an example of your join?
>> All joins are an equality in that the key has to match.
>> Whether its a one to one , one to many, or many to many remains to be seen.
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:
>> 
>>> Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. Also, more than two tables can be joined in Hive.
>>> 
>>> 
>>> 2013/4/9 Michael Segel <mi...@hotmail.com>
>>>> Hi,
>>>> 
>>>> Your cross join is supported in both pig and hive. (Cross, and Theta joins) 
>>>> 
>>>> So there must be code to do this. 
>>>> 
>>>> Essentially in the reducer you would have your key and then the set of rows that match the key. You would then perform the cross product on the key's result set and output them to the collector as separate rows. 
>>>> 
>>>> I'm not sure why you would need the reduce context. 
>>>> 
>>>> But then again, I'm still on my first cup of coffee. ;-)
>>>> 
>>>> 
>>>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com> wrote:
>>>> 
>>>>> Hi
>>>>> I am also woring on join using MapReduce
>>>>> i think instead of finding postion of table in RawKeyValuIterator.
>>>>> what we can do modify context.write method to alway write key as table name or id
>>>>> then we dont need to find postion we can get Key and Value from "reducerContext"
>>>>>  
>>>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>>>> method join in Reducer.java Reducer class and give call to reducer.join(reduceContext)
>>>>>  
>>>>>  
>>>>> I just wonder how r going to support NON EQUI join.
>>>>>  
>>>>> I am also having same problem how to do join if datasets cant fit in to memory.
>>>>>  
>>>>>  
>>>>> for now i am cloning using following code :
>>>>>  
>>>>>  
>>>>> KEYIN key = context.getCurrentKey() ;
>>>>> KEYIN outKey = null;
>>>>> try {
>>>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>>>    }
>>>>> catch(Exception e)
>>>>>  {}         
>>>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
>>>>> 
>>>>>  Iterable<VALUEIN> values = context.getValues();
>>>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>>>  for(VALUEIN value: values) {        
>>>>>    VALUEIN outValue = null;
>>>>>     try {
>>>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>>>    }
>>>>>    catch(Exception e)    {}          
>>>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>>>  }
>>>>>  
>>>>>  
>>>>> if you have found any other solution please feel free to share
>>>>>  
>>>>> Thank You.
>>>>>  
>>>>>        
>>>>>  
>>>>>  
>>>>> 
>>>>> 
>>>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>> In reduce() we have:
>>>>>> 
>>>>>> key1 values1
>>>>>> key2 values2
>>>>>> ...
>>>>>> keyn valuesn
>>>>>> 
>>>>>> so,what i want to do is join all values like a SQL:
>>>>>> 
>>>>>> select * from values1,values2...valuesn;
>>>>>> 
>>>>>> if memory is not enough to cache values,how to complete the join operation?
>>>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>>>> 
>>>>>> Any help will be appreciated.
>>>>>> 
>>>>>> 
>>>>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks like only support equal join.
>>>>>>> I want a non-equal join,but I have no idea now.
>>>>>>> 
>>>>>>> 
>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>> you want a n:n join or 1:n join?
>>>>>>>> 
>>>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>>>>> I want to join two table data in reducer.So I need to find the start of the table.
>>>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>>>  
>>>>>>>>>> what's the mean of you want get the record position? I cannot understand, can you give a simple example?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>> sorry，I still can't understand how to use recordreader in the reduce(),because the input is a RawKeyValueIterator in the class reducecontext.so,I'm confused.
>>>>>>>>>>> anyway,thank you.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record position?
>>>>>>>>>>>>> Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>>>> Thank you very much.
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> 
>>>>> 
>>>>> Thanx and Regards
>>>>>  Vikas Jadhav
> 
> 
> 
> -- 
> 
> 
> Thanx and Regards
>  Vikas Jadhav

Re: How can I record some position of context in Reduce()?

Posted by Vikas Jadhav <vi...@gmail.com>.

How are you going to support NON EQUI Join using MapReduce ?
As per my understanding there is only one way to do this is
to bring all data to one reducer then reducer will know lesser/greater
values correctly.
Correct me if I am wrong.
Thank You.

*  Regards,*
*  Vikas *



On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <mi...@hotmail.com>wrote:

> Can you show an example of your join?
> All joins are an equality in that the key has to match.
> Whether its a one to one , one to many, or many to many remains to be seen.
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:
>
> Only equality joins, outer joins, and left semi joins are supported in
> Hive. Hive does not support join conditions that are not equality
> conditions as it is very difficult to express such conditions as a
> map/reduce job. Also, more than two tables can be joined in Hive.
>
>
> 2013/4/9 Michael Segel <mi...@hotmail.com>
>
>> Hi,
>>
>> Your cross join is supported in both pig and hive. (Cross, and Theta
>> joins)
>>
>> So there must be code to do this.
>>
>> Essentially in the reducer you would have your key and then the set of
>> rows that match the key. You would then perform the cross product on the
>> key's result set and output them to the collector as separate rows.
>>
>> I'm not sure why you would need the reduce context.
>>
>> But then again, I'm still on my first cup of coffee. ;-)
>>
>>
>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com>
>> wrote:
>>
>> Hi
>> I am also woring on join using MapReduce
>> i think instead of finding postion of table in RawKeyValuIterator.
>> what we can do modify context.write method to alway write key as table
>> name or id
>> then we dont need to find postion we can get Key and Value from
>> "reducerContext"
>>
>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>> method join in Reducer.java Reducer class and give call to
>> reducer.join(reduceContext)
>>
>>
>> I just wonder how r going to support NON EQUI join.
>>
>> I am also having same problem how to do join if datasets cant fit in to
>> memory.
>>
>>
>> for now i am cloning using following code :
>>
>>
>> KEYIN key = context.getCurrentKey() ;
>> KEYIN outKey = null;
>> try {
>>     outKey = (KEYIN)key.getClass().newInstance();
>>    }
>> catch(Exception e)
>>  {}
>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);
>>
>>  Iterable<VALUEIN> values = context.getValues();
>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>  for(VALUEIN value: values) {
>>    VALUEIN outValue = null;
>>     try {
>>          outValue = (VALUEIN)value.getClass().newInstance();
>>    }
>>    catch(Exception e)    {}
>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>  }
>>
>>
>> if you have found any other solution please feel free to share
>>
>> Thank You.
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>
>>> In reduce() we have:
>>>
>>> key1 values1
>>> key2 values2
>>> ...
>>> keyn valuesn
>>>
>>> so,what i want to do is join all values like a SQL:
>>>
>>> select * from values1,values2...valuesn;
>>>
>>> if memory is not enough to cache values,how to complete the join
>>> operation?
>>> my idea is clone the reducecontext,but it maybe not easy.
>>>
>>> Any help will be appreciated.
>>>
>>>
>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>
>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks
>>>> like only support equal join.
>>>> I want a non-equal join,but I have no idea now.
>>>>
>>>>
>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>
>>>>> you want a n:n join or 1:n join?
>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>
>>>>>> I want to join two table data in reducer.So I need to find the start
>>>>>> of the table.
>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>
>>>>>>
>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>
>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>
>>>>>>> what's the mean of you want get the record position? I cannot
>>>>>>> understand, can you give a simple example?
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>
>>>>>>>> sorry，I still can't understand how to use recordreader in the
>>>>>>>> reduce(),because the input is a RawKeyValueIterator in the class
>>>>>>>> reducecontext.so,I'm confused.
>>>>>>>> anyway,thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>
>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Hi,everyone,
>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the
>>>>>>>>>> record position?
>>>>>>>>>> Now,what I thought is to save the context status,but class
>>>>>>>>>> Context doesn't implement a clone construct method.
>>>>>>>>>>
>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>> Thank you very much.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>>
>> --
>> *
>> *
>> *
>>
>> Thanx and Regards*
>> * Vikas Jadhav*
>>
>>
>>
>


-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Re: How can I record some position of context in Reduce()?

Posted by Vikas Jadhav <vi...@gmail.com>.

How are you going to support NON EQUI Join using MapReduce ?
As per my understanding there is only one way to do this is
to bring all data to one reducer then reducer will know lesser/greater
values correctly.
Correct me if I am wrong.
Thank You.

*  Regards,*
*  Vikas *



On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <mi...@hotmail.com>wrote:

> Can you show an example of your join?
> All joins are an equality in that the key has to match.
> Whether its a one to one , one to many, or many to many remains to be seen.
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:
>
> Only equality joins, outer joins, and left semi joins are supported in
> Hive. Hive does not support join conditions that are not equality
> conditions as it is very difficult to express such conditions as a
> map/reduce job. Also, more than two tables can be joined in Hive.
>
>
> 2013/4/9 Michael Segel <mi...@hotmail.com>
>
>> Hi,
>>
>> Your cross join is supported in both pig and hive. (Cross, and Theta
>> joins)
>>
>> So there must be code to do this.
>>
>> Essentially in the reducer you would have your key and then the set of
>> rows that match the key. You would then perform the cross product on the
>> key's result set and output them to the collector as separate rows.
>>
>> I'm not sure why you would need the reduce context.
>>
>> But then again, I'm still on my first cup of coffee. ;-)
>>
>>
>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com>
>> wrote:
>>
>> Hi
>> I am also woring on join using MapReduce
>> i think instead of finding postion of table in RawKeyValuIterator.
>> what we can do modify context.write method to alway write key as table
>> name or id
>> then we dont need to find postion we can get Key and Value from
>> "reducerContext"
>>
>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>> method join in Reducer.java Reducer class and give call to
>> reducer.join(reduceContext)
>>
>>
>> I just wonder how r going to support NON EQUI join.
>>
>> I am also having same problem how to do join if datasets cant fit in to
>> memory.
>>
>>
>> for now i am cloning using following code :
>>
>>
>> KEYIN key = context.getCurrentKey() ;
>> KEYIN outKey = null;
>> try {
>>     outKey = (KEYIN)key.getClass().newInstance();
>>    }
>> catch(Exception e)
>>  {}
>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);
>>
>>  Iterable<VALUEIN> values = context.getValues();
>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>  for(VALUEIN value: values) {
>>    VALUEIN outValue = null;
>>     try {
>>          outValue = (VALUEIN)value.getClass().newInstance();
>>    }
>>    catch(Exception e)    {}
>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>  }
>>
>>
>> if you have found any other solution please feel free to share
>>
>> Thank You.
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>
>>> In reduce() we have:
>>>
>>> key1 values1
>>> key2 values2
>>> ...
>>> keyn valuesn
>>>
>>> so,what i want to do is join all values like a SQL:
>>>
>>> select * from values1,values2...valuesn;
>>>
>>> if memory is not enough to cache values,how to complete the join
>>> operation?
>>> my idea is clone the reducecontext,but it maybe not easy.
>>>
>>> Any help will be appreciated.
>>>
>>>
>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>
>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks
>>>> like only support equal join.
>>>> I want a non-equal join,but I have no idea now.
>>>>
>>>>
>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>
>>>>> you want a n:n join or 1:n join?
>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>
>>>>>> I want to join two table data in reducer.So I need to find the start
>>>>>> of the table.
>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>
>>>>>>
>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>
>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>
>>>>>>> what's the mean of you want get the record position? I cannot
>>>>>>> understand, can you give a simple example?
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>
>>>>>>>> sorry，I still can't understand how to use recordreader in the
>>>>>>>> reduce(),because the input is a RawKeyValueIterator in the class
>>>>>>>> reducecontext.so,I'm confused.
>>>>>>>> anyway,thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>
>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Hi,everyone,
>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the
>>>>>>>>>> record position?
>>>>>>>>>> Now,what I thought is to save the context status,but class
>>>>>>>>>> Context doesn't implement a clone construct method.
>>>>>>>>>>
>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>> Thank you very much.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>>
>> --
>> *
>> *
>> *
>>
>> Thanx and Regards*
>> * Vikas Jadhav*
>>
>>
>>
>


-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Re: How can I record some position of context in Reduce()?

Posted by Vikas Jadhav <vi...@gmail.com>.

How are you going to support NON EQUI Join using MapReduce ?
As per my understanding there is only one way to do this is
to bring all data to one reducer then reducer will know lesser/greater
values correctly.
Correct me if I am wrong.
Thank You.

*  Regards,*
*  Vikas *



On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <mi...@hotmail.com>wrote:

> Can you show an example of your join?
> All joins are an equality in that the key has to match.
> Whether its a one to one , one to many, or many to many remains to be seen.
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:
>
> Only equality joins, outer joins, and left semi joins are supported in
> Hive. Hive does not support join conditions that are not equality
> conditions as it is very difficult to express such conditions as a
> map/reduce job. Also, more than two tables can be joined in Hive.
>
>
> 2013/4/9 Michael Segel <mi...@hotmail.com>
>
>> Hi,
>>
>> Your cross join is supported in both pig and hive. (Cross, and Theta
>> joins)
>>
>> So there must be code to do this.
>>
>> Essentially in the reducer you would have your key and then the set of
>> rows that match the key. You would then perform the cross product on the
>> key's result set and output them to the collector as separate rows.
>>
>> I'm not sure why you would need the reduce context.
>>
>> But then again, I'm still on my first cup of coffee. ;-)
>>
>>
>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com>
>> wrote:
>>
>> Hi
>> I am also woring on join using MapReduce
>> i think instead of finding postion of table in RawKeyValuIterator.
>> what we can do modify context.write method to alway write key as table
>> name or id
>> then we dont need to find postion we can get Key and Value from
>> "reducerContext"
>>
>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>> method join in Reducer.java Reducer class and give call to
>> reducer.join(reduceContext)
>>
>>
>> I just wonder how r going to support NON EQUI join.
>>
>> I am also having same problem how to do join if datasets cant fit in to
>> memory.
>>
>>
>> for now i am cloning using following code :
>>
>>
>> KEYIN key = context.getCurrentKey() ;
>> KEYIN outKey = null;
>> try {
>>     outKey = (KEYIN)key.getClass().newInstance();
>>    }
>> catch(Exception e)
>>  {}
>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);
>>
>>  Iterable<VALUEIN> values = context.getValues();
>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>  for(VALUEIN value: values) {
>>    VALUEIN outValue = null;
>>     try {
>>          outValue = (VALUEIN)value.getClass().newInstance();
>>    }
>>    catch(Exception e)    {}
>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>  }
>>
>>
>> if you have found any other solution please feel free to share
>>
>> Thank You.
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>
>>> In reduce() we have:
>>>
>>> key1 values1
>>> key2 values2
>>> ...
>>> keyn valuesn
>>>
>>> so,what i want to do is join all values like a SQL:
>>>
>>> select * from values1,values2...valuesn;
>>>
>>> if memory is not enough to cache values,how to complete the join
>>> operation?
>>> my idea is clone the reducecontext,but it maybe not easy.
>>>
>>> Any help will be appreciated.
>>>
>>>
>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>
>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks
>>>> like only support equal join.
>>>> I want a non-equal join,but I have no idea now.
>>>>
>>>>
>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>
>>>>> you want a n:n join or 1:n join?
>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>
>>>>>> I want to join two table data in reducer.So I need to find the start
>>>>>> of the table.
>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>
>>>>>>
>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>
>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>
>>>>>>> what's the mean of you want get the record position? I cannot
>>>>>>> understand, can you give a simple example?
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>
>>>>>>>> sorry，I still can't understand how to use recordreader in the
>>>>>>>> reduce(),because the input is a RawKeyValueIterator in the class
>>>>>>>> reducecontext.so,I'm confused.
>>>>>>>> anyway,thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>
>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Hi,everyone,
>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the
>>>>>>>>>> record position?
>>>>>>>>>> Now,what I thought is to save the context status,but class
>>>>>>>>>> Context doesn't implement a clone construct method.
>>>>>>>>>>
>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>> Thank you very much.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>>
>> --
>> *
>> *
>> *
>>
>> Thanx and Regards*
>> * Vikas Jadhav*
>>
>>
>>
>


-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Re: How can I record some position of context in Reduce()?

Posted by Vikas Jadhav <vi...@gmail.com>.

How are you going to support NON EQUI Join using MapReduce ?
As per my understanding there is only one way to do this is
to bring all data to one reducer then reducer will know lesser/greater
values correctly.
Correct me if I am wrong.
Thank You.

*  Regards,*
*  Vikas *



On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel <mi...@hotmail.com>wrote:

> Can you show an example of your join?
> All joins are an equality in that the key has to match.
> Whether its a one to one , one to many, or many to many remains to be seen.
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:
>
> Only equality joins, outer joins, and left semi joins are supported in
> Hive. Hive does not support join conditions that are not equality
> conditions as it is very difficult to express such conditions as a
> map/reduce job. Also, more than two tables can be joined in Hive.
>
>
> 2013/4/9 Michael Segel <mi...@hotmail.com>
>
>> Hi,
>>
>> Your cross join is supported in both pig and hive. (Cross, and Theta
>> joins)
>>
>> So there must be code to do this.
>>
>> Essentially in the reducer you would have your key and then the set of
>> rows that match the key. You would then perform the cross product on the
>> key's result set and output them to the collector as separate rows.
>>
>> I'm not sure why you would need the reduce context.
>>
>> But then again, I'm still on my first cup of coffee. ;-)
>>
>>
>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com>
>> wrote:
>>
>> Hi
>> I am also woring on join using MapReduce
>> i think instead of finding postion of table in RawKeyValuIterator.
>> what we can do modify context.write method to alway write key as table
>> name or id
>> then we dont need to find postion we can get Key and Value from
>> "reducerContext"
>>
>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>> method join in Reducer.java Reducer class and give call to
>> reducer.join(reduceContext)
>>
>>
>> I just wonder how r going to support NON EQUI join.
>>
>> I am also having same problem how to do join if datasets cant fit in to
>> memory.
>>
>>
>> for now i am cloning using following code :
>>
>>
>> KEYIN key = context.getCurrentKey() ;
>> KEYIN outKey = null;
>> try {
>>     outKey = (KEYIN)key.getClass().newInstance();
>>    }
>> catch(Exception e)
>>  {}
>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);
>>
>>  Iterable<VALUEIN> values = context.getValues();
>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>  for(VALUEIN value: values) {
>>    VALUEIN outValue = null;
>>     try {
>>          outValue = (VALUEIN)value.getClass().newInstance();
>>    }
>>    catch(Exception e)    {}
>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>  }
>>
>>
>> if you have found any other solution please feel free to share
>>
>> Thank You.
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>
>>> In reduce() we have:
>>>
>>> key1 values1
>>> key2 values2
>>> ...
>>> keyn valuesn
>>>
>>> so,what i want to do is join all values like a SQL:
>>>
>>> select * from values1,values2...valuesn;
>>>
>>> if memory is not enough to cache values,how to complete the join
>>> operation?
>>> my idea is clone the reducecontext,but it maybe not easy.
>>>
>>> Any help will be appreciated.
>>>
>>>
>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>
>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks
>>>> like only support equal join.
>>>> I want a non-equal join,but I have no idea now.
>>>>
>>>>
>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>
>>>>> you want a n:n join or 1:n join?
>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>
>>>>>> I want to join two table data in reducer.So I need to find the start
>>>>>> of the table.
>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>
>>>>>>
>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>
>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>
>>>>>>> what's the mean of you want get the record position? I cannot
>>>>>>> understand, can you give a simple example?
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>
>>>>>>>> sorry，I still can't understand how to use recordreader in the
>>>>>>>> reduce(),because the input is a RawKeyValueIterator in the class
>>>>>>>> reducecontext.so,I'm confused.
>>>>>>>> anyway,thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>
>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Hi,everyone,
>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the
>>>>>>>>>> record position?
>>>>>>>>>> Now,what I thought is to save the context status,but class
>>>>>>>>>> Context doesn't implement a clone construct method.
>>>>>>>>>>
>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>> Thank you very much.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>>
>> --
>> *
>> *
>> *
>>
>> Thanx and Regards*
>> * Vikas Jadhav*
>>
>>
>>
>


-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*

Re: How can I record some position of context in Reduce()?

Posted by Michel Segel <mi...@hotmail.com>.

Can you show an example of your join?
All joins are an equality in that the key has to match.
Whether its a one to one , one to many, or many to many remains to be seen.


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:

> Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. Also, more than two tables can be joined in Hive.
> 
> 
> 2013/4/9 Michael Segel <mi...@hotmail.com>
>> Hi,
>> 
>> Your cross join is supported in both pig and hive. (Cross, and Theta joins) 
>> 
>> So there must be code to do this. 
>> 
>> Essentially in the reducer you would have your key and then the set of rows that match the key. You would then perform the cross product on the key's result set and output them to the collector as separate rows. 
>> 
>> I'm not sure why you would need the reduce context. 
>> 
>> But then again, I'm still on my first cup of coffee. ;-)
>> 
>> 
>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com> wrote:
>> 
>>> Hi
>>> I am also woring on join using MapReduce
>>> i think instead of finding postion of table in RawKeyValuIterator.
>>> what we can do modify context.write method to alway write key as table name or id
>>> then we dont need to find postion we can get Key and Value from "reducerContext"
>>>  
>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>> method join in Reducer.java Reducer class and give call to reducer.join(reduceContext)
>>>  
>>>  
>>> I just wonder how r going to support NON EQUI join.
>>>  
>>> I am also having same problem how to do join if datasets cant fit in to memory.
>>>  
>>>  
>>> for now i am cloning using following code :
>>>  
>>>  
>>> KEYIN key = context.getCurrentKey() ;
>>> KEYIN outKey = null;
>>> try {
>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>    }
>>> catch(Exception e)
>>>  {}         
>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
>>> 
>>>  Iterable<VALUEIN> values = context.getValues();
>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>  for(VALUEIN value: values) {        
>>>    VALUEIN outValue = null;
>>>     try {
>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>    }
>>>    catch(Exception e)    {}          
>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>  }
>>>  
>>>  
>>> if you have found any other solution please feel free to share
>>>  
>>> Thank You.
>>>  
>>>        
>>>  
>>>  
>>> 
>>> 
>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>>> In reduce() we have:
>>>> 
>>>> key1 values1
>>>> key2 values2
>>>> ...
>>>> keyn valuesn
>>>> 
>>>> so,what i want to do is join all values like a SQL:
>>>> 
>>>> select * from values1,values2...valuesn;
>>>> 
>>>> if memory is not enough to cache values,how to complete the join operation?
>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>> 
>>>> Any help will be appreciated.
>>>> 
>>>> 
>>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks like only support equal join.
>>>>> I want a non-equal join,but I have no idea now.
>>>>> 
>>>>> 
>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>> you want a n:n join or 1:n join?
>>>>>> 
>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>>> I want to join two table data in reducer.So I need to find the start of the table.
>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>> 
>>>>>>> 
>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>  
>>>>>>>> what's the mean of you want get the record position? I cannot understand, can you give a simple example?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>> sorry，I still can't understand how to use recordreader in the reduce(),because the input is a RawKeyValueIterator in the class reducecontext.so,I'm confused.
>>>>>>>>> anyway,thank you.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record position?
>>>>>>>>>>> Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method.
>>>>>>>>>>> 
>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>> Thank you very much.
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> 
>>> Thanx and Regards
>>>  Vikas Jadhav
>

Re: How can I record some position of context in Reduce()?

Posted by Michel Segel <mi...@hotmail.com>.

Can you show an example of your join?
All joins are an equality in that the key has to match.
Whether its a one to one , one to many, or many to many remains to be seen.


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:

> Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. Also, more than two tables can be joined in Hive.
> 
> 
> 2013/4/9 Michael Segel <mi...@hotmail.com>
>> Hi,
>> 
>> Your cross join is supported in both pig and hive. (Cross, and Theta joins) 
>> 
>> So there must be code to do this. 
>> 
>> Essentially in the reducer you would have your key and then the set of rows that match the key. You would then perform the cross product on the key's result set and output them to the collector as separate rows. 
>> 
>> I'm not sure why you would need the reduce context. 
>> 
>> But then again, I'm still on my first cup of coffee. ;-)
>> 
>> 
>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com> wrote:
>> 
>>> Hi
>>> I am also woring on join using MapReduce
>>> i think instead of finding postion of table in RawKeyValuIterator.
>>> what we can do modify context.write method to alway write key as table name or id
>>> then we dont need to find postion we can get Key and Value from "reducerContext"
>>>  
>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>> method join in Reducer.java Reducer class and give call to reducer.join(reduceContext)
>>>  
>>>  
>>> I just wonder how r going to support NON EQUI join.
>>>  
>>> I am also having same problem how to do join if datasets cant fit in to memory.
>>>  
>>>  
>>> for now i am cloning using following code :
>>>  
>>>  
>>> KEYIN key = context.getCurrentKey() ;
>>> KEYIN outKey = null;
>>> try {
>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>    }
>>> catch(Exception e)
>>>  {}         
>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
>>> 
>>>  Iterable<VALUEIN> values = context.getValues();
>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>  for(VALUEIN value: values) {        
>>>    VALUEIN outValue = null;
>>>     try {
>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>    }
>>>    catch(Exception e)    {}          
>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>  }
>>>  
>>>  
>>> if you have found any other solution please feel free to share
>>>  
>>> Thank You.
>>>  
>>>        
>>>  
>>>  
>>> 
>>> 
>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>>> In reduce() we have:
>>>> 
>>>> key1 values1
>>>> key2 values2
>>>> ...
>>>> keyn valuesn
>>>> 
>>>> so,what i want to do is join all values like a SQL:
>>>> 
>>>> select * from values1,values2...valuesn;
>>>> 
>>>> if memory is not enough to cache values,how to complete the join operation?
>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>> 
>>>> Any help will be appreciated.
>>>> 
>>>> 
>>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks like only support equal join.
>>>>> I want a non-equal join,but I have no idea now.
>>>>> 
>>>>> 
>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>> you want a n:n join or 1:n join?
>>>>>> 
>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>>> I want to join two table data in reducer.So I need to find the start of the table.
>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>> 
>>>>>>> 
>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>  
>>>>>>>> what's the mean of you want get the record position? I cannot understand, can you give a simple example?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>> sorry，I still can't understand how to use recordreader in the reduce(),because the input is a RawKeyValueIterator in the class reducecontext.so,I'm confused.
>>>>>>>>> anyway,thank you.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record position?
>>>>>>>>>>> Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method.
>>>>>>>>>>> 
>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>> Thank you very much.
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> 
>>> Thanx and Regards
>>>  Vikas Jadhav
>

Re: How can I record some position of context in Reduce()?

Posted by Michel Segel <mi...@hotmail.com>.

Can you show an example of your join?
All joins are an equality in that the key has to match.
Whether its a one to one , one to many, or many to many remains to be seen.


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:

> Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. Also, more than two tables can be joined in Hive.
> 
> 
> 2013/4/9 Michael Segel <mi...@hotmail.com>
>> Hi,
>> 
>> Your cross join is supported in both pig and hive. (Cross, and Theta joins) 
>> 
>> So there must be code to do this. 
>> 
>> Essentially in the reducer you would have your key and then the set of rows that match the key. You would then perform the cross product on the key's result set and output them to the collector as separate rows. 
>> 
>> I'm not sure why you would need the reduce context. 
>> 
>> But then again, I'm still on my first cup of coffee. ;-)
>> 
>> 
>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com> wrote:
>> 
>>> Hi
>>> I am also woring on join using MapReduce
>>> i think instead of finding postion of table in RawKeyValuIterator.
>>> what we can do modify context.write method to alway write key as table name or id
>>> then we dont need to find postion we can get Key and Value from "reducerContext"
>>>  
>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>> method join in Reducer.java Reducer class and give call to reducer.join(reduceContext)
>>>  
>>>  
>>> I just wonder how r going to support NON EQUI join.
>>>  
>>> I am also having same problem how to do join if datasets cant fit in to memory.
>>>  
>>>  
>>> for now i am cloning using following code :
>>>  
>>>  
>>> KEYIN key = context.getCurrentKey() ;
>>> KEYIN outKey = null;
>>> try {
>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>    }
>>> catch(Exception e)
>>>  {}         
>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
>>> 
>>>  Iterable<VALUEIN> values = context.getValues();
>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>  for(VALUEIN value: values) {        
>>>    VALUEIN outValue = null;
>>>     try {
>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>    }
>>>    catch(Exception e)    {}          
>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>  }
>>>  
>>>  
>>> if you have found any other solution please feel free to share
>>>  
>>> Thank You.
>>>  
>>>        
>>>  
>>>  
>>> 
>>> 
>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>>> In reduce() we have:
>>>> 
>>>> key1 values1
>>>> key2 values2
>>>> ...
>>>> keyn valuesn
>>>> 
>>>> so,what i want to do is join all values like a SQL:
>>>> 
>>>> select * from values1,values2...valuesn;
>>>> 
>>>> if memory is not enough to cache values,how to complete the join operation?
>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>> 
>>>> Any help will be appreciated.
>>>> 
>>>> 
>>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks like only support equal join.
>>>>> I want a non-equal join,but I have no idea now.
>>>>> 
>>>>> 
>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>> you want a n:n join or 1:n join?
>>>>>> 
>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>>> I want to join two table data in reducer.So I need to find the start of the table.
>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>> 
>>>>>>> 
>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>  
>>>>>>>> what's the mean of you want get the record position? I cannot understand, can you give a simple example?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>> sorry，I still can't understand how to use recordreader in the reduce(),because the input is a RawKeyValueIterator in the class reducecontext.so,I'm confused.
>>>>>>>>> anyway,thank you.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record position?
>>>>>>>>>>> Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method.
>>>>>>>>>>> 
>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>> Thank you very much.
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> 
>>> Thanx and Regards
>>>  Vikas Jadhav
>

Re: How can I record some position of context in Reduce()?

Posted by Michel Segel <mi...@hotmail.com>.

Can you show an example of your join?
All joins are an equality in that the key has to match.
Whether its a one to one , one to many, or many to many remains to be seen.


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 9, 2013, at 10:35 AM, Effyroth Gu <ef...@gmail.com> wrote:

> Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. Also, more than two tables can be joined in Hive.
> 
> 
> 2013/4/9 Michael Segel <mi...@hotmail.com>
>> Hi,
>> 
>> Your cross join is supported in both pig and hive. (Cross, and Theta joins) 
>> 
>> So there must be code to do this. 
>> 
>> Essentially in the reducer you would have your key and then the set of rows that match the key. You would then perform the cross product on the key's result set and output them to the collector as separate rows. 
>> 
>> I'm not sure why you would need the reduce context. 
>> 
>> But then again, I'm still on my first cup of coffee. ;-)
>> 
>> 
>> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com> wrote:
>> 
>>> Hi
>>> I am also woring on join using MapReduce
>>> i think instead of finding postion of table in RawKeyValuIterator.
>>> what we can do modify context.write method to alway write key as table name or id
>>> then we dont need to find postion we can get Key and Value from "reducerContext"
>>>  
>>> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
>>> method join in Reducer.java Reducer class and give call to reducer.join(reduceContext)
>>>  
>>>  
>>> I just wonder how r going to support NON EQUI join.
>>>  
>>> I am also having same problem how to do join if datasets cant fit in to memory.
>>>  
>>>  
>>> for now i am cloning using following code :
>>>  
>>>  
>>> KEYIN key = context.getCurrentKey() ;
>>> KEYIN outKey = null;
>>> try {
>>>     outKey = (KEYIN)key.getClass().newInstance();
>>>    }
>>> catch(Exception e)
>>>  {}         
>>> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
>>> 
>>>  Iterable<VALUEIN> values = context.getValues();
>>>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>>>  for(VALUEIN value: values) {        
>>>    VALUEIN outValue = null;
>>>     try {
>>>          outValue = (VALUEIN)value.getClass().newInstance();
>>>    }
>>>    catch(Exception e)    {}          
>>>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>>>  }
>>>  
>>>  
>>> if you have found any other solution please feel free to share
>>>  
>>> Thank You.
>>>  
>>>        
>>>  
>>>  
>>> 
>>> 
>>> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>>>> In reduce() we have:
>>>> 
>>>> key1 values1
>>>> key2 values2
>>>> ...
>>>> keyn valuesn
>>>> 
>>>> so,what i want to do is join all values like a SQL:
>>>> 
>>>> select * from values1,values2...valuesn;
>>>> 
>>>> if memory is not enough to cache values,how to complete the join operation?
>>>> my idea is clone the reducecontext,but it maybe not easy.
>>>> 
>>>> Any help will be appreciated.
>>>> 
>>>> 
>>>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks like only support equal join.
>>>>> I want a non-equal join,but I have no idea now.
>>>>> 
>>>>> 
>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>> you want a n:n join or 1:n join?
>>>>>> 
>>>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>>>> I want to join two table data in reducer.So I need to find the start of the table.
>>>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>>> 
>>>>>>> 
>>>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>>>> you cannot use RecordReader in Reducer.
>>>>>>>>  
>>>>>>>> what's the mean of you want get the record position? I cannot understand, can you give a simple example?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>> sorry，I still can't understand how to use recordreader in the reduce(),because the input is a RawKeyValueIterator in the class reducecontext.so,I'm confused.
>>>>>>>>> anyway,thank you.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com> wrote:
>>>>>>>>>>> Hi,everyone,
>>>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record position?
>>>>>>>>>>> Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method.
>>>>>>>>>>> 
>>>>>>>>>>> Any help will be appreciated.
>>>>>>>>>>> Thank you very much.
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> 
>>> Thanx and Regards
>>>  Vikas Jadhav
>

Re: How can I record some position of context in Reduce()?

Posted by Effyroth Gu <ef...@gmail.com>.

Only equality joins, outer joins, and left semi joins are supported in
Hive. Hive does not support join conditions that are not equality
conditions as it is very difficult to express such conditions as a
map/reduce job. Also, more than two tables can be joined in Hive.


2013/4/9 Michael Segel <mi...@hotmail.com>

> Hi,
>
> Your cross join is supported in both pig and hive. (Cross, and Theta
> joins)
>
> So there must be code to do this.
>
> Essentially in the reducer you would have your key and then the set of
> rows that match the key. You would then perform the cross product on the
> key's result set and output them to the collector as separate rows.
>
> I'm not sure why you would need the reduce context.
>
> But then again, I'm still on my first cup of coffee. ;-)
>
>
> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com>
> wrote:
>
> Hi
> I am also woring on join using MapReduce
> i think instead of finding postion of table in RawKeyValuIterator.
> what we can do modify context.write method to alway write key as table
> name or id
> then we dont need to find postion we can get Key and Value from
> "reducerContext"
>
> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
> method join in Reducer.java Reducer class and give call to
> reducer.join(reduceContext)
>
>
> I just wonder how r going to support NON EQUI join.
>
> I am also having same problem how to do join if datasets cant fit in to
> memory.
>
>
> for now i am cloning using following code :
>
>
> KEYIN key = context.getCurrentKey() ;
> KEYIN outKey = null;
> try {
>     outKey = (KEYIN)key.getClass().newInstance();
>    }
> catch(Exception e)
>  {}
> ReflectionUtils.copy(context.getConfiguration(), key, outKey);
>
>  Iterable<VALUEIN> values = context.getValues();
>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>  for(VALUEIN value: values) {
>    VALUEIN outValue = null;
>     try {
>          outValue = (VALUEIN)value.getClass().newInstance();
>    }
>    catch(Exception e)    {}
>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>  }
>
>
> if you have found any other solution please feel free to share
>
> Thank You.
>
>
>
>
>
>
> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>
>> In reduce() we have:
>>
>> key1 values1
>> key2 values2
>> ...
>> keyn valuesn
>>
>> so,what i want to do is join all values like a SQL:
>>
>> select * from values1,values2...valuesn;
>>
>> if memory is not enough to cache values,how to complete the join
>> operation?
>> my idea is clone the reducecontext,but it maybe not easy.
>>
>> Any help will be appreciated.
>>
>>
>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>
>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks
>>> like only support equal join.
>>> I want a non-equal join,but I have no idea now.
>>>
>>>
>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>
>>>> you want a n:n join or 1:n join?
>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>
>>>>> I want to join two table data in reducer.So I need to find the start
>>>>> of the table.
>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>
>>>>>
>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>
>>>>>> you cannot use RecordReader in Reducer.
>>>>>>
>>>>>> what's the mean of you want get the record position? I cannot
>>>>>> understand, can you give a simple example?
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>
>>>>>>> sorry，I still can't understand how to use recordreader in the
>>>>>>> reduce(),because the input is a RawKeyValueIterator in the class
>>>>>>> reducecontext.so,I'm confused.
>>>>>>> anyway,thank you.
>>>>>>>
>>>>>>>
>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>
>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Hi,everyone,
>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record
>>>>>>>>> position?
>>>>>>>>> Now,what I thought is to save the context status,but class Context
>>>>>>>>> doesn't implement a clone construct method.
>>>>>>>>>
>>>>>>>>> Any help will be appreciated.
>>>>>>>>> Thank you very much.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>
>
>

Re: How can I record some position of context in Reduce()?

Posted by Effyroth Gu <ef...@gmail.com>.

Only equality joins, outer joins, and left semi joins are supported in
Hive. Hive does not support join conditions that are not equality
conditions as it is very difficult to express such conditions as a
map/reduce job. Also, more than two tables can be joined in Hive.


2013/4/9 Michael Segel <mi...@hotmail.com>

> Hi,
>
> Your cross join is supported in both pig and hive. (Cross, and Theta
> joins)
>
> So there must be code to do this.
>
> Essentially in the reducer you would have your key and then the set of
> rows that match the key. You would then perform the cross product on the
> key's result set and output them to the collector as separate rows.
>
> I'm not sure why you would need the reduce context.
>
> But then again, I'm still on my first cup of coffee. ;-)
>
>
> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com>
> wrote:
>
> Hi
> I am also woring on join using MapReduce
> i think instead of finding postion of table in RawKeyValuIterator.
> what we can do modify context.write method to alway write key as table
> name or id
> then we dont need to find postion we can get Key and Value from
> "reducerContext"
>
> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
> method join in Reducer.java Reducer class and give call to
> reducer.join(reduceContext)
>
>
> I just wonder how r going to support NON EQUI join.
>
> I am also having same problem how to do join if datasets cant fit in to
> memory.
>
>
> for now i am cloning using following code :
>
>
> KEYIN key = context.getCurrentKey() ;
> KEYIN outKey = null;
> try {
>     outKey = (KEYIN)key.getClass().newInstance();
>    }
> catch(Exception e)
>  {}
> ReflectionUtils.copy(context.getConfiguration(), key, outKey);
>
>  Iterable<VALUEIN> values = context.getValues();
>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>  for(VALUEIN value: values) {
>    VALUEIN outValue = null;
>     try {
>          outValue = (VALUEIN)value.getClass().newInstance();
>    }
>    catch(Exception e)    {}
>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>  }
>
>
> if you have found any other solution please feel free to share
>
> Thank You.
>
>
>
>
>
>
> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>
>> In reduce() we have:
>>
>> key1 values1
>> key2 values2
>> ...
>> keyn valuesn
>>
>> so,what i want to do is join all values like a SQL:
>>
>> select * from values1,values2...valuesn;
>>
>> if memory is not enough to cache values,how to complete the join
>> operation?
>> my idea is clone the reducecontext,but it maybe not easy.
>>
>> Any help will be appreciated.
>>
>>
>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>
>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks
>>> like only support equal join.
>>> I want a non-equal join,but I have no idea now.
>>>
>>>
>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>
>>>> you want a n:n join or 1:n join?
>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>
>>>>> I want to join two table data in reducer.So I need to find the start
>>>>> of the table.
>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>
>>>>>
>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>
>>>>>> you cannot use RecordReader in Reducer.
>>>>>>
>>>>>> what's the mean of you want get the record position? I cannot
>>>>>> understand, can you give a simple example?
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>
>>>>>>> sorry，I still can't understand how to use recordreader in the
>>>>>>> reduce(),because the input is a RawKeyValueIterator in the class
>>>>>>> reducecontext.so,I'm confused.
>>>>>>> anyway,thank you.
>>>>>>>
>>>>>>>
>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>
>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Hi,everyone,
>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record
>>>>>>>>> position?
>>>>>>>>> Now,what I thought is to save the context status,but class Context
>>>>>>>>> doesn't implement a clone construct method.
>>>>>>>>>
>>>>>>>>> Any help will be appreciated.
>>>>>>>>> Thank you very much.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>
>
>

Re: How can I record some position of context in Reduce()?

Posted by Effyroth Gu <ef...@gmail.com>.

Only equality joins, outer joins, and left semi joins are supported in
Hive. Hive does not support join conditions that are not equality
conditions as it is very difficult to express such conditions as a
map/reduce job. Also, more than two tables can be joined in Hive.


2013/4/9 Michael Segel <mi...@hotmail.com>

> Hi,
>
> Your cross join is supported in both pig and hive. (Cross, and Theta
> joins)
>
> So there must be code to do this.
>
> Essentially in the reducer you would have your key and then the set of
> rows that match the key. You would then perform the cross product on the
> key's result set and output them to the collector as separate rows.
>
> I'm not sure why you would need the reduce context.
>
> But then again, I'm still on my first cup of coffee. ;-)
>
>
> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com>
> wrote:
>
> Hi
> I am also woring on join using MapReduce
> i think instead of finding postion of table in RawKeyValuIterator.
> what we can do modify context.write method to alway write key as table
> name or id
> then we dont need to find postion we can get Key and Value from
> "reducerContext"
>
> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
> method join in Reducer.java Reducer class and give call to
> reducer.join(reduceContext)
>
>
> I just wonder how r going to support NON EQUI join.
>
> I am also having same problem how to do join if datasets cant fit in to
> memory.
>
>
> for now i am cloning using following code :
>
>
> KEYIN key = context.getCurrentKey() ;
> KEYIN outKey = null;
> try {
>     outKey = (KEYIN)key.getClass().newInstance();
>    }
> catch(Exception e)
>  {}
> ReflectionUtils.copy(context.getConfiguration(), key, outKey);
>
>  Iterable<VALUEIN> values = context.getValues();
>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>  for(VALUEIN value: values) {
>    VALUEIN outValue = null;
>     try {
>          outValue = (VALUEIN)value.getClass().newInstance();
>    }
>    catch(Exception e)    {}
>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>  }
>
>
> if you have found any other solution please feel free to share
>
> Thank You.
>
>
>
>
>
>
> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>
>> In reduce() we have:
>>
>> key1 values1
>> key2 values2
>> ...
>> keyn valuesn
>>
>> so,what i want to do is join all values like a SQL:
>>
>> select * from values1,values2...valuesn;
>>
>> if memory is not enough to cache values,how to complete the join
>> operation?
>> my idea is clone the reducecontext,but it maybe not easy.
>>
>> Any help will be appreciated.
>>
>>
>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>
>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks
>>> like only support equal join.
>>> I want a non-equal join,but I have no idea now.
>>>
>>>
>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>
>>>> you want a n:n join or 1:n join?
>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>
>>>>> I want to join two table data in reducer.So I need to find the start
>>>>> of the table.
>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>
>>>>>
>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>
>>>>>> you cannot use RecordReader in Reducer.
>>>>>>
>>>>>> what's the mean of you want get the record position? I cannot
>>>>>> understand, can you give a simple example?
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>
>>>>>>> sorry，I still can't understand how to use recordreader in the
>>>>>>> reduce(),because the input is a RawKeyValueIterator in the class
>>>>>>> reducecontext.so,I'm confused.
>>>>>>> anyway,thank you.
>>>>>>>
>>>>>>>
>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>
>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Hi,everyone,
>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record
>>>>>>>>> position?
>>>>>>>>> Now,what I thought is to save the context status,but class Context
>>>>>>>>> doesn't implement a clone construct method.
>>>>>>>>>
>>>>>>>>> Any help will be appreciated.
>>>>>>>>> Thank you very much.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>
>
>

Re: How can I record some position of context in Reduce()?

Posted by Effyroth Gu <ef...@gmail.com>.

Only equality joins, outer joins, and left semi joins are supported in
Hive. Hive does not support join conditions that are not equality
conditions as it is very difficult to express such conditions as a
map/reduce job. Also, more than two tables can be joined in Hive.


2013/4/9 Michael Segel <mi...@hotmail.com>

> Hi,
>
> Your cross join is supported in both pig and hive. (Cross, and Theta
> joins)
>
> So there must be code to do this.
>
> Essentially in the reducer you would have your key and then the set of
> rows that match the key. You would then perform the cross product on the
> key's result set and output them to the collector as separate rows.
>
> I'm not sure why you would need the reduce context.
>
> But then again, I'm still on my first cup of coffee. ;-)
>
>
> On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com>
> wrote:
>
> Hi
> I am also woring on join using MapReduce
> i think instead of finding postion of table in RawKeyValuIterator.
> what we can do modify context.write method to alway write key as table
> name or id
> then we dont need to find postion we can get Key and Value from
> "reducerContext"
>
> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
> method join in Reducer.java Reducer class and give call to
> reducer.join(reduceContext)
>
>
> I just wonder how r going to support NON EQUI join.
>
> I am also having same problem how to do join if datasets cant fit in to
> memory.
>
>
> for now i am cloning using following code :
>
>
> KEYIN key = context.getCurrentKey() ;
> KEYIN outKey = null;
> try {
>     outKey = (KEYIN)key.getClass().newInstance();
>    }
> catch(Exception e)
>  {}
> ReflectionUtils.copy(context.getConfiguration(), key, outKey);
>
>  Iterable<VALUEIN> values = context.getValues();
>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>  for(VALUEIN value: values) {
>    VALUEIN outValue = null;
>     try {
>          outValue = (VALUEIN)value.getClass().newInstance();
>    }
>    catch(Exception e)    {}
>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>  }
>
>
> if you have found any other solution please feel free to share
>
> Thank You.
>
>
>
>
>
>
> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
>
>> In reduce() we have:
>>
>> key1 values1
>> key2 values2
>> ...
>> keyn valuesn
>>
>> so,what i want to do is join all values like a SQL:
>>
>> select * from values1,values2...valuesn;
>>
>> if memory is not enough to cache values,how to complete the join
>> operation?
>> my idea is clone the reducecontext,but it maybe not easy.
>>
>> Any help will be appreciated.
>>
>>
>> 2013/3/13 Roth Effy <ef...@gmail.com>
>>
>>> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks
>>> like only support equal join.
>>> I want a non-equal join,but I have no idea now.
>>>
>>>
>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>
>>>> you want a n:n join or 1:n join?
>>>> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
>>>>
>>>>> I want to join two table data in reducer.So I need to find the start
>>>>> of the table.
>>>>> someone said the DataJoinReducerBase can help me,isn't it?
>>>>>
>>>>>
>>>>> 2013/3/13 Azuryy Yu <az...@gmail.com>
>>>>>
>>>>>> you cannot use RecordReader in Reducer.
>>>>>>
>>>>>> what's the mean of you want get the record position? I cannot
>>>>>> understand, can you give a simple example?
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>
>>>>>>> sorry，I still can't understand how to use recordreader in the
>>>>>>> reduce(),because the input is a RawKeyValueIterator in the class
>>>>>>> reducecontext.so,I'm confused.
>>>>>>> anyway,thank you.
>>>>>>>
>>>>>>>
>>>>>>> 2013/3/12 samir das mohapatra <sa...@gmail.com>
>>>>>>>
>>>>>>>> Through the RecordReader and FileStatus you can get it.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Hi,everyone,
>>>>>>>>> I want to join the k-v pairs in Reduce(),but how to get the record
>>>>>>>>> position?
>>>>>>>>> Now,what I thought is to save the context status,but class Context
>>>>>>>>> doesn't implement a clone construct method.
>>>>>>>>>
>>>>>>>>> Any help will be appreciated.
>>>>>>>>> Thank you very much.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>
>
> --
> *
> *
> *
>
> Thanx and Regards*
> * Vikas Jadhav*
>
>
>

Re: How can I record some position of context in Reduce()?

Posted by Michael Segel <mi...@hotmail.com>.

Hi,

Your cross join is supported in both pig and hive. (Cross, and Theta joins) 

So there must be code to do this. 

Essentially in the reducer you would have your key and then the set of rows that match the key. You would then perform the cross product on the key's result set and output them to the collector as separate rows. 

I'm not sure why you would need the reduce context. 

But then again, I'm still on my first cup of coffee. ;-)


On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com> wrote:

> Hi
> I am also woring on join using MapReduce
> i think instead of finding postion of table in RawKeyValuIterator.
> what we can do modify context.write method to alway write key as table name or id
> then we dont need to find postion we can get Key and Value from "reducerContext"
>  
> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
> method join in Reducer.java Reducer class and give call to reducer.join(reduceContext)
>  
>  
> I just wonder how r going to support NON EQUI join.
>  
> I am also having same problem how to do join if datasets cant fit in to memory.
>  
>  
> for now i am cloning using following code :
>  
>  
> KEYIN key = context.getCurrentKey() ;
> KEYIN outKey = null;
> try {
>     outKey = (KEYIN)key.getClass().newInstance();
>    }
> catch(Exception e)
>  {}         
> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
> 
>  Iterable<VALUEIN> values = context.getValues();
>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>  for(VALUEIN value: values) {        
>    VALUEIN outValue = null;
>     try {
>          outValue = (VALUEIN)value.getClass().newInstance();
>    }
>    catch(Exception e)    {}          
>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>  }
>  
>  
> if you have found any other solution please feel free to share
>  
> Thank You.
>  
>        
>  
>  
> 
> 
> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
> In reduce() we have:
> 
> key1 values1
> key2 values2
> ...
> keyn valuesn
> 
> so,what i want to do is join all values like a SQL:
> 
> select * from values1,values2...valuesn;
> 
> if memory is not enough to cache values,how to complete the join operation?
> my idea is clone the reducecontext,but it maybe not easy.
> 
> Any help will be appreciated.
> 
> 
> 2013/3/13 Roth Effy <ef...@gmail.com>
> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks like only support equal join.
> I want a non-equal join,but I have no idea now.
> 
> 
> 2013/3/13 Azuryy Yu <az...@gmail.com>
> you want a n:n join or 1:n join?
> 
> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
> I want to join two table data in reducer.So I need to find the start of the table.
> someone said the DataJoinReducerBase can help me,isn't it?
> 
> 
> 2013/3/13 Azuryy Yu <az...@gmail.com>
> you cannot use RecordReader in Reducer.
>  
> what's the mean of you want get the record position? I cannot understand, can you give a simple example?
> 
> 
> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
> sorry，I still can't understand how to use recordreader in the reduce(),because the input is a RawKeyValueIterator in the class reducecontext.so,I'm confused.
> anyway,thank you.
> 
> 
> 2013/3/12 samir das mohapatra <sa...@gmail.com>
> Through the RecordReader and FileStatus you can get it.
> 
> 
> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com> wrote:
> Hi,everyone,
> I want to join the k-v pairs in Reduce(),but how to get the record position?
> Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method.
> 
> Any help will be appreciated.
> Thank you very much.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> 
> Thanx and Regards
>  Vikas Jadhav

Re: How can I record some position of context in Reduce()?

Posted by Michael Segel <mi...@hotmail.com>.

Hi,

Your cross join is supported in both pig and hive. (Cross, and Theta joins) 

So there must be code to do this. 

Essentially in the reducer you would have your key and then the set of rows that match the key. You would then perform the cross product on the key's result set and output them to the collector as separate rows. 

I'm not sure why you would need the reduce context. 

But then again, I'm still on my first cup of coffee. ;-)


On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com> wrote:

> Hi
> I am also woring on join using MapReduce
> i think instead of finding postion of table in RawKeyValuIterator.
> what we can do modify context.write method to alway write key as table name or id
> then we dont need to find postion we can get Key and Value from "reducerContext"
>  
> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
> method join in Reducer.java Reducer class and give call to reducer.join(reduceContext)
>  
>  
> I just wonder how r going to support NON EQUI join.
>  
> I am also having same problem how to do join if datasets cant fit in to memory.
>  
>  
> for now i am cloning using following code :
>  
>  
> KEYIN key = context.getCurrentKey() ;
> KEYIN outKey = null;
> try {
>     outKey = (KEYIN)key.getClass().newInstance();
>    }
> catch(Exception e)
>  {}         
> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
> 
>  Iterable<VALUEIN> values = context.getValues();
>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>  for(VALUEIN value: values) {        
>    VALUEIN outValue = null;
>     try {
>          outValue = (VALUEIN)value.getClass().newInstance();
>    }
>    catch(Exception e)    {}          
>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>  }
>  
>  
> if you have found any other solution please feel free to share
>  
> Thank You.
>  
>        
>  
>  
> 
> 
> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
> In reduce() we have:
> 
> key1 values1
> key2 values2
> ...
> keyn valuesn
> 
> so,what i want to do is join all values like a SQL:
> 
> select * from values1,values2...valuesn;
> 
> if memory is not enough to cache values,how to complete the join operation?
> my idea is clone the reducecontext,but it maybe not easy.
> 
> Any help will be appreciated.
> 
> 
> 2013/3/13 Roth Effy <ef...@gmail.com>
> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks like only support equal join.
> I want a non-equal join,but I have no idea now.
> 
> 
> 2013/3/13 Azuryy Yu <az...@gmail.com>
> you want a n:n join or 1:n join?
> 
> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
> I want to join two table data in reducer.So I need to find the start of the table.
> someone said the DataJoinReducerBase can help me,isn't it?
> 
> 
> 2013/3/13 Azuryy Yu <az...@gmail.com>
> you cannot use RecordReader in Reducer.
>  
> what's the mean of you want get the record position? I cannot understand, can you give a simple example?
> 
> 
> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
> sorry，I still can't understand how to use recordreader in the reduce(),because the input is a RawKeyValueIterator in the class reducecontext.so,I'm confused.
> anyway,thank you.
> 
> 
> 2013/3/12 samir das mohapatra <sa...@gmail.com>
> Through the RecordReader and FileStatus you can get it.
> 
> 
> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com> wrote:
> Hi,everyone,
> I want to join the k-v pairs in Reduce(),but how to get the record position?
> Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method.
> 
> Any help will be appreciated.
> Thank you very much.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> 
> Thanx and Regards
>  Vikas Jadhav

Re: How can I record some position of context in Reduce()?

Posted by Michael Segel <mi...@hotmail.com>.

Hi,

Your cross join is supported in both pig and hive. (Cross, and Theta joins) 

So there must be code to do this. 

Essentially in the reducer you would have your key and then the set of rows that match the key. You would then perform the cross product on the key's result set and output them to the collector as separate rows. 

I'm not sure why you would need the reduce context. 

But then again, I'm still on my first cup of coffee. ;-)


On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com> wrote:

> Hi
> I am also woring on join using MapReduce
> i think instead of finding postion of table in RawKeyValuIterator.
> what we can do modify context.write method to alway write key as table name or id
> then we dont need to find postion we can get Key and Value from "reducerContext"
>  
> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
> method join in Reducer.java Reducer class and give call to reducer.join(reduceContext)
>  
>  
> I just wonder how r going to support NON EQUI join.
>  
> I am also having same problem how to do join if datasets cant fit in to memory.
>  
>  
> for now i am cloning using following code :
>  
>  
> KEYIN key = context.getCurrentKey() ;
> KEYIN outKey = null;
> try {
>     outKey = (KEYIN)key.getClass().newInstance();
>    }
> catch(Exception e)
>  {}         
> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
> 
>  Iterable<VALUEIN> values = context.getValues();
>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>  for(VALUEIN value: values) {        
>    VALUEIN outValue = null;
>     try {
>          outValue = (VALUEIN)value.getClass().newInstance();
>    }
>    catch(Exception e)    {}          
>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>  }
>  
>  
> if you have found any other solution please feel free to share
>  
> Thank You.
>  
>        
>  
>  
> 
> 
> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
> In reduce() we have:
> 
> key1 values1
> key2 values2
> ...
> keyn valuesn
> 
> so,what i want to do is join all values like a SQL:
> 
> select * from values1,values2...valuesn;
> 
> if memory is not enough to cache values,how to complete the join operation?
> my idea is clone the reducecontext,but it maybe not easy.
> 
> Any help will be appreciated.
> 
> 
> 2013/3/13 Roth Effy <ef...@gmail.com>
> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks like only support equal join.
> I want a non-equal join,but I have no idea now.
> 
> 
> 2013/3/13 Azuryy Yu <az...@gmail.com>
> you want a n:n join or 1:n join?
> 
> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
> I want to join two table data in reducer.So I need to find the start of the table.
> someone said the DataJoinReducerBase can help me,isn't it?
> 
> 
> 2013/3/13 Azuryy Yu <az...@gmail.com>
> you cannot use RecordReader in Reducer.
>  
> what's the mean of you want get the record position? I cannot understand, can you give a simple example?
> 
> 
> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
> sorry，I still can't understand how to use recordreader in the reduce(),because the input is a RawKeyValueIterator in the class reducecontext.so,I'm confused.
> anyway,thank you.
> 
> 
> 2013/3/12 samir das mohapatra <sa...@gmail.com>
> Through the RecordReader and FileStatus you can get it.
> 
> 
> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com> wrote:
> Hi,everyone,
> I want to join the k-v pairs in Reduce(),but how to get the record position?
> Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method.
> 
> Any help will be appreciated.
> Thank you very much.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> 
> Thanx and Regards
>  Vikas Jadhav

Re: How can I record some position of context in Reduce()?

Posted by Michael Segel <mi...@hotmail.com>.

Hi,

Your cross join is supported in both pig and hive. (Cross, and Theta joins) 

So there must be code to do this. 

Essentially in the reducer you would have your key and then the set of rows that match the key. You would then perform the cross product on the key's result set and output them to the collector as separate rows. 

I'm not sure why you would need the reduce context. 

But then again, I'm still on my first cup of coffee. ;-)


On Apr 9, 2013, at 12:15 AM, Vikas Jadhav <vi...@gmail.com> wrote:

> Hi
> I am also woring on join using MapReduce
> i think instead of finding postion of table in RawKeyValuIterator.
> what we can do modify context.write method to alway write key as table name or id
> then we dont need to find postion we can get Key and Value from "reducerContext"
>  
> befor calling reducer.run(reducerContext) in ReduceTask.java we can  add
> method join in Reducer.java Reducer class and give call to reducer.join(reduceContext)
>  
>  
> I just wonder how r going to support NON EQUI join.
>  
> I am also having same problem how to do join if datasets cant fit in to memory.
>  
>  
> for now i am cloning using following code :
>  
>  
> KEYIN key = context.getCurrentKey() ;
> KEYIN outKey = null;
> try {
>     outKey = (KEYIN)key.getClass().newInstance();
>    }
> catch(Exception e)
>  {}         
> ReflectionUtils.copy(context.getConfiguration(), key, outKey);       
> 
>  Iterable<VALUEIN> values = context.getValues();
>  ArrayList<VALUEIN> myValues = new ArrayList<VALUEIN>();
>  for(VALUEIN value: values) {        
>    VALUEIN outValue = null;
>     try {
>          outValue = (VALUEIN)value.getClass().newInstance();
>    }
>    catch(Exception e)    {}          
>    ReflectionUtils.copy(context.getConfiguration(), value, outValue);
>  }
>  
>  
> if you have found any other solution please feel free to share
>  
> Thank You.
>  
>        
>  
>  
> 
> 
> On Thu, Mar 14, 2013 at 1:53 PM, Roth Effy <ef...@gmail.com> wrote:
> In reduce() we have:
> 
> key1 values1
> key2 values2
> ...
> keyn valuesn
> 
> so,what i want to do is join all values like a SQL:
> 
> select * from values1,values2...valuesn;
> 
> if memory is not enough to cache values,how to complete the join operation?
> my idea is clone the reducecontext,but it maybe not easy.
> 
> Any help will be appreciated.
> 
> 
> 2013/3/13 Roth Effy <ef...@gmail.com>
> I want a n:n join as Cartesian product,but the DataJoinReducerBase looks like only support equal join.
> I want a non-equal join,but I have no idea now.
> 
> 
> 2013/3/13 Azuryy Yu <az...@gmail.com>
> you want a n:n join or 1:n join?
> 
> On Mar 13, 2013 10:51 AM, "Roth Effy" <ef...@gmail.com> wrote:
> I want to join two table data in reducer.So I need to find the start of the table.
> someone said the DataJoinReducerBase can help me,isn't it?
> 
> 
> 2013/3/13 Azuryy Yu <az...@gmail.com>
> you cannot use RecordReader in Reducer.
>  
> what's the mean of you want get the record position? I cannot understand, can you give a simple example?
> 
> 
> On Wed, Mar 13, 2013 at 9:56 AM, Roth Effy <ef...@gmail.com> wrote:
> sorry，I still can't understand how to use recordreader in the reduce(),because the input is a RawKeyValueIterator in the class reducecontext.so,I'm confused.
> anyway,thank you.
> 
> 
> 2013/3/12 samir das mohapatra <sa...@gmail.com>
> Through the RecordReader and FileStatus you can get it.
> 
> 
> On Tue, Mar 12, 2013 at 4:08 PM, Roth Effy <ef...@gmail.com> wrote:
> Hi,everyone,
> I want to join the k-v pairs in Reduce(),but how to get the record position?
> Now,what I thought is to save the context status,but class Context doesn't implement a clone construct method.
> 
> Any help will be appreciated.
> Thank you very much.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> 
> Thanx and Regards
>  Vikas Jadhav