You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Reinis Vicups <ma...@orbit-x.de> on 2014/04/08 10:08:39 UTC

Best practice for partial cartesian product

Hi,

this is not mahout question directly, but I figured that you guys most 
likely can answer it.

Actually I have two questions:

1. This: {(1,2); (1,3); (2,3)} is not full cartesian product, right? It 
is missing (1,1); (2,2); (3,3); (2,1);.... My question is - how is it 
called? Partial cartesian? Asymetric cartesian?

2. If I try to build the product I described above in reducer, what 
would be the best practice? My current code look like this:

     @Override
     public void reduce(final VarLongWritable key, final 
Iterable<VarLongWritable> values, final Context context)  {

         final VarLongWritable[] valueArray = Iterables.toArray(values, 
VarLongWritable.class);

         for (int i = 0; i < valueArray.length; i++) {
             for (int j = i + 1; j < valueArray.length; j++) {
                 context.write(new PairWritable(valueArray[i].get(), 
valueArray[j].get()), customerPreferenceWritable);
             }
         }
     }

I don't feel quite right with this solution since I make a copy of 
values in "valueArray" and believe that it will cost me 
OoutOfMemoryExceptions with larger data sets.

thanks and br
reinis

Re: Best practice for partial cartesian product

Posted by Sebastian Schelter <ss...@googlemail.com>.
Have a look at the sampleDown method in RowSimilarityJob:

https://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/RowSimilarityJob.java?view=markup

On 04/08/2014 10:33 AM, Reinis Vicups wrote:
> Sebastian, thank your very much for your response.
>
> Could you or anyone point me to the mahout classes where this is being
> solved?
>
> thank you guys
> reinis
>
> On 08.04.2014 10:27, Sebastian Schelter wrote:
>> I don't know a good name for that. The problems is that a quadratic
>> amount of pairs needs to be emitted here. In our collaborative
>> filtering code, we solve this through downsampling.
>>
>> --sebastian
>>
>> On 04/08/2014 10:08 AM, Reinis Vicups wrote:
>>> Hi,
>>>
>>> this is not mahout question directly, but I figured that you guys most
>>> likely can answer it.
>>>
>>> Actually I have two questions:
>>>
>>> 1. This: {(1,2); (1,3); (2,3)} is not full cartesian product, right? It
>>> is missing (1,1); (2,2); (3,3); (2,1);.... My question is - how is it
>>> called? Partial cartesian? Asymetric cartesian?
>>>
>>> 2. If I try to build the product I described above in reducer, what
>>> would be the best practice? My current code look like this:
>>>
>>>      @Override
>>>      public void reduce(final VarLongWritable key, final
>>> Iterable<VarLongWritable> values, final Context context) {
>>>
>>>          final VarLongWritable[] valueArray = Iterables.toArray(values,
>>> VarLongWritable.class);
>>>
>>>          for (int i = 0; i < valueArray.length; i++) {
>>>              for (int j = i + 1; j < valueArray.length; j++) {
>>>                  context.write(new PairWritable(valueArray[i].get(),
>>> valueArray[j].get()), customerPreferenceWritable);
>>>              }
>>>          }
>>>      }
>>>
>>> I don't feel quite right with this solution since I make a copy of
>>> values in "valueArray" and believe that it will cost me
>>> OoutOfMemoryExceptions with larger data sets.
>>>
>>> thanks and br
>>> reinis
>


Re: Best practice for partial cartesian product

Posted by Reinis Vicups <ma...@orbit-x.de>.
Sebastian, thank your very much for your response.

Could you or anyone point me to the mahout classes where this is being 
solved?

thank you guys
reinis

On 08.04.2014 10:27, Sebastian Schelter wrote:
> I don't know a good name for that. The problems is that a quadratic 
> amount of pairs needs to be emitted here. In our collaborative 
> filtering code, we solve this through downsampling.
>
> --sebastian
>
> On 04/08/2014 10:08 AM, Reinis Vicups wrote:
>> Hi,
>>
>> this is not mahout question directly, but I figured that you guys most
>> likely can answer it.
>>
>> Actually I have two questions:
>>
>> 1. This: {(1,2); (1,3); (2,3)} is not full cartesian product, right? It
>> is missing (1,1); (2,2); (3,3); (2,1);.... My question is - how is it
>> called? Partial cartesian? Asymetric cartesian?
>>
>> 2. If I try to build the product I described above in reducer, what
>> would be the best practice? My current code look like this:
>>
>>      @Override
>>      public void reduce(final VarLongWritable key, final
>> Iterable<VarLongWritable> values, final Context context) {
>>
>>          final VarLongWritable[] valueArray = Iterables.toArray(values,
>> VarLongWritable.class);
>>
>>          for (int i = 0; i < valueArray.length; i++) {
>>              for (int j = i + 1; j < valueArray.length; j++) {
>>                  context.write(new PairWritable(valueArray[i].get(),
>> valueArray[j].get()), customerPreferenceWritable);
>>              }
>>          }
>>      }
>>
>> I don't feel quite right with this solution since I make a copy of
>> values in "valueArray" and believe that it will cost me
>> OoutOfMemoryExceptions with larger data sets.
>>
>> thanks and br
>> reinis


Re: Best practice for partial cartesian product

Posted by Sebastian Schelter <ss...@apache.org>.
I don't know a good name for that. The problems is that a quadratic 
amount of pairs needs to be emitted here. In our collaborative filtering 
code, we solve this through downsampling.

--sebastian

On 04/08/2014 10:08 AM, Reinis Vicups wrote:
> Hi,
>
> this is not mahout question directly, but I figured that you guys most
> likely can answer it.
>
> Actually I have two questions:
>
> 1. This: {(1,2); (1,3); (2,3)} is not full cartesian product, right? It
> is missing (1,1); (2,2); (3,3); (2,1);.... My question is - how is it
> called? Partial cartesian? Asymetric cartesian?
>
> 2. If I try to build the product I described above in reducer, what
> would be the best practice? My current code look like this:
>
>      @Override
>      public void reduce(final VarLongWritable key, final
> Iterable<VarLongWritable> values, final Context context)  {
>
>          final VarLongWritable[] valueArray = Iterables.toArray(values,
> VarLongWritable.class);
>
>          for (int i = 0; i < valueArray.length; i++) {
>              for (int j = i + 1; j < valueArray.length; j++) {
>                  context.write(new PairWritable(valueArray[i].get(),
> valueArray[j].get()), customerPreferenceWritable);
>              }
>          }
>      }
>
> I don't feel quite right with this solution since I make a copy of
> values in "valueArray" and believe that it will cost me
> OoutOfMemoryExceptions with larger data sets.
>
> thanks and br
> reinis