You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Reinis Vicups <ma...@orbit-x.de> on 2014/04/08 10:08:39 UTC
Best practice for partial cartesian product
Hi,
this is not mahout question directly, but I figured that you guys most
likely can answer it.
Actually I have two questions:
1. This: {(1,2); (1,3); (2,3)} is not full cartesian product, right? It
is missing (1,1); (2,2); (3,3); (2,1);.... My question is - how is it
called? Partial cartesian? Asymetric cartesian?
2. If I try to build the product I described above in reducer, what
would be the best practice? My current code look like this:
@Override
public void reduce(final VarLongWritable key, final
Iterable<VarLongWritable> values, final Context context) {
final VarLongWritable[] valueArray = Iterables.toArray(values,
VarLongWritable.class);
for (int i = 0; i < valueArray.length; i++) {
for (int j = i + 1; j < valueArray.length; j++) {
context.write(new PairWritable(valueArray[i].get(),
valueArray[j].get()), customerPreferenceWritable);
}
}
}
I don't feel quite right with this solution since I make a copy of
values in "valueArray" and believe that it will cost me
OoutOfMemoryExceptions with larger data sets.
thanks and br
reinis
Re: Best practice for partial cartesian product
Posted by Sebastian Schelter <ss...@googlemail.com>.
Have a look at the sampleDown method in RowSimilarityJob:
https://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/RowSimilarityJob.java?view=markup
On 04/08/2014 10:33 AM, Reinis Vicups wrote:
> Sebastian, thank your very much for your response.
>
> Could you or anyone point me to the mahout classes where this is being
> solved?
>
> thank you guys
> reinis
>
> On 08.04.2014 10:27, Sebastian Schelter wrote:
>> I don't know a good name for that. The problems is that a quadratic
>> amount of pairs needs to be emitted here. In our collaborative
>> filtering code, we solve this through downsampling.
>>
>> --sebastian
>>
>> On 04/08/2014 10:08 AM, Reinis Vicups wrote:
>>> Hi,
>>>
>>> this is not mahout question directly, but I figured that you guys most
>>> likely can answer it.
>>>
>>> Actually I have two questions:
>>>
>>> 1. This: {(1,2); (1,3); (2,3)} is not full cartesian product, right? It
>>> is missing (1,1); (2,2); (3,3); (2,1);.... My question is - how is it
>>> called? Partial cartesian? Asymetric cartesian?
>>>
>>> 2. If I try to build the product I described above in reducer, what
>>> would be the best practice? My current code look like this:
>>>
>>> @Override
>>> public void reduce(final VarLongWritable key, final
>>> Iterable<VarLongWritable> values, final Context context) {
>>>
>>> final VarLongWritable[] valueArray = Iterables.toArray(values,
>>> VarLongWritable.class);
>>>
>>> for (int i = 0; i < valueArray.length; i++) {
>>> for (int j = i + 1; j < valueArray.length; j++) {
>>> context.write(new PairWritable(valueArray[i].get(),
>>> valueArray[j].get()), customerPreferenceWritable);
>>> }
>>> }
>>> }
>>>
>>> I don't feel quite right with this solution since I make a copy of
>>> values in "valueArray" and believe that it will cost me
>>> OoutOfMemoryExceptions with larger data sets.
>>>
>>> thanks and br
>>> reinis
>
Re: Best practice for partial cartesian product
Posted by Reinis Vicups <ma...@orbit-x.de>.
Sebastian, thank your very much for your response.
Could you or anyone point me to the mahout classes where this is being
solved?
thank you guys
reinis
On 08.04.2014 10:27, Sebastian Schelter wrote:
> I don't know a good name for that. The problems is that a quadratic
> amount of pairs needs to be emitted here. In our collaborative
> filtering code, we solve this through downsampling.
>
> --sebastian
>
> On 04/08/2014 10:08 AM, Reinis Vicups wrote:
>> Hi,
>>
>> this is not mahout question directly, but I figured that you guys most
>> likely can answer it.
>>
>> Actually I have two questions:
>>
>> 1. This: {(1,2); (1,3); (2,3)} is not full cartesian product, right? It
>> is missing (1,1); (2,2); (3,3); (2,1);.... My question is - how is it
>> called? Partial cartesian? Asymetric cartesian?
>>
>> 2. If I try to build the product I described above in reducer, what
>> would be the best practice? My current code look like this:
>>
>> @Override
>> public void reduce(final VarLongWritable key, final
>> Iterable<VarLongWritable> values, final Context context) {
>>
>> final VarLongWritable[] valueArray = Iterables.toArray(values,
>> VarLongWritable.class);
>>
>> for (int i = 0; i < valueArray.length; i++) {
>> for (int j = i + 1; j < valueArray.length; j++) {
>> context.write(new PairWritable(valueArray[i].get(),
>> valueArray[j].get()), customerPreferenceWritable);
>> }
>> }
>> }
>>
>> I don't feel quite right with this solution since I make a copy of
>> values in "valueArray" and believe that it will cost me
>> OoutOfMemoryExceptions with larger data sets.
>>
>> thanks and br
>> reinis
Re: Best practice for partial cartesian product
Posted by Sebastian Schelter <ss...@apache.org>.
I don't know a good name for that. The problems is that a quadratic
amount of pairs needs to be emitted here. In our collaborative filtering
code, we solve this through downsampling.
--sebastian
On 04/08/2014 10:08 AM, Reinis Vicups wrote:
> Hi,
>
> this is not mahout question directly, but I figured that you guys most
> likely can answer it.
>
> Actually I have two questions:
>
> 1. This: {(1,2); (1,3); (2,3)} is not full cartesian product, right? It
> is missing (1,1); (2,2); (3,3); (2,1);.... My question is - how is it
> called? Partial cartesian? Asymetric cartesian?
>
> 2. If I try to build the product I described above in reducer, what
> would be the best practice? My current code look like this:
>
> @Override
> public void reduce(final VarLongWritable key, final
> Iterable<VarLongWritable> values, final Context context) {
>
> final VarLongWritable[] valueArray = Iterables.toArray(values,
> VarLongWritable.class);
>
> for (int i = 0; i < valueArray.length; i++) {
> for (int j = i + 1; j < valueArray.length; j++) {
> context.write(new PairWritable(valueArray[i].get(),
> valueArray[j].get()), customerPreferenceWritable);
> }
> }
> }
>
> I don't feel quite right with this solution since I make a copy of
> values in "valueArray" and believe that it will cost me
> OoutOfMemoryExceptions with larger data sets.
>
> thanks and br
> reinis