You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Claudia Grieco <gr...@crmpa.unisa.it> on 2012/03/05 09:53:46 UTC

Using recommenders with String identifiers

Hi guys,

I'd like to use mahout to implement a recommender but I'm encountering a
problem:

Ids of items and users are represented in Mahout as long integers, while my
data comes from an external database that uses strings to identify items and
users.

Any suggestion as to how I can fix this problem?

Thanks a lot

Claudia


Re: R: Using recommenders with String identifiers

Posted by Sean Owen <sr...@gmail.com>.
In this case, the code in question is the non-distributed code rather
than Hadoop. But yes I agree it will make a perhaps bigger difference
on Hadoop. All of the Hadoop stuff uses integer keys.

On Fri, Mar 9, 2012 at 2:10 AM, Paritosh Ranjan <pr...@xebia.com> wrote:
> Are these identifiers used as keys for mappers somewhere?
> If yes, then the sorting phase of map reduce will be much faster with long,
> as the key comparison time will be less ( long comparison will take less
> time than String comparison, due to lesser number of bytes  ) as well as
> more records can be kept in memory while sorting ( because the size is less
> ).
> I was once processing 1 billion records and just changing the keys from
> String to Long increased the performance by 20%.
>
> Ignore if this is not the case.
>

Re: R: Using recommenders with String identifiers

Posted by Paritosh Ranjan <pr...@xebia.com>.
Are these identifiers used as keys for mappers somewhere?
If yes, then the sorting phase of map reduce will be much faster with 
long, as the key comparison time will be less ( long comparison will 
take less time than String comparison, due to lesser number of bytes  ) 
as well as more records can be kept in memory while sorting ( because 
the size is less ).
I was once processing 1 billion records and just changing the keys from 
String to Long increased the performance by 20%.

Ignore if this is not the case.

On 08-03-2012 19:23, Manuel Blechschmidt wrote:
> Hallo Claudia,
> the reason why longs are use is pure efficiency. When you have a lot of things and a lot of users and you are using Strings as identifiers you will need a lot of memory just for saving them. Further processes like equals or hash codes will take longer.
>
> So a long has 4 bytes (64 bits) a UUID string (e.g. 936DA01F-9ABD-4D9D-80C7-02AF85C822A8) encoded as utf-16 has 72 bytes that means that UUID would consume more then18x the memory that longs are taking.
>
> /Manuel
>
>
> On 08.03.2012, at 14:27, Claudia Grieco wrote:
>
>> Do you think it's worth the work to change the internal code of Mahout in
>> order to use string identifiers?
>> Thanks
>> Claudia
>>
>> -----Messaggio originale-----
>> Da: Manuel Blechschmidt [mailto:Manuel.Blechschmidt@gmx.de]
>> Inviato: lunedì 5 marzo 2012 11.28
>> A: user@mahout.apache.org
>> Oggetto: Re: Using recommenders with String identifiers
>>
>> Hi Claudia,
>> you have to use an IDMigrator.
>>
>> The following projects shows you an example:
>> https://github.com/ManuelB/facebook-recommender-demo
>>
>> https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/ja
>> va/de/apaxo/bedcon/FacebookRecommender.java
>>
>> Good luck
>>     Manuel
>>
>> On 05.03.2012, at 09:53, Claudia Grieco wrote:
>>
>>> Hi guys,
>>>
>>> I'd like to use mahout to implement a recommender but I'm encountering a
>>> problem:
>>>
>>> Ids of items and users are represented in Mahout as long integers, while
>> my
>>> data comes from an external database that uses strings to identify items
>> and
>>> users.
>>>
>>> Any suggestion as to how I can fix this problem?
>>>
>>> Thanks a lot
>>>
>>> Claudia
>>>
>> -- 
>> Manuel Blechschmidt
>> Dortustr. 57
>> 14467 Potsdam
>> Mobil: 0173/6322621
>> Twitter: http://twitter.com/Manuel_B
>>
>>


Re: R: R: R: Using recommenders with String identifiers

Posted by Daniel Glauser <da...@gmail.com>.
I have some custom Clojure code that maps strings to longs for my
particular data set, stores the values in a set and writes them to a file.
Will try and post some code in the next couple of weeks.

Daniel
 On Mar 8, 2012 9:21 AM, "Claudia Grieco" <gr...@crmpa.unisa.it> wrote:

> Thanks guys for the help
> Claudia
>
> -----Messaggio originale-----
> Da: Manuel Blechschmidt [mailto:Manuel.Blechschmidt@gmx.de]
> Inviato: giovedì 8 marzo 2012 16.15
> A: user@mahout.apache.org
> Oggetto: Re: R: R: Using recommenders with String identifiers
>
> Hi Claudia,
> actually a kind of. With the IDMigrator it depends how you store them. You
> can store them in memory, in a database or in a file.
>
> Further if you would use strings these strings would get copied multiple
> times and therefore would use multiple times the amount of there memory.
>
> So you could supply a recommender implementation which is doing the String
> Long mapping transparently for the user and put in on github. Currently
> there is a lack of easy to understand examples. I tried to help a little
> bit
> with my facebook-recommender-demo.
>
> /Manuel
>
> On 08.03.2012, at 15:52, Claudia Grieco wrote:
>
> > I understand, but with IDMigrator I still need the memory to store the
> > long-string mappings, isn't it?
> >
> > -----Messaggio originale-----
> > Da: Sebastian Schelter [mailto:ssc@apache.org]
> > Inviato: giovedì 8 marzo 2012 15.27
> > A: user@mahout.apache.org
> > Oggetto: Re: R: Using recommenders with String identifiers
> >
> > Here's some details on the memory usage of Strings in Java:
> >
> > http://www.javamex.com/tutorials/memory/string_memory_usage.shtml
> >
> > On 08.03.2012 14:53, Manuel Blechschmidt wrote:
> >> Hallo Claudia,
> >> the reason why longs are use is pure efficiency. When you have a lot of
> > things and a lot of users and you are using Strings as identifiers you
> will
> > need a lot of memory just for saving them. Further processes like equals
> or
> > hash codes will take longer.
> >>
> >> So a long has 4 bytes (64 bits) a UUID string (e.g.
> > 936DA01F-9ABD-4D9D-80C7-02AF85C822A8) encoded as utf-16 has 72 bytes that
> > means that UUID would consume more then18x the memory that longs are
> taking.
> >>
> >> /Manuel
> >>
> >>
> >> On 08.03.2012, at 14:27, Claudia Grieco wrote:
> >>
> >>> Do you think it's worth the work to change the internal code of Mahout
> in
> >>> order to use string identifiers?
> >>> Thanks
> >>> Claudia
> >>>
> >>> -----Messaggio originale-----
> >>> Da: Manuel Blechschmidt [mailto:Manuel.Blechschmidt@gmx.de]
> >>> Inviato: lunedì 5 marzo 2012 11.28
> >>> A: user@mahout.apache.org
> >>> Oggetto: Re: Using recommenders with String identifiers
> >>>
> >>> Hi Claudia,
> >>> you have to use an IDMigrator.
> >>>
> >>> The following projects shows you an example:
> >>> https://github.com/ManuelB/facebook-recommender-demo
> >>>
> >>>
> >
>
> https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/ja
> >>> va/de/apaxo/bedcon/FacebookRecommender.java
> >>>
> >>> Good luck
> >>>   Manuel
> >>>
> >>> On 05.03.2012, at 09:53, Claudia Grieco wrote:
> >>>
> >>>> Hi guys,
> >>>>
> >>>> I'd like to use mahout to implement a recommender but I'm encountering
> a
> >>>> problem:
> >>>>
> >>>> Ids of items and users are represented in Mahout as long integers,
> while
> >>> my
> >>>> data comes from an external database that uses strings to identify
> items
> >>> and
> >>>> users.
> >>>>
> >>>> Any suggestion as to how I can fix this problem?
> >>>>
> >>>> Thanks a lot
> >>>>
> >>>> Claudia
> >>>>
> >>>
> >>> --
> >>> Manuel Blechschmidt
> >>> Dortustr. 57
> >>> 14467 Potsdam
> >>> Mobil: 0173/6322621
> >>> Twitter: http://twitter.com/Manuel_B
> >>>
> >>>
> >>
> >
>
> --
> Manuel Blechschmidt
> Dortustr. 57
> 14467 Potsdam
> Mobil: 0173/6322621
> Twitter: http://twitter.com/Manuel_B
>
>
>

R: R: R: Using recommenders with String identifiers

Posted by Claudia Grieco <gr...@crmpa.unisa.it>.
Thanks guys for the help
Claudia

-----Messaggio originale-----
Da: Manuel Blechschmidt [mailto:Manuel.Blechschmidt@gmx.de] 
Inviato: giovedì 8 marzo 2012 16.15
A: user@mahout.apache.org
Oggetto: Re: R: R: Using recommenders with String identifiers

Hi Claudia,
actually a kind of. With the IDMigrator it depends how you store them. You
can store them in memory, in a database or in a file.

Further if you would use strings these strings would get copied multiple
times and therefore would use multiple times the amount of there memory.

So you could supply a recommender implementation which is doing the String
Long mapping transparently for the user and put in on github. Currently
there is a lack of easy to understand examples. I tried to help a little bit
with my facebook-recommender-demo.

/Manuel

On 08.03.2012, at 15:52, Claudia Grieco wrote:

> I understand, but with IDMigrator I still need the memory to store the
> long-string mappings, isn't it?
> 
> -----Messaggio originale-----
> Da: Sebastian Schelter [mailto:ssc@apache.org] 
> Inviato: giovedì 8 marzo 2012 15.27
> A: user@mahout.apache.org
> Oggetto: Re: R: Using recommenders with String identifiers
> 
> Here's some details on the memory usage of Strings in Java:
> 
> http://www.javamex.com/tutorials/memory/string_memory_usage.shtml
> 
> On 08.03.2012 14:53, Manuel Blechschmidt wrote:
>> Hallo Claudia,
>> the reason why longs are use is pure efficiency. When you have a lot of
> things and a lot of users and you are using Strings as identifiers you
will
> need a lot of memory just for saving them. Further processes like equals
or
> hash codes will take longer.
>> 
>> So a long has 4 bytes (64 bits) a UUID string (e.g.
> 936DA01F-9ABD-4D9D-80C7-02AF85C822A8) encoded as utf-16 has 72 bytes that
> means that UUID would consume more then18x the memory that longs are
taking.
>> 
>> /Manuel
>> 
>> 
>> On 08.03.2012, at 14:27, Claudia Grieco wrote:
>> 
>>> Do you think it's worth the work to change the internal code of Mahout
in
>>> order to use string identifiers?
>>> Thanks 
>>> Claudia
>>> 
>>> -----Messaggio originale-----
>>> Da: Manuel Blechschmidt [mailto:Manuel.Blechschmidt@gmx.de] 
>>> Inviato: lunedì 5 marzo 2012 11.28
>>> A: user@mahout.apache.org
>>> Oggetto: Re: Using recommenders with String identifiers
>>> 
>>> Hi Claudia,
>>> you have to use an IDMigrator.
>>> 
>>> The following projects shows you an example:
>>> https://github.com/ManuelB/facebook-recommender-demo
>>> 
>>> 
>
https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/ja
>>> va/de/apaxo/bedcon/FacebookRecommender.java
>>> 
>>> Good luck
>>>   Manuel
>>> 
>>> On 05.03.2012, at 09:53, Claudia Grieco wrote:
>>> 
>>>> Hi guys,
>>>> 
>>>> I'd like to use mahout to implement a recommender but I'm encountering
a
>>>> problem:
>>>> 
>>>> Ids of items and users are represented in Mahout as long integers,
while
>>> my
>>>> data comes from an external database that uses strings to identify
items
>>> and
>>>> users.
>>>> 
>>>> Any suggestion as to how I can fix this problem?
>>>> 
>>>> Thanks a lot
>>>> 
>>>> Claudia
>>>> 
>>> 
>>> -- 
>>> Manuel Blechschmidt
>>> Dortustr. 57
>>> 14467 Potsdam
>>> Mobil: 0173/6322621
>>> Twitter: http://twitter.com/Manuel_B
>>> 
>>> 
>> 
> 

-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B



Re: R: R: Using recommenders with String identifiers

Posted by Manuel Blechschmidt <Ma...@gmx.de>.
Hi Claudia,
actually a kind of. With the IDMigrator it depends how you store them. You can store them in memory, in a database or in a file.

Further if you would use strings these strings would get copied multiple times and therefore would use multiple times the amount of there memory.

So you could supply a recommender implementation which is doing the String Long mapping transparently for the user and put in on github. Currently there is a lack of easy to understand examples. I tried to help a little bit with my facebook-recommender-demo.

/Manuel

On 08.03.2012, at 15:52, Claudia Grieco wrote:

> I understand, but with IDMigrator I still need the memory to store the
> long-string mappings, isn't it?
> 
> -----Messaggio originale-----
> Da: Sebastian Schelter [mailto:ssc@apache.org] 
> Inviato: giovedì 8 marzo 2012 15.27
> A: user@mahout.apache.org
> Oggetto: Re: R: Using recommenders with String identifiers
> 
> Here's some details on the memory usage of Strings in Java:
> 
> http://www.javamex.com/tutorials/memory/string_memory_usage.shtml
> 
> On 08.03.2012 14:53, Manuel Blechschmidt wrote:
>> Hallo Claudia,
>> the reason why longs are use is pure efficiency. When you have a lot of
> things and a lot of users and you are using Strings as identifiers you will
> need a lot of memory just for saving them. Further processes like equals or
> hash codes will take longer.
>> 
>> So a long has 4 bytes (64 bits) a UUID string (e.g.
> 936DA01F-9ABD-4D9D-80C7-02AF85C822A8) encoded as utf-16 has 72 bytes that
> means that UUID would consume more then18x the memory that longs are taking.
>> 
>> /Manuel
>> 
>> 
>> On 08.03.2012, at 14:27, Claudia Grieco wrote:
>> 
>>> Do you think it's worth the work to change the internal code of Mahout in
>>> order to use string identifiers?
>>> Thanks 
>>> Claudia
>>> 
>>> -----Messaggio originale-----
>>> Da: Manuel Blechschmidt [mailto:Manuel.Blechschmidt@gmx.de] 
>>> Inviato: lunedì 5 marzo 2012 11.28
>>> A: user@mahout.apache.org
>>> Oggetto: Re: Using recommenders with String identifiers
>>> 
>>> Hi Claudia,
>>> you have to use an IDMigrator.
>>> 
>>> The following projects shows you an example:
>>> https://github.com/ManuelB/facebook-recommender-demo
>>> 
>>> 
> https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/ja
>>> va/de/apaxo/bedcon/FacebookRecommender.java
>>> 
>>> Good luck
>>>   Manuel
>>> 
>>> On 05.03.2012, at 09:53, Claudia Grieco wrote:
>>> 
>>>> Hi guys,
>>>> 
>>>> I'd like to use mahout to implement a recommender but I'm encountering a
>>>> problem:
>>>> 
>>>> Ids of items and users are represented in Mahout as long integers, while
>>> my
>>>> data comes from an external database that uses strings to identify items
>>> and
>>>> users.
>>>> 
>>>> Any suggestion as to how I can fix this problem?
>>>> 
>>>> Thanks a lot
>>>> 
>>>> Claudia
>>>> 
>>> 
>>> -- 
>>> Manuel Blechschmidt
>>> Dortustr. 57
>>> 14467 Potsdam
>>> Mobil: 0173/6322621
>>> Twitter: http://twitter.com/Manuel_B
>>> 
>>> 
>> 
> 

-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B


R: R: Using recommenders with String identifiers

Posted by Claudia Grieco <gr...@crmpa.unisa.it>.
I understand, but with IDMigrator I still need the memory to store the
long-string mappings, isn't it?

-----Messaggio originale-----
Da: Sebastian Schelter [mailto:ssc@apache.org] 
Inviato: giovedì 8 marzo 2012 15.27
A: user@mahout.apache.org
Oggetto: Re: R: Using recommenders with String identifiers

Here's some details on the memory usage of Strings in Java:

http://www.javamex.com/tutorials/memory/string_memory_usage.shtml

On 08.03.2012 14:53, Manuel Blechschmidt wrote:
> Hallo Claudia,
> the reason why longs are use is pure efficiency. When you have a lot of
things and a lot of users and you are using Strings as identifiers you will
need a lot of memory just for saving them. Further processes like equals or
hash codes will take longer.
> 
> So a long has 4 bytes (64 bits) a UUID string (e.g.
936DA01F-9ABD-4D9D-80C7-02AF85C822A8) encoded as utf-16 has 72 bytes that
means that UUID would consume more then18x the memory that longs are taking.
> 
> /Manuel
> 
> 
> On 08.03.2012, at 14:27, Claudia Grieco wrote:
> 
>> Do you think it's worth the work to change the internal code of Mahout in
>> order to use string identifiers?
>> Thanks 
>> Claudia
>>
>> -----Messaggio originale-----
>> Da: Manuel Blechschmidt [mailto:Manuel.Blechschmidt@gmx.de] 
>> Inviato: lunedì 5 marzo 2012 11.28
>> A: user@mahout.apache.org
>> Oggetto: Re: Using recommenders with String identifiers
>>
>> Hi Claudia,
>> you have to use an IDMigrator.
>>
>> The following projects shows you an example:
>> https://github.com/ManuelB/facebook-recommender-demo
>>
>>
https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/ja
>> va/de/apaxo/bedcon/FacebookRecommender.java
>>
>> Good luck
>>    Manuel
>>
>> On 05.03.2012, at 09:53, Claudia Grieco wrote:
>>
>>> Hi guys,
>>>
>>> I'd like to use mahout to implement a recommender but I'm encountering a
>>> problem:
>>>
>>> Ids of items and users are represented in Mahout as long integers, while
>> my
>>> data comes from an external database that uses strings to identify items
>> and
>>> users.
>>>
>>> Any suggestion as to how I can fix this problem?
>>>
>>> Thanks a lot
>>>
>>> Claudia
>>>
>>
>> -- 
>> Manuel Blechschmidt
>> Dortustr. 57
>> 14467 Potsdam
>> Mobil: 0173/6322621
>> Twitter: http://twitter.com/Manuel_B
>>
>>
> 


Re: R: Using recommenders with String identifiers

Posted by Sebastian Schelter <ss...@apache.org>.
Here's some details on the memory usage of Strings in Java:

http://www.javamex.com/tutorials/memory/string_memory_usage.shtml

On 08.03.2012 14:53, Manuel Blechschmidt wrote:
> Hallo Claudia,
> the reason why longs are use is pure efficiency. When you have a lot of things and a lot of users and you are using Strings as identifiers you will need a lot of memory just for saving them. Further processes like equals or hash codes will take longer.
> 
> So a long has 4 bytes (64 bits) a UUID string (e.g. 936DA01F-9ABD-4D9D-80C7-02AF85C822A8) encoded as utf-16 has 72 bytes that means that UUID would consume more then18x the memory that longs are taking.
> 
> /Manuel
> 
> 
> On 08.03.2012, at 14:27, Claudia Grieco wrote:
> 
>> Do you think it's worth the work to change the internal code of Mahout in
>> order to use string identifiers?
>> Thanks 
>> Claudia
>>
>> -----Messaggio originale-----
>> Da: Manuel Blechschmidt [mailto:Manuel.Blechschmidt@gmx.de] 
>> Inviato: lunedì 5 marzo 2012 11.28
>> A: user@mahout.apache.org
>> Oggetto: Re: Using recommenders with String identifiers
>>
>> Hi Claudia,
>> you have to use an IDMigrator.
>>
>> The following projects shows you an example:
>> https://github.com/ManuelB/facebook-recommender-demo
>>
>> https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/ja
>> va/de/apaxo/bedcon/FacebookRecommender.java
>>
>> Good luck
>>    Manuel
>>
>> On 05.03.2012, at 09:53, Claudia Grieco wrote:
>>
>>> Hi guys,
>>>
>>> I'd like to use mahout to implement a recommender but I'm encountering a
>>> problem:
>>>
>>> Ids of items and users are represented in Mahout as long integers, while
>> my
>>> data comes from an external database that uses strings to identify items
>> and
>>> users.
>>>
>>> Any suggestion as to how I can fix this problem?
>>>
>>> Thanks a lot
>>>
>>> Claudia
>>>
>>
>> -- 
>> Manuel Blechschmidt
>> Dortustr. 57
>> 14467 Potsdam
>> Mobil: 0173/6322621
>> Twitter: http://twitter.com/Manuel_B
>>
>>
> 


Re: R: Using recommenders with String identifiers

Posted by Manuel Blechschmidt <Ma...@gmx.de>.
Yes, sorry. You are correct.

So basically then it would consume only at least 9x the memory.

/Manuel

On 08.03.2012, at 15:01, Sebastian Schelter wrote:

> A long has 8 bytes :)
> 
> On 08.03.2012 14:53, Manuel Blechschmidt wrote:
>> Hallo Claudia,
>> the reason why longs are use is pure efficiency. When you have a lot of things and a lot of users and you are using Strings as identifiers you will need a lot of memory just for saving them. Further processes like equals or hash codes will take longer.
>> 
>> So a long has 4 bytes (64 bits) a UUID string (e.g. 936DA01F-9ABD-4D9D-80C7-02AF85C822A8) encoded as utf-16 has 72 bytes that means that UUID would consume more then18x the memory that longs are taking.
>> 
>> /Manuel
>> 
>> 
>> On 08.03.2012, at 14:27, Claudia Grieco wrote:
>> 
>>> Do you think it's worth the work to change the internal code of Mahout in
>>> order to use string identifiers?
>>> Thanks 
>>> Claudia
>>> 
>>> -----Messaggio originale-----
>>> Da: Manuel Blechschmidt [mailto:Manuel.Blechschmidt@gmx.de] 
>>> Inviato: lunedì 5 marzo 2012 11.28
>>> A: user@mahout.apache.org
>>> Oggetto: Re: Using recommenders with String identifiers
>>> 
>>> Hi Claudia,
>>> you have to use an IDMigrator.
>>> 
>>> The following projects shows you an example:
>>> https://github.com/ManuelB/facebook-recommender-demo
>>> 
>>> https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/ja
>>> va/de/apaxo/bedcon/FacebookRecommender.java
>>> 
>>> Good luck
>>>   Manuel
>>> 
>>> On 05.03.2012, at 09:53, Claudia Grieco wrote:
>>> 
>>>> Hi guys,
>>>> 
>>>> I'd like to use mahout to implement a recommender but I'm encountering a
>>>> problem:
>>>> 
>>>> Ids of items and users are represented in Mahout as long integers, while
>>> my
>>>> data comes from an external database that uses strings to identify items
>>> and
>>>> users.
>>>> 
>>>> Any suggestion as to how I can fix this problem?
>>>> 
>>>> Thanks a lot
>>>> 
>>>> Claudia
>>>> 
>>> 
>>> -- 
>>> Manuel Blechschmidt
>>> Dortustr. 57
>>> 14467 Potsdam
>>> Mobil: 0173/6322621
>>> Twitter: http://twitter.com/Manuel_B
>>> 
>>> 
>> 
> 

-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B


Re: R: Using recommenders with String identifiers

Posted by Sebastian Schelter <ss...@apache.org>.
A long has 8 bytes :)

On 08.03.2012 14:53, Manuel Blechschmidt wrote:
> Hallo Claudia,
> the reason why longs are use is pure efficiency. When you have a lot of things and a lot of users and you are using Strings as identifiers you will need a lot of memory just for saving them. Further processes like equals or hash codes will take longer.
> 
> So a long has 4 bytes (64 bits) a UUID string (e.g. 936DA01F-9ABD-4D9D-80C7-02AF85C822A8) encoded as utf-16 has 72 bytes that means that UUID would consume more then18x the memory that longs are taking.
> 
> /Manuel
> 
> 
> On 08.03.2012, at 14:27, Claudia Grieco wrote:
> 
>> Do you think it's worth the work to change the internal code of Mahout in
>> order to use string identifiers?
>> Thanks 
>> Claudia
>>
>> -----Messaggio originale-----
>> Da: Manuel Blechschmidt [mailto:Manuel.Blechschmidt@gmx.de] 
>> Inviato: lunedì 5 marzo 2012 11.28
>> A: user@mahout.apache.org
>> Oggetto: Re: Using recommenders with String identifiers
>>
>> Hi Claudia,
>> you have to use an IDMigrator.
>>
>> The following projects shows you an example:
>> https://github.com/ManuelB/facebook-recommender-demo
>>
>> https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/ja
>> va/de/apaxo/bedcon/FacebookRecommender.java
>>
>> Good luck
>>    Manuel
>>
>> On 05.03.2012, at 09:53, Claudia Grieco wrote:
>>
>>> Hi guys,
>>>
>>> I'd like to use mahout to implement a recommender but I'm encountering a
>>> problem:
>>>
>>> Ids of items and users are represented in Mahout as long integers, while
>> my
>>> data comes from an external database that uses strings to identify items
>> and
>>> users.
>>>
>>> Any suggestion as to how I can fix this problem?
>>>
>>> Thanks a lot
>>>
>>> Claudia
>>>
>>
>> -- 
>> Manuel Blechschmidt
>> Dortustr. 57
>> 14467 Potsdam
>> Mobil: 0173/6322621
>> Twitter: http://twitter.com/Manuel_B
>>
>>
> 


Re: R: Using recommenders with String identifiers

Posted by Manuel Blechschmidt <Ma...@gmx.de>.
Hallo Claudia,
the reason why longs are use is pure efficiency. When you have a lot of things and a lot of users and you are using Strings as identifiers you will need a lot of memory just for saving them. Further processes like equals or hash codes will take longer.

So a long has 4 bytes (64 bits) a UUID string (e.g. 936DA01F-9ABD-4D9D-80C7-02AF85C822A8) encoded as utf-16 has 72 bytes that means that UUID would consume more then18x the memory that longs are taking.

/Manuel


On 08.03.2012, at 14:27, Claudia Grieco wrote:

> Do you think it's worth the work to change the internal code of Mahout in
> order to use string identifiers?
> Thanks 
> Claudia
> 
> -----Messaggio originale-----
> Da: Manuel Blechschmidt [mailto:Manuel.Blechschmidt@gmx.de] 
> Inviato: lunedì 5 marzo 2012 11.28
> A: user@mahout.apache.org
> Oggetto: Re: Using recommenders with String identifiers
> 
> Hi Claudia,
> you have to use an IDMigrator.
> 
> The following projects shows you an example:
> https://github.com/ManuelB/facebook-recommender-demo
> 
> https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/ja
> va/de/apaxo/bedcon/FacebookRecommender.java
> 
> Good luck
>    Manuel
> 
> On 05.03.2012, at 09:53, Claudia Grieco wrote:
> 
>> Hi guys,
>> 
>> I'd like to use mahout to implement a recommender but I'm encountering a
>> problem:
>> 
>> Ids of items and users are represented in Mahout as long integers, while
> my
>> data comes from an external database that uses strings to identify items
> and
>> users.
>> 
>> Any suggestion as to how I can fix this problem?
>> 
>> Thanks a lot
>> 
>> Claudia
>> 
> 
> -- 
> Manuel Blechschmidt
> Dortustr. 57
> 14467 Potsdam
> Mobil: 0173/6322621
> Twitter: http://twitter.com/Manuel_B
> 
> 

-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B


Re: Using recommenders with String identifiers

Posted by Sean Owen <sr...@gmail.com>.
No. It used to work this way, but was removed just because you get
much better memory and performance using longs. It would be a lot of
surgery to undo this.

The best answer is to use longs. If you must use strings, IDMigrator
does the trick quite well.

On Thu, Mar 8, 2012 at 1:27 PM, Claudia Grieco <gr...@crmpa.unisa.it> wrote:
> Do you think it's worth the work to change the internal code of Mahout in
> order to use string identifiers?
> Thanks
> Claudia
>

R: Using recommenders with String identifiers

Posted by Claudia Grieco <gr...@crmpa.unisa.it>.
Do you think it's worth the work to change the internal code of Mahout in
order to use string identifiers?
Thanks 
Claudia

-----Messaggio originale-----
Da: Manuel Blechschmidt [mailto:Manuel.Blechschmidt@gmx.de] 
Inviato: lunedì 5 marzo 2012 11.28
A: user@mahout.apache.org
Oggetto: Re: Using recommenders with String identifiers

Hi Claudia,
you have to use an IDMigrator.

The following projects shows you an example:
https://github.com/ManuelB/facebook-recommender-demo

https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/ja
va/de/apaxo/bedcon/FacebookRecommender.java

Good luck
    Manuel

On 05.03.2012, at 09:53, Claudia Grieco wrote:

> Hi guys,
> 
> I'd like to use mahout to implement a recommender but I'm encountering a
> problem:
> 
> Ids of items and users are represented in Mahout as long integers, while
my
> data comes from an external database that uses strings to identify items
and
> users.
> 
> Any suggestion as to how I can fix this problem?
> 
> Thanks a lot
> 
> Claudia
> 

-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B



Re: Using recommenders with String identifiers

Posted by Manuel Blechschmidt <Ma...@gmx.de>.
Hi Claudia,
you have to use an IDMigrator.

The following projects shows you an example:
https://github.com/ManuelB/facebook-recommender-demo

https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/FacebookRecommender.java

Good luck
    Manuel

On 05.03.2012, at 09:53, Claudia Grieco wrote:

> Hi guys,
> 
> I'd like to use mahout to implement a recommender but I'm encountering a
> problem:
> 
> Ids of items and users are represented in Mahout as long integers, while my
> data comes from an external database that uses strings to identify items and
> users.
> 
> Any suggestion as to how I can fix this problem?
> 
> Thanks a lot
> 
> Claudia
> 

-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B