You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Kris Jack <mr...@gmail.com> on 2010/07/09 12:04:21 UTC

Data Types for Item/Element Ids

Hi everyone,

In mahout, I get the impression that most of the data structures require
item/element ids to be longs or ints.  As Mahout is designed to scale up to
huge data sets, would it be useful to also allow strings as ids, such as
universally unique identifiers?  Any comments?

Regards,
Kris

Re: Data Types for Item/Element Ids

Posted by Sean Owen <sr...@gmail.com>.
No it doesn't do that. Maybe you could hack the code to do so, but I
think you'll just find it slows things down significantly to carry
strings through the pipeline this way. I think it's more efficient to
translate at the start and translate back at the end.

On Fri, Jul 9, 2010 at 11:42 AM, Kris Jack <mr...@gmail.com> wrote:
> In terms of labelling dimensions with strings, is it currently possible to
> run the org.apache.mahout.cf.taste.hadoop.item.RecommenderJob and to label
> vectors with string ids so that they will be accepted in the input and
> produced in the output?
>
> Thanks,
> Kris
>
>
>
> 2010/7/9 Sean Owen <sr...@gmail.com>
>
>> Look for IDMigrator and subclasses. It's a band-aid solution; for a
>> large-scale solution you want to use integer IDs.
>>
>> 2010/7/9 Matthias Böhmer <ma...@m-boehmer.de>:
>> > Very good question! Could you please point me to the support of the
>> > library where I can find a mapper form strings to longs for item and
>> > unser IDs? Thank you!
>> >
>> > Matthias
>> >
>>
>

Re: Data Types for Item/Element Ids

Posted by Kris Jack <mr...@gmail.com>.
In terms of labelling dimensions with strings, is it currently possible to
run the org.apache.mahout.cf.taste.hadoop.item.RecommenderJob and to label
vectors with string ids so that they will be accepted in the input and
produced in the output?

Thanks,
Kris



2010/7/9 Sean Owen <sr...@gmail.com>

> Look for IDMigrator and subclasses. It's a band-aid solution; for a
> large-scale solution you want to use integer IDs.
>
> 2010/7/9 Matthias Böhmer <ma...@m-boehmer.de>:
> > Very good question! Could you please point me to the support of the
> > library where I can find a mapper form strings to longs for item and
> > unser IDs? Thank you!
> >
> > Matthias
> >
>

Re: Data Types for Item/Element Ids

Posted by Sean Owen <sr...@gmail.com>.
Look for IDMigrator and subclasses. It's a band-aid solution; for a
large-scale solution you want to use integer IDs.

2010/7/9 Matthias Böhmer <ma...@m-boehmer.de>:
> Very good question! Could you please point me to the support of the
> library where I can find a mapper form strings to longs for item and
> unser IDs? Thank you!
>
> Matthias
>

Re: Data Types for Item/Element Ids

Posted by Matthias Böhmer <ma...@m-boehmer.de>.
Very good question! Could you please point me to the support of the
library where I can find a mapper form strings to longs for item and
unser IDs? Thank you!

Matthias


2010/7/9 Sean Owen <sr...@gmail.com>:
> I believe you are referring to recommenders specifically. It used to
> allow string IDs but it's a lot of overhead. Most applications already
> use integer identifiers for these entities -- and those that don't can
> use the support in the library for building a mapping from strings to
> ints.
>
> Vectors naturally have integer dimensions of course, but can in some
> cases label the dimensions with strings.
>
> The answer depends on what you are referring to but think it's going
> to be integers for the most part, since the caller can map anything to
> integers without much trouble.
>
> On Fri, Jul 9, 2010 at 11:04 AM, Kris Jack <mr...@gmail.com> wrote:
>> Hi everyone,
>>
>> In mahout, I get the impression that most of the data structures require
>> item/element ids to be longs or ints.  As Mahout is designed to scale up to
>> huge data sets, would it be useful to also allow strings as ids, such as
>> universally unique identifiers?  Any comments?
>>
>> Regards,
>> Kris
>>
>



-- 
--

Re: Data Types for Item/Element Ids

Posted by Sean Owen <sr...@gmail.com>.
I believe you are referring to recommenders specifically. It used to
allow string IDs but it's a lot of overhead. Most applications already
use integer identifiers for these entities -- and those that don't can
use the support in the library for building a mapping from strings to
ints.

Vectors naturally have integer dimensions of course, but can in some
cases label the dimensions with strings.

The answer depends on what you are referring to but think it's going
to be integers for the most part, since the caller can map anything to
integers without much trouble.

On Fri, Jul 9, 2010 at 11:04 AM, Kris Jack <mr...@gmail.com> wrote:
> Hi everyone,
>
> In mahout, I get the impression that most of the data structures require
> item/element ids to be longs or ints.  As Mahout is designed to scale up to
> huge data sets, would it be useful to also allow strings as ids, such as
> universally unique identifiers?  Any comments?
>
> Regards,
> Kris
>