You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2012/10/12 10:59:03 UTC

[jira] [Resolved] (MAHOUT-1055) Change id fields to use LongWritable instead of IntWritable

     [ https://issues.apache.org/jira/browse/MAHOUT-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-1055.
-------------------------------

    Resolution: Won't Fix
    
> Change id fields to use LongWritable instead of IntWritable
> -----------------------------------------------------------
>
>                 Key: MAHOUT-1055
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1055
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.7
>            Reporter: Markus Paaso
>
> Why is IntWritable used as id field type in Mahout CVB? (org.apache.mahout.clustering.lda.cvb.CachingCVB0Mapper)
> Does Long have that significant impact on performance?
> Long is much more usable as id type and int causes compatibility issues like the one below.
> In method org.apache.mahout.utils.vectors.lucene.Driver.getSeqFileWriter() LongWritable is used correctly as id field type.
> I suggest that every IntWritable id should be changed to LongWritable.
> Sequencefile produced by command 'mahout lucene.vector' cannot be handled by command 'mahout cvb' due to this id type incompatibility issue.
> see http://mahout.markmail.org/thread/r3m6ojkpbzlxxizy

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira