You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2009/08/04 08:49:15 UTC

[jira] Resolved: (MAHOUT-154) Reduce memory usage with smarter data structures

     [ https://issues.apache.org/jira/browse/MAHOUT-154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-154.
------------------------------

    Resolution: Fixed

Committed with MAHOUT-151

> Reduce memory usage with smarter data structures
> ------------------------------------------------
>
>                 Key: MAHOUT-154
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-154
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.2
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>             Fix For: 0.2
>
>         Attachments: MAHOUT-151-154.patch
>
>
> Memory usage remains an issue. This issue tracks two changes with API implications that could reduce memory requirements:
> - use float, not double, for preference values. It is terribly unlikely that a float (4 bytes) is not enough precision to accurately represent user preferences, which are typically like "3.0" or "4.5". Using float instead of an 8-byte double saves 4 bytes per preference value, which is significant when loading tens of millions of prefs into memory
> - Preference[] is an inefficient way to store prefs, since it entails a great deal of Preference object overhead (48 bytes per pref is needed, of which 36 is overhead (!)) Using an abstraction like PreferenceArray which can use parallel arrays internally can cut at least 12 of the 36 bytes of overhead out -- more if crazier data structures are used.
> So far these changes have reduced memory requirements  by about 20% in my particular test case, which is significant.
> I am tracking this as an issue since like MAHOUT-151 it will entail API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.