You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2009/08/03 12:03:14 UTC

[jira] Updated: (MAHOUT-154) Reduce memory usage with smarter data structures

     [ https://issues.apache.org/jira/browse/MAHOUT-154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-154:
-----------------------------

    Attachment: MAHOUT-151-154.patch

Current patch for MAHOUT-154 and MAHOUT-151. It's big, but, actually deletes over 3,000 lines while adding only about 2,750. This patch seems to reduce memory requirements by 40%, in my test case at least. YMMV. There is still more to do but I want to get this big patch in sooner rather than later so as to continue with more changes.

> Reduce memory usage with smarter data structures
> ------------------------------------------------
>
>                 Key: MAHOUT-154
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-154
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.2
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>             Fix For: 0.2
>
>         Attachments: MAHOUT-151-154.patch
>
>
> Memory usage remains an issue. This issue tracks two changes with API implications that could reduce memory requirements:
> - use float, not double, for preference values. It is terribly unlikely that a float (4 bytes) is not enough precision to accurately represent user preferences, which are typically like "3.0" or "4.5". Using float instead of an 8-byte double saves 4 bytes per preference value, which is significant when loading tens of millions of prefs into memory
> - Preference[] is an inefficient way to store prefs, since it entails a great deal of Preference object overhead (48 bytes per pref is needed, of which 36 is overhead (!)) Using an abstraction like PreferenceArray which can use parallel arrays internally can cut at least 12 of the 36 bytes of overhead out -- more if crazier data structures are used.
> So far these changes have reduced memory requirements  by about 20% in my particular test case, which is significant.
> I am tracking this as an issue since like MAHOUT-151 it will entail API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.