You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@crunch.apache.org by "Steven Ruppert (JIRA)" <ji...@apache.org> on 2016/04/18 21:30:25 UTC

[jira] [Created] (CRUNCH-603) Cache constituent Writables inside TupleWritable `readField` call

Steven Ruppert created CRUNCH-603:
-------------------------------------

             Summary: Cache constituent Writables inside TupleWritable `readField` call
                 Key: CRUNCH-603
                 URL: https://issues.apache.org/jira/browse/CRUNCH-603
             Project: Crunch
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.13.0
            Reporter: Steven Ruppert
            Assignee: Josh Wills
            Priority: Minor


Currently, `TupleWritable.readFields` will, for every field in the tuple, create a new Writable of that field type using reflection (`WritableFactories.newInstance`), through `TupleWritable.getWritable`, in order to deserialize that field. This burns up an unfortunate amount of CPU time.

I've got a patch for this that caches the writables to be reused (just as the TupleWritable itself is reused throughout hadoop). It appears to work, at least for our cases. I think it will break if you ever  have heterogenous tuple types, but that seems like a bad idea, if not already proscribed in the documentation somewhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)