You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by rx...@apache.org on 2014/04/11 00:00:09 UTC
git commit: Update tuning.md

Repository: spark
Updated Branches:
  refs/heads/master 7b52b6631 -> f04666252


Update tuning.md

http://stackoverflow.com/questions/9699071/what-is-the-javas-internal-represention-for-string-modified-utf-8-utf-16

Author: Andrew Ash <an...@andrewash.com>

Closes #384 from ash211/patch-2 and squashes the following commits:

da1b0be [Andrew Ash] Update tuning.md


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f0466625
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f0466625
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f0466625

Branch: refs/heads/master
Commit: f0466625200842f3cc486e9aa1caa417586be533
Parents: 7b52b66
Author: Andrew Ash <an...@andrewash.com>
Authored: Thu Apr 10 14:59:58 2014 -0700
Committer: Reynold Xin <rx...@apache.org>
Committed: Thu Apr 10 14:59:58 2014 -0700

----------------------------------------------------------------------
 docs/tuning.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/f0466625/docs/tuning.md
----------------------------------------------------------------------
diff --git a/docs/tuning.md b/docs/tuning.md
index 093df31..cc069f0 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -90,9 +90,10 @@ than the "raw" data inside their fields. This is due to several reasons:
 * Each distinct Java object has an "object header", which is about 16 bytes and contains information
   such as a pointer to its class. For an object with very little data in it (say one `Int` field), this
   can be bigger than the data.
-* Java Strings have about 40 bytes of overhead over the raw string data (since they store it in an
+* Java `String`s have about 40 bytes of overhead over the raw string data (since they store it in an
   array of `Char`s and keep extra data such as the length), and store each character
-  as *two* bytes due to Unicode. Thus a 10-character string can easily consume 60 bytes.
+  as *two* bytes due to `String`'s internal usage of UTF-16 encoding. Thus a 10-character string can
+  easily consume 60 bytes.
 * Common collection classes, such as `HashMap` and `LinkedList`, use linked data structures, where
   there is a "wrapper" object for each entry (e.g. `Map.Entry`). This object not only has a header,
   but also pointers (typically 8 bytes each) to the next object in the list.