You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by madhu phatak <ph...@gmail.com> on 2014/11/19 11:57:32 UTC

Help needed to publish SizeEstimator as separate library

Hi,
 As I was going through spark source code, SizeEstimator
<https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala>
caught my eye. It's a very useful tool to do the size estimations on JVM
which helps in use cases like memory bounded cache.

It will be useful to have this as separate library, which can be used in
the other projects too. There was a discussion
<https://spark-project.atlassian.net/browse/SPARK-383> long back, but i
don't see any updates on it.

I have extracted the code and packaged as separate project on github
<https://github.com/phatak-dev/java-sizeof>. I have simplified the code to
remove dependencies from google-guava and OpenHashSet which leads to a
small compromise in accuracy in big arrays. But at same time, it greatly
simplifies the code base and dependency graph. I want to publish it to
maven central so it can be added as dependency.

Though I have published code under my package "com.madhu" with keeping
license information, I am not sure is it the right way to do. So it will be
great if someone can guide me on package naming and attribution.

-- 
Regards,
Madhukara Phatak
http://www.madhukaraphatak.com