You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2014/02/24 08:47:44 UTC

Cutting a 0.7 release

Hi all,

I plan on cutting a release next week. If you have some opinions, Pls feel free to comment here.

Sent from my iPhone

Re: Cutting a 0.7 release

Posted by "Edward J. Yoon" <ed...@apache.org>.
That's huge diagram :-) Do you plan on work on HAMA-505, or create new one?

On Tue, Feb 25, 2014 at 1:33 PM, Chia-Hung Lin <cl...@googlemail.com> wrote:
> Just let you know I may refactor based on the following diagram.
>
> http://people.apache.org/~chl501/diagram1.png
>
> That sketches the basic flow required for ft. I am currently evaluate
> related parts, so it's subjected to change.
>
>
>
>
>
>
> On 24 February 2014 20:52, Edward J. Yoon <ed...@apache.org> wrote:
>> 0.6.4 or 0.7.0, Both are OK to me.
>>
>> Just FYI,
>>
>> The memory efficiency has been significantly (almost x2-3) improved by
>> runtime message serialization and compression. See
>> https://wiki.apache.org/hama/Benchmarks#PageRank_Performance_0.7.0-SNAPSHOT_vs_0.6.3
>> (I'll attach more benchmarks and comparisons with other systems result
>> soon). And, we've fixed many bugs. e.g., K-Means, NeuralNetwork,
>> SemiClustering, Graph's Combiners HAMA-857.
>>
>> According to my personal evaluations, current system is fairly
>> respectable. As I mentioned before, I believe we should stick to
>> in-memory style since the today's machines can be equipped with up to
>> 128 GB. Disk (or disk hybrid) based queue is a optional, not a
>> must-have.
>>
>> Once we release this one, we finally might want to focus on below issues:
>>
>> * Fault tolerant job processing (checkpoint recovery)
>> * Support GPUs and InfiniBand
>>
>> Then, I think we can release version 1.0.
>>
>> On Mon, Feb 24, 2014 at 8:44 PM, Tommaso Teofili
>> <to...@gmail.com> wrote:
>>> Would you cut 0.7 or 0.6.4 ?
>>> I'd go with 0.6.4 as I think the next minor version change should be due to
>>> significant feature additions / changes and / or stability / scalability
>>> improvements.
>>>
>>> Regards,
>>> Tommaso
>>>
>>>
>>> 2014-02-24 8:47 GMT+01:00 Edward J. Yoon <ed...@apache.org>:
>>>
>>>> Hi all,
>>>>
>>>> I plan on cutting a release next week. If you have some opinions, Pls feel
>>>> free to comment here.
>>>>
>>>> Sent from my iPhone
>>
>>
>>
>> --
>> Edward J. Yoon (@eddieyoon)
>> Chief Executive Officer
>> DataSayer, Inc.



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer, Inc.

Re: Cutting a 0.7 release

Posted by Chia-Hung Lin <cl...@googlemail.com>.
Just let you know I may refactor based on the following diagram.

http://people.apache.org/~chl501/diagram1.png

That sketches the basic flow required for ft. I am currently evaluate
related parts, so it's subjected to change.






On 24 February 2014 20:52, Edward J. Yoon <ed...@apache.org> wrote:
> 0.6.4 or 0.7.0, Both are OK to me.
>
> Just FYI,
>
> The memory efficiency has been significantly (almost x2-3) improved by
> runtime message serialization and compression. See
> https://wiki.apache.org/hama/Benchmarks#PageRank_Performance_0.7.0-SNAPSHOT_vs_0.6.3
> (I'll attach more benchmarks and comparisons with other systems result
> soon). And, we've fixed many bugs. e.g., K-Means, NeuralNetwork,
> SemiClustering, Graph's Combiners HAMA-857.
>
> According to my personal evaluations, current system is fairly
> respectable. As I mentioned before, I believe we should stick to
> in-memory style since the today's machines can be equipped with up to
> 128 GB. Disk (or disk hybrid) based queue is a optional, not a
> must-have.
>
> Once we release this one, we finally might want to focus on below issues:
>
> * Fault tolerant job processing (checkpoint recovery)
> * Support GPUs and InfiniBand
>
> Then, I think we can release version 1.0.
>
> On Mon, Feb 24, 2014 at 8:44 PM, Tommaso Teofili
> <to...@gmail.com> wrote:
>> Would you cut 0.7 or 0.6.4 ?
>> I'd go with 0.6.4 as I think the next minor version change should be due to
>> significant feature additions / changes and / or stability / scalability
>> improvements.
>>
>> Regards,
>> Tommaso
>>
>>
>> 2014-02-24 8:47 GMT+01:00 Edward J. Yoon <ed...@apache.org>:
>>
>>> Hi all,
>>>
>>> I plan on cutting a release next week. If you have some opinions, Pls feel
>>> free to comment here.
>>>
>>> Sent from my iPhone
>
>
>
> --
> Edward J. Yoon (@eddieyoon)
> Chief Executive Officer
> DataSayer, Inc.

Re: Cutting a 0.7 release

Posted by Chia-Hung Lin <cl...@googlemail.com>.
Programmer can't control java memory like malloc/ free in c, type
boxing/ unboxing, etc., it seems not be easy to evaluate the memory.
So it would be good sticking to erlang fail fast style. Or we can have
a programme that load data and measure the actual memory usage.


On 24 February 2014 22:32, Tommaso Teofili <to...@gmail.com> wrote:
> 2014-02-24 13:52 GMT+01:00 Edward J. Yoon <ed...@apache.org>:
>
>> 0.6.4 or 0.7.0, Both are OK to me.
>>
>> Just FYI,
>>
>> The memory efficiency has been significantly (almost x2-3) improved by
>> runtime message serialization and compression. See
>>
>> https://wiki.apache.org/hama/Benchmarks#PageRank_Performance_0.7.0-SNAPSHOT_vs_0.6.3
>> (I'll attach more benchmarks and comparisons with other systems result
>> soon). And, we've fixed many bugs. e.g., K-Means, NeuralNetwork,
>> SemiClustering, Graph's Combiners HAMA-857.
>>
>
> sure, all the above things look good to me.
>
>
>>
>> According to my personal evaluations, current system is fairly
>> respectable. As I mentioned before, I believe we should stick to
>> in-memory style since the today's machines can be equipped with up to
>> 128 GB. Disk (or disk hybrid) based queue is a optional, not a
>> must-have.
>>
>
> right, the only thing that I think we need to address before 0.7.0 is
> related to the OutOfMemory errors (especially when dealing with large
> graphs); for example IMHO even if the memory is not enough to store all the
> graph vertices assigned to a certain peer, a scalable system should never
> throw OOM exceptions, instead it may eventually process items slower (with
> caches / queues) but never throw an exception for that but that's just my
> opinion.
>
>
>>
>> Once we release this one, we finally might want to focus on below issues:
>>
>> * Fault tolerant job processing (checkpoint recovery)
>>
>
> +1
>
>
>> * Support GPUs and InfiniBand
>>
>
> +1 for the former, not sure about the latter.
>
>
>>
>> Then, I think we can release version 1.0.
>>
>
> My 2 cents,
> Tommaso
>
>
>>
>> On Mon, Feb 24, 2014 at 8:44 PM, Tommaso Teofili
>> <to...@gmail.com> wrote:
>> > Would you cut 0.7 or 0.6.4 ?
>> > I'd go with 0.6.4 as I think the next minor version change should be due
>> to
>> > significant feature additions / changes and / or stability / scalability
>> > improvements.
>> >
>> > Regards,
>> > Tommaso
>> >
>> >
>> > 2014-02-24 8:47 GMT+01:00 Edward J. Yoon <ed...@apache.org>:
>> >
>> >> Hi all,
>> >>
>> >> I plan on cutting a release next week. If you have some opinions, Pls
>> feel
>> >> free to comment here.
>> >>
>> >> Sent from my iPhone
>>
>>
>>
>> --
>> Edward J. Yoon (@eddieyoon)
>> Chief Executive Officer
>> DataSayer, Inc.
>>

Re: Cutting a 0.7 release

Posted by "Edward J. Yoon" <ed...@apache.org>.
1) Map and Reduce model is a file-based communication. So, each
mappers can run separately. For example, To run MR job on 1 GB input
data, 5 mappers will be scheduled. Even though there are only 2 task
slots (single machine), MR job slow but works - 2 running Map Tasks, 3
pending Map tasks.

However, unlike MapReduce, BSP uses network-based communication. It
means that the every BSP tasks must run at once. And the number of BSP
tasks is determined by the number of blocks of input. So, you CANNOT
run 1 GB input data on a single machine. It's not a Memory issue.

> throw OOM exceptions, instead it may eventually process items slower (with
> caches / queues) but never throw an exception for that but that's just my

I hope so too, but I think you are saying about Iterative MapReduce.

2) The normal block size of HDFS is 64 ~ 256 MB. If we can assume that
the split size = block size, I feel that current system is enough.

I don't think we have to spend a time for implementing disk-based something.

WDYT?

On Tue, Feb 25, 2014 at 12:19 AM, Anastasis Andronidis
<an...@hotmail.com> wrote:
> On 24 Φεβ 2014, at 3:32 μ.μ., Tommaso Teofili <to...@gmail.com> wrote:
>
>>>
>>> According to my personal evaluations, current system is fairly
>>> respectable. As I mentioned before, I believe we should stick to
>>> in-memory style since the today's machines can be equipped with up to
>>> 128 GB. Disk (or disk hybrid) based queue is a optional, not a
>>> must-have.
>>>
>>
>> right, the only thing that I think we need to address before 0.7.0 is
>> related to the OutOfMemory errors (especially when dealing with large
>> graphs); for example IMHO even if the memory is not enough to store all the
>> graph vertices assigned to a certain peer, a scalable system should never
>> throw OOM exceptions, instead it may eventually process items slower (with
>> caches / queues) but never throw an exception for that but that's just my
>> opinion.
>>
>
> I like and agree with this.
>
> Cheers,
> Anastasis
>



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer, Inc.

Re: Cutting a 0.7 release

Posted by Anastasis Andronidis <an...@hotmail.com>.
On 24 Φεβ 2014, at 3:32 μ.μ., Tommaso Teofili <to...@gmail.com> wrote:

>> 
>> According to my personal evaluations, current system is fairly
>> respectable. As I mentioned before, I believe we should stick to
>> in-memory style since the today's machines can be equipped with up to
>> 128 GB. Disk (or disk hybrid) based queue is a optional, not a
>> must-have.
>> 
> 
> right, the only thing that I think we need to address before 0.7.0 is
> related to the OutOfMemory errors (especially when dealing with large
> graphs); for example IMHO even if the memory is not enough to store all the
> graph vertices assigned to a certain peer, a scalable system should never
> throw OOM exceptions, instead it may eventually process items slower (with
> caches / queues) but never throw an exception for that but that's just my
> opinion.
> 

I like and agree with this.

Cheers,
Anastasis


Re: Cutting a 0.7 release

Posted by Tommaso Teofili <to...@gmail.com>.
2014-02-24 13:52 GMT+01:00 Edward J. Yoon <ed...@apache.org>:

> 0.6.4 or 0.7.0, Both are OK to me.
>
> Just FYI,
>
> The memory efficiency has been significantly (almost x2-3) improved by
> runtime message serialization and compression. See
>
> https://wiki.apache.org/hama/Benchmarks#PageRank_Performance_0.7.0-SNAPSHOT_vs_0.6.3
> (I'll attach more benchmarks and comparisons with other systems result
> soon). And, we've fixed many bugs. e.g., K-Means, NeuralNetwork,
> SemiClustering, Graph's Combiners HAMA-857.
>

sure, all the above things look good to me.


>
> According to my personal evaluations, current system is fairly
> respectable. As I mentioned before, I believe we should stick to
> in-memory style since the today's machines can be equipped with up to
> 128 GB. Disk (or disk hybrid) based queue is a optional, not a
> must-have.
>

right, the only thing that I think we need to address before 0.7.0 is
related to the OutOfMemory errors (especially when dealing with large
graphs); for example IMHO even if the memory is not enough to store all the
graph vertices assigned to a certain peer, a scalable system should never
throw OOM exceptions, instead it may eventually process items slower (with
caches / queues) but never throw an exception for that but that's just my
opinion.


>
> Once we release this one, we finally might want to focus on below issues:
>
> * Fault tolerant job processing (checkpoint recovery)
>

+1


> * Support GPUs and InfiniBand
>

+1 for the former, not sure about the latter.


>
> Then, I think we can release version 1.0.
>

My 2 cents,
Tommaso


>
> On Mon, Feb 24, 2014 at 8:44 PM, Tommaso Teofili
> <to...@gmail.com> wrote:
> > Would you cut 0.7 or 0.6.4 ?
> > I'd go with 0.6.4 as I think the next minor version change should be due
> to
> > significant feature additions / changes and / or stability / scalability
> > improvements.
> >
> > Regards,
> > Tommaso
> >
> >
> > 2014-02-24 8:47 GMT+01:00 Edward J. Yoon <ed...@apache.org>:
> >
> >> Hi all,
> >>
> >> I plan on cutting a release next week. If you have some opinions, Pls
> feel
> >> free to comment here.
> >>
> >> Sent from my iPhone
>
>
>
> --
> Edward J. Yoon (@eddieyoon)
> Chief Executive Officer
> DataSayer, Inc.
>

Re: Cutting a 0.7 release

Posted by "Edward J. Yoon" <ed...@apache.org>.
0.6.4 or 0.7.0, Both are OK to me.

Just FYI,

The memory efficiency has been significantly (almost x2-3) improved by
runtime message serialization and compression. See
https://wiki.apache.org/hama/Benchmarks#PageRank_Performance_0.7.0-SNAPSHOT_vs_0.6.3
(I'll attach more benchmarks and comparisons with other systems result
soon). And, we've fixed many bugs. e.g., K-Means, NeuralNetwork,
SemiClustering, Graph's Combiners HAMA-857.

According to my personal evaluations, current system is fairly
respectable. As I mentioned before, I believe we should stick to
in-memory style since the today's machines can be equipped with up to
128 GB. Disk (or disk hybrid) based queue is a optional, not a
must-have.

Once we release this one, we finally might want to focus on below issues:

* Fault tolerant job processing (checkpoint recovery)
* Support GPUs and InfiniBand

Then, I think we can release version 1.0.

On Mon, Feb 24, 2014 at 8:44 PM, Tommaso Teofili
<to...@gmail.com> wrote:
> Would you cut 0.7 or 0.6.4 ?
> I'd go with 0.6.4 as I think the next minor version change should be due to
> significant feature additions / changes and / or stability / scalability
> improvements.
>
> Regards,
> Tommaso
>
>
> 2014-02-24 8:47 GMT+01:00 Edward J. Yoon <ed...@apache.org>:
>
>> Hi all,
>>
>> I plan on cutting a release next week. If you have some opinions, Pls feel
>> free to comment here.
>>
>> Sent from my iPhone



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer, Inc.

Re: Cutting a 0.7 release

Posted by Tommaso Teofili <to...@gmail.com>.
Would you cut 0.7 or 0.6.4 ?
I'd go with 0.6.4 as I think the next minor version change should be due to
significant feature additions / changes and / or stability / scalability
improvements.

Regards,
Tommaso


2014-02-24 8:47 GMT+01:00 Edward J. Yoon <ed...@apache.org>:

> Hi all,
>
> I plan on cutting a release next week. If you have some opinions, Pls feel
> free to comment here.
>
> Sent from my iPhone