You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@harmony.apache.org by Mark Hindess <ma...@googlemail.com> on 2009/03/01 15:05:04 UTC

Re: Google Summer of Code 2009

In message <E1...@fhw-relay07.plus.net>, Mark Hindess writes:
>
> 
> In message <fc...@mail.gmail.com>,
> Sian January writes:
> >
> > Hi everyone,
> > 
> > Do we want to propose any projects for Google Summer of Code 2009?  It
> > was quite successful last year for Harmony, with two students
> > completing the programme, so definitely worth doing in my opinion.
> > 
> > http://code.google.com/soc/
> >
> > Thanks,
> > Sian
> 
> I've a couple of items on my todo list that might make an interesting
> GSoC project.  While looking at file descriptor usage between Harmony
> and RI I noticed that the RI typically reads jar files with an
> open/mmap/close sequence and then uses the mapped memory to access the
> file.  Harmony uses open and uses seek/read to access the file.  There
> are a couple of issues here:
> 
>   * some applications that use lots of jar files will not work on Harmony
>     because they will run out of file descriptors even though they will
>     work on the RI

I notice while looking a the strace from the latest "trival" test case
in the "Problems with NIO" thread that on the RI the client connect
socket is always fd=4 where as on DRLVM it is fd=110 so the difference
is quite significant.  This got me wondering what the difference would
be when running something like Eclipse with lots of plugin jars.  Just
loading a fairly trivial workspace on Sun and DRLVM results in using
586 and 674 file descriptors respectively.  So it looks like not all
jars are loaded using the mmap trick but DRLVM would still run out of
descriptors roughly 100 sooner than the RI.

-Mark

>   * code with memory access rather than seek/read will be a lots simpler
>     to read/maintain
> 
>   * what are the performance implications?
> 
> I'd quite like to investigate this but don't seem to be finding the time.
> 
> It might also be interesting to explore the possibility of exploiting
> parallelism (compare gzip/pigz).
> 
> It might also be worth seeing if there is any performance benefit to using
> the inflateBack api (compare gzip/gun - gun is in the zlib source examples 
> directory).
> 
> If people think these ideas are concrete enough to explore then I'll add
> an item to the wiki.
> 
> Regards,
>  Mark.
>

Re: Google Summer of Code 2009

Posted by Alexei Fedotov <al...@gmail.com>.

Mark,
I like the task about jars.

I have a hint for a student who wants to approach it. Harmony jar
reading code has numerous limitations and assumptions (e.g. Harmony
limits a size of a jar file). It is important to keep most of
limitations as is, resisting a desire to eliminate them all at once.
Otherwise instead of performance gain one may face that popular
applications slow down.

Thanks.


On Sun, Mar 1, 2009 at 5:05 PM, Mark Hindess
<ma...@googlemail.com> wrote:
>
> In message <E1...@fhw-relay07.plus.net>, Mark Hindess writes:
>>
>>
>> In message <fc...@mail.gmail.com>,
>> Sian January writes:
>> >
>> > Hi everyone,
>> >
>> > Do we want to propose any projects for Google Summer of Code 2009?  It
>> > was quite successful last year for Harmony, with two students
>> > completing the programme, so definitely worth doing in my opinion.
>> >
>> > http://code.google.com/soc/
>> >
>> > Thanks,
>> > Sian
>>
>> I've a couple of items on my todo list that might make an interesting
>> GSoC project.  While looking at file descriptor usage between Harmony
>> and RI I noticed that the RI typically reads jar files with an
>> open/mmap/close sequence and then uses the mapped memory to access the
>> file.  Harmony uses open and uses seek/read to access the file.  There
>> are a couple of issues here:
>>
>>   * some applications that use lots of jar files will not work on Harmony
>>     because they will run out of file descriptors even though they will
>>     work on the RI
>
> I notice while looking a the strace from the latest "trival" test case
> in the "Problems with NIO" thread that on the RI the client connect
> socket is always fd=4 where as on DRLVM it is fd=110 so the difference
> is quite significant.  This got me wondering what the difference would
> be when running something like Eclipse with lots of plugin jars.  Just
> loading a fairly trivial workspace on Sun and DRLVM results in using
> 586 and 674 file descriptors respectively.  So it looks like not all
> jars are loaded using the mmap trick but DRLVM would still run out of
> descriptors roughly 100 sooner than the RI.
>
> -Mark
>
>>   * code with memory access rather than seek/read will be a lots simpler
>>     to read/maintain
>>
>>   * what are the performance implications?
>>
>> I'd quite like to investigate this but don't seem to be finding the time.
>>
>> It might also be interesting to explore the possibility of exploiting
>> parallelism (compare gzip/pigz).
>>
>> It might also be worth seeing if there is any performance benefit to using
>> the inflateBack api (compare gzip/gun - gun is in the zlib source examples
>> directory).
>>
>> If people think these ideas are concrete enough to explore then I'll add
>> an item to the wiki.
>>
>> Regards,
>>  Mark.
>>
>
>
>



-- 
С уважением,
Алексей Федотов,
http://people.apache.org/~aaf/