You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Ioan Eugen Stan <st...@gmail.com> on 2011/08/01 16:23:26 UTC
HBase message implementation
Hello,
I'm having some issues putting things together when it comes to
Message implementation. I will explain things right away.
The current code [1], [2], and [3] is based on JPA implementation and
it copies the message content into byte arrays.
This does not seem to scale too well.
There is also the issue of getting the info back from HBase. I agree
with Eric and Norman that large content should be split and I'm
thinking of providing a HBaseMessage implementation that should read
data from HBase as demanded (ChunkedInputStream and
ChunkedOutputStream as suggested by Norman [4] ).
Do I have to copy the data when someone creates a new HBaseMessage, or
as suggested by the streaming alternative, I can save a reference of
SharedInputStream and when I save the message to the mailbox I can
move the bytes to HBase.
The way I see things is that HBaseMessage implementation is only used
for retrieving data from HBase. It should not store the message body
(in the constructor).
Please, I need some clarification.
Thanks,
[1] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/HBaseMessage.java
[2] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/AbstractHBaseMessage.java
[3] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/HBaseStreamingMessage.java
[4] https://github.com/rantav/hector/tree/master/core/src/main/java/me/prettyprint/cassandra/io
--
Ioan Eugen Stan
http://ieugen.blogspot.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org
Re: HBase message implementation
Posted by Eric Charles <er...@apache.org>.
On 01/08/11 18:52, Norman Maurer wrote:
> Hi Eugen,
>
> comments inside..
>
> 2011/8/1 Ioan Eugen Stan<st...@gmail.com>:
>> Hello,
>>
>> I'm having some issues putting things together when it comes to
>> Message implementation. I will explain things right away.
>> The current code [1], [2], and [3] is based on JPA implementation and
>> it copies the message content into byte arrays.
>> This does not seem to scale too well.
>>
>> There is also the issue of getting the info back from HBase. I agree
>> with Eric and Norman that large content should be split and I'm
>> thinking of providing a HBaseMessage implementation that should read
>> data from HBase as demanded (ChunkedInputStream and
>> ChunkedOutputStream as suggested by Norman [4] ).
>>
>> Do I have to copy the data when someone creates a new HBaseMessage, or
>> as suggested by the streaming alternative, I can save a reference of
>> SharedInputStream and when I save the message to the mailbox I can
>> move the bytes to HBase.
>>
>> The way I see things is that HBaseMessage implementation is only used
>> for retrieving data from HBase. It should not store the message body
>> (in the constructor).
>
> Exactly.. The only special case is the MessageMapper.copy(..) but
> maybe thiscan be handled in a more efficient way in hbase...
>
The byte loading into memory can be further deferred from the
HBaseMessage constructor to a HBaseMessageMapper method, but at the end,
the memory will be used by the bytes because there is no native
streaming support on hbase side.
To mitigate the effect, you can implement ChunkedInput/OuputStream that
will split the stream in more limited chunks, saving some peak memory usage.
The call to these Chunk classes that transforms the stream to the bytes
is probably best placed in the HBaseMessageMapper to avoid pollute the
HBaseMessage constructor with many byte arrays, also loosing the
positive effect of the chunks.
You will need to loop and instantiate one HBase Put per chunk in the
HBaseMessageMapper.
>>
>> Please, I need some clarification.
>>
>> Thanks,
>>
>> [1] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/HBaseMessage.java
>> [2] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/AbstractHBaseMessage.java
>> [3] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/HBaseStreamingMessage.java
>> [4] https://github.com/rantav/hector/tree/master/core/src/main/java/me/prettyprint/cassandra/io
>>
>> --
>> Ioan Eugen Stan
>> http://ieugen.blogspot.com/
>
>
> Bye,
> Norman
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
> For additional commands, e-mail: server-dev-help@james.apache.org
>
--
Eric Charles
http://about.echarles.net
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org
Re: HBase message implementation
Posted by Norman Maurer <no...@googlemail.com>.
Hi Eugen,
comments inside..
2011/8/1 Ioan Eugen Stan <st...@gmail.com>:
> Hello,
>
> I'm having some issues putting things together when it comes to
> Message implementation. I will explain things right away.
> The current code [1], [2], and [3] is based on JPA implementation and
> it copies the message content into byte arrays.
> This does not seem to scale too well.
>
> There is also the issue of getting the info back from HBase. I agree
> with Eric and Norman that large content should be split and I'm
> thinking of providing a HBaseMessage implementation that should read
> data from HBase as demanded (ChunkedInputStream and
> ChunkedOutputStream as suggested by Norman [4] ).
>
> Do I have to copy the data when someone creates a new HBaseMessage, or
> as suggested by the streaming alternative, I can save a reference of
> SharedInputStream and when I save the message to the mailbox I can
> move the bytes to HBase.
>
> The way I see things is that HBaseMessage implementation is only used
> for retrieving data from HBase. It should not store the message body
> (in the constructor).
Exactly.. The only special case is the MessageMapper.copy(..) but
maybe thiscan be handled in a more efficient way in hbase...
>
> Please, I need some clarification.
>
> Thanks,
>
> [1] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/HBaseMessage.java
> [2] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/AbstractHBaseMessage.java
> [3] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/HBaseStreamingMessage.java
> [4] https://github.com/rantav/hector/tree/master/core/src/main/java/me/prettyprint/cassandra/io
>
> --
> Ioan Eugen Stan
> http://ieugen.blogspot.com/
Bye,
Norman
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org