You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Ioan Eugen Stan <st...@gmail.com> on 2011/08/01 16:23:26 UTC

HBase message implementation

Hello,

I'm having some issues putting things together when it comes to
Message implementation. I will explain things right away.
The current code [1], [2], and [3] is based on JPA implementation and
it copies the message content into byte arrays.
This does not seem to scale too well.

There is also the issue of getting the info back from HBase. I agree
with Eric and Norman that large content should be split and I'm
thinking of providing a HBaseMessage implementation that should read
data from HBase as demanded (ChunkedInputStream and
ChunkedOutputStream as suggested by Norman [4] ).

Do I have to copy the data when someone creates a new HBaseMessage, or
as suggested by the streaming alternative, I can save a reference of
SharedInputStream and when I save the message to the mailbox I can
move the bytes to HBase.

The way I see things is that HBaseMessage implementation is only used
for retrieving data from HBase. It should not store the message body
(in the constructor).

Please, I need some clarification.

Thanks,

[1] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/HBaseMessage.java
[2] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/AbstractHBaseMessage.java
[3] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/HBaseStreamingMessage.java
[4] https://github.com/rantav/hector/tree/master/core/src/main/java/me/prettyprint/cassandra/io

-- 
Ioan Eugen Stan
http://ieugen.blogspot.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: HBase message implementation

Posted by Eric Charles <er...@apache.org>.
On 01/08/11 18:52, Norman Maurer wrote:
> Hi Eugen,
>
> comments inside..
>
> 2011/8/1 Ioan Eugen Stan<st...@gmail.com>:
>> Hello,
>>
>> I'm having some issues putting things together when it comes to
>> Message implementation. I will explain things right away.
>> The current code [1], [2], and [3] is based on JPA implementation and
>> it copies the message content into byte arrays.
>> This does not seem to scale too well.
>>
>> There is also the issue of getting the info back from HBase. I agree
>> with Eric and Norman that large content should be split and I'm
>> thinking of providing a HBaseMessage implementation that should read
>> data from HBase as demanded (ChunkedInputStream and
>> ChunkedOutputStream as suggested by Norman [4] ).
>>
>> Do I have to copy the data when someone creates a new HBaseMessage, or
>> as suggested by the streaming alternative, I can save a reference of
>> SharedInputStream and when I save the message to the mailbox I can
>> move the bytes to HBase.
>>
>> The way I see things is that HBaseMessage implementation is only used
>> for retrieving data from HBase. It should not store the message body
>> (in the constructor).
>
> Exactly.. The only special case is the MessageMapper.copy(..) but
> maybe thiscan be handled in a more efficient way in hbase...
>

The byte loading into memory can be further deferred from the 
HBaseMessage constructor to a HBaseMessageMapper method, but at the end, 
the memory will be used by the bytes because there is no native 
streaming support on hbase side.

To mitigate the effect, you can implement ChunkedInput/OuputStream that 
will split the stream in more limited chunks, saving some peak memory usage.

The call to these Chunk classes that transforms the stream to the bytes 
is probably best placed in the HBaseMessageMapper to avoid pollute the 
HBaseMessage constructor with many byte arrays, also loosing the 
positive effect of the chunks.

You will need to loop and instantiate one HBase Put per chunk in the 
HBaseMessageMapper.


>>
>> Please, I need some clarification.
>>
>> Thanks,
>>
>> [1] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/HBaseMessage.java
>> [2] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/AbstractHBaseMessage.java
>> [3] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/HBaseStreamingMessage.java
>> [4] https://github.com/rantav/hector/tree/master/core/src/main/java/me/prettyprint/cassandra/io
>>
>> --
>> Ioan Eugen Stan
>> http://ieugen.blogspot.com/
>
>
> Bye,
> Norman
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
> For additional commands, e-mail: server-dev-help@james.apache.org
>


-- 
Eric Charles
http://about.echarles.net

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: HBase message implementation

Posted by Norman Maurer <no...@googlemail.com>.
Hi Eugen,

comments inside..

2011/8/1 Ioan Eugen Stan <st...@gmail.com>:
> Hello,
>
> I'm having some issues putting things together when it comes to
> Message implementation. I will explain things right away.
> The current code [1], [2], and [3] is based on JPA implementation and
> it copies the message content into byte arrays.
> This does not seem to scale too well.
>
> There is also the issue of getting the info back from HBase. I agree
> with Eric and Norman that large content should be split and I'm
> thinking of providing a HBaseMessage implementation that should read
> data from HBase as demanded (ChunkedInputStream and
> ChunkedOutputStream as suggested by Norman [4] ).
>
> Do I have to copy the data when someone creates a new HBaseMessage, or
> as suggested by the streaming alternative, I can save a reference of
> SharedInputStream and when I save the message to the mailbox I can
> move the bytes to HBase.
>
> The way I see things is that HBaseMessage implementation is only used
> for retrieving data from HBase. It should not store the message body
> (in the constructor).

Exactly.. The only special case is the MessageMapper.copy(..) but
maybe thiscan be handled in a more efficient way in hbase...

>
> Please, I need some clarification.
>
> Thanks,
>
> [1] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/HBaseMessage.java
> [2] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/AbstractHBaseMessage.java
> [3] http://code.google.com/a/apache-extras.org/p/mailbox-hdfs/source/browse/src/main/java/org/apache/james/mailbox/hbase/mail/model/hbase/HBaseStreamingMessage.java
> [4] https://github.com/rantav/hector/tree/master/core/src/main/java/me/prettyprint/cassandra/io
>
> --
> Ioan Eugen Stan
> http://ieugen.blogspot.com/


Bye,
Norman

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org