You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by 大平怜 <re...@gmail.com> on 2017/11/01 21:25:46 UTC

Re: Making CommitLog pluggable

Hi Ariel,

CommitLogSegment assumes commit log files stored on a regular file system.
Our CAPI Flash system bypasses OS and directly accesses flash,
so we cannot use the current framework of CommitLogSegment as it is.
Intel's SPDK also bypasses a file system, so we think this kind of
requirement
is not uncommon.

It would not be easy to reuse AbstractCommitLogSegmentManager, either,
because the archiving and synchronization logics have to be decoupled.
It would require major rework, and we don't think we should affect
the existing implementation so much.

We do not change any existing format of CommitLog.  Our plugin will use
its own format, as it must manage commit logs on the 4KB-block-oriented
address spaces of flash devices.


Regards,
Rei Odaira


2017-10-31 15:38 GMT-05:00 Ariel Weisberg <ar...@weisberg.ws>:

> Hi,
>
> There are pluggable elements to the commit log such as those used to
> support mmap or compressed.
>
> Can you describe at a high level what a new implementation would look
> like and why it can't be a mode of the existing implementation?
>
> You are not proposing changing the format correct?
>
> Regards,
> Ariel
>
> On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
> > Hello,
> >
> > We are developing a Cassandra plugin to store CommitLog on our
> > low-latency
> > Flash device (CAPI-Flash).  To do that, the original CommitLog interface
> > must be changed to allow plugins.  Anyone has any thoughts about it?  We
> > have our codebase ready, but we think we should start with high-level
> > discussion.
> >
> > The runtime overhead will be minimal.  The only overhead will be changing
> > method invocations to CommitLog#add(), CommitLog#getCurrentPosition(),
> > etc.
> > into interface invocations.
> >
> > Synching to CommitLog is one of the performance bottlenecks in Cassandra
> > especially with batch commit.  I think the pluggable CommitLog will allow
> > other interesting alternatives, such as one using SPDK.  Appreciate any
> > comments.
> >
> >
> > Regards,
> > Rei Odaira
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Making CommitLog pluggable

Posted by 大平怜 <re...@gmail.com>.
Hi,

I submitted patches for the pluggable CommitLog.  Appreciate any comments.
Thanks!
https://issues.apache.org/jira/browse/CASSANDRA-14062


Rei Odaira

2017-11-06 17:14 GMT-06:00 大平怜 <re...@gmail.com>:

> Thanks for the feedback, Ariel,
>
> Based on your comments, we are revisiting our code changes,
> and then we will submit a patch for review.
> I hope this effort will help further modularize Cassandra
> for better maintainability.
>
>
> Thanks,
> Rei Odaira
>
>
> 2017-11-06 8:20 GMT-06:00 Ariel Weisberg <ar...@weisberg.ws>:
>
>> Hi,
>>
>> OK sorry I am very late to the discussion. I think the existing
>> consensus around doing it is fine I just think you will find that making
>> the commit log pluggable might be a little trickier than making a cache
>> which is a glorified K/V store pluggable.
>>
>>  The commit log reaches into a bunch of other internal API during replay
>>  and even a few at runtime. I think it's a refactor away from
>>  abstracting out those concerns from the concerns of making log records
>>  durable, providing notifications of durability, and then releasing them
>>  though.
>>
>> If I'm the only person unhappy about breaking the plugins in bug fix
>> releases and the resulting problems that creates for anyone who wants to
>> operate this hardware in production then I am willing to look past it.
>> We could also address it via documentation the plugin link page as well
>> as at landing sites for individual plugins.
>>
>> Otherwise I think this is a bigger commitment from us to a larger API
>> then originally scoped and one that restricts what changes can be made
>> in bug fix releases to the existing C* commit log. Unless we have
>> current version and version next of the plugin API so that we can move
>> ahead in bug fix releases without breaking existing plugins.
>>
>> Thanks,
>> Ariel
>>
>> On Thu, Nov 2, 2017, at 04:46 PM, 大平怜 wrote:
>> > Hell Ariel,
>> >
>> > About the pluggability, we have discussed this topic in the dev list
>> last
>> > May:
>> > https://www.mail-archive.com/dev@cassandra.apache.org/msg11102.html
>> >
>> > I don't think the whole community has reached a consensus, but
>> > the result of the discussion at that time was that
>> > 1) We were going to release our extensions as plugins.
>> > 2) The project would support the plugin ecosystem by creating a plugin
>> > link
>> >    page on the Web site.
>> >
>> > Appreciate it if you could shed new light on the discussion.
>> >
>> >
>> > Thanks,
>> > Rei Odaira
>> >
>> >
>> > 2017-11-01 17:53 GMT-05:00 Ariel Weisberg <ar...@weisberg.ws>:
>> >
>> > > Hi,
>> > >
>> > > Just so I don't seem too negative, what I would really like to see is
>> an
>> > > in tree implementation. The real challenge there is that the hardware
>> is
>> > > not widely available. If it were something you could get in GCE or AWS
>> > > or at least get via an emulator that would be a different story.
>> > >
>> > > Ariel
>> > >
>> > > On Wed, Nov 1, 2017, at 06:46 PM, Ariel Weisberg wrote:
>> > > > Hi,
>> > > >
>> > > > OK. It makes sense that most of the existing plumbing is not
>> applicable
>> > > > since it operates on a filesystem.
>> > > >
>> > > > How does replay work? Presumably you will need to refactor
>> > > > CommitLogReplayer as well?
>> > > >
>> > > > I think the best way for us to decide whether it's something we
>> want in
>> > > > tree is to see a patch. You would need to do this even if it doesn't
>> > > > make it in tree and you end up having to deploy a patched build.
>> > > >
>> > > > Pluggability is a little bit of a touchy subject because we don't
>> want
>> > > > to directly or indirectly become responsible for interfaces to out
>> of
>> > > > tree implementations. I don't know if there is consensus around
>> this,
>> > > > but I think even if we made the commit log pluggable it would be
>> with
>> > > > the understanding that we may change the API even in bug fix
>> releases.
>> > > >
>> > > > Down the line where this becomes tricky is unmaintained out of tree
>> > > > implementations that people depend on being broken due to interface
>> > > > changes and then no one being around to fix them. People who depend
>> on
>> > > > the out of tree implementation have no one to complain to but us.
>> This
>> > > > becomes even more likely when the maintainers aren't using the
>> latest
>> > > > version of C* and are busy with other things.
>> > > >
>> > > > You are characterizing the API as being just a few methods on
>> CommitLog
>> > > > but that isn't true.
>> > > >
>> > > > These are the imports for CommitLogReplayer
>> > > >
>> > > > import org.apache.cassandra.concurrent.Stage;
>> > > > import org.apache.cassandra.concurrent.StageManager;
>> > > > import org.apache.cassandra.config.CFMetaData;
>> > > > import org.apache.cassandra.config.Schema;
>> > > > import org.apache.cassandra.db.*;
>> > > > import org.apache.cassandra.io.util.FastByteArrayInputStream;
>> > > > import org.apache.cassandra.io.util.FileUtils;
>> > > > import org.apache.cassandra.io.util.RandomAccessReader;
>> > > > import org.apache.cassandra.utils.*;
>> > > >
>> > > > And these are the imports for CommitLog
>> > > >
>> > > > import org.apache.cassandra.config.Config;
>> > > > import org.apache.cassandra.config.DatabaseDescriptor;
>> > > > import org.apache.cassandra.db.*;
>> > > > import org.apache.cassandra.io.FSWriteError;
>> > > > import org.apache.cassandra.io.sstable.SSTableDeletingTask;
>> > > > import org.apache.cassandra.io.util.DataOutputByteBuffer;
>> > > > import org.apache.cassandra.metrics.CommitLogMetrics;
>> > > > import org.apache.cassandra.net.MessagingService;
>> > > > import org.apache.cassandra.service.StorageService;
>> > > > import org.apache.cassandra.utils.JVMStabilityInspector;
>> > > >
>> > > > If we change any code that changes a line in CommitLog or
>> > > > CommitLogReplayer in a bug fix release it's probably going to break
>> your
>> > > > plugin JAR. Anyone running it in production will now have to fix it
>> and
>> > > > recompile or be unable to get bug fixes.
>> > > >
>> > > > Regards,
>> > > > Ariel
>> > > > On Wed, Nov 1, 2017, at 05:25 PM, 大平怜 wrote:
>> > > > > Hi Ariel,
>> > > > >
>> > > > > CommitLogSegment assumes commit log files stored on a regular file
>> > > > > system.
>> > > > > Our CAPI Flash system bypasses OS and directly accesses flash,
>> > > > > so we cannot use the current framework of CommitLogSegment as it
>> is.
>> > > > > Intel's SPDK also bypasses a file system, so we think this kind of
>> > > > > requirement
>> > > > > is not uncommon.
>> > > > >
>> > > > > It would not be easy to reuse AbstractCommitLogSegmentManager,
>> either,
>> > > > > because the archiving and synchronization logics have to be
>> decoupled.
>> > > > > It would require major rework, and we don't think we should affect
>> > > > > the existing implementation so much.
>> > > > >
>> > > > > We do not change any existing format of CommitLog.  Our plugin
>> will use
>> > > > > its own format, as it must manage commit logs on the
>> 4KB-block-oriented
>> > > > > address spaces of flash devices.
>> > > > >
>> > > > >
>> > > > > Regards,
>> > > > > Rei Odaira
>> > > > >
>> > > > >
>> > > > > 2017-10-31 15:38 GMT-05:00 Ariel Weisberg <ar...@weisberg.ws>:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > There are pluggable elements to the commit log such as those
>> used to
>> > > > > > support mmap or compressed.
>> > > > > >
>> > > > > > Can you describe at a high level what a new implementation
>> would look
>> > > > > > like and why it can't be a mode of the existing implementation?
>> > > > > >
>> > > > > > You are not proposing changing the format correct?
>> > > > > >
>> > > > > > Regards,
>> > > > > > Ariel
>> > > > > >
>> > > > > > On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
>> > > > > > > Hello,
>> > > > > > >
>> > > > > > > We are developing a Cassandra plugin to store CommitLog on our
>> > > > > > > low-latency
>> > > > > > > Flash device (CAPI-Flash).  To do that, the original CommitLog
>> > > interface
>> > > > > > > must be changed to allow plugins.  Anyone has any thoughts
>> about
>> > > it?  We
>> > > > > > > have our codebase ready, but we think we should start with
>> > > high-level
>> > > > > > > discussion.
>> > > > > > >
>> > > > > > > The runtime overhead will be minimal.  The only overhead will
>> be
>> > > changing
>> > > > > > > method invocations to CommitLog#add(),
>> > > CommitLog#getCurrentPosition(),
>> > > > > > > etc.
>> > > > > > > into interface invocations.
>> > > > > > >
>> > > > > > > Synching to CommitLog is one of the performance bottlenecks in
>> > > Cassandra
>> > > > > > > especially with batch commit.  I think the pluggable CommitLog
>> > > will allow
>> > > > > > > other interesting alternatives, such as one using SPDK.
>> > > Appreciate any
>> > > > > > > comments.
>> > > > > > >
>> > > > > > >
>> > > > > > > Regards,
>> > > > > > > Rei Odaira
>> > > > > >
>> > > > > > ------------------------------------------------------------
>> > > ---------
>> > > > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> > > > > > For additional commands, e-mail: dev-help@cassandra.apache.org
>> > > > > >
>> > > > > >
>> > > >
>> > > > ------------------------------------------------------------
>> ---------
>> > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> > > > For additional commands, e-mail: dev-help@cassandra.apache.org
>> > > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> > > For additional commands, e-mail: dev-help@cassandra.apache.org
>> > >
>> > >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>>
>

Re: Making CommitLog pluggable

Posted by 大平怜 <re...@gmail.com>.
Thanks for the feedback, Ariel,

Based on your comments, we are revisiting our code changes,
and then we will submit a patch for review.
I hope this effort will help further modularize Cassandra
for better maintainability.


Thanks,
Rei Odaira


2017-11-06 8:20 GMT-06:00 Ariel Weisberg <ar...@weisberg.ws>:

> Hi,
>
> OK sorry I am very late to the discussion. I think the existing
> consensus around doing it is fine I just think you will find that making
> the commit log pluggable might be a little trickier than making a cache
> which is a glorified K/V store pluggable.
>
>  The commit log reaches into a bunch of other internal API during replay
>  and even a few at runtime. I think it's a refactor away from
>  abstracting out those concerns from the concerns of making log records
>  durable, providing notifications of durability, and then releasing them
>  though.
>
> If I'm the only person unhappy about breaking the plugins in bug fix
> releases and the resulting problems that creates for anyone who wants to
> operate this hardware in production then I am willing to look past it.
> We could also address it via documentation the plugin link page as well
> as at landing sites for individual plugins.
>
> Otherwise I think this is a bigger commitment from us to a larger API
> then originally scoped and one that restricts what changes can be made
> in bug fix releases to the existing C* commit log. Unless we have
> current version and version next of the plugin API so that we can move
> ahead in bug fix releases without breaking existing plugins.
>
> Thanks,
> Ariel
>
> On Thu, Nov 2, 2017, at 04:46 PM, 大平怜 wrote:
> > Hell Ariel,
> >
> > About the pluggability, we have discussed this topic in the dev list last
> > May:
> > https://www.mail-archive.com/dev@cassandra.apache.org/msg11102.html
> >
> > I don't think the whole community has reached a consensus, but
> > the result of the discussion at that time was that
> > 1) We were going to release our extensions as plugins.
> > 2) The project would support the plugin ecosystem by creating a plugin
> > link
> >    page on the Web site.
> >
> > Appreciate it if you could shed new light on the discussion.
> >
> >
> > Thanks,
> > Rei Odaira
> >
> >
> > 2017-11-01 17:53 GMT-05:00 Ariel Weisberg <ar...@weisberg.ws>:
> >
> > > Hi,
> > >
> > > Just so I don't seem too negative, what I would really like to see is
> an
> > > in tree implementation. The real challenge there is that the hardware
> is
> > > not widely available. If it were something you could get in GCE or AWS
> > > or at least get via an emulator that would be a different story.
> > >
> > > Ariel
> > >
> > > On Wed, Nov 1, 2017, at 06:46 PM, Ariel Weisberg wrote:
> > > > Hi,
> > > >
> > > > OK. It makes sense that most of the existing plumbing is not
> applicable
> > > > since it operates on a filesystem.
> > > >
> > > > How does replay work? Presumably you will need to refactor
> > > > CommitLogReplayer as well?
> > > >
> > > > I think the best way for us to decide whether it's something we want
> in
> > > > tree is to see a patch. You would need to do this even if it doesn't
> > > > make it in tree and you end up having to deploy a patched build.
> > > >
> > > > Pluggability is a little bit of a touchy subject because we don't
> want
> > > > to directly or indirectly become responsible for interfaces to out of
> > > > tree implementations. I don't know if there is consensus around this,
> > > > but I think even if we made the commit log pluggable it would be with
> > > > the understanding that we may change the API even in bug fix
> releases.
> > > >
> > > > Down the line where this becomes tricky is unmaintained out of tree
> > > > implementations that people depend on being broken due to interface
> > > > changes and then no one being around to fix them. People who depend
> on
> > > > the out of tree implementation have no one to complain to but us.
> This
> > > > becomes even more likely when the maintainers aren't using the latest
> > > > version of C* and are busy with other things.
> > > >
> > > > You are characterizing the API as being just a few methods on
> CommitLog
> > > > but that isn't true.
> > > >
> > > > These are the imports for CommitLogReplayer
> > > >
> > > > import org.apache.cassandra.concurrent.Stage;
> > > > import org.apache.cassandra.concurrent.StageManager;
> > > > import org.apache.cassandra.config.CFMetaData;
> > > > import org.apache.cassandra.config.Schema;
> > > > import org.apache.cassandra.db.*;
> > > > import org.apache.cassandra.io.util.FastByteArrayInputStream;
> > > > import org.apache.cassandra.io.util.FileUtils;
> > > > import org.apache.cassandra.io.util.RandomAccessReader;
> > > > import org.apache.cassandra.utils.*;
> > > >
> > > > And these are the imports for CommitLog
> > > >
> > > > import org.apache.cassandra.config.Config;
> > > > import org.apache.cassandra.config.DatabaseDescriptor;
> > > > import org.apache.cassandra.db.*;
> > > > import org.apache.cassandra.io.FSWriteError;
> > > > import org.apache.cassandra.io.sstable.SSTableDeletingTask;
> > > > import org.apache.cassandra.io.util.DataOutputByteBuffer;
> > > > import org.apache.cassandra.metrics.CommitLogMetrics;
> > > > import org.apache.cassandra.net.MessagingService;
> > > > import org.apache.cassandra.service.StorageService;
> > > > import org.apache.cassandra.utils.JVMStabilityInspector;
> > > >
> > > > If we change any code that changes a line in CommitLog or
> > > > CommitLogReplayer in a bug fix release it's probably going to break
> your
> > > > plugin JAR. Anyone running it in production will now have to fix it
> and
> > > > recompile or be unable to get bug fixes.
> > > >
> > > > Regards,
> > > > Ariel
> > > > On Wed, Nov 1, 2017, at 05:25 PM, 大平怜 wrote:
> > > > > Hi Ariel,
> > > > >
> > > > > CommitLogSegment assumes commit log files stored on a regular file
> > > > > system.
> > > > > Our CAPI Flash system bypasses OS and directly accesses flash,
> > > > > so we cannot use the current framework of CommitLogSegment as it
> is.
> > > > > Intel's SPDK also bypasses a file system, so we think this kind of
> > > > > requirement
> > > > > is not uncommon.
> > > > >
> > > > > It would not be easy to reuse AbstractCommitLogSegmentManager,
> either,
> > > > > because the archiving and synchronization logics have to be
> decoupled.
> > > > > It would require major rework, and we don't think we should affect
> > > > > the existing implementation so much.
> > > > >
> > > > > We do not change any existing format of CommitLog.  Our plugin
> will use
> > > > > its own format, as it must manage commit logs on the
> 4KB-block-oriented
> > > > > address spaces of flash devices.
> > > > >
> > > > >
> > > > > Regards,
> > > > > Rei Odaira
> > > > >
> > > > >
> > > > > 2017-10-31 15:38 GMT-05:00 Ariel Weisberg <ar...@weisberg.ws>:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > There are pluggable elements to the commit log such as those
> used to
> > > > > > support mmap or compressed.
> > > > > >
> > > > > > Can you describe at a high level what a new implementation would
> look
> > > > > > like and why it can't be a mode of the existing implementation?
> > > > > >
> > > > > > You are not proposing changing the format correct?
> > > > > >
> > > > > > Regards,
> > > > > > Ariel
> > > > > >
> > > > > > On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > We are developing a Cassandra plugin to store CommitLog on our
> > > > > > > low-latency
> > > > > > > Flash device (CAPI-Flash).  To do that, the original CommitLog
> > > interface
> > > > > > > must be changed to allow plugins.  Anyone has any thoughts
> about
> > > it?  We
> > > > > > > have our codebase ready, but we think we should start with
> > > high-level
> > > > > > > discussion.
> > > > > > >
> > > > > > > The runtime overhead will be minimal.  The only overhead will
> be
> > > changing
> > > > > > > method invocations to CommitLog#add(),
> > > CommitLog#getCurrentPosition(),
> > > > > > > etc.
> > > > > > > into interface invocations.
> > > > > > >
> > > > > > > Synching to CommitLog is one of the performance bottlenecks in
> > > Cassandra
> > > > > > > especially with batch commit.  I think the pluggable CommitLog
> > > will allow
> > > > > > > other interesting alternatives, such as one using SPDK.
> > > Appreciate any
> > > > > > > comments.
> > > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > > Rei Odaira
> > > > > >
> > > > > > ------------------------------------------------------------
> > > ---------
> > > > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > > > >
> > > > > >
> > > >
> > > > ------------------------------------------------------------
> ---------
> > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Making CommitLog pluggable

Posted by Ariel Weisberg <ar...@weisberg.ws>.
Hi,

OK sorry I am very late to the discussion. I think the existing
consensus around doing it is fine I just think you will find that making
the commit log pluggable might be a little trickier than making a cache
which is a glorified K/V store pluggable.

 The commit log reaches into a bunch of other internal API during replay
 and even a few at runtime. I think it's a refactor away from
 abstracting out those concerns from the concerns of making log records
 durable, providing notifications of durability, and then releasing them
 though.

If I'm the only person unhappy about breaking the plugins in bug fix
releases and the resulting problems that creates for anyone who wants to
operate this hardware in production then I am willing to look past it.
We could also address it via documentation the plugin link page as well
as at landing sites for individual plugins.

Otherwise I think this is a bigger commitment from us to a larger API
then originally scoped and one that restricts what changes can be made
in bug fix releases to the existing C* commit log. Unless we have
current version and version next of the plugin API so that we can move
ahead in bug fix releases without breaking existing plugins.

Thanks,
Ariel

On Thu, Nov 2, 2017, at 04:46 PM, 大平怜 wrote:
> Hell Ariel,
> 
> About the pluggability, we have discussed this topic in the dev list last
> May:
> https://www.mail-archive.com/dev@cassandra.apache.org/msg11102.html
> 
> I don't think the whole community has reached a consensus, but
> the result of the discussion at that time was that
> 1) We were going to release our extensions as plugins.
> 2) The project would support the plugin ecosystem by creating a plugin
> link
>    page on the Web site.
> 
> Appreciate it if you could shed new light on the discussion.
> 
> 
> Thanks,
> Rei Odaira
> 
> 
> 2017-11-01 17:53 GMT-05:00 Ariel Weisberg <ar...@weisberg.ws>:
> 
> > Hi,
> >
> > Just so I don't seem too negative, what I would really like to see is an
> > in tree implementation. The real challenge there is that the hardware is
> > not widely available. If it were something you could get in GCE or AWS
> > or at least get via an emulator that would be a different story.
> >
> > Ariel
> >
> > On Wed, Nov 1, 2017, at 06:46 PM, Ariel Weisberg wrote:
> > > Hi,
> > >
> > > OK. It makes sense that most of the existing plumbing is not applicable
> > > since it operates on a filesystem.
> > >
> > > How does replay work? Presumably you will need to refactor
> > > CommitLogReplayer as well?
> > >
> > > I think the best way for us to decide whether it's something we want in
> > > tree is to see a patch. You would need to do this even if it doesn't
> > > make it in tree and you end up having to deploy a patched build.
> > >
> > > Pluggability is a little bit of a touchy subject because we don't want
> > > to directly or indirectly become responsible for interfaces to out of
> > > tree implementations. I don't know if there is consensus around this,
> > > but I think even if we made the commit log pluggable it would be with
> > > the understanding that we may change the API even in bug fix releases.
> > >
> > > Down the line where this becomes tricky is unmaintained out of tree
> > > implementations that people depend on being broken due to interface
> > > changes and then no one being around to fix them. People who depend on
> > > the out of tree implementation have no one to complain to but us. This
> > > becomes even more likely when the maintainers aren't using the latest
> > > version of C* and are busy with other things.
> > >
> > > You are characterizing the API as being just a few methods on CommitLog
> > > but that isn't true.
> > >
> > > These are the imports for CommitLogReplayer
> > >
> > > import org.apache.cassandra.concurrent.Stage;
> > > import org.apache.cassandra.concurrent.StageManager;
> > > import org.apache.cassandra.config.CFMetaData;
> > > import org.apache.cassandra.config.Schema;
> > > import org.apache.cassandra.db.*;
> > > import org.apache.cassandra.io.util.FastByteArrayInputStream;
> > > import org.apache.cassandra.io.util.FileUtils;
> > > import org.apache.cassandra.io.util.RandomAccessReader;
> > > import org.apache.cassandra.utils.*;
> > >
> > > And these are the imports for CommitLog
> > >
> > > import org.apache.cassandra.config.Config;
> > > import org.apache.cassandra.config.DatabaseDescriptor;
> > > import org.apache.cassandra.db.*;
> > > import org.apache.cassandra.io.FSWriteError;
> > > import org.apache.cassandra.io.sstable.SSTableDeletingTask;
> > > import org.apache.cassandra.io.util.DataOutputByteBuffer;
> > > import org.apache.cassandra.metrics.CommitLogMetrics;
> > > import org.apache.cassandra.net.MessagingService;
> > > import org.apache.cassandra.service.StorageService;
> > > import org.apache.cassandra.utils.JVMStabilityInspector;
> > >
> > > If we change any code that changes a line in CommitLog or
> > > CommitLogReplayer in a bug fix release it's probably going to break your
> > > plugin JAR. Anyone running it in production will now have to fix it and
> > > recompile or be unable to get bug fixes.
> > >
> > > Regards,
> > > Ariel
> > > On Wed, Nov 1, 2017, at 05:25 PM, 大平怜 wrote:
> > > > Hi Ariel,
> > > >
> > > > CommitLogSegment assumes commit log files stored on a regular file
> > > > system.
> > > > Our CAPI Flash system bypasses OS and directly accesses flash,
> > > > so we cannot use the current framework of CommitLogSegment as it is.
> > > > Intel's SPDK also bypasses a file system, so we think this kind of
> > > > requirement
> > > > is not uncommon.
> > > >
> > > > It would not be easy to reuse AbstractCommitLogSegmentManager, either,
> > > > because the archiving and synchronization logics have to be decoupled.
> > > > It would require major rework, and we don't think we should affect
> > > > the existing implementation so much.
> > > >
> > > > We do not change any existing format of CommitLog.  Our plugin will use
> > > > its own format, as it must manage commit logs on the 4KB-block-oriented
> > > > address spaces of flash devices.
> > > >
> > > >
> > > > Regards,
> > > > Rei Odaira
> > > >
> > > >
> > > > 2017-10-31 15:38 GMT-05:00 Ariel Weisberg <ar...@weisberg.ws>:
> > > >
> > > > > Hi,
> > > > >
> > > > > There are pluggable elements to the commit log such as those used to
> > > > > support mmap or compressed.
> > > > >
> > > > > Can you describe at a high level what a new implementation would look
> > > > > like and why it can't be a mode of the existing implementation?
> > > > >
> > > > > You are not proposing changing the format correct?
> > > > >
> > > > > Regards,
> > > > > Ariel
> > > > >
> > > > > On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
> > > > > > Hello,
> > > > > >
> > > > > > We are developing a Cassandra plugin to store CommitLog on our
> > > > > > low-latency
> > > > > > Flash device (CAPI-Flash).  To do that, the original CommitLog
> > interface
> > > > > > must be changed to allow plugins.  Anyone has any thoughts about
> > it?  We
> > > > > > have our codebase ready, but we think we should start with
> > high-level
> > > > > > discussion.
> > > > > >
> > > > > > The runtime overhead will be minimal.  The only overhead will be
> > changing
> > > > > > method invocations to CommitLog#add(),
> > CommitLog#getCurrentPosition(),
> > > > > > etc.
> > > > > > into interface invocations.
> > > > > >
> > > > > > Synching to CommitLog is one of the performance bottlenecks in
> > Cassandra
> > > > > > especially with batch commit.  I think the pluggable CommitLog
> > will allow
> > > > > > other interesting alternatives, such as one using SPDK.
> > Appreciate any
> > > > > > comments.
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Rei Odaira
> > > > >
> > > > > ------------------------------------------------------------
> > ---------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > > >
> > > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: Making CommitLog pluggable

Posted by 大平怜 <re...@gmail.com>.
Hell Ariel,

About the pluggability, we have discussed this topic in the dev list last
May:
https://www.mail-archive.com/dev@cassandra.apache.org/msg11102.html

I don't think the whole community has reached a consensus, but
the result of the discussion at that time was that
1) We were going to release our extensions as plugins.
2) The project would support the plugin ecosystem by creating a plugin link
   page on the Web site.

Appreciate it if you could shed new light on the discussion.


Thanks,
Rei Odaira


2017-11-01 17:53 GMT-05:00 Ariel Weisberg <ar...@weisberg.ws>:

> Hi,
>
> Just so I don't seem too negative, what I would really like to see is an
> in tree implementation. The real challenge there is that the hardware is
> not widely available. If it were something you could get in GCE or AWS
> or at least get via an emulator that would be a different story.
>
> Ariel
>
> On Wed, Nov 1, 2017, at 06:46 PM, Ariel Weisberg wrote:
> > Hi,
> >
> > OK. It makes sense that most of the existing plumbing is not applicable
> > since it operates on a filesystem.
> >
> > How does replay work? Presumably you will need to refactor
> > CommitLogReplayer as well?
> >
> > I think the best way for us to decide whether it's something we want in
> > tree is to see a patch. You would need to do this even if it doesn't
> > make it in tree and you end up having to deploy a patched build.
> >
> > Pluggability is a little bit of a touchy subject because we don't want
> > to directly or indirectly become responsible for interfaces to out of
> > tree implementations. I don't know if there is consensus around this,
> > but I think even if we made the commit log pluggable it would be with
> > the understanding that we may change the API even in bug fix releases.
> >
> > Down the line where this becomes tricky is unmaintained out of tree
> > implementations that people depend on being broken due to interface
> > changes and then no one being around to fix them. People who depend on
> > the out of tree implementation have no one to complain to but us. This
> > becomes even more likely when the maintainers aren't using the latest
> > version of C* and are busy with other things.
> >
> > You are characterizing the API as being just a few methods on CommitLog
> > but that isn't true.
> >
> > These are the imports for CommitLogReplayer
> >
> > import org.apache.cassandra.concurrent.Stage;
> > import org.apache.cassandra.concurrent.StageManager;
> > import org.apache.cassandra.config.CFMetaData;
> > import org.apache.cassandra.config.Schema;
> > import org.apache.cassandra.db.*;
> > import org.apache.cassandra.io.util.FastByteArrayInputStream;
> > import org.apache.cassandra.io.util.FileUtils;
> > import org.apache.cassandra.io.util.RandomAccessReader;
> > import org.apache.cassandra.utils.*;
> >
> > And these are the imports for CommitLog
> >
> > import org.apache.cassandra.config.Config;
> > import org.apache.cassandra.config.DatabaseDescriptor;
> > import org.apache.cassandra.db.*;
> > import org.apache.cassandra.io.FSWriteError;
> > import org.apache.cassandra.io.sstable.SSTableDeletingTask;
> > import org.apache.cassandra.io.util.DataOutputByteBuffer;
> > import org.apache.cassandra.metrics.CommitLogMetrics;
> > import org.apache.cassandra.net.MessagingService;
> > import org.apache.cassandra.service.StorageService;
> > import org.apache.cassandra.utils.JVMStabilityInspector;
> >
> > If we change any code that changes a line in CommitLog or
> > CommitLogReplayer in a bug fix release it's probably going to break your
> > plugin JAR. Anyone running it in production will now have to fix it and
> > recompile or be unable to get bug fixes.
> >
> > Regards,
> > Ariel
> > On Wed, Nov 1, 2017, at 05:25 PM, 大平怜 wrote:
> > > Hi Ariel,
> > >
> > > CommitLogSegment assumes commit log files stored on a regular file
> > > system.
> > > Our CAPI Flash system bypasses OS and directly accesses flash,
> > > so we cannot use the current framework of CommitLogSegment as it is.
> > > Intel's SPDK also bypasses a file system, so we think this kind of
> > > requirement
> > > is not uncommon.
> > >
> > > It would not be easy to reuse AbstractCommitLogSegmentManager, either,
> > > because the archiving and synchronization logics have to be decoupled.
> > > It would require major rework, and we don't think we should affect
> > > the existing implementation so much.
> > >
> > > We do not change any existing format of CommitLog.  Our plugin will use
> > > its own format, as it must manage commit logs on the 4KB-block-oriented
> > > address spaces of flash devices.
> > >
> > >
> > > Regards,
> > > Rei Odaira
> > >
> > >
> > > 2017-10-31 15:38 GMT-05:00 Ariel Weisberg <ar...@weisberg.ws>:
> > >
> > > > Hi,
> > > >
> > > > There are pluggable elements to the commit log such as those used to
> > > > support mmap or compressed.
> > > >
> > > > Can you describe at a high level what a new implementation would look
> > > > like and why it can't be a mode of the existing implementation?
> > > >
> > > > You are not proposing changing the format correct?
> > > >
> > > > Regards,
> > > > Ariel
> > > >
> > > > On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
> > > > > Hello,
> > > > >
> > > > > We are developing a Cassandra plugin to store CommitLog on our
> > > > > low-latency
> > > > > Flash device (CAPI-Flash).  To do that, the original CommitLog
> interface
> > > > > must be changed to allow plugins.  Anyone has any thoughts about
> it?  We
> > > > > have our codebase ready, but we think we should start with
> high-level
> > > > > discussion.
> > > > >
> > > > > The runtime overhead will be minimal.  The only overhead will be
> changing
> > > > > method invocations to CommitLog#add(),
> CommitLog#getCurrentPosition(),
> > > > > etc.
> > > > > into interface invocations.
> > > > >
> > > > > Synching to CommitLog is one of the performance bottlenecks in
> Cassandra
> > > > > especially with batch commit.  I think the pluggable CommitLog
> will allow
> > > > > other interesting alternatives, such as one using SPDK.
> Appreciate any
> > > > > comments.
> > > > >
> > > > >
> > > > > Regards,
> > > > > Rei Odaira
> > > >
> > > > ------------------------------------------------------------
> ---------
> > > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Making CommitLog pluggable

Posted by Ariel Weisberg <ar...@weisberg.ws>.
Hi,

Just so I don't seem too negative, what I would really like to see is an
in tree implementation. The real challenge there is that the hardware is
not widely available. If it were something you could get in GCE or AWS
or at least get via an emulator that would be a different story.

Ariel

On Wed, Nov 1, 2017, at 06:46 PM, Ariel Weisberg wrote:
> Hi,
> 
> OK. It makes sense that most of the existing plumbing is not applicable
> since it operates on a filesystem.
> 
> How does replay work? Presumably you will need to refactor
> CommitLogReplayer as well?
> 
> I think the best way for us to decide whether it's something we want in
> tree is to see a patch. You would need to do this even if it doesn't
> make it in tree and you end up having to deploy a patched build.
> 
> Pluggability is a little bit of a touchy subject because we don't want
> to directly or indirectly become responsible for interfaces to out of
> tree implementations. I don't know if there is consensus around this,
> but I think even if we made the commit log pluggable it would be with
> the understanding that we may change the API even in bug fix releases.
> 
> Down the line where this becomes tricky is unmaintained out of tree
> implementations that people depend on being broken due to interface
> changes and then no one being around to fix them. People who depend on
> the out of tree implementation have no one to complain to but us. This
> becomes even more likely when the maintainers aren't using the latest
> version of C* and are busy with other things.
> 
> You are characterizing the API as being just a few methods on CommitLog
> but that isn't true. 
> 
> These are the imports for CommitLogReplayer
> 
> import org.apache.cassandra.concurrent.Stage;
> import org.apache.cassandra.concurrent.StageManager;
> import org.apache.cassandra.config.CFMetaData;
> import org.apache.cassandra.config.Schema;
> import org.apache.cassandra.db.*;
> import org.apache.cassandra.io.util.FastByteArrayInputStream;
> import org.apache.cassandra.io.util.FileUtils;
> import org.apache.cassandra.io.util.RandomAccessReader;
> import org.apache.cassandra.utils.*;
> 
> And these are the imports for CommitLog
> 
> import org.apache.cassandra.config.Config;
> import org.apache.cassandra.config.DatabaseDescriptor;
> import org.apache.cassandra.db.*;
> import org.apache.cassandra.io.FSWriteError;
> import org.apache.cassandra.io.sstable.SSTableDeletingTask;
> import org.apache.cassandra.io.util.DataOutputByteBuffer;
> import org.apache.cassandra.metrics.CommitLogMetrics;
> import org.apache.cassandra.net.MessagingService;
> import org.apache.cassandra.service.StorageService;
> import org.apache.cassandra.utils.JVMStabilityInspector;
> 
> If we change any code that changes a line in CommitLog or
> CommitLogReplayer in a bug fix release it's probably going to break your
> plugin JAR. Anyone running it in production will now have to fix it and
> recompile or be unable to get bug fixes.
> 
> Regards,
> Ariel
> On Wed, Nov 1, 2017, at 05:25 PM, 大平怜 wrote:
> > Hi Ariel,
> > 
> > CommitLogSegment assumes commit log files stored on a regular file
> > system.
> > Our CAPI Flash system bypasses OS and directly accesses flash,
> > so we cannot use the current framework of CommitLogSegment as it is.
> > Intel's SPDK also bypasses a file system, so we think this kind of
> > requirement
> > is not uncommon.
> > 
> > It would not be easy to reuse AbstractCommitLogSegmentManager, either,
> > because the archiving and synchronization logics have to be decoupled.
> > It would require major rework, and we don't think we should affect
> > the existing implementation so much.
> > 
> > We do not change any existing format of CommitLog.  Our plugin will use
> > its own format, as it must manage commit logs on the 4KB-block-oriented
> > address spaces of flash devices.
> > 
> > 
> > Regards,
> > Rei Odaira
> > 
> > 
> > 2017-10-31 15:38 GMT-05:00 Ariel Weisberg <ar...@weisberg.ws>:
> > 
> > > Hi,
> > >
> > > There are pluggable elements to the commit log such as those used to
> > > support mmap or compressed.
> > >
> > > Can you describe at a high level what a new implementation would look
> > > like and why it can't be a mode of the existing implementation?
> > >
> > > You are not proposing changing the format correct?
> > >
> > > Regards,
> > > Ariel
> > >
> > > On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
> > > > Hello,
> > > >
> > > > We are developing a Cassandra plugin to store CommitLog on our
> > > > low-latency
> > > > Flash device (CAPI-Flash).  To do that, the original CommitLog interface
> > > > must be changed to allow plugins.  Anyone has any thoughts about it?  We
> > > > have our codebase ready, but we think we should start with high-level
> > > > discussion.
> > > >
> > > > The runtime overhead will be minimal.  The only overhead will be changing
> > > > method invocations to CommitLog#add(), CommitLog#getCurrentPosition(),
> > > > etc.
> > > > into interface invocations.
> > > >
> > > > Synching to CommitLog is one of the performance bottlenecks in Cassandra
> > > > especially with batch commit.  I think the pluggable CommitLog will allow
> > > > other interesting alternatives, such as one using SPDK.  Appreciate any
> > > > comments.
> > > >
> > > >
> > > > Regards,
> > > > Rei Odaira
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> > >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: Making CommitLog pluggable

Posted by Ariel Weisberg <ar...@weisberg.ws>.
Hi,

OK. It makes sense that most of the existing plumbing is not applicable
since it operates on a filesystem.

How does replay work? Presumably you will need to refactor
CommitLogReplayer as well?

I think the best way for us to decide whether it's something we want in
tree is to see a patch. You would need to do this even if it doesn't
make it in tree and you end up having to deploy a patched build.

Pluggability is a little bit of a touchy subject because we don't want
to directly or indirectly become responsible for interfaces to out of
tree implementations. I don't know if there is consensus around this,
but I think even if we made the commit log pluggable it would be with
the understanding that we may change the API even in bug fix releases.

Down the line where this becomes tricky is unmaintained out of tree
implementations that people depend on being broken due to interface
changes and then no one being around to fix them. People who depend on
the out of tree implementation have no one to complain to but us. This
becomes even more likely when the maintainers aren't using the latest
version of C* and are busy with other things.

You are characterizing the API as being just a few methods on CommitLog
but that isn't true. 

These are the imports for CommitLogReplayer

import org.apache.cassandra.concurrent.Stage;
import org.apache.cassandra.concurrent.StageManager;
import org.apache.cassandra.config.CFMetaData;
import org.apache.cassandra.config.Schema;
import org.apache.cassandra.db.*;
import org.apache.cassandra.io.util.FastByteArrayInputStream;
import org.apache.cassandra.io.util.FileUtils;
import org.apache.cassandra.io.util.RandomAccessReader;
import org.apache.cassandra.utils.*;

And these are the imports for CommitLog

import org.apache.cassandra.config.Config;
import org.apache.cassandra.config.DatabaseDescriptor;
import org.apache.cassandra.db.*;
import org.apache.cassandra.io.FSWriteError;
import org.apache.cassandra.io.sstable.SSTableDeletingTask;
import org.apache.cassandra.io.util.DataOutputByteBuffer;
import org.apache.cassandra.metrics.CommitLogMetrics;
import org.apache.cassandra.net.MessagingService;
import org.apache.cassandra.service.StorageService;
import org.apache.cassandra.utils.JVMStabilityInspector;

If we change any code that changes a line in CommitLog or
CommitLogReplayer in a bug fix release it's probably going to break your
plugin JAR. Anyone running it in production will now have to fix it and
recompile or be unable to get bug fixes.

Regards,
Ariel
On Wed, Nov 1, 2017, at 05:25 PM, 大平怜 wrote:
> Hi Ariel,
> 
> CommitLogSegment assumes commit log files stored on a regular file
> system.
> Our CAPI Flash system bypasses OS and directly accesses flash,
> so we cannot use the current framework of CommitLogSegment as it is.
> Intel's SPDK also bypasses a file system, so we think this kind of
> requirement
> is not uncommon.
> 
> It would not be easy to reuse AbstractCommitLogSegmentManager, either,
> because the archiving and synchronization logics have to be decoupled.
> It would require major rework, and we don't think we should affect
> the existing implementation so much.
> 
> We do not change any existing format of CommitLog.  Our plugin will use
> its own format, as it must manage commit logs on the 4KB-block-oriented
> address spaces of flash devices.
> 
> 
> Regards,
> Rei Odaira
> 
> 
> 2017-10-31 15:38 GMT-05:00 Ariel Weisberg <ar...@weisberg.ws>:
> 
> > Hi,
> >
> > There are pluggable elements to the commit log such as those used to
> > support mmap or compressed.
> >
> > Can you describe at a high level what a new implementation would look
> > like and why it can't be a mode of the existing implementation?
> >
> > You are not proposing changing the format correct?
> >
> > Regards,
> > Ariel
> >
> > On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
> > > Hello,
> > >
> > > We are developing a Cassandra plugin to store CommitLog on our
> > > low-latency
> > > Flash device (CAPI-Flash).  To do that, the original CommitLog interface
> > > must be changed to allow plugins.  Anyone has any thoughts about it?  We
> > > have our codebase ready, but we think we should start with high-level
> > > discussion.
> > >
> > > The runtime overhead will be minimal.  The only overhead will be changing
> > > method invocations to CommitLog#add(), CommitLog#getCurrentPosition(),
> > > etc.
> > > into interface invocations.
> > >
> > > Synching to CommitLog is one of the performance bottlenecks in Cassandra
> > > especially with batch commit.  I think the pluggable CommitLog will allow
> > > other interesting alternatives, such as one using SPDK.  Appreciate any
> > > comments.
> > >
> > >
> > > Regards,
> > > Rei Odaira
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: Making CommitLog pluggable

Posted by 大平怜 <re...@gmail.com>.
Sorry, not "rsync" but "fsync"....


Regards,
Rei Odaira

2017-11-02 15:57 GMT-05:00 大平怜 <re...@gmail.com>:

> Hello Michael,
>
> The page size of the flash is 4KiB.  I have to ask someone else about
> the exact specification of the write endurance, but we have two products;
> as a backend flash device, one uses Samsung's PM963 M.2 NVMe SSD,
> and the other uses FibreChannel-attached IBM FlashSystem 840/900.
>
> Each write is guaranteed to be flushed to the backend flash device
> when it returns, so there is no API like rsync.  The power loss protection
> depends on what the backend flash devices guarantee.  For example,
> IBM FlashSystem buffers writes in its internal DRAM (as I understand),
> but it provides high redundancy in the power units and also includes
> batteries.
>
> In the current implementation, we must write at least 4KiB for each
> mutation.
> We can change it to merge multiple mutations, much like the batch commit.
>
>
> Thanks,
> Rei Odaira
>
>
> 2017-11-01 16:47 GMT-05:00 Michael Kjellman <mk...@internalcircle.com>
> :
>
>> Awesome!! You're two steps ahead ;)
>>
>> Not sure if you're allowed to share, but can you highlight any details on
>> endurance and performance? Are the pages 4kb or 16kb? How many writes do
>> you expect to handle over a 1 year window of the device? I assume because
>> you're directly accessing the hardware as a block device there are
>> different rules in regards to fsync and how things are flushed? Any power
>> loss protection features etc? If you write a commit log segment that's like
>> 20 bytes (for example), will you post-pad the entire thing internally and
>> still need to write 4kb (or whatever the physical page size is)?
>>
>> Thanks!
>>
>> best,
>> kjellman
>>
>> > On Nov 1, 2017, at 2:40 PM, 大平怜 <re...@gmail.com> wrote:
>> >
>> > Hi Michael,
>> >
>> > Yes, testing is always a problem, and that is exactly why we would like
>> to
>> > release
>> > our code as a plugin, outside of the main source tree, so that the
>> project
>> > won't
>> > need to test the hardware-dependent code.
>> > The pluggable CommitLog will allow this approach.
>> >
>> > Actually, we have already released another plugin for CAPI-Flash-based
>> > RowCache,
>> > which takes advantage of the pluggable RowCache mechanism.
>> > https://github.com/ppc64le/capi-rowcache
>> > We would just like to repeat this approach in CommitLog.
>> >
>> >
>> > Thanks,
>> > Rei Odaira
>> >
>> >
>> > 2017-11-01 16:30 GMT-05:00 Michael Kjellman <
>> mkjellman@internalcircle.com>:
>> >
>> >> Rei:
>> >>
>> >> One thing that comes up when these type of conversations occur is how
>> the
>> >> project can test hardware dependent code. In the case of the PPC64
>> stuff,
>> >> hardware actually got donated to the ASF so Jenkins runs could be done
>> to
>> >> check that things work. Any thoughts on this aspect? Might be a bit
>> >> pre-mature, but I thought I'd at least mention it... On the flip side:
>> if
>> >> CommitLog becomes pluggable enough, shipping an implementation
>> compatible
>> >> with the hardware out of tree might also be viable too.
>> >>
>> >> best,
>> >> kjellman
>> >>
>> >>> On Nov 1, 2017, at 2:25 PM, 大平怜 <re...@gmail.com> wrote:
>> >>>
>> >>> Hi Ariel,
>> >>>
>> >>> CommitLogSegment assumes commit log files stored on a regular file
>> >> system.
>> >>> Our CAPI Flash system bypasses OS and directly accesses flash,
>> >>> so we cannot use the current framework of CommitLogSegment as it is.
>> >>> Intel's SPDK also bypasses a file system, so we think this kind of
>> >>> requirement
>> >>> is not uncommon.
>> >>>
>> >>> It would not be easy to reuse AbstractCommitLogSegmentManager,
>> either,
>> >>> because the archiving and synchronization logics have to be decoupled.
>> >>> It would require major rework, and we don't think we should affect
>> >>> the existing implementation so much.
>> >>>
>> >>> We do not change any existing format of CommitLog.  Our plugin will
>> use
>> >>> its own format, as it must manage commit logs on the
>> 4KB-block-oriented
>> >>> address spaces of flash devices.
>> >>>
>> >>>
>> >>> Regards,
>> >>> Rei Odaira
>> >>>
>> >>>
>> >>> 2017-10-31 15:38 GMT-05:00 Ariel Weisberg <ar...@weisberg.ws>:
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> There are pluggable elements to the commit log such as those used to
>> >>>> support mmap or compressed.
>> >>>>
>> >>>> Can you describe at a high level what a new implementation would look
>> >>>> like and why it can't be a mode of the existing implementation?
>> >>>>
>> >>>> You are not proposing changing the format correct?
>> >>>>
>> >>>> Regards,
>> >>>> Ariel
>> >>>>
>> >>>> On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
>> >>>>> Hello,
>> >>>>>
>> >>>>> We are developing a Cassandra plugin to store CommitLog on our
>> >>>>> low-latency
>> >>>>> Flash device (CAPI-Flash).  To do that, the original CommitLog
>> >> interface
>> >>>>> must be changed to allow plugins.  Anyone has any thoughts about it?
>> >> We
>> >>>>> have our codebase ready, but we think we should start with
>> high-level
>> >>>>> discussion.
>> >>>>>
>> >>>>> The runtime overhead will be minimal.  The only overhead will be
>> >> changing
>> >>>>> method invocations to CommitLog#add(),
>> CommitLog#getCurrentPosition(),
>> >>>>> etc.
>> >>>>> into interface invocations.
>> >>>>>
>> >>>>> Synching to CommitLog is one of the performance bottlenecks in
>> >> Cassandra
>> >>>>> especially with batch commit.  I think the pluggable CommitLog will
>> >> allow
>> >>>>> other interesting alternatives, such as one using SPDK.  Appreciate
>> any
>> >>>>> comments.
>> >>>>>
>> >>>>>
>> >>>>> Regards,
>> >>>>> Rei Odaira
>> >>>>
>> >>>> ------------------------------------------------------------
>> ---------
>> >>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> >>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> >>>>
>> >>>>
>> >>
>> >>
>>
>>
>

Re: Making CommitLog pluggable

Posted by 大平怜 <re...@gmail.com>.
Hello Michael,

The page size of the flash is 4KiB.  I have to ask someone else about
the exact specification of the write endurance, but we have two products;
as a backend flash device, one uses Samsung's PM963 M.2 NVMe SSD,
and the other uses FibreChannel-attached IBM FlashSystem 840/900.

Each write is guaranteed to be flushed to the backend flash device
when it returns, so there is no API like rsync.  The power loss protection
depends on what the backend flash devices guarantee.  For example,
IBM FlashSystem buffers writes in its internal DRAM (as I understand),
but it provides high redundancy in the power units and also includes
batteries.

In the current implementation, we must write at least 4KiB for each
mutation.
We can change it to merge multiple mutations, much like the batch commit.


Thanks,
Rei Odaira


2017-11-01 16:47 GMT-05:00 Michael Kjellman <mk...@internalcircle.com>:

> Awesome!! You're two steps ahead ;)
>
> Not sure if you're allowed to share, but can you highlight any details on
> endurance and performance? Are the pages 4kb or 16kb? How many writes do
> you expect to handle over a 1 year window of the device? I assume because
> you're directly accessing the hardware as a block device there are
> different rules in regards to fsync and how things are flushed? Any power
> loss protection features etc? If you write a commit log segment that's like
> 20 bytes (for example), will you post-pad the entire thing internally and
> still need to write 4kb (or whatever the physical page size is)?
>
> Thanks!
>
> best,
> kjellman
>
> > On Nov 1, 2017, at 2:40 PM, 大平怜 <re...@gmail.com> wrote:
> >
> > Hi Michael,
> >
> > Yes, testing is always a problem, and that is exactly why we would like
> to
> > release
> > our code as a plugin, outside of the main source tree, so that the
> project
> > won't
> > need to test the hardware-dependent code.
> > The pluggable CommitLog will allow this approach.
> >
> > Actually, we have already released another plugin for CAPI-Flash-based
> > RowCache,
> > which takes advantage of the pluggable RowCache mechanism.
> > https://github.com/ppc64le/capi-rowcache
> > We would just like to repeat this approach in CommitLog.
> >
> >
> > Thanks,
> > Rei Odaira
> >
> >
> > 2017-11-01 16:30 GMT-05:00 Michael Kjellman <
> mkjellman@internalcircle.com>:
> >
> >> Rei:
> >>
> >> One thing that comes up when these type of conversations occur is how
> the
> >> project can test hardware dependent code. In the case of the PPC64
> stuff,
> >> hardware actually got donated to the ASF so Jenkins runs could be done
> to
> >> check that things work. Any thoughts on this aspect? Might be a bit
> >> pre-mature, but I thought I'd at least mention it... On the flip side:
> if
> >> CommitLog becomes pluggable enough, shipping an implementation
> compatible
> >> with the hardware out of tree might also be viable too.
> >>
> >> best,
> >> kjellman
> >>
> >>> On Nov 1, 2017, at 2:25 PM, 大平怜 <re...@gmail.com> wrote:
> >>>
> >>> Hi Ariel,
> >>>
> >>> CommitLogSegment assumes commit log files stored on a regular file
> >> system.
> >>> Our CAPI Flash system bypasses OS and directly accesses flash,
> >>> so we cannot use the current framework of CommitLogSegment as it is.
> >>> Intel's SPDK also bypasses a file system, so we think this kind of
> >>> requirement
> >>> is not uncommon.
> >>>
> >>> It would not be easy to reuse AbstractCommitLogSegmentManager, either,
> >>> because the archiving and synchronization logics have to be decoupled.
> >>> It would require major rework, and we don't think we should affect
> >>> the existing implementation so much.
> >>>
> >>> We do not change any existing format of CommitLog.  Our plugin will use
> >>> its own format, as it must manage commit logs on the 4KB-block-oriented
> >>> address spaces of flash devices.
> >>>
> >>>
> >>> Regards,
> >>> Rei Odaira
> >>>
> >>>
> >>> 2017-10-31 15:38 GMT-05:00 Ariel Weisberg <ar...@weisberg.ws>:
> >>>
> >>>> Hi,
> >>>>
> >>>> There are pluggable elements to the commit log such as those used to
> >>>> support mmap or compressed.
> >>>>
> >>>> Can you describe at a high level what a new implementation would look
> >>>> like and why it can't be a mode of the existing implementation?
> >>>>
> >>>> You are not proposing changing the format correct?
> >>>>
> >>>> Regards,
> >>>> Ariel
> >>>>
> >>>> On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
> >>>>> Hello,
> >>>>>
> >>>>> We are developing a Cassandra plugin to store CommitLog on our
> >>>>> low-latency
> >>>>> Flash device (CAPI-Flash).  To do that, the original CommitLog
> >> interface
> >>>>> must be changed to allow plugins.  Anyone has any thoughts about it?
> >> We
> >>>>> have our codebase ready, but we think we should start with high-level
> >>>>> discussion.
> >>>>>
> >>>>> The runtime overhead will be minimal.  The only overhead will be
> >> changing
> >>>>> method invocations to CommitLog#add(), CommitLog#getCurrentPosition()
> ,
> >>>>> etc.
> >>>>> into interface invocations.
> >>>>>
> >>>>> Synching to CommitLog is one of the performance bottlenecks in
> >> Cassandra
> >>>>> especially with batch commit.  I think the pluggable CommitLog will
> >> allow
> >>>>> other interesting alternatives, such as one using SPDK.  Appreciate
> any
> >>>>> comments.
> >>>>>
> >>>>>
> >>>>> Regards,
> >>>>> Rei Odaira
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>
> >>>>
> >>
> >>
>
>

Re: Making CommitLog pluggable

Posted by Michael Kjellman <mk...@internalcircle.com>.
Awesome!! You're two steps ahead ;)

Not sure if you're allowed to share, but can you highlight any details on endurance and performance? Are the pages 4kb or 16kb? How many writes do you expect to handle over a 1 year window of the device? I assume because you're directly accessing the hardware as a block device there are different rules in regards to fsync and how things are flushed? Any power loss protection features etc? If you write a commit log segment that's like 20 bytes (for example), will you post-pad the entire thing internally and still need to write 4kb (or whatever the physical page size is)?

Thanks!

best,
kjellman

> On Nov 1, 2017, at 2:40 PM, 大平怜 <re...@gmail.com> wrote:
> 
> Hi Michael,
> 
> Yes, testing is always a problem, and that is exactly why we would like to
> release
> our code as a plugin, outside of the main source tree, so that the project
> won't
> need to test the hardware-dependent code.
> The pluggable CommitLog will allow this approach.
> 
> Actually, we have already released another plugin for CAPI-Flash-based
> RowCache,
> which takes advantage of the pluggable RowCache mechanism.
> https://github.com/ppc64le/capi-rowcache
> We would just like to repeat this approach in CommitLog.
> 
> 
> Thanks,
> Rei Odaira
> 
> 
> 2017-11-01 16:30 GMT-05:00 Michael Kjellman <mk...@internalcircle.com>:
> 
>> Rei:
>> 
>> One thing that comes up when these type of conversations occur is how the
>> project can test hardware dependent code. In the case of the PPC64 stuff,
>> hardware actually got donated to the ASF so Jenkins runs could be done to
>> check that things work. Any thoughts on this aspect? Might be a bit
>> pre-mature, but I thought I'd at least mention it... On the flip side: if
>> CommitLog becomes pluggable enough, shipping an implementation compatible
>> with the hardware out of tree might also be viable too.
>> 
>> best,
>> kjellman
>> 
>>> On Nov 1, 2017, at 2:25 PM, 大平怜 <re...@gmail.com> wrote:
>>> 
>>> Hi Ariel,
>>> 
>>> CommitLogSegment assumes commit log files stored on a regular file
>> system.
>>> Our CAPI Flash system bypasses OS and directly accesses flash,
>>> so we cannot use the current framework of CommitLogSegment as it is.
>>> Intel's SPDK also bypasses a file system, so we think this kind of
>>> requirement
>>> is not uncommon.
>>> 
>>> It would not be easy to reuse AbstractCommitLogSegmentManager, either,
>>> because the archiving and synchronization logics have to be decoupled.
>>> It would require major rework, and we don't think we should affect
>>> the existing implementation so much.
>>> 
>>> We do not change any existing format of CommitLog.  Our plugin will use
>>> its own format, as it must manage commit logs on the 4KB-block-oriented
>>> address spaces of flash devices.
>>> 
>>> 
>>> Regards,
>>> Rei Odaira
>>> 
>>> 
>>> 2017-10-31 15:38 GMT-05:00 Ariel Weisberg <ar...@weisberg.ws>:
>>> 
>>>> Hi,
>>>> 
>>>> There are pluggable elements to the commit log such as those used to
>>>> support mmap or compressed.
>>>> 
>>>> Can you describe at a high level what a new implementation would look
>>>> like and why it can't be a mode of the existing implementation?
>>>> 
>>>> You are not proposing changing the format correct?
>>>> 
>>>> Regards,
>>>> Ariel
>>>> 
>>>> On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
>>>>> Hello,
>>>>> 
>>>>> We are developing a Cassandra plugin to store CommitLog on our
>>>>> low-latency
>>>>> Flash device (CAPI-Flash).  To do that, the original CommitLog
>> interface
>>>>> must be changed to allow plugins.  Anyone has any thoughts about it?
>> We
>>>>> have our codebase ready, but we think we should start with high-level
>>>>> discussion.
>>>>> 
>>>>> The runtime overhead will be minimal.  The only overhead will be
>> changing
>>>>> method invocations to CommitLog#add(), CommitLog#getCurrentPosition(),
>>>>> etc.
>>>>> into interface invocations.
>>>>> 
>>>>> Synching to CommitLog is one of the performance bottlenecks in
>> Cassandra
>>>>> especially with batch commit.  I think the pluggable CommitLog will
>> allow
>>>>> other interesting alternatives, such as one using SPDK.  Appreciate any
>>>>> comments.
>>>>> 
>>>>> 
>>>>> Regards,
>>>>> Rei Odaira
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>>>> 
>> 
>> 


Re: Making CommitLog pluggable

Posted by 大平怜 <re...@gmail.com>.
Hi Michael,

Yes, testing is always a problem, and that is exactly why we would like to
release
our code as a plugin, outside of the main source tree, so that the project
won't
need to test the hardware-dependent code.
The pluggable CommitLog will allow this approach.

Actually, we have already released another plugin for CAPI-Flash-based
RowCache,
which takes advantage of the pluggable RowCache mechanism.
https://github.com/ppc64le/capi-rowcache
We would just like to repeat this approach in CommitLog.


Thanks,
Rei Odaira


2017-11-01 16:30 GMT-05:00 Michael Kjellman <mk...@internalcircle.com>:

> Rei:
>
> One thing that comes up when these type of conversations occur is how the
> project can test hardware dependent code. In the case of the PPC64 stuff,
> hardware actually got donated to the ASF so Jenkins runs could be done to
> check that things work. Any thoughts on this aspect? Might be a bit
> pre-mature, but I thought I'd at least mention it... On the flip side: if
> CommitLog becomes pluggable enough, shipping an implementation compatible
> with the hardware out of tree might also be viable too.
>
> best,
> kjellman
>
> > On Nov 1, 2017, at 2:25 PM, 大平怜 <re...@gmail.com> wrote:
> >
> > Hi Ariel,
> >
> > CommitLogSegment assumes commit log files stored on a regular file
> system.
> > Our CAPI Flash system bypasses OS and directly accesses flash,
> > so we cannot use the current framework of CommitLogSegment as it is.
> > Intel's SPDK also bypasses a file system, so we think this kind of
> > requirement
> > is not uncommon.
> >
> > It would not be easy to reuse AbstractCommitLogSegmentManager, either,
> > because the archiving and synchronization logics have to be decoupled.
> > It would require major rework, and we don't think we should affect
> > the existing implementation so much.
> >
> > We do not change any existing format of CommitLog.  Our plugin will use
> > its own format, as it must manage commit logs on the 4KB-block-oriented
> > address spaces of flash devices.
> >
> >
> > Regards,
> > Rei Odaira
> >
> >
> > 2017-10-31 15:38 GMT-05:00 Ariel Weisberg <ar...@weisberg.ws>:
> >
> >> Hi,
> >>
> >> There are pluggable elements to the commit log such as those used to
> >> support mmap or compressed.
> >>
> >> Can you describe at a high level what a new implementation would look
> >> like and why it can't be a mode of the existing implementation?
> >>
> >> You are not proposing changing the format correct?
> >>
> >> Regards,
> >> Ariel
> >>
> >> On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
> >>> Hello,
> >>>
> >>> We are developing a Cassandra plugin to store CommitLog on our
> >>> low-latency
> >>> Flash device (CAPI-Flash).  To do that, the original CommitLog
> interface
> >>> must be changed to allow plugins.  Anyone has any thoughts about it?
> We
> >>> have our codebase ready, but we think we should start with high-level
> >>> discussion.
> >>>
> >>> The runtime overhead will be minimal.  The only overhead will be
> changing
> >>> method invocations to CommitLog#add(), CommitLog#getCurrentPosition(),
> >>> etc.
> >>> into interface invocations.
> >>>
> >>> Synching to CommitLog is one of the performance bottlenecks in
> Cassandra
> >>> especially with batch commit.  I think the pluggable CommitLog will
> allow
> >>> other interesting alternatives, such as one using SPDK.  Appreciate any
> >>> comments.
> >>>
> >>>
> >>> Regards,
> >>> Rei Odaira
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >>
>
>

Re: Making CommitLog pluggable

Posted by Michael Kjellman <mk...@internalcircle.com>.
Rei:

One thing that comes up when these type of conversations occur is how the project can test hardware dependent code. In the case of the PPC64 stuff, hardware actually got donated to the ASF so Jenkins runs could be done to check that things work. Any thoughts on this aspect? Might be a bit pre-mature, but I thought I'd at least mention it... On the flip side: if CommitLog becomes pluggable enough, shipping an implementation compatible with the hardware out of tree might also be viable too.

best,
kjellman

> On Nov 1, 2017, at 2:25 PM, 大平怜 <re...@gmail.com> wrote:
> 
> Hi Ariel,
> 
> CommitLogSegment assumes commit log files stored on a regular file system.
> Our CAPI Flash system bypasses OS and directly accesses flash,
> so we cannot use the current framework of CommitLogSegment as it is.
> Intel's SPDK also bypasses a file system, so we think this kind of
> requirement
> is not uncommon.
> 
> It would not be easy to reuse AbstractCommitLogSegmentManager, either,
> because the archiving and synchronization logics have to be decoupled.
> It would require major rework, and we don't think we should affect
> the existing implementation so much.
> 
> We do not change any existing format of CommitLog.  Our plugin will use
> its own format, as it must manage commit logs on the 4KB-block-oriented
> address spaces of flash devices.
> 
> 
> Regards,
> Rei Odaira
> 
> 
> 2017-10-31 15:38 GMT-05:00 Ariel Weisberg <ar...@weisberg.ws>:
> 
>> Hi,
>> 
>> There are pluggable elements to the commit log such as those used to
>> support mmap or compressed.
>> 
>> Can you describe at a high level what a new implementation would look
>> like and why it can't be a mode of the existing implementation?
>> 
>> You are not proposing changing the format correct?
>> 
>> Regards,
>> Ariel
>> 
>> On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
>>> Hello,
>>> 
>>> We are developing a Cassandra plugin to store CommitLog on our
>>> low-latency
>>> Flash device (CAPI-Flash).  To do that, the original CommitLog interface
>>> must be changed to allow plugins.  Anyone has any thoughts about it?  We
>>> have our codebase ready, but we think we should start with high-level
>>> discussion.
>>> 
>>> The runtime overhead will be minimal.  The only overhead will be changing
>>> method invocations to CommitLog#add(), CommitLog#getCurrentPosition(),
>>> etc.
>>> into interface invocations.
>>> 
>>> Synching to CommitLog is one of the performance bottlenecks in Cassandra
>>> especially with batch commit.  I think the pluggable CommitLog will allow
>>> other interesting alternatives, such as one using SPDK.  Appreciate any
>>> comments.
>>> 
>>> 
>>> Regards,
>>> Rei Odaira
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>>