You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by Jaln <va...@gmail.com> on 2014/07/18 22:11:12 UTC

ledger and journal file

Hi,
is the ledger file and journal file same?
I run the bookkeeper and generate the bookie,
inside the bookie, I found the journal file and ledger file are almost same.

Best,
Jialin

RE: ledger and journal file

Posted by Rakesh R <ra...@huawei.com>.
Hi Jialin,

Entry log and Journal files are mandatory for the bookie server. AFAIK these are built into the persistence storage protocol.
I don't see a good reason to maintain as a single file. Please feel to correct me if I'm missing anything.

FYI: We have 'journalMaxBackups' to control the maximum number of old journal files, with this server will delete the old journal files. It won't grow unboundedly.
Using this one can minimize the number of old journal files maintained in the system.

Regards,
Rakesh
-----Original Message-----
From: Jaln [mailto:valiantljk@gmail.com] 
Sent: 21 July 2014 03:08
To: bookkeeper-dev
Subject: Re: ledger and journal file

Hi Rakesh,
If we can use one file to do everything, why not?

Best,
Jialin


On Sat, Jul 19, 2014 at 11:44 PM, Rakesh R <ra...@huawei.com> wrote:

> Hi Jaln,
>
> Could you tell me any specific reason to maintain one file ?
>
> -Rakesh
>
> -----Original Message-----
> From: Jaln [mailto:valiantljk@gmail.com]
> Sent: 20 July 2014 02:25
> To: bookkeeper-dev
> Subject: Re: ledger and journal file
>
> Thank you so much, Rakesh,
> Without consideration of performance, can we just maintain one file. 
> For example journal file, and the index for each entry.
>
> Best,
> Jaln
>
>
> On Fri, Jul 18, 2014 at 11:23 PM, Rakesh Radhakrishnan < 
> rakeshr.apache@gmail.com> wrote:
>
> > Hi Jaln,
> >
> > >>>>>>for the data in the journal file(*.txn) and the entry log
> > file(*.log), are
> > >>>>>>they similar?
> > >>>>>>for example, when I add an entry, this opeartion and the entry 
> > >>>>>>data
> > will be
> > >>>>>>logged in the journal file,
> > >>>>>>and the entry data will be logged in the entry log file 
> > >>>>>>(*.log),
> > right?
> >
> > As I mentioned earlier, when an entry is added Bookie server will 
> > add only this entry to the journal file and will send a response 
> > back to the client after the successful flush to the disk. Later 
> > during checkpointing time, server will read the journal entries and 
> > add it to the entry logger files. Also, it will generate index files 
> > corresponding to each ledgers for the faster access. This old 
> > journal file will be garbage collected, because all these entries 
> > are mapped it
> to the entry logger.
> >
> > >>>>>what's the purpose of the two files?
> > AFAIK, adding to entry log and generating index is a costly I/O 
> > operation and will affect the performance. Thats the reason, first 
> > will only add transactions to journal file and send a response 
> > quickly. Later will add it to the entrylog file & index files offline.
> >
> > Total bookie stored data = entry logger data + journal data(most 
> > recent
> > data)
> >
> > *For example:* I'm calling write operation as transaction. Assume 
> > client has performed 20 transactions. All these exists only in the
> journal file.
> > Say, now checkpointing triggered. It will add these 20 transactions 
> > to the entry logger file and generate indexes. Again assume user 
> > performed 10 more transactions. Now we have total 30 transactions.
> >
> > Bookie data(30 transactions) = 20 + 10.
> >
> > Regards,
> > Rakesh
> >
> >
> >
> > On Sat, Jul 19, 2014 at 9:52 AM, Jaln <va...@gmail.com> wrote:
> >
> > > Thanks Rakesh,
> > > for the data in the journal file(*.txn) and the entry log 
> > > file(*.log),
> > are
> > > they similar?
> > > for example, when I add an entry, this opeartion and the entry 
> > > data will
> > be
> > > logged in the journal file,
> > > and the entry data will be logged in the entry log file (*.log), right?
> > > what's the purpose of the two files?
> > >
> > > Thanks,
> > > Jaln
> > >
> > > On Fri, Jul 18, 2014 at 8:16 PM, Rakesh Radhakrishnan < 
> > > rakeshr.apache@gmail.com> wrote:
> > >
> > > > Hi Jaln,
> > > >
> > > > No, both are different. I hope you are asking about 'entry log'
> > > > files
> > and
> > > > 'journal' files
> > > >
> > > > *Journal : *When client performs a write operation (such as 
> > > > adding an
> > > entry
> > > > etc), it is first recorded in the journal file. Journal will be 
> > > > flushed
> > > and
> > > > synced after every write operation before a success code is 
> > > > returned to
> > > the
> > > > client. This ensures that no operation is lost due to machine
> failure.
> > > >
> > > > *Entry Log : *It is not updated for every write operation, 
> > > > bookie
> > server
> > > > will do it lazily. Because writing out the ledger involves - 
> > > > update
> > > ledger
> > > > index files to faster look up and add entry to the logger file.
> > > > This
> > will
> > > > be a costly operation and will affect the performance.
> > > >
> > > > In Bookie, there is a dedicated thread to play journal 
> > > > transactions and
> > > add
> > > > it to the logger lazily, this is called as checkpointing operation.
> > This
> > > > will be performed periodically, now the data will be persisted 
> > > > to
> > ledger
> > > > index files and entry logger. By default the 'flushInterval' is
> > > > 100 milliseconds. Probably you can configure a bigger value to 
> > > > see the difference.
> > > >
> > > > *"SyncThread"* is a background thread which help checkpointing.
> > > > After a ledger storage is checkpointed, the journal files added 
> > > > before
> > checkpoint
> > > > will be garbage collected.
> > > >
> > > > Cheers,
> > > > Rakesh
> > > >
> > > >
> > > > On Sat, Jul 19, 2014 at 1:41 AM, Jaln <va...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > > is the ledger file and journal file same?
> > > > > I run the bookkeeper and generate the bookie, inside the 
> > > > > bookie, I found the journal file and ledger file are
> > almost
> > > > > same.
> > > > >
> > > > > Best,
> > > > > Jialin
> > > > >
> > > >
> > >
> >
>



-- 

Genius only means hard-working all one's life

Re: ledger and journal file

Posted by Jaln <va...@gmail.com>.
Hi Rakesh,
If we can use one file to do everything, why not?

Best,
Jialin


On Sat, Jul 19, 2014 at 11:44 PM, Rakesh R <ra...@huawei.com> wrote:

> Hi Jaln,
>
> Could you tell me any specific reason to maintain one file ?
>
> -Rakesh
>
> -----Original Message-----
> From: Jaln [mailto:valiantljk@gmail.com]
> Sent: 20 July 2014 02:25
> To: bookkeeper-dev
> Subject: Re: ledger and journal file
>
> Thank you so much, Rakesh,
> Without consideration of performance, can we just maintain one file. For
> example journal file, and the index for each entry.
>
> Best,
> Jaln
>
>
> On Fri, Jul 18, 2014 at 11:23 PM, Rakesh Radhakrishnan <
> rakeshr.apache@gmail.com> wrote:
>
> > Hi Jaln,
> >
> > >>>>>>for the data in the journal file(*.txn) and the entry log
> > file(*.log), are
> > >>>>>>they similar?
> > >>>>>>for example, when I add an entry, this opeartion and the entry
> > >>>>>>data
> > will be
> > >>>>>>logged in the journal file,
> > >>>>>>and the entry data will be logged in the entry log file (*.log),
> > right?
> >
> > As I mentioned earlier, when an entry is added Bookie server will add
> > only this entry to the journal file and will send a response back to
> > the client after the successful flush to the disk. Later during
> > checkpointing time, server will read the journal entries and add it to
> > the entry logger files. Also, it will generate index files
> > corresponding to each ledgers for the faster access. This old journal
> > file will be garbage collected, because all these entries are mapped it
> to the entry logger.
> >
> > >>>>>what's the purpose of the two files?
> > AFAIK, adding to entry log and generating index is a costly I/O
> > operation and will affect the performance. Thats the reason, first
> > will only add transactions to journal file and send a response
> > quickly. Later will add it to the entrylog file & index files offline.
> >
> > Total bookie stored data = entry logger data + journal data(most
> > recent
> > data)
> >
> > *For example:* I'm calling write operation as transaction. Assume
> > client has performed 20 transactions. All these exists only in the
> journal file.
> > Say, now checkpointing triggered. It will add these 20 transactions to
> > the entry logger file and generate indexes. Again assume user
> > performed 10 more transactions. Now we have total 30 transactions.
> >
> > Bookie data(30 transactions) = 20 + 10.
> >
> > Regards,
> > Rakesh
> >
> >
> >
> > On Sat, Jul 19, 2014 at 9:52 AM, Jaln <va...@gmail.com> wrote:
> >
> > > Thanks Rakesh,
> > > for the data in the journal file(*.txn) and the entry log
> > > file(*.log),
> > are
> > > they similar?
> > > for example, when I add an entry, this opeartion and the entry data
> > > will
> > be
> > > logged in the journal file,
> > > and the entry data will be logged in the entry log file (*.log), right?
> > > what's the purpose of the two files?
> > >
> > > Thanks,
> > > Jaln
> > >
> > > On Fri, Jul 18, 2014 at 8:16 PM, Rakesh Radhakrishnan <
> > > rakeshr.apache@gmail.com> wrote:
> > >
> > > > Hi Jaln,
> > > >
> > > > No, both are different. I hope you are asking about 'entry log'
> > > > files
> > and
> > > > 'journal' files
> > > >
> > > > *Journal : *When client performs a write operation (such as adding
> > > > an
> > > entry
> > > > etc), it is first recorded in the journal file. Journal will be
> > > > flushed
> > > and
> > > > synced after every write operation before a success code is
> > > > returned to
> > > the
> > > > client. This ensures that no operation is lost due to machine
> failure.
> > > >
> > > > *Entry Log : *It is not updated for every write operation, bookie
> > server
> > > > will do it lazily. Because writing out the ledger involves -
> > > > update
> > > ledger
> > > > index files to faster look up and add entry to the logger file.
> > > > This
> > will
> > > > be a costly operation and will affect the performance.
> > > >
> > > > In Bookie, there is a dedicated thread to play journal
> > > > transactions and
> > > add
> > > > it to the logger lazily, this is called as checkpointing operation.
> > This
> > > > will be performed periodically, now the data will be persisted to
> > ledger
> > > > index files and entry logger. By default the 'flushInterval' is
> > > > 100 milliseconds. Probably you can configure a bigger value to see
> > > > the difference.
> > > >
> > > > *"SyncThread"* is a background thread which help checkpointing.
> > > > After a ledger storage is checkpointed, the journal files added
> > > > before
> > checkpoint
> > > > will be garbage collected.
> > > >
> > > > Cheers,
> > > > Rakesh
> > > >
> > > >
> > > > On Sat, Jul 19, 2014 at 1:41 AM, Jaln <va...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > > is the ledger file and journal file same?
> > > > > I run the bookkeeper and generate the bookie, inside the bookie,
> > > > > I found the journal file and ledger file are
> > almost
> > > > > same.
> > > > >
> > > > > Best,
> > > > > Jialin
> > > > >
> > > >
> > >
> >
>



-- 

Genius only means hard-working all one's life

RE: ledger and journal file

Posted by Rakesh R <ra...@huawei.com>.
Hi Jaln,

Could you tell me any specific reason to maintain one file ?

-Rakesh

-----Original Message-----
From: Jaln [mailto:valiantljk@gmail.com] 
Sent: 20 July 2014 02:25
To: bookkeeper-dev
Subject: Re: ledger and journal file

Thank you so much, Rakesh,
Without consideration of performance, can we just maintain one file. For example journal file, and the index for each entry.

Best,
Jaln


On Fri, Jul 18, 2014 at 11:23 PM, Rakesh Radhakrishnan < rakeshr.apache@gmail.com> wrote:

> Hi Jaln,
>
> >>>>>>for the data in the journal file(*.txn) and the entry log
> file(*.log), are
> >>>>>>they similar?
> >>>>>>for example, when I add an entry, this opeartion and the entry 
> >>>>>>data
> will be
> >>>>>>logged in the journal file,
> >>>>>>and the entry data will be logged in the entry log file (*.log),
> right?
>
> As I mentioned earlier, when an entry is added Bookie server will add 
> only this entry to the journal file and will send a response back to 
> the client after the successful flush to the disk. Later during 
> checkpointing time, server will read the journal entries and add it to 
> the entry logger files. Also, it will generate index files 
> corresponding to each ledgers for the faster access. This old journal 
> file will be garbage collected, because all these entries are mapped it to the entry logger.
>
> >>>>>what's the purpose of the two files?
> AFAIK, adding to entry log and generating index is a costly I/O 
> operation and will affect the performance. Thats the reason, first 
> will only add transactions to journal file and send a response 
> quickly. Later will add it to the entrylog file & index files offline.
>
> Total bookie stored data = entry logger data + journal data(most 
> recent
> data)
>
> *For example:* I'm calling write operation as transaction. Assume 
> client has performed 20 transactions. All these exists only in the journal file.
> Say, now checkpointing triggered. It will add these 20 transactions to 
> the entry logger file and generate indexes. Again assume user 
> performed 10 more transactions. Now we have total 30 transactions.
>
> Bookie data(30 transactions) = 20 + 10.
>
> Regards,
> Rakesh
>
>
>
> On Sat, Jul 19, 2014 at 9:52 AM, Jaln <va...@gmail.com> wrote:
>
> > Thanks Rakesh,
> > for the data in the journal file(*.txn) and the entry log 
> > file(*.log),
> are
> > they similar?
> > for example, when I add an entry, this opeartion and the entry data 
> > will
> be
> > logged in the journal file,
> > and the entry data will be logged in the entry log file (*.log), right?
> > what's the purpose of the two files?
> >
> > Thanks,
> > Jaln
> >
> > On Fri, Jul 18, 2014 at 8:16 PM, Rakesh Radhakrishnan < 
> > rakeshr.apache@gmail.com> wrote:
> >
> > > Hi Jaln,
> > >
> > > No, both are different. I hope you are asking about 'entry log' 
> > > files
> and
> > > 'journal' files
> > >
> > > *Journal : *When client performs a write operation (such as adding 
> > > an
> > entry
> > > etc), it is first recorded in the journal file. Journal will be 
> > > flushed
> > and
> > > synced after every write operation before a success code is 
> > > returned to
> > the
> > > client. This ensures that no operation is lost due to machine failure.
> > >
> > > *Entry Log : *It is not updated for every write operation, bookie
> server
> > > will do it lazily. Because writing out the ledger involves - 
> > > update
> > ledger
> > > index files to faster look up and add entry to the logger file. 
> > > This
> will
> > > be a costly operation and will affect the performance.
> > >
> > > In Bookie, there is a dedicated thread to play journal 
> > > transactions and
> > add
> > > it to the logger lazily, this is called as checkpointing operation.
> This
> > > will be performed periodically, now the data will be persisted to
> ledger
> > > index files and entry logger. By default the 'flushInterval' is 
> > > 100 milliseconds. Probably you can configure a bigger value to see 
> > > the difference.
> > >
> > > *"SyncThread"* is a background thread which help checkpointing. 
> > > After a ledger storage is checkpointed, the journal files added 
> > > before
> checkpoint
> > > will be garbage collected.
> > >
> > > Cheers,
> > > Rakesh
> > >
> > >
> > > On Sat, Jul 19, 2014 at 1:41 AM, Jaln <va...@gmail.com> wrote:
> > >
> > > > Hi,
> > > > is the ledger file and journal file same?
> > > > I run the bookkeeper and generate the bookie, inside the bookie, 
> > > > I found the journal file and ledger file are
> almost
> > > > same.
> > > >
> > > > Best,
> > > > Jialin
> > > >
> > >
> >
>

Re: ledger and journal file

Posted by Flavio Junqueira <fp...@yahoo.com.INVALID>.
Great, say hi to Masood!

-Flavio

On 21 Jul 2014, at 21:24, Jaln <va...@gmail.com> wrote:

> Hi Flavio,
> I'm doing some research on scalable/durable transactional messaging system, with Masood at Huawei Innovation Center. I'm currently using bookkeeper as a case study.
> Thanks for the help.
> 
> Best,
> Jialin
> 
> 
> On Mon, Jul 21, 2014 at 5:43 AM, Flavio Junqueira <fp...@yahoo.com.invalid> wrote:
> Jialin,
> 
> I'm curious to know why you're asking all these questions. Are you working on some research project that involves BookKeeper? Otherwise, what's your use case if you don't mind sharing?
> 
> 
> -Flavio
> 
> 
> 
> On Monday, July 21, 2014 1:34 PM, Ivan Kelly <iv...@apache.org> wrote:
> 
> 
> >
> >
> >We have considered something like this in the past. However, it would
> >mean that reads will affect the latency or writes, as they will move
> >the disk head.
> >
> >It's also the case that the interleaved entrylog performs really badly
> >on reads. Work has been done recently to improve this, by buffering
> >entries and sorting them by ledger id before flushing to the
> >entrylog. This means that reads for a specific ledger will be
> >sequential as opposed to jumping all over the place as it has to do
> >now. If we used the journal for this, then we wouldn't be able to do
> >this processing, as the point of the journal is to ensure that the
> >entry is on persistent storage before replying to the client. If we
> >buffered enough to get benefit from sorting, write latency would be
> >enormous.
> >
> >-Ivan
> >
> >
> >On Sat, Jul 19, 2014 at 01:55:16PM -0700, Jaln wrote:
> >> Thank you so much, Rakesh,
> >> Without consideration of performance, can we just maintain one file. For
> >> example journal file, and the index for each entry.
> >>
> >> Best,
> >> Jaln
> >>
> >>
> >> On Fri, Jul 18, 2014 at 11:23 PM, Rakesh Radhakrishnan <
> >> rakeshr.apache@gmail.com> wrote:
> >>
> >> > Hi Jaln,
> >> >
> >> > >>>>>>for the data in the journal file(*.txn) and the entry log
> >> > file(*.log), are
> >> > >>>>>>they similar?
> >> > >>>>>>for example, when I add an entry, this opeartion and the entry data
> >> > will be
> >> > >>>>>>logged in the journal file,
> >> > >>>>>>and the entry data will be logged in the entry log file (*.log),
> >> > right?
> >> >
> >> > As I mentioned earlier, when an entry is added Bookie server will add only
> >> > this entry to the journal file and will send a response back to the
> >> > client after the successful flush to the disk. Later during checkpointing
> >> > time, server will read the journal entries and add it to the entry logger
> >> > files. Also, it will generate index files corresponding to each ledgers for
> >> > the faster access. This old journal file will be garbage collected, because
> >> > all these entries are mapped it to the entry logger.
> >> >
> >> > >>>>>what's the purpose of the two files?
> >> > AFAIK, adding to entry log and generating index is a costly I/O operation
> >> > and will affect the performance. Thats the reason, first will only add
> >> > transactions to journal file and send a response quickly. Later will add it
> >> > to the entrylog file & index files offline.
> >> >
> >> > Total bookie stored data = entry logger data + journal data(most recent
> >> > data)
> >> >
> >> > *For example:* I'm calling write operation as transaction. Assume client
> >> > has performed 20 transactions. All these exists only in the journal file.
> >> > Say, now checkpointing triggered. It will add these 20 transactions to the
> >> > entry logger file and generate indexes. Again assume user performed 10 more
> >> > transactions. Now we have total 30 transactions.
> >> >
> >> > Bookie data(30 transactions) = 20 + 10.
> >> >
> >> > Regards,
> >> > Rakesh
> >> >
> >> >
> >> >
> >> > On Sat, Jul 19, 2014 at 9:52 AM, Jaln <va...@gmail.com> wrote:
> >> >
> >> > > Thanks Rakesh,
> >> > > for the data in the journal file(*.txn) and the entry log file(*.log),
> >> > are
> >> > > they similar?
> >> > > for example, when I add an entry, this opeartion and the entry data will
> >> > be
> >> > > logged in the journal file,
> >> > > and the entry data will be logged in the entry log file (*.log), right?
> >> > > what's the purpose of the two files?
> >> > >
> >> > > Thanks,
> >> > > Jaln
> >> > >
> >> > > On Fri, Jul 18, 2014 at 8:16 PM, Rakesh Radhakrishnan <
> >> > > rakeshr.apache@gmail.com> wrote:
> >> > >
> >> > > > Hi Jaln,
> >> > > >
> >> > > > No, both are different. I hope you are asking about 'entry log' files
> >> > and
> >> > > > 'journal' files
> >> > > >
> >> > > > *Journal : *When client performs a write operation (such as adding an
> >> > > entry
> >> > > > etc), it is first recorded in the journal file. Journal will be flushed
> >> > > and
> >> > > > synced after every write operation before a success code is returned to
> >> > > the
> >> > > > client. This ensures that no operation is lost due to machine failure.
> >> > > >
> >> > > > *Entry Log : *It is not updated for every write operation, bookie
> >> > server
> >> > > > will do it lazily. Because writing out the ledger involves - update
> >> > > ledger
> >> > > > index files to faster look up and add entry to the logger file. This
> >> > will
> >> > > > be a costly operation and will affect the performance.
> >> > > >
> >> > > > In Bookie, there is a dedicated thread to play journal transactions and
> >> > > add
> >> > > > it to the logger lazily, this is called as checkpointing operation.
> >> > This
> >> > > > will be performed periodically, now the data will be persisted to
> >> > ledger
> >> > > > index files and entry logger. By default the 'flushInterval' is 100
> >> > > > milliseconds. Probably you can configure a bigger value to see the
> >> > > > difference.
> >> > > >
> >> > > > *"SyncThread"* is a background thread which help checkpointing. After a
> >> > > > ledger storage is checkpointed, the journal files added before
> >> > checkpoint
> >> > > > will be garbage collected.
> >> > > >
> >> > > > Cheers,
> >> > > > Rakesh
> >> > > >
> >> > > >
> >> > > > On Sat, Jul 19, 2014 at 1:41 AM, Jaln <va...@gmail.com> wrote:
> >> > > >
> >> > > > > Hi,
> >> > > > > is the ledger file and journal file same?
> >> > > > > I run the bookkeeper and generate the bookie,
> >> > > > > inside the bookie, I found the journal file and ledger file are
> >> > almost
> >> > > > > same.
> >> > > > >
> >> > > > > Best,
> >> > > > > Jialin
> >> > > > >
> >> > > >
> >> > >
> >> >
> >
> >
> >
> 
> 
> 
> -- 
> Genius only means hard-working all one's life


Re: ledger and journal file

Posted by Jaln <va...@gmail.com>.
Hi Flavio,
I'm doing some research on scalable/durable transactional messaging system,
with Masood at Huawei Innovation Center. I'm currently using bookkeeper as
a case study.
Thanks for the help.

Best,
Jialin


On Mon, Jul 21, 2014 at 5:43 AM, Flavio Junqueira <
fpjunqueira@yahoo.com.invalid> wrote:

> Jialin,
>
> I'm curious to know why you're asking all these questions. Are you working
> on some research project that involves BookKeeper? Otherwise, what's your
> use case if you don't mind sharing?
>
>
> -Flavio
>
>
>
> On Monday, July 21, 2014 1:34 PM, Ivan Kelly <iv...@apache.org> wrote:
>
>
> >
> >
> >We have considered something like this in the past. However, it would
> >mean that reads will affect the latency or writes, as they will move
> >the disk head.
> >
> >It's also the case that the interleaved entrylog performs really badly
> >on reads. Work has been done recently to improve this, by buffering
> >entries and sorting them by ledger id before flushing to the
> >entrylog. This means that reads for a specific ledger will be
> >sequential as opposed to jumping all over the place as it has to do
> >now. If we used the journal for this, then we wouldn't be able to do
> >this processing, as the point of the journal is to ensure that the
> >entry is on persistent storage before replying to the client. If we
> >buffered enough to get benefit from sorting, write latency would be
> >enormous.
> >
> >-Ivan
> >
> >
> >On Sat, Jul 19, 2014 at 01:55:16PM -0700, Jaln wrote:
> >> Thank you so much, Rakesh,
> >> Without consideration of performance, can we just maintain one file. For
> >> example journal file, and the index for each entry.
> >>
> >> Best,
> >> Jaln
> >>
> >>
> >> On Fri, Jul 18, 2014 at 11:23 PM, Rakesh Radhakrishnan <
> >> rakeshr.apache@gmail.com> wrote:
> >>
> >> > Hi Jaln,
> >> >
> >> > >>>>>>for the data in the journal file(*.txn) and the entry log
> >> > file(*.log), are
> >> > >>>>>>they similar?
> >> > >>>>>>for example, when I add an entry, this opeartion and the entry
> data
> >> > will be
> >> > >>>>>>logged in the journal file,
> >> > >>>>>>and the entry data will be logged in the entry log file (*.log),
> >> > right?
> >> >
> >> > As I mentioned earlier, when an entry is added Bookie server will add
> only
> >> > this entry to the journal file and will send a response back to the
> >> > client after the successful flush to the disk. Later during
> checkpointing
> >> > time, server will read the journal entries and add it to the entry
> logger
> >> > files. Also, it will generate index files corresponding to each
> ledgers for
> >> > the faster access. This old journal file will be garbage collected,
> because
> >> > all these entries are mapped it to the entry logger.
> >> >
> >> > >>>>>what's the purpose of the two files?
> >> > AFAIK, adding to entry log and generating index is a costly I/O
> operation
> >> > and will affect the performance. Thats the reason, first will only add
> >> > transactions to journal file and send a response quickly. Later will
> add it
> >> > to the entrylog file & index files offline.
> >> >
> >> > Total bookie stored data = entry logger data + journal data(most
> recent
> >> > data)
> >> >
> >> > *For example:* I'm calling write operation as transaction. Assume
> client
> >> > has performed 20 transactions. All these exists only in the journal
> file.
> >> > Say, now checkpointing triggered. It will add these 20 transactions
> to the
> >> > entry logger file and generate indexes. Again assume user performed
> 10 more
> >> > transactions. Now we have total 30 transactions.
> >> >
> >> > Bookie data(30 transactions) = 20 + 10.
> >> >
> >> > Regards,
> >> > Rakesh
> >> >
> >> >
> >> >
> >> > On Sat, Jul 19, 2014 at 9:52 AM, Jaln <va...@gmail.com> wrote:
> >> >
> >> > > Thanks Rakesh,
> >> > > for the data in the journal file(*.txn) and the entry log
> file(*.log),
> >> > are
> >> > > they similar?
> >> > > for example, when I add an entry, this opeartion and the entry data
> will
> >> > be
> >> > > logged in the journal file,
> >> > > and the entry data will be logged in the entry log file (*.log),
> right?
> >> > > what's the purpose of the two files?
> >> > >
> >> > > Thanks,
> >> > > Jaln
> >> > >
> >> > > On Fri, Jul 18, 2014 at 8:16 PM, Rakesh Radhakrishnan <
> >> > > rakeshr.apache@gmail.com> wrote:
> >> > >
> >> > > > Hi Jaln,
> >> > > >
> >> > > > No, both are different. I hope you are asking about 'entry log'
> files
> >> > and
> >> > > > 'journal' files
> >> > > >
> >> > > > *Journal : *When client performs a write operation (such as
> adding an
> >> > > entry
> >> > > > etc), it is first recorded in the journal file. Journal will be
> flushed
> >> > > and
> >> > > > synced after every write operation before a success code is
> returned to
> >> > > the
> >> > > > client. This ensures that no operation is lost due to machine
> failure.
> >> > > >
> >> > > > *Entry Log : *It is not updated for every write operation, bookie
> >> > server
> >> > > > will do it lazily. Because writing out the ledger involves -
> update
> >> > > ledger
> >> > > > index files to faster look up and add entry to the logger file.
> This
> >> > will
> >> > > > be a costly operation and will affect the performance.
> >> > > >
> >> > > > In Bookie, there is a dedicated thread to play journal
> transactions and
> >> > > add
> >> > > > it to the logger lazily, this is called as checkpointing
> operation.
> >> > This
> >> > > > will be performed periodically, now the data will be persisted to
> >> > ledger
> >> > > > index files and entry logger. By default the 'flushInterval' is
> 100
> >> > > > milliseconds. Probably you can configure a bigger value to see the
> >> > > > difference.
> >> > > >
> >> > > > *"SyncThread"* is a background thread which help checkpointing.
> After a
> >> > > > ledger storage is checkpointed, the journal files added before
> >> > checkpoint
> >> > > > will be garbage collected.
> >> > > >
> >> > > > Cheers,
> >> > > > Rakesh
> >> > > >
> >> > > >
> >> > > > On Sat, Jul 19, 2014 at 1:41 AM, Jaln <va...@gmail.com>
> wrote:
> >> > > >
> >> > > > > Hi,
> >> > > > > is the ledger file and journal file same?
> >> > > > > I run the bookkeeper and generate the bookie,
> >> > > > > inside the bookie, I found the journal file and ledger file are
> >> > almost
> >> > > > > same.
> >> > > > >
> >> > > > > Best,
> >> > > > > Jialin
> >> > > > >
> >> > > >
> >> > >
> >> >
> >
> >
> >




-- 

Genius only means hard-working all one's life

Re: ledger and journal file

Posted by Flavio Junqueira <fp...@yahoo.com.INVALID>.
Jialin,

I'm curious to know why you're asking all these questions. Are you working on some research project that involves BookKeeper? Otherwise, what's your use case if you don't mind sharing?


-Flavio



On Monday, July 21, 2014 1:34 PM, Ivan Kelly <iv...@apache.org> wrote:
 

>
>
>We have considered something like this in the past. However, it would
>mean that reads will affect the latency or writes, as they will move
>the disk head.
>
>It's also the case that the interleaved entrylog performs really badly
>on reads. Work has been done recently to improve this, by buffering
>entries and sorting them by ledger id before flushing to the
>entrylog. This means that reads for a specific ledger will be
>sequential as opposed to jumping all over the place as it has to do
>now. If we used the journal for this, then we wouldn't be able to do
>this processing, as the point of the journal is to ensure that the
>entry is on persistent storage before replying to the client. If we
>buffered enough to get benefit from sorting, write latency would be
>enormous.
>
>-Ivan
>
>
>On Sat, Jul 19, 2014 at 01:55:16PM -0700, Jaln wrote:
>> Thank you so much, Rakesh,
>> Without consideration of performance, can we just maintain one file. For
>> example journal file, and the index for each entry.
>> 
>> Best,
>> Jaln
>> 
>> 
>> On Fri, Jul 18, 2014 at 11:23 PM, Rakesh Radhakrishnan <
>> rakeshr.apache@gmail.com> wrote:
>> 
>> > Hi Jaln,
>> >
>> > >>>>>>for the data in the journal file(*.txn) and the entry log
>> > file(*.log), are
>> > >>>>>>they similar?
>> > >>>>>>for example, when I add an entry, this opeartion and the entry data
>> > will be
>> > >>>>>>logged in the journal file,
>> > >>>>>>and the entry data will be logged in the entry log file (*.log),
>> > right?
>> >
>> > As I mentioned earlier, when an entry is added Bookie server will add only
>> > this entry to the journal file and will send a response back to the
>> > client after the successful flush to the disk. Later during checkpointing
>> > time, server will read the journal entries and add it to the entry logger
>> > files. Also, it will generate index files corresponding to each ledgers for
>> > the faster access. This old journal file will be garbage collected, because
>> > all these entries are mapped it to the entry logger.
>> >
>> > >>>>>what's the purpose of the two files?
>> > AFAIK, adding to entry log and generating index is a costly I/O operation
>> > and will affect the performance. Thats the reason, first will only add
>> > transactions to journal file and send a response quickly. Later will add it
>> > to the entrylog file & index files offline.
>> >
>> > Total bookie stored data = entry logger data + journal data(most recent
>> > data)
>> >
>> > *For example:* I'm calling write operation as transaction. Assume client
>> > has performed 20 transactions. All these exists only in the journal file.
>> > Say, now checkpointing triggered. It will add these 20 transactions to the
>> > entry logger file and generate indexes. Again assume user performed 10 more
>> > transactions. Now we have total 30 transactions.
>> >
>> > Bookie data(30 transactions) = 20 + 10.
>> >
>> > Regards,
>> > Rakesh
>> >
>> >
>> >
>> > On Sat, Jul 19, 2014 at 9:52 AM, Jaln <va...@gmail.com> wrote:
>> >
>> > > Thanks Rakesh,
>> > > for the data in the journal file(*.txn) and the entry log file(*.log),
>> > are
>> > > they similar?
>> > > for example, when I add an entry, this opeartion and the entry data will
>> > be
>> > > logged in the journal file,
>> > > and the entry data will be logged in the entry log file (*.log), right?
>> > > what's the purpose of the two files?
>> > >
>> > > Thanks,
>> > > Jaln
>> > >
>> > > On Fri, Jul 18, 2014 at 8:16 PM, Rakesh Radhakrishnan <
>> > > rakeshr.apache@gmail.com> wrote:
>> > >
>> > > > Hi Jaln,
>> > > >
>> > > > No, both are different. I hope you are asking about 'entry log' files
>> > and
>> > > > 'journal' files
>> > > >
>> > > > *Journal : *When client performs a write operation (such as adding an
>> > > entry
>> > > > etc), it is first recorded in the journal file. Journal will be flushed
>> > > and
>> > > > synced after every write operation before a success code is returned to
>> > > the
>> > > > client. This ensures that no operation is lost due to machine failure.
>> > > >
>> > > > *Entry Log : *It is not updated for every write operation, bookie
>> > server
>> > > > will do it lazily. Because writing out the ledger involves - update
>> > > ledger
>> > > > index files to faster look up and add entry to the logger file. This
>> > will
>> > > > be a costly operation and will affect the performance.
>> > > >
>> > > > In Bookie, there is a dedicated thread to play journal transactions and
>> > > add
>> > > > it to the logger lazily, this is called as checkpointing operation.
>> > This
>> > > > will be performed periodically, now the data will be persisted to
>> > ledger
>> > > > index files and entry logger. By default the 'flushInterval' is 100
>> > > > milliseconds. Probably you can configure a bigger value to see the
>> > > > difference.
>> > > >
>> > > > *"SyncThread"* is a background thread which help checkpointing. After a
>> > > > ledger storage is checkpointed, the journal files added before
>> > checkpoint
>> > > > will be garbage collected.
>> > > >
>> > > > Cheers,
>> > > > Rakesh
>> > > >
>> > > >
>> > > > On Sat, Jul 19, 2014 at 1:41 AM, Jaln <va...@gmail.com> wrote:
>> > > >
>> > > > > Hi,
>> > > > > is the ledger file and journal file same?
>> > > > > I run the bookkeeper and generate the bookie,
>> > > > > inside the bookie, I found the journal file and ledger file are
>> > almost
>> > > > > same.
>> > > > >
>> > > > > Best,
>> > > > > Jialin
>> > > > >
>> > > >
>> > >
>> >
>
>
>

Re: ledger and journal file

Posted by Jaln <va...@gmail.com>.
Thanks Sijie.

Best,
Jialin


On Fri, Jul 25, 2014 at 11:25 PM, Sijie Guo <gu...@gmail.com> wrote:

> On Fri, Jul 25, 2014 at 12:04 PM, Jaln <va...@gmail.com> wrote:
>
> > Thanks Ivan,
> > From the bookkeeper tutorial(
> > http://zookeeper.apache.org/doc/r3.4.1/bookkeeperOverview.html)
> > It says `A server maintains an in-memory data structure (with periodic
> > snapshots for example) and logs changes to that structure before it
> applies
> > the change.',
> > but why what I see by opening the ledger file and journal file, both of
> > them are same, i.e., entry data. (for example, if I use hedwig to pub a
> > topic, both the ledger and journal file only have the topic/contents,
> which
> > is what I called entry data, no change information, e.g., pub, is
> logged.)
> > why there is no such `change' information that can be used to recover
> > failure.
> >
> > Maybe my understanding is wrong, plz correct me. Thanks a lot.
> >
>
> There isn't an update or modification on an existing entry. BK only append
> entries into a ledger, so all entries are new data. Let's why we record a
> entry as 'add' in journal.
>
>
> >
> > Best,
> > Jialin
> >
> > On Mon, Jul 21, 2014 at 5:31 AM, Ivan Kelly <iv...@apache.org> wrote:
> >
> > > We have considered something like this in the past. However, it would
> > > mean that reads will affect the latency or writes, as they will move
> > > the disk head.
> > >
> > > It's also the case that the interleaved entrylog performs really badly
> > > on reads. Work has been done recently to improve this, by buffering
> > > entries and sorting them by ledger id before flushing to the
> > > entrylog. This means that reads for a specific ledger will be
> > > sequential as opposed to jumping all over the place as it has to do
> > > now. If we used the journal for this, then we wouldn't be able to do
> > > this processing, as the point of the journal is to ensure that the
> > > entry is on persistent storage before replying to the client. If we
> > > buffered enough to get benefit from sorting, write latency would be
> > > enormous.
> > >
> > > -Ivan
> > >
> > > On Sat, Jul 19, 2014 at 01:55:16PM -0700, Jaln wrote:
> > > > Thank you so much, Rakesh,
> > > > Without consideration of performance, can we just maintain one file.
> > For
> > > > example journal file, and the index for each entry.
> > > >
> > > > Best,
> > > > Jaln
> > > >
> > > >
> > > > On Fri, Jul 18, 2014 at 11:23 PM, Rakesh Radhakrishnan <
> > > > rakeshr.apache@gmail.com> wrote:
> > > >
> > > > > Hi Jaln,
> > > > >
> > > > > >>>>>>for the data in the journal file(*.txn) and the entry log
> > > > > file(*.log), are
> > > > > >>>>>>they similar?
> > > > > >>>>>>for example, when I add an entry, this opeartion and the
> entry
> > > data
> > > > > will be
> > > > > >>>>>>logged in the journal file,
> > > > > >>>>>>and the entry data will be logged in the entry log file
> > (*.log),
> > > > > right?
> > > > >
> > > > > As I mentioned earlier, when an entry is added Bookie server will
> add
> > > only
> > > > > this entry to the journal file and will send a response back to the
> > > > > client after the successful flush to the disk. Later during
> > > checkpointing
> > > > > time, server will read the journal entries and add it to the entry
> > > logger
> > > > > files. Also, it will generate index files corresponding to each
> > > ledgers for
> > > > > the faster access. This old journal file will be garbage collected,
> > > because
> > > > > all these entries are mapped it to the entry logger.
> > > > >
> > > > > >>>>>what's the purpose of the two files?
> > > > > AFAIK, adding to entry log and generating index is a costly I/O
> > > operation
> > > > > and will affect the performance. Thats the reason, first will only
> > add
> > > > > transactions to journal file and send a response quickly. Later
> will
> > > add it
> > > > > to the entrylog file & index files offline.
> > > > >
> > > > > Total bookie stored data = entry logger data + journal data(most
> > recent
> > > > > data)
> > > > >
> > > > > *For example:* I'm calling write operation as transaction. Assume
> > > client
> > > > > has performed 20 transactions. All these exists only in the journal
> > > file.
> > > > > Say, now checkpointing triggered. It will add these 20 transactions
> > to
> > > the
> > > > > entry logger file and generate indexes. Again assume user performed
> > 10
> > > more
> > > > > transactions. Now we have total 30 transactions.
> > > > >
> > > > > Bookie data(30 transactions) = 20 + 10.
> > > > >
> > > > > Regards,
> > > > > Rakesh
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Jul 19, 2014 at 9:52 AM, Jaln <va...@gmail.com>
> wrote:
> > > > >
> > > > > > Thanks Rakesh,
> > > > > > for the data in the journal file(*.txn) and the entry log
> > > file(*.log),
> > > > > are
> > > > > > they similar?
> > > > > > for example, when I add an entry, this opeartion and the entry
> data
> > > will
> > > > > be
> > > > > > logged in the journal file,
> > > > > > and the entry data will be logged in the entry log file (*.log),
> > > right?
> > > > > > what's the purpose of the two files?
> > > > > >
> > > > > > Thanks,
> > > > > > Jaln
> > > > > >
> > > > > > On Fri, Jul 18, 2014 at 8:16 PM, Rakesh Radhakrishnan <
> > > > > > rakeshr.apache@gmail.com> wrote:
> > > > > >
> > > > > > > Hi Jaln,
> > > > > > >
> > > > > > > No, both are different. I hope you are asking about 'entry log'
> > > files
> > > > > and
> > > > > > > 'journal' files
> > > > > > >
> > > > > > > *Journal : *When client performs a write operation (such as
> > adding
> > > an
> > > > > > entry
> > > > > > > etc), it is first recorded in the journal file. Journal will be
> > > flushed
> > > > > > and
> > > > > > > synced after every write operation before a success code is
> > > returned to
> > > > > > the
> > > > > > > client. This ensures that no operation is lost due to machine
> > > failure.
> > > > > > >
> > > > > > > *Entry Log : *It is not updated for every write operation,
> bookie
> > > > > server
> > > > > > > will do it lazily. Because writing out the ledger involves -
> > update
> > > > > > ledger
> > > > > > > index files to faster look up and add entry to the logger file.
> > > This
> > > > > will
> > > > > > > be a costly operation and will affect the performance.
> > > > > > >
> > > > > > > In Bookie, there is a dedicated thread to play journal
> > > transactions and
> > > > > > add
> > > > > > > it to the logger lazily, this is called as checkpointing
> > operation.
> > > > > This
> > > > > > > will be performed periodically, now the data will be persisted
> to
> > > > > ledger
> > > > > > > index files and entry logger. By default the 'flushInterval' is
> > 100
> > > > > > > milliseconds. Probably you can configure a bigger value to see
> > the
> > > > > > > difference.
> > > > > > >
> > > > > > > *"SyncThread"* is a background thread which help checkpointing.
> > > After a
> > > > > > > ledger storage is checkpointed, the journal files added before
> > > > > checkpoint
> > > > > > > will be garbage collected.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Rakesh
> > > > > > >
> > > > > > >
> > > > > > > On Sat, Jul 19, 2014 at 1:41 AM, Jaln <va...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > > is the ledger file and journal file same?
> > > > > > > > I run the bookkeeper and generate the bookie,
> > > > > > > > inside the bookie, I found the journal file and ledger file
> are
> > > > > almost
> > > > > > > > same.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Jialin
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> >
> >
> > --
> >
>
>

Re: ledger and journal file

Posted by Sijie Guo <gu...@gmail.com>.
On Fri, Jul 25, 2014 at 12:04 PM, Jaln <va...@gmail.com> wrote:

> Thanks Ivan,
> From the bookkeeper tutorial(
> http://zookeeper.apache.org/doc/r3.4.1/bookkeeperOverview.html)
> It says `A server maintains an in-memory data structure (with periodic
> snapshots for example) and logs changes to that structure before it applies
> the change.',
> but why what I see by opening the ledger file and journal file, both of
> them are same, i.e., entry data. (for example, if I use hedwig to pub a
> topic, both the ledger and journal file only have the topic/contents, which
> is what I called entry data, no change information, e.g., pub, is logged.)
> why there is no such `change' information that can be used to recover
> failure.
>
> Maybe my understanding is wrong, plz correct me. Thanks a lot.
>

There isn't an update or modification on an existing entry. BK only append
entries into a ledger, so all entries are new data. Let's why we record a
entry as 'add' in journal.


>
> Best,
> Jialin
>
> On Mon, Jul 21, 2014 at 5:31 AM, Ivan Kelly <iv...@apache.org> wrote:
>
> > We have considered something like this in the past. However, it would
> > mean that reads will affect the latency or writes, as they will move
> > the disk head.
> >
> > It's also the case that the interleaved entrylog performs really badly
> > on reads. Work has been done recently to improve this, by buffering
> > entries and sorting them by ledger id before flushing to the
> > entrylog. This means that reads for a specific ledger will be
> > sequential as opposed to jumping all over the place as it has to do
> > now. If we used the journal for this, then we wouldn't be able to do
> > this processing, as the point of the journal is to ensure that the
> > entry is on persistent storage before replying to the client. If we
> > buffered enough to get benefit from sorting, write latency would be
> > enormous.
> >
> > -Ivan
> >
> > On Sat, Jul 19, 2014 at 01:55:16PM -0700, Jaln wrote:
> > > Thank you so much, Rakesh,
> > > Without consideration of performance, can we just maintain one file.
> For
> > > example journal file, and the index for each entry.
> > >
> > > Best,
> > > Jaln
> > >
> > >
> > > On Fri, Jul 18, 2014 at 11:23 PM, Rakesh Radhakrishnan <
> > > rakeshr.apache@gmail.com> wrote:
> > >
> > > > Hi Jaln,
> > > >
> > > > >>>>>>for the data in the journal file(*.txn) and the entry log
> > > > file(*.log), are
> > > > >>>>>>they similar?
> > > > >>>>>>for example, when I add an entry, this opeartion and the entry
> > data
> > > > will be
> > > > >>>>>>logged in the journal file,
> > > > >>>>>>and the entry data will be logged in the entry log file
> (*.log),
> > > > right?
> > > >
> > > > As I mentioned earlier, when an entry is added Bookie server will add
> > only
> > > > this entry to the journal file and will send a response back to the
> > > > client after the successful flush to the disk. Later during
> > checkpointing
> > > > time, server will read the journal entries and add it to the entry
> > logger
> > > > files. Also, it will generate index files corresponding to each
> > ledgers for
> > > > the faster access. This old journal file will be garbage collected,
> > because
> > > > all these entries are mapped it to the entry logger.
> > > >
> > > > >>>>>what's the purpose of the two files?
> > > > AFAIK, adding to entry log and generating index is a costly I/O
> > operation
> > > > and will affect the performance. Thats the reason, first will only
> add
> > > > transactions to journal file and send a response quickly. Later will
> > add it
> > > > to the entrylog file & index files offline.
> > > >
> > > > Total bookie stored data = entry logger data + journal data(most
> recent
> > > > data)
> > > >
> > > > *For example:* I'm calling write operation as transaction. Assume
> > client
> > > > has performed 20 transactions. All these exists only in the journal
> > file.
> > > > Say, now checkpointing triggered. It will add these 20 transactions
> to
> > the
> > > > entry logger file and generate indexes. Again assume user performed
> 10
> > more
> > > > transactions. Now we have total 30 transactions.
> > > >
> > > > Bookie data(30 transactions) = 20 + 10.
> > > >
> > > > Regards,
> > > > Rakesh
> > > >
> > > >
> > > >
> > > > On Sat, Jul 19, 2014 at 9:52 AM, Jaln <va...@gmail.com> wrote:
> > > >
> > > > > Thanks Rakesh,
> > > > > for the data in the journal file(*.txn) and the entry log
> > file(*.log),
> > > > are
> > > > > they similar?
> > > > > for example, when I add an entry, this opeartion and the entry data
> > will
> > > > be
> > > > > logged in the journal file,
> > > > > and the entry data will be logged in the entry log file (*.log),
> > right?
> > > > > what's the purpose of the two files?
> > > > >
> > > > > Thanks,
> > > > > Jaln
> > > > >
> > > > > On Fri, Jul 18, 2014 at 8:16 PM, Rakesh Radhakrishnan <
> > > > > rakeshr.apache@gmail.com> wrote:
> > > > >
> > > > > > Hi Jaln,
> > > > > >
> > > > > > No, both are different. I hope you are asking about 'entry log'
> > files
> > > > and
> > > > > > 'journal' files
> > > > > >
> > > > > > *Journal : *When client performs a write operation (such as
> adding
> > an
> > > > > entry
> > > > > > etc), it is first recorded in the journal file. Journal will be
> > flushed
> > > > > and
> > > > > > synced after every write operation before a success code is
> > returned to
> > > > > the
> > > > > > client. This ensures that no operation is lost due to machine
> > failure.
> > > > > >
> > > > > > *Entry Log : *It is not updated for every write operation, bookie
> > > > server
> > > > > > will do it lazily. Because writing out the ledger involves -
> update
> > > > > ledger
> > > > > > index files to faster look up and add entry to the logger file.
> > This
> > > > will
> > > > > > be a costly operation and will affect the performance.
> > > > > >
> > > > > > In Bookie, there is a dedicated thread to play journal
> > transactions and
> > > > > add
> > > > > > it to the logger lazily, this is called as checkpointing
> operation.
> > > > This
> > > > > > will be performed periodically, now the data will be persisted to
> > > > ledger
> > > > > > index files and entry logger. By default the 'flushInterval' is
> 100
> > > > > > milliseconds. Probably you can configure a bigger value to see
> the
> > > > > > difference.
> > > > > >
> > > > > > *"SyncThread"* is a background thread which help checkpointing.
> > After a
> > > > > > ledger storage is checkpointed, the journal files added before
> > > > checkpoint
> > > > > > will be garbage collected.
> > > > > >
> > > > > > Cheers,
> > > > > > Rakesh
> > > > > >
> > > > > >
> > > > > > On Sat, Jul 19, 2014 at 1:41 AM, Jaln <va...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > is the ledger file and journal file same?
> > > > > > > I run the bookkeeper and generate the bookie,
> > > > > > > inside the bookie, I found the journal file and ledger file are
> > > > almost
> > > > > > > same.
> > > > > > >
> > > > > > > Best,
> > > > > > > Jialin
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>
>
>
> --
>
> Genius only means hard-working all one's life
>

Re: ledger and journal file

Posted by Jaln <va...@gmail.com>.
Thanks Ivan,
>From the bookkeeper tutorial(
http://zookeeper.apache.org/doc/r3.4.1/bookkeeperOverview.html)
It says `A server maintains an in-memory data structure (with periodic
snapshots for example) and logs changes to that structure before it applies
the change.',
but why what I see by opening the ledger file and journal file, both of
them are same, i.e., entry data. (for example, if I use hedwig to pub a
topic, both the ledger and journal file only have the topic/contents, which
is what I called entry data, no change information, e.g., pub, is logged.)
why there is no such `change' information that can be used to recover
failure.

Maybe my understanding is wrong, plz correct me. Thanks a lot.

Best,
Jialin

On Mon, Jul 21, 2014 at 5:31 AM, Ivan Kelly <iv...@apache.org> wrote:

> We have considered something like this in the past. However, it would
> mean that reads will affect the latency or writes, as they will move
> the disk head.
>
> It's also the case that the interleaved entrylog performs really badly
> on reads. Work has been done recently to improve this, by buffering
> entries and sorting them by ledger id before flushing to the
> entrylog. This means that reads for a specific ledger will be
> sequential as opposed to jumping all over the place as it has to do
> now. If we used the journal for this, then we wouldn't be able to do
> this processing, as the point of the journal is to ensure that the
> entry is on persistent storage before replying to the client. If we
> buffered enough to get benefit from sorting, write latency would be
> enormous.
>
> -Ivan
>
> On Sat, Jul 19, 2014 at 01:55:16PM -0700, Jaln wrote:
> > Thank you so much, Rakesh,
> > Without consideration of performance, can we just maintain one file. For
> > example journal file, and the index for each entry.
> >
> > Best,
> > Jaln
> >
> >
> > On Fri, Jul 18, 2014 at 11:23 PM, Rakesh Radhakrishnan <
> > rakeshr.apache@gmail.com> wrote:
> >
> > > Hi Jaln,
> > >
> > > >>>>>>for the data in the journal file(*.txn) and the entry log
> > > file(*.log), are
> > > >>>>>>they similar?
> > > >>>>>>for example, when I add an entry, this opeartion and the entry
> data
> > > will be
> > > >>>>>>logged in the journal file,
> > > >>>>>>and the entry data will be logged in the entry log file (*.log),
> > > right?
> > >
> > > As I mentioned earlier, when an entry is added Bookie server will add
> only
> > > this entry to the journal file and will send a response back to the
> > > client after the successful flush to the disk. Later during
> checkpointing
> > > time, server will read the journal entries and add it to the entry
> logger
> > > files. Also, it will generate index files corresponding to each
> ledgers for
> > > the faster access. This old journal file will be garbage collected,
> because
> > > all these entries are mapped it to the entry logger.
> > >
> > > >>>>>what's the purpose of the two files?
> > > AFAIK, adding to entry log and generating index is a costly I/O
> operation
> > > and will affect the performance. Thats the reason, first will only add
> > > transactions to journal file and send a response quickly. Later will
> add it
> > > to the entrylog file & index files offline.
> > >
> > > Total bookie stored data = entry logger data + journal data(most recent
> > > data)
> > >
> > > *For example:* I'm calling write operation as transaction. Assume
> client
> > > has performed 20 transactions. All these exists only in the journal
> file.
> > > Say, now checkpointing triggered. It will add these 20 transactions to
> the
> > > entry logger file and generate indexes. Again assume user performed 10
> more
> > > transactions. Now we have total 30 transactions.
> > >
> > > Bookie data(30 transactions) = 20 + 10.
> > >
> > > Regards,
> > > Rakesh
> > >
> > >
> > >
> > > On Sat, Jul 19, 2014 at 9:52 AM, Jaln <va...@gmail.com> wrote:
> > >
> > > > Thanks Rakesh,
> > > > for the data in the journal file(*.txn) and the entry log
> file(*.log),
> > > are
> > > > they similar?
> > > > for example, when I add an entry, this opeartion and the entry data
> will
> > > be
> > > > logged in the journal file,
> > > > and the entry data will be logged in the entry log file (*.log),
> right?
> > > > what's the purpose of the two files?
> > > >
> > > > Thanks,
> > > > Jaln
> > > >
> > > > On Fri, Jul 18, 2014 at 8:16 PM, Rakesh Radhakrishnan <
> > > > rakeshr.apache@gmail.com> wrote:
> > > >
> > > > > Hi Jaln,
> > > > >
> > > > > No, both are different. I hope you are asking about 'entry log'
> files
> > > and
> > > > > 'journal' files
> > > > >
> > > > > *Journal : *When client performs a write operation (such as adding
> an
> > > > entry
> > > > > etc), it is first recorded in the journal file. Journal will be
> flushed
> > > > and
> > > > > synced after every write operation before a success code is
> returned to
> > > > the
> > > > > client. This ensures that no operation is lost due to machine
> failure.
> > > > >
> > > > > *Entry Log : *It is not updated for every write operation, bookie
> > > server
> > > > > will do it lazily. Because writing out the ledger involves - update
> > > > ledger
> > > > > index files to faster look up and add entry to the logger file.
> This
> > > will
> > > > > be a costly operation and will affect the performance.
> > > > >
> > > > > In Bookie, there is a dedicated thread to play journal
> transactions and
> > > > add
> > > > > it to the logger lazily, this is called as checkpointing operation.
> > > This
> > > > > will be performed periodically, now the data will be persisted to
> > > ledger
> > > > > index files and entry logger. By default the 'flushInterval' is 100
> > > > > milliseconds. Probably you can configure a bigger value to see the
> > > > > difference.
> > > > >
> > > > > *"SyncThread"* is a background thread which help checkpointing.
> After a
> > > > > ledger storage is checkpointed, the journal files added before
> > > checkpoint
> > > > > will be garbage collected.
> > > > >
> > > > > Cheers,
> > > > > Rakesh
> > > > >
> > > > >
> > > > > On Sat, Jul 19, 2014 at 1:41 AM, Jaln <va...@gmail.com>
> wrote:
> > > > >
> > > > > > Hi,
> > > > > > is the ledger file and journal file same?
> > > > > > I run the bookkeeper and generate the bookie,
> > > > > > inside the bookie, I found the journal file and ledger file are
> > > almost
> > > > > > same.
> > > > > >
> > > > > > Best,
> > > > > > Jialin
> > > > > >
> > > > >
> > > >
> > >
>



-- 

Genius only means hard-working all one's life

Re: ledger and journal file

Posted by Ivan Kelly <iv...@apache.org>.
We have considered something like this in the past. However, it would
mean that reads will affect the latency or writes, as they will move
the disk head.

It's also the case that the interleaved entrylog performs really badly
on reads. Work has been done recently to improve this, by buffering
entries and sorting them by ledger id before flushing to the
entrylog. This means that reads for a specific ledger will be
sequential as opposed to jumping all over the place as it has to do
now. If we used the journal for this, then we wouldn't be able to do
this processing, as the point of the journal is to ensure that the
entry is on persistent storage before replying to the client. If we
buffered enough to get benefit from sorting, write latency would be
enormous.

-Ivan

On Sat, Jul 19, 2014 at 01:55:16PM -0700, Jaln wrote:
> Thank you so much, Rakesh,
> Without consideration of performance, can we just maintain one file. For
> example journal file, and the index for each entry.
> 
> Best,
> Jaln
> 
> 
> On Fri, Jul 18, 2014 at 11:23 PM, Rakesh Radhakrishnan <
> rakeshr.apache@gmail.com> wrote:
> 
> > Hi Jaln,
> >
> > >>>>>>for the data in the journal file(*.txn) and the entry log
> > file(*.log), are
> > >>>>>>they similar?
> > >>>>>>for example, when I add an entry, this opeartion and the entry data
> > will be
> > >>>>>>logged in the journal file,
> > >>>>>>and the entry data will be logged in the entry log file (*.log),
> > right?
> >
> > As I mentioned earlier, when an entry is added Bookie server will add only
> > this entry to the journal file and will send a response back to the
> > client after the successful flush to the disk. Later during checkpointing
> > time, server will read the journal entries and add it to the entry logger
> > files. Also, it will generate index files corresponding to each ledgers for
> > the faster access. This old journal file will be garbage collected, because
> > all these entries are mapped it to the entry logger.
> >
> > >>>>>what's the purpose of the two files?
> > AFAIK, adding to entry log and generating index is a costly I/O operation
> > and will affect the performance. Thats the reason, first will only add
> > transactions to journal file and send a response quickly. Later will add it
> > to the entrylog file & index files offline.
> >
> > Total bookie stored data = entry logger data + journal data(most recent
> > data)
> >
> > *For example:* I'm calling write operation as transaction. Assume client
> > has performed 20 transactions. All these exists only in the journal file.
> > Say, now checkpointing triggered. It will add these 20 transactions to the
> > entry logger file and generate indexes. Again assume user performed 10 more
> > transactions. Now we have total 30 transactions.
> >
> > Bookie data(30 transactions) = 20 + 10.
> >
> > Regards,
> > Rakesh
> >
> >
> >
> > On Sat, Jul 19, 2014 at 9:52 AM, Jaln <va...@gmail.com> wrote:
> >
> > > Thanks Rakesh,
> > > for the data in the journal file(*.txn) and the entry log file(*.log),
> > are
> > > they similar?
> > > for example, when I add an entry, this opeartion and the entry data will
> > be
> > > logged in the journal file,
> > > and the entry data will be logged in the entry log file (*.log), right?
> > > what's the purpose of the two files?
> > >
> > > Thanks,
> > > Jaln
> > >
> > > On Fri, Jul 18, 2014 at 8:16 PM, Rakesh Radhakrishnan <
> > > rakeshr.apache@gmail.com> wrote:
> > >
> > > > Hi Jaln,
> > > >
> > > > No, both are different. I hope you are asking about 'entry log' files
> > and
> > > > 'journal' files
> > > >
> > > > *Journal : *When client performs a write operation (such as adding an
> > > entry
> > > > etc), it is first recorded in the journal file. Journal will be flushed
> > > and
> > > > synced after every write operation before a success code is returned to
> > > the
> > > > client. This ensures that no operation is lost due to machine failure.
> > > >
> > > > *Entry Log : *It is not updated for every write operation, bookie
> > server
> > > > will do it lazily. Because writing out the ledger involves - update
> > > ledger
> > > > index files to faster look up and add entry to the logger file. This
> > will
> > > > be a costly operation and will affect the performance.
> > > >
> > > > In Bookie, there is a dedicated thread to play journal transactions and
> > > add
> > > > it to the logger lazily, this is called as checkpointing operation.
> > This
> > > > will be performed periodically, now the data will be persisted to
> > ledger
> > > > index files and entry logger. By default the 'flushInterval' is 100
> > > > milliseconds. Probably you can configure a bigger value to see the
> > > > difference.
> > > >
> > > > *"SyncThread"* is a background thread which help checkpointing. After a
> > > > ledger storage is checkpointed, the journal files added before
> > checkpoint
> > > > will be garbage collected.
> > > >
> > > > Cheers,
> > > > Rakesh
> > > >
> > > >
> > > > On Sat, Jul 19, 2014 at 1:41 AM, Jaln <va...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > > is the ledger file and journal file same?
> > > > > I run the bookkeeper and generate the bookie,
> > > > > inside the bookie, I found the journal file and ledger file are
> > almost
> > > > > same.
> > > > >
> > > > > Best,
> > > > > Jialin
> > > > >
> > > >
> > >
> >

Re: ledger and journal file

Posted by Jaln <va...@gmail.com>.
Thank you so much, Rakesh,
Without consideration of performance, can we just maintain one file. For
example journal file, and the index for each entry.

Best,
Jaln


On Fri, Jul 18, 2014 at 11:23 PM, Rakesh Radhakrishnan <
rakeshr.apache@gmail.com> wrote:

> Hi Jaln,
>
> >>>>>>for the data in the journal file(*.txn) and the entry log
> file(*.log), are
> >>>>>>they similar?
> >>>>>>for example, when I add an entry, this opeartion and the entry data
> will be
> >>>>>>logged in the journal file,
> >>>>>>and the entry data will be logged in the entry log file (*.log),
> right?
>
> As I mentioned earlier, when an entry is added Bookie server will add only
> this entry to the journal file and will send a response back to the
> client after the successful flush to the disk. Later during checkpointing
> time, server will read the journal entries and add it to the entry logger
> files. Also, it will generate index files corresponding to each ledgers for
> the faster access. This old journal file will be garbage collected, because
> all these entries are mapped it to the entry logger.
>
> >>>>>what's the purpose of the two files?
> AFAIK, adding to entry log and generating index is a costly I/O operation
> and will affect the performance. Thats the reason, first will only add
> transactions to journal file and send a response quickly. Later will add it
> to the entrylog file & index files offline.
>
> Total bookie stored data = entry logger data + journal data(most recent
> data)
>
> *For example:* I'm calling write operation as transaction. Assume client
> has performed 20 transactions. All these exists only in the journal file.
> Say, now checkpointing triggered. It will add these 20 transactions to the
> entry logger file and generate indexes. Again assume user performed 10 more
> transactions. Now we have total 30 transactions.
>
> Bookie data(30 transactions) = 20 + 10.
>
> Regards,
> Rakesh
>
>
>
> On Sat, Jul 19, 2014 at 9:52 AM, Jaln <va...@gmail.com> wrote:
>
> > Thanks Rakesh,
> > for the data in the journal file(*.txn) and the entry log file(*.log),
> are
> > they similar?
> > for example, when I add an entry, this opeartion and the entry data will
> be
> > logged in the journal file,
> > and the entry data will be logged in the entry log file (*.log), right?
> > what's the purpose of the two files?
> >
> > Thanks,
> > Jaln
> >
> > On Fri, Jul 18, 2014 at 8:16 PM, Rakesh Radhakrishnan <
> > rakeshr.apache@gmail.com> wrote:
> >
> > > Hi Jaln,
> > >
> > > No, both are different. I hope you are asking about 'entry log' files
> and
> > > 'journal' files
> > >
> > > *Journal : *When client performs a write operation (such as adding an
> > entry
> > > etc), it is first recorded in the journal file. Journal will be flushed
> > and
> > > synced after every write operation before a success code is returned to
> > the
> > > client. This ensures that no operation is lost due to machine failure.
> > >
> > > *Entry Log : *It is not updated for every write operation, bookie
> server
> > > will do it lazily. Because writing out the ledger involves - update
> > ledger
> > > index files to faster look up and add entry to the logger file. This
> will
> > > be a costly operation and will affect the performance.
> > >
> > > In Bookie, there is a dedicated thread to play journal transactions and
> > add
> > > it to the logger lazily, this is called as checkpointing operation.
> This
> > > will be performed periodically, now the data will be persisted to
> ledger
> > > index files and entry logger. By default the 'flushInterval' is 100
> > > milliseconds. Probably you can configure a bigger value to see the
> > > difference.
> > >
> > > *"SyncThread"* is a background thread which help checkpointing. After a
> > > ledger storage is checkpointed, the journal files added before
> checkpoint
> > > will be garbage collected.
> > >
> > > Cheers,
> > > Rakesh
> > >
> > >
> > > On Sat, Jul 19, 2014 at 1:41 AM, Jaln <va...@gmail.com> wrote:
> > >
> > > > Hi,
> > > > is the ledger file and journal file same?
> > > > I run the bookkeeper and generate the bookie,
> > > > inside the bookie, I found the journal file and ledger file are
> almost
> > > > same.
> > > >
> > > > Best,
> > > > Jialin
> > > >
> > >
> >
>

Re: ledger and journal file

Posted by Rakesh Radhakrishnan <ra...@gmail.com>.
Hi Jaln,

>>>>>>for the data in the journal file(*.txn) and the entry log
file(*.log), are
>>>>>>they similar?
>>>>>>for example, when I add an entry, this opeartion and the entry data
will be
>>>>>>logged in the journal file,
>>>>>>and the entry data will be logged in the entry log file (*.log),
right?

As I mentioned earlier, when an entry is added Bookie server will add only
this entry to the journal file and will send a response back to the
client after the successful flush to the disk. Later during checkpointing
time, server will read the journal entries and add it to the entry logger
files. Also, it will generate index files corresponding to each ledgers for
the faster access. This old journal file will be garbage collected, because
all these entries are mapped it to the entry logger.

>>>>>what's the purpose of the two files?
AFAIK, adding to entry log and generating index is a costly I/O operation
and will affect the performance. Thats the reason, first will only add
transactions to journal file and send a response quickly. Later will add it
to the entrylog file & index files offline.

Total bookie stored data = entry logger data + journal data(most recent
data)

*For example:* I'm calling write operation as transaction. Assume client
has performed 20 transactions. All these exists only in the journal file.
Say, now checkpointing triggered. It will add these 20 transactions to the
entry logger file and generate indexes. Again assume user performed 10 more
transactions. Now we have total 30 transactions.

Bookie data(30 transactions) = 20 + 10.

Regards,
Rakesh



On Sat, Jul 19, 2014 at 9:52 AM, Jaln <va...@gmail.com> wrote:

> Thanks Rakesh,
> for the data in the journal file(*.txn) and the entry log file(*.log), are
> they similar?
> for example, when I add an entry, this opeartion and the entry data will be
> logged in the journal file,
> and the entry data will be logged in the entry log file (*.log), right?
> what's the purpose of the two files?
>
> Thanks,
> Jaln
>
> On Fri, Jul 18, 2014 at 8:16 PM, Rakesh Radhakrishnan <
> rakeshr.apache@gmail.com> wrote:
>
> > Hi Jaln,
> >
> > No, both are different. I hope you are asking about 'entry log' files and
> > 'journal' files
> >
> > *Journal : *When client performs a write operation (such as adding an
> entry
> > etc), it is first recorded in the journal file. Journal will be flushed
> and
> > synced after every write operation before a success code is returned to
> the
> > client. This ensures that no operation is lost due to machine failure.
> >
> > *Entry Log : *It is not updated for every write operation, bookie server
> > will do it lazily. Because writing out the ledger involves - update
> ledger
> > index files to faster look up and add entry to the logger file. This will
> > be a costly operation and will affect the performance.
> >
> > In Bookie, there is a dedicated thread to play journal transactions and
> add
> > it to the logger lazily, this is called as checkpointing operation. This
> > will be performed periodically, now the data will be persisted to ledger
> > index files and entry logger. By default the 'flushInterval' is 100
> > milliseconds. Probably you can configure a bigger value to see the
> > difference.
> >
> > *"SyncThread"* is a background thread which help checkpointing. After a
> > ledger storage is checkpointed, the journal files added before checkpoint
> > will be garbage collected.
> >
> > Cheers,
> > Rakesh
> >
> >
> > On Sat, Jul 19, 2014 at 1:41 AM, Jaln <va...@gmail.com> wrote:
> >
> > > Hi,
> > > is the ledger file and journal file same?
> > > I run the bookkeeper and generate the bookie,
> > > inside the bookie, I found the journal file and ledger file are almost
> > > same.
> > >
> > > Best,
> > > Jialin
> > >
> >
>

Re: ledger and journal file

Posted by Jaln <va...@gmail.com>.
Thanks Rakesh,
for the data in the journal file(*.txn) and the entry log file(*.log), are
they similar?
for example, when I add an entry, this opeartion and the entry data will be
logged in the journal file,
and the entry data will be logged in the entry log file (*.log), right?
what's the purpose of the two files?

Thanks,
Jaln

On Fri, Jul 18, 2014 at 8:16 PM, Rakesh Radhakrishnan <
rakeshr.apache@gmail.com> wrote:

> Hi Jaln,
>
> No, both are different. I hope you are asking about 'entry log' files and
> 'journal' files
>
> *Journal : *When client performs a write operation (such as adding an entry
> etc), it is first recorded in the journal file. Journal will be flushed and
> synced after every write operation before a success code is returned to the
> client. This ensures that no operation is lost due to machine failure.
>
> *Entry Log : *It is not updated for every write operation, bookie server
> will do it lazily. Because writing out the ledger involves - update ledger
> index files to faster look up and add entry to the logger file. This will
> be a costly operation and will affect the performance.
>
> In Bookie, there is a dedicated thread to play journal transactions and add
> it to the logger lazily, this is called as checkpointing operation. This
> will be performed periodically, now the data will be persisted to ledger
> index files and entry logger. By default the 'flushInterval' is 100
> milliseconds. Probably you can configure a bigger value to see the
> difference.
>
> *"SyncThread"* is a background thread which help checkpointing. After a
> ledger storage is checkpointed, the journal files added before checkpoint
> will be garbage collected.
>
> Cheers,
> Rakesh
>
>
> On Sat, Jul 19, 2014 at 1:41 AM, Jaln <va...@gmail.com> wrote:
>
> > Hi,
> > is the ledger file and journal file same?
> > I run the bookkeeper and generate the bookie,
> > inside the bookie, I found the journal file and ledger file are almost
> > same.
> >
> > Best,
> > Jialin
> >
>

Re: ledger and journal file

Posted by Rakesh Radhakrishnan <ra...@gmail.com>.
Hi Jaln,

No, both are different. I hope you are asking about 'entry log' files and
'journal' files

*Journal : *When client performs a write operation (such as adding an entry
etc), it is first recorded in the journal file. Journal will be flushed and
synced after every write operation before a success code is returned to the
client. This ensures that no operation is lost due to machine failure.

*Entry Log : *It is not updated for every write operation, bookie server
will do it lazily. Because writing out the ledger involves - update ledger
index files to faster look up and add entry to the logger file. This will
be a costly operation and will affect the performance.

In Bookie, there is a dedicated thread to play journal transactions and add
it to the logger lazily, this is called as checkpointing operation. This
will be performed periodically, now the data will be persisted to ledger
index files and entry logger. By default the 'flushInterval' is 100
milliseconds. Probably you can configure a bigger value to see the
difference.

*"SyncThread"* is a background thread which help checkpointing. After a
ledger storage is checkpointed, the journal files added before checkpoint
will be garbage collected.

Cheers,
Rakesh


On Sat, Jul 19, 2014 at 1:41 AM, Jaln <va...@gmail.com> wrote:

> Hi,
> is the ledger file and journal file same?
> I run the bookkeeper and generate the bookie,
> inside the bookie, I found the journal file and ledger file are almost
> same.
>
> Best,
> Jialin
>