You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "Vyacheslav V. Zholudev" <vy...@gmail.com> on 2008/07/13 12:41:43 UTC

Re: Multiple entires for the same key in the strings table

Michael, you said that you used to write data into one row in BDB. How 
did you do that? In particular, how did you accumulate the chunks which 
were got from a stream? Am I right that you had to cache everything 
somewhere and then when the whole fulltext was comprised, you flushed 
the whole fulltext data to BDB?

C. Michael Pilato wrote:
> Vyacheslav V. Zholudev wrote:
>> Hey all!
>>
>> Could somebody explain me please, why we can have multiple entries 
>> for the same key in the 'strings' table in case of BDB? I mean what 
>> are the reasons behind it?
>
> We get data piecemeal, in chunks, though the interfaces that write 
> file contents to the 'strings' table.  We used to simply append this 
> data to the one "row" for that string ID, but then we noticed that 
> Berkeley DB's write-ahead logging subsystem wanted to replicate the 
> whole string-so-far value over and over again in the log.* files (so 
> the transaction could be replayed if necessary).  This caused 
> obnoxious bloat of the disk.  We solved this issue (and possibly some 
> performance related performance pains, though I don't recall for sure) 
> by simply splitting each of these incoming chunks into its own "row", 
> but with the same key (the collection of which we traverse with the 
> magic of cursors).
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Multiple entires for the same key in the strings table

Posted by "C. Michael Pilato" <cm...@collab.net>.
Vyacheslav V. Zholudev wrote:
> Michael, you said that you used to write data into one row in BDB. How 
> did you do that? In particular, how did you accumulate the chunks which 
> were got from a stream? Am I right that you had to cache everything 
> somewhere and then when the whole fulltext was comprised, you flushed 
> the whole fulltext data to BDB?

Nope.  Subversion has never allowed itself to hold all of a file's contents 
in memory -- that's just too risky scalability-wise.  We used BDB's partial 
value access logic to simply append the new bits to the existing 
(contents-in-progress) value.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand