You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Eric B <eb...@gmail.com> on 2014/10/01 21:53:38 UTC

Are attachments duplicated for each revision as well?

Given that attachments are seemingly stored as key/value pairs within a
document, does that mean that each revision of a document contains the
attachments as well?  Or are they stored independently?

For instance, given a 5kb document with a 100Mb attachment that has 10 revs
(where the attachment was added in rev 1), will the total storage
requirements be 5kb * 10 + 100Mb or (5kb + 10Mb) * 10?

Thanks,

Eric

Re: Are attachments duplicated for each revision as well?

Posted by Brian Mitchell <br...@standardanalytics.io>.
> On Oct 1, 2014, at 4:16 PM, Eric B <eb...@gmail.com> wrote:
> 
>> Attachments are identified by name and can be replaced without mutating
>> old references to documents with attachments of
>> the same name.
>> 
> 
> This is where you lose me a little.  How can I have multiple references to
> the same attachment?  Am I not able to have 2 documents with 2 distinct
> attachments with the same name?  For example, if each user uploads a
> "photo.jpg" that is attached to their profile?
> 
> Or are you referring to the ability to retrieve an older rev of the
> document and retrieve the older rev of the attachment?  For example, in
> rev1 of a doc I attach photo.jpg and in rev2 I update the photo.jpg.  Do
> you mean I can still retrieve rev1 and the original photo.jpg?


Yes. I just wanted to make it clear that you have a version of each attachment
at a given revpos. So this means that you can replace or delete the attachment
but old revisions will reference them until they are culled by compaction. If you
have conflicting revisions they will both be able to keep different attachments
under the same name.

DocA-1 + cat.gif (md5-hash 1234, revpos 1)
DocA-2 + cat.gif (md5-hash 1234, revpos 1)
[at this point we only have one copy]
DocA-3 + cat.git (md5-hash 7890, revpos 3)
[now I have two different gifs but I can get both until I compact]
Compact [DocA-3 is the only leaf revision with no conflicts]
[now I should only have one gif (7890)]

If we had conflicts, then we’d possibly have more images. You can play around
with this more by using the atts_since=N query parameter. Keep in mind that
content is not currently deduplicated between different documents so that is
where the application can do some work to ensure that only one of anything is
stored by using a digest like SHA1 or similar as the document id. This
artificially restricts one attachment per doc but I find things work a bit better
when you avoid having huge numbers of attachments to manage per
document.

Brian.


Fwd: Are attachments duplicated for each revision as well?

Posted by Eric B <eb...@gmail.com>.
On Wed, Oct 1, 2014 at 4:02 PM, Brian Mitchell <br...@standardanalytics.io>
wrote:

> The attachment will be stored once and each revision will retain a
> reference to that attachment (including when it was added,
> called revpos, so replication should be efficient too). Compaction will
> copy the attachments over and should retain a single
> copy for each unique attachment.
>

Thanks for the confirmation.  That's what I was suspecting, but wasn't sure.


> Attachments are identified by name and can be replaced without mutating
> old references to documents with attachments of
> the same name.
>

This is where you lose me a little.  How can I have multiple references to
the same attachment?  Am I not able to have 2 documents with 2 distinct
attachments with the same name?  For example, if each user uploads a
"photo.jpg" that is attached to their profile?

Or are you referring to the ability to retrieve an older rev of the
document and retrieve the older rev of the attachment?  For example, in
rev1 of a doc I attach photo.jpg and in rev2 I update the photo.jpg.  Do
you mean I can still retrieve rev1 and the original photo.jpg?

Thanks,

Eric



> Brian.
>
> > On Oct 1, 2014, at 3:53 PM, Eric B <eb...@gmail.com> wrote:
> >
> > Given that attachments are seemingly stored as key/value pairs within a
> > document, does that mean that each revision of a document contains the
> > attachments as well?  Or are they stored independently?
> >
> > For instance, given a 5kb document with a 100Mb attachment that has 10
> revs
> > (where the attachment was added in rev 1), will the total storage
> > requirements be 5kb * 10 + 100Mb or (5kb + 10Mb) * 10?
> >
> > Thanks,
> >
> > Eric
>
>

Re: Are attachments duplicated for each revision as well?

Posted by Eric Benzacar <er...@benzacar.ca>.
On Wed, Oct 1, 2014 at 4:02 PM, Brian Mitchell <br...@standardanalytics.io>
wrote:

> The attachment will be stored once and each revision will retain a
> reference to that attachment (including when it was added,
> called revpos, so replication should be efficient too). Compaction will
> copy the attachments over and should retain a single
> copy for each unique attachment.
>

Thanks for the confirmation.  That's what I was suspecting, but wasn't sure.


> Attachments are identified by name and can be replaced without mutating
> old references to documents with attachments of
> the same name.
>

This is where you lose me a little.  How can I have multiple references to
the same attachment?  Am I not able to have 2 documents with 2 distinct
attachments with the same name?  For example, if each user uploads a
"photo.jpg" that is attached to their profile?

Or are you referring to the ability to retrieve an older rev of the
document and retrieve the older rev of the attachment?  For example, in
rev1 of a doc I attach photo.jpg and in rev2 I update the photo.jpg.  Do
you mean I can still retrieve rev1 and the original photo.jpg?

Thanks,

Eric



> Brian.
>
> > On Oct 1, 2014, at 3:53 PM, Eric B <eb...@gmail.com> wrote:
> >
> > Given that attachments are seemingly stored as key/value pairs within a
> > document, does that mean that each revision of a document contains the
> > attachments as well?  Or are they stored independently?
> >
> > For instance, given a 5kb document with a 100Mb attachment that has 10
> revs
> > (where the attachment was added in rev 1), will the total storage
> > requirements be 5kb * 10 + 100Mb or (5kb + 10Mb) * 10?
> >
> > Thanks,
> >
> > Eric
>
>

Re: Are attachments duplicated for each revision as well?

Posted by Brian Mitchell <br...@standardanalytics.io>.
The attachment will be stored once and each revision will retain a reference to that attachment (including when it was added,
called revpos, so replication should be efficient too). Compaction will copy the attachments over and should retain a single
copy for each unique attachment.

Attachments are identified by name and can be replaced without mutating old references to documents with attachments of
the same name. If you pass the _attachments section and leave out stubs for any existing attachments, that is interpreted as
a delete.

Brian.

> On Oct 1, 2014, at 3:53 PM, Eric B <eb...@gmail.com> wrote:
> 
> Given that attachments are seemingly stored as key/value pairs within a
> document, does that mean that each revision of a document contains the
> attachments as well?  Or are they stored independently?
> 
> For instance, given a 5kb document with a 100Mb attachment that has 10 revs
> (where the attachment was added in rev 1), will the total storage
> requirements be 5kb * 10 + 100Mb or (5kb + 10Mb) * 10?
> 
> Thanks,
> 
> Eric