You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Igor Klimer <i....@getbacksa.pl> on 2014/01/09 15:39:06 UTC

Error during compaction

Hi all,
I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN

I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.

Best regards,
Igor Klimer

(sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)

-------------------------------

getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
Numer KRS: 0000413997
NIP: 8992733884
REGON: 021829989

Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł

Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.

Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.

This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Re: Error during compaction

Posted by Igor Klimer <i....@getbacksa.pl>.

I've created a ticket: https://issues.apache.org/jira/browse/COUCHDB-2040
I've tried to write up a short version of this discussion there to help anyone get up to date quick ;)
Should we move the discussion of the issue there?

Best regards,
Igor Klimer
________________________________________
Od: Benoit Chesneau [bchesneau@gmail.com]
Wysłano: 28 stycznia 2014 13:06
Do: user@couchdb.apache.org
Temat: Re: Error during compaction

Igor can you open a ticket about this issue in Jira?

looks like the md5 or the atatchment length are not matching:
https://github.com/refuge/couchdb/blob/master/src/couchdb/couch_db_updater.erl#L833

- benoit

On Thu, Jan 9, 2014 at 3:39 PM, Igor Klimer <i....@getbacksa.pl> wrote:

> Hi all,
> I've stumbled upon a peculiar problem while trying to compact (for the
> first time) a large(-ish) database (~100GB at that time). At about 50% it
> failed with this error: http://pastebin.com/qeaZNHMj
> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
> I figured that it might be a bug in the Windows build (Erlang on Windows?
> C'mon, that can't be good ;)) or already fixed in a newer version. Some
> time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS
> (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>
> I've tried wrapping my head around that error, googling it, checking this
> mail list but to no avail :) So if anyone can give me any pointers as to
> what might be causing this problem, I'd be very grateful.
>
> Best regards,
> Igor Klimer
>
> (sorry for the footer that will probably follow, unfortunately it's added
> for all outgoing external mail...)

-------------------------------

getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
Numer KRS: 0000413997
NIP: 8992733884
REGON: 021829989

Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł

Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.

Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.

This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Re: Error during compaction

Posted by Benoit Chesneau <bc...@gmail.com>.

Igor can you open a ticket about this issue in Jira?

looks like the md5 or the atatchment length are not matching:
https://github.com/refuge/couchdb/blob/master/src/couchdb/couch_db_updater.erl#L833

- benoit


On Thu, Jan 9, 2014 at 3:39 PM, Igor Klimer <i....@getbacksa.pl> wrote:

> Hi all,
> I've stumbled upon a peculiar problem while trying to compact (for the
> first time) a large(-ish) database (~100GB at that time). At about 50% it
> failed with this error: http://pastebin.com/qeaZNHMj
> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
> I figured that it might be a bug in the Windows build (Erlang on Windows?
> C'mon, that can't be good ;)) or already fixed in a newer version. Some
> time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS
> (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>
> I've tried wrapping my head around that error, googling it, checking this
> mail list but to no avail :) So if anyone can give me any pointers as to
> what might be causing this problem, I'd be very grateful.
>
> Best regards,
> Igor Klimer
>
> (sorry for the footer that will probably follow, unfortunately it's added
> for all outgoing external mail...)
>
>
>
>
> -------------------------------
>
> getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
> Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział
> Gospodarczy KRS.
> Numer KRS: 0000413997
> NIP: 8992733884
> REGON: 021829989
>
> Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł
>
> Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do
> art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym
> charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez
> wpływu na interpretacje zawartych w niej oświadczeń.
>
>
> Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą
> podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem
> powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować,
> dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O
> błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę
> i usunąć wiadomość.
>
>
> This e-mail message may contain confidential and/or privileged
> information. If you are not the intended recipient (or have received this
> e-mail in error) please notify the sender immediately and destroy this
> e-mail. Any unauthorized copying, disclosure or distribution of the
> material in this e-mail is strictly forbidden.
>

Re: Error during compaction

Posted by Robert Samuel Newson <rn...@apache.org>.

Hi,

That’s encouraging! This must mean that the crash occurs when the compactor is reading attachment data for a 'losing' conflict? Hmm.

I’m going to discuss the next step with other couch devs. In the meantime, can you replicate this database to a new local database? If that completes successfully then you’ve implicitly compacted the database anyway (though with a new name). I’m curious to know if it will work. I’d expect it to fail the same way the compactor does, and for the same reason (attempting to copy some needed item from the source that is in some way unreadable or corrupt).

B.


On 28 Jan 2014, at 11:20, Igor Klimer <i....@getbacksa.pl> wrote:

> Finally managed to run _changes and _all_docs last night. It seems that the calls.. succeeded.
> I've run:
> curl localhost:5984/ecrepo/_changes?include_docs=true 2>out && curl localhost:5984/ecrepo/_all_docs?include_docs=true 2>out2
> 
> In the out files are only statistics from curl:
> out: http://pastebin.com/6X8CRVE7
> out2: http://pastebin.com/80pAaR1Y
> 
> I've actually run this twice, since the first time I redirected stdout to /dev/null (thinking that the error would pop up in the couchdb logs). The second time I left the stdout alone to see if the response ends in some kind of error/abruptly, here are the last few lines from both calls:
> http://pastebin.com/dkDFSCQg
> http://pastebin.com/E2usS018
> 
> Everything looks ok to me - the last lines complete the JSON structure. Judging from this, the docs themselves are all ok, only one(-ish) of the attachment is "bad"? (almost every document in the database has an attachment, a tiff image, with it).
> 
> Best regards,
> Igor Klimer
> ________________________________________
> Od: Igor Klimer [i.klimer@getbacksa.pl]
> Wysłano: 24 stycznia 2014 08:57
> Do: user@couchdb.apache.org
> Temat: Re: Error during compaction
> 
> Yes, currently it's 1.5.0.
> I meant to run _changes and _all_docs last night, but there was some other maintenance tasks scheduled. I'll definitely do it during the weekend and report back.
> 
> Best regards,
> Igor Klimer
> ________________________________________
> Od: Robert Samuel Newson [rnewson@apache.org]
> Wysłano: 23 stycznia 2014 19:28
> Do: user
> Temat: Re: Error during compaction
> 
> Understood on access to the file but I figured I should ask.
> 
> The simplest thing will be a small patch to print out the values that CouchDB expected to match but did not. Remind which exact CouchDB version you are running? I think you said you’re up to 1.5.0 now.
> 
> B.
> 
> On 23 Jan 2014, at 15:37, Igor Klimer <i....@getbacksa.pl> wrote:
> 
>> Well, if you put it that way, then yes, (potential) database corruption does seem like something you might be interested in ;)
>> 
>> As far as I can tell, there weren't any options set that could affect fsyncing, in fact there was hardly any customization done at all. Here are the config files from the Windows machine (since that's where the corruption occurred):
>> default.ini: http://pastebin.com/kUz0qyNk
>> local.ini: http://pastebin.com/srZUMwzB
>> 
>> I've checked http://localhost:5984/_utils/config.html and delayed_commits is set to true (since that's the default).
>> 
>> Parity checking the RAID seems like a good idea, but from what the admins are telling me, they'd need to take it offline, so I want to exhaust other possibilities before that.
>> So for now, I'll try checking the DB in the way you suggested:
>> curl localhost:5984/dbname/_changes?include_docs=true
>> curl localhost:5984/dbname/_all_docs?include_docs=true
>> 
>> And then (if it succeeds) I'll try to recover the database using the recover-couchdb script.
>> 
>> As for receiving a copy of the database - as a programmer I understand how that'd help you investigate this issue, but it seems that "the powers that be" aren't so understanding. Even if your software is at the core of their enterprise... But I'm more than willing to compile/run any debug builds you throw at me or learn some basic Erlang debugging if that'd help finding the core of this issue.
>> 
>> Best regards,
>> Igor Klimer
>> ________________________________________
>> Od: Robert Samuel Newson [rnewson@apache.org]
>> Wysłano: 23 stycznia 2014 15:40
>> Do: user
>> Temat: Re: Error during compaction
>> 
>> Database corruption is not esoteric, we take it very seriously. :)
>> 
>> Yes, strictly append only (opened with O_APPEND in posix land, not sure what the equivalent is for Win). This doesn’t stop hardware failure from mangling earlier bits, of course. You have RAID-5, so you have a parity stripe, perhaps you could force a parity check? Abrupt shutdown or hitting end of disk should do no more than cause unflushed updates to be lost, nothing fsync’ed() prior to that should be at risk, by design. Are you running with delayed_commits equal true or false? The difference is that we fsync immediately or at up to a one second delay. In neither case do we omit to make fsync() calls before and after writing each new database footer. There are config options to disable those fsync calls, please let us know if you did that (your local.ini file will tell you).
>> 
>> So, the two causes of true corruption would be 1) a bug in our code and 2) disk corruption or failure to honor fsync() and/or write ordering. Obviously we’re very interested in any occurrence of type 1.
>> 
>> A few things to try to confirm that your database is ok but not compactable (which would be hard to explain)
>> 
>> curl localhost:5984/dbname/_changes?include_docs=true
>> curl localhost:5984/dbname/_all_docs?include_docs=true
>> 
>> Let us know if those complete without error (or the errors if there are any).
>> 
>> Finally, would it be possible for the CouchDB development team (or a subset) to receive a copy of the database file for forensic investigation? I believe we can sign NDA’s and the like if that helps.
>> 
>> B.
>> 
>> On 23 Jan 2014, at 13:54, Igor Klimer <i....@getbacksa.pl> wrote:
>> 
>>> No problem - I'm sure that you have more then enough on your plate without users pestering you with such esoteric problems ;)
>>> 
>>> As for the I/O subsystem - on the lowest level it's a couple of RAID5 matrices. Then, there is a Windows Server 2008 managing that and some virtual servers (via Hyper-V), including both of the machines that we had the database on (Windows Server 2008 R2 Enterprise  and now Ubuntu 12 LTS). I was assured that the matrices have been running rock-solid since the moment they've been setup and didn't exhibit any hardware failures. However, the servers could have been abruptly shut down, for example when it run out of free disk space. While NTFS should recover from something like that, I wonder if there isn't a corner case that would cause some database corruption (like you're suggesting).
>>> 
>>> Thanks for the recover-couchdb script - I'll try to give it a go ASAP. We do have backups, but I don't think they are helpful in this case - the database as a whole is working, we don't have any reports of missing files or not working documents. It would be interesting to inspect the backups to see when and, more interestingly, what happened. You mentioned that the database format is strictly append only, so the corruption shouldn't "move" and the "old"/early parts of the database file shouldn't change? Still, that seems like a really time consuming task, so for now I'll try to give the recovery script a go.
>>> 
>>> Thanks again for your help and best regards,
>>> Igor Klimer
>>> 
>>> ________________________________________
>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>> Wysłano: 23 stycznia 2014 13:52
>>> Do: user
>>> Temat: Re: Error during compaction
>>> 
>>> Hi,
>>> 
>>> Sorry, in turn, for not replying sooner. I’m not really sure what to suggest, it does sound like the database file is corrupt, which is quite hard to do with our strictly append only format. The only oddities here are the use of Windows (and, presumably, NTFS?) and the fact that you did hit end of disk. I’ve not observed corruption when hitting end of disk on other OS/FS combinations, though. Do you happen to know any details of your I/O subsystem that might provide a hint? Could any of the disks themselves have suffered block failures that couldn’t be corrected by disk firmware? Bit flips?Would your disk controller lie about fsync() calls?
>>> 
>>> Those questions will help us figure out how the corruption occurs, but it obviously doesn’t help you fix it. For that, beyond hoping you have backups, I suggest trying https://github.com/jhs/recover-couchdb (perhaps Jason can chip in with how likely this is to still work, given the age). If that doesn’t extract all your data, then the only other suggestion I have is to truncate the database file until it can compact, but this necessarily means losing data.
>>> 
>>> Do other developers have other suggestions here?
>>> 
>>> B.
>>> 
>>> On 21 Jan 2014, at 09:08, Igor Klimer <i....@getbacksa.pl> wrote:
>>> 
>>>> Hi,
>>>> I'm extremely sorry for not replying sooner, however I was on sick leave last week.
>>>> I've tried your suggestion with an empty .compact file, however the results seem to be the same...
>>>> Log: http://pastebin.com/MJCgGM8C
>>>> 
>>>> Started with an empty ecrepo.couch.compact file (touch ecrepo.couch.compact), then after about 3 hours, the error was printed in the logs and the compaction failed:
>>>> -rw-r--r-- 1 couchdb couchdb 137502523517 Jan 21 09:51 ecrepo.couch
>>>> -rw-r--r-- 1 couchdb couchdb  51692612367 Jan 21 02:07 ecrepo.couch.compact
>>>> 
>>>> There's over 100GB free space available on the disk.
>>>> 
>>>> At least I think I know what the number 51692471440 in log means ;) But I don't know if there's a way to check which document resides at that position in file.
>>>> 
>>>> Best regards,
>>>> Igor Klimer
>>>> 
>>>> ________________________________________
>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>> Wysłano: 10 stycznia 2014 18:45
>>>> Do: user
>>>> Temat: Re: Error during compaction
>>>> 
>>>> Yes, I understood. The empty .compact file will trigger more checking in the compaction process, I’m hoping it gets us past the problem.
>>>> 
>>>> B.
>>>> 
>>>> On 10 Jan 2014, at 13:34, Igor Klimer <i....@getbacksa.pl> wrote:
>>>> 
>>>>> :)
>>>>> Just to clarify - the .compact file is getting created and then the compaction fails after some time (an hour or more):
>>>>> 1) attempt on Windows with Couchdb 1.2.0 it failed because insufficient disk space. The .compact file had at least 10GB, unfortunately, I don't remember how much (and whether it was bigger then the one produced in the later attempts). There was no free disk space when it failed, so I'm assuming that was the cause.
>>>>> 2) attempt on Windows with Couchdb 1.2.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk.
>>>>> 3) attempt on Ubuntu with Couchdb 1.5.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk, and judging from the numbers present in the log (ids? node numbers?) it failed at the same moment as attempt #2.
>>>>> 
>>>>> Just wanted to make sure we're on the same page :) Do you still want me to try it with an empty .compact file? (I can do this only during night hours, since I don't want to put too much load on the server during working hours)
>>>>> 
>>>>> Best regards,
>>>>> Igor Klimer
>>>>> ________________________________________
>>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>>> Wysłano: 10 stycznia 2014 14:03
>>>>> Do: user
>>>>> Temat: Re: Error during compaction
>>>>> 
>>>>> Hrm, strike one. Ok. Next thing to try is subtly different. stop couchdb, delete the .compact file, but then make a new, empty .compact file (so ’touch /path/to/dbname.compact’), start couchdb and compact.
>>>>> 
>>>>> B.
>>>>> 
>>>>> On 10 Jan 2014, at 12:42, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>> 
>>>>>> Yes, I've already done that after the very fist attempt at compaction (the one that failed because of lack of disk space). And it resulted in the second fail (on Windows), then the same on Linux - I always deleted the incomplete (about 50% of the database, around 50GB) .compact file before running the compaction again. So I was always doing compaction from scratch.
>>>>>> 
>>>>>> Best regards,
>>>>>> Igor Klimer
>>>>>> ________________________________________
>>>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>>>> Wysłano: 10 stycznia 2014 13:08
>>>>>> Do: user
>>>>>> Temat: Re: Error during compaction
>>>>>> 
>>>>>> Thanks! that’s very useful. Hitting end of disk certainly feels like a cause here. Since the compaction has never completed, I suggest we redo compaction from scratch.
>>>>>> 
>>>>>> 1) stop couchdb
>>>>>> 2) delete (or move aside) the dbname.compact file for this database
>>>>>> 3) start couchdb
>>>>>> 4) compact the db
>>>>>> 
>>>>>> Whether it works or not, please let us know.
>>>>>> 
>>>>>> B.
>>>>>> 
>>>>>> On 10 Jan 2014, at 08:25, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>>> 
>>>>>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>>>>> 
>>>>>>> No, we've been running 1.2.0 from the start (around Oct 2012), then switched to Ubuntu and 1.5.0.
>>>>>>> 
>>>>>>>> Do you have free disk space?
>>>>>>> Yes, there's about 150% of the DB's size worth of free space :) I forgot to mention ("OK, here we go, the user will confess to some sin he committed and is ashamed of and is most likely the reason for this failure") that we've run the compaction once before the error on Windows I mentioned below, but it failed because of insufficient disk space - so I double checked before running the compaction again if there's enough space. Here's the log, if it's any helpful: http://pastebin.com/S1URXN0p
>>>>>>> Do you think it could have left the database in some corrupted state? It seems it failed at a different part then the two next attempts (and, as far as I understand, compaction is just copying over the database while pruning the old revisions and deleted documents).
>>>>>>> 
>>>>>>> Thank you for your time and help and best regards,
>>>>>>> Igor Klimer
>>>>>>> ________________________________________
>>>>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>>>>> Wysłano: 9 stycznia 2014 17:13
>>>>>>> Do: user
>>>>>>> Temat: Re: Error during compaction
>>>>>>> 
>>>>>>> Do you have free disk space?
>>>>>>> 
>>>>>>> On 9 Jan 2014, at 15:25, Robert Samuel Newson <rn...@apache.org> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>>>>>> 
>>>>>>>> B.
>>>>>>>> 
>>>>>>>> On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
>>>>>>>>> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
>>>>>>>>> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
>>>>>>>>> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>>>>>>>>> 
>>>>>>>>> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
>>>>>>>>> 
>>>>>>>>> Best regards,
>>>>>>>>> Igor Klimer
>>>>>>>>> 
>>>>>>>>> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
> 
> 
> 
> 
> -------------------------------
> 
> getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
> Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
> Numer KRS: 0000413997
> NIP: 8992733884
> REGON: 021829989
> 
> Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł
> 
> Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.
> 
> 
> Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.
> 
> 
> This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Re: Error during compaction

Posted by Igor Klimer <i....@getbacksa.pl>.

Finally managed to run _changes and _all_docs last night. It seems that the calls.. succeeded.
I've run:
curl localhost:5984/ecrepo/_changes?include_docs=true 2>out && curl localhost:5984/ecrepo/_all_docs?include_docs=true 2>out2

In the out files are only statistics from curl:
out: http://pastebin.com/6X8CRVE7
out2: http://pastebin.com/80pAaR1Y

I've actually run this twice, since the first time I redirected stdout to /dev/null (thinking that the error would pop up in the couchdb logs). The second time I left the stdout alone to see if the response ends in some kind of error/abruptly, here are the last few lines from both calls:
http://pastebin.com/dkDFSCQg
http://pastebin.com/E2usS018

Everything looks ok to me - the last lines complete the JSON structure. Judging from this, the docs themselves are all ok, only one(-ish) of the attachment is "bad"? (almost every document in the database has an attachment, a tiff image, with it).

Best regards,
Igor Klimer
________________________________________
Od: Igor Klimer [i.klimer@getbacksa.pl]
Wysłano: 24 stycznia 2014 08:57
Do: user@couchdb.apache.org
Temat: Re: Error during compaction

Yes, currently it's 1.5.0.
I meant to run _changes and _all_docs last night, but there was some other maintenance tasks scheduled. I'll definitely do it during the weekend and report back.

Best regards,
Igor Klimer
________________________________________
Od: Robert Samuel Newson [rnewson@apache.org]
Wysłano: 23 stycznia 2014 19:28
Do: user
Temat: Re: Error during compaction

Understood on access to the file but I figured I should ask.

The simplest thing will be a small patch to print out the values that CouchDB expected to match but did not. Remind which exact CouchDB version you are running? I think you said you’re up to 1.5.0 now.

B.

On 23 Jan 2014, at 15:37, Igor Klimer <i....@getbacksa.pl> wrote:

> Well, if you put it that way, then yes, (potential) database corruption does seem like something you might be interested in ;)
>
> As far as I can tell, there weren't any options set that could affect fsyncing, in fact there was hardly any customization done at all. Here are the config files from the Windows machine (since that's where the corruption occurred):
> default.ini: http://pastebin.com/kUz0qyNk
> local.ini: http://pastebin.com/srZUMwzB
>
> I've checked http://localhost:5984/_utils/config.html and delayed_commits is set to true (since that's the default).
>
> Parity checking the RAID seems like a good idea, but from what the admins are telling me, they'd need to take it offline, so I want to exhaust other possibilities before that.
> So for now, I'll try checking the DB in the way you suggested:
> curl localhost:5984/dbname/_changes?include_docs=true
> curl localhost:5984/dbname/_all_docs?include_docs=true
>
> And then (if it succeeds) I'll try to recover the database using the recover-couchdb script.
>
> As for receiving a copy of the database - as a programmer I understand how that'd help you investigate this issue, but it seems that "the powers that be" aren't so understanding. Even if your software is at the core of their enterprise... But I'm more than willing to compile/run any debug builds you throw at me or learn some basic Erlang debugging if that'd help finding the core of this issue.
>
> Best regards,
> Igor Klimer
> ________________________________________
> Od: Robert Samuel Newson [rnewson@apache.org]
> Wysłano: 23 stycznia 2014 15:40
> Do: user
> Temat: Re: Error during compaction
>
> Database corruption is not esoteric, we take it very seriously. :)
>
> Yes, strictly append only (opened with O_APPEND in posix land, not sure what the equivalent is for Win). This doesn’t stop hardware failure from mangling earlier bits, of course. You have RAID-5, so you have a parity stripe, perhaps you could force a parity check? Abrupt shutdown or hitting end of disk should do no more than cause unflushed updates to be lost, nothing fsync’ed() prior to that should be at risk, by design. Are you running with delayed_commits equal true or false? The difference is that we fsync immediately or at up to a one second delay. In neither case do we omit to make fsync() calls before and after writing each new database footer. There are config options to disable those fsync calls, please let us know if you did that (your local.ini file will tell you).
>
> So, the two causes of true corruption would be 1) a bug in our code and 2) disk corruption or failure to honor fsync() and/or write ordering. Obviously we’re very interested in any occurrence of type 1.
>
> A few things to try to confirm that your database is ok but not compactable (which would be hard to explain)
>
> curl localhost:5984/dbname/_changes?include_docs=true
> curl localhost:5984/dbname/_all_docs?include_docs=true
>
> Let us know if those complete without error (or the errors if there are any).
>
> Finally, would it be possible for the CouchDB development team (or a subset) to receive a copy of the database file for forensic investigation? I believe we can sign NDA’s and the like if that helps.
>
> B.
>
> On 23 Jan 2014, at 13:54, Igor Klimer <i....@getbacksa.pl> wrote:
>
>> No problem - I'm sure that you have more then enough on your plate without users pestering you with such esoteric problems ;)
>>
>> As for the I/O subsystem - on the lowest level it's a couple of RAID5 matrices. Then, there is a Windows Server 2008 managing that and some virtual servers (via Hyper-V), including both of the machines that we had the database on (Windows Server 2008 R2 Enterprise  and now Ubuntu 12 LTS). I was assured that the matrices have been running rock-solid since the moment they've been setup and didn't exhibit any hardware failures. However, the servers could have been abruptly shut down, for example when it run out of free disk space. While NTFS should recover from something like that, I wonder if there isn't a corner case that would cause some database corruption (like you're suggesting).
>>
>> Thanks for the recover-couchdb script - I'll try to give it a go ASAP. We do have backups, but I don't think they are helpful in this case - the database as a whole is working, we don't have any reports of missing files or not working documents. It would be interesting to inspect the backups to see when and, more interestingly, what happened. You mentioned that the database format is strictly append only, so the corruption shouldn't "move" and the "old"/early parts of the database file shouldn't change? Still, that seems like a really time consuming task, so for now I'll try to give the recovery script a go.
>>
>> Thanks again for your help and best regards,
>> Igor Klimer
>>
>> ________________________________________
>> Od: Robert Samuel Newson [rnewson@apache.org]
>> Wysłano: 23 stycznia 2014 13:52
>> Do: user
>> Temat: Re: Error during compaction
>>
>> Hi,
>>
>> Sorry, in turn, for not replying sooner. I’m not really sure what to suggest, it does sound like the database file is corrupt, which is quite hard to do with our strictly append only format. The only oddities here are the use of Windows (and, presumably, NTFS?) and the fact that you did hit end of disk. I’ve not observed corruption when hitting end of disk on other OS/FS combinations, though. Do you happen to know any details of your I/O subsystem that might provide a hint? Could any of the disks themselves have suffered block failures that couldn’t be corrected by disk firmware? Bit flips?Would your disk controller lie about fsync() calls?
>>
>> Those questions will help us figure out how the corruption occurs, but it obviously doesn’t help you fix it. For that, beyond hoping you have backups, I suggest trying https://github.com/jhs/recover-couchdb (perhaps Jason can chip in with how likely this is to still work, given the age). If that doesn’t extract all your data, then the only other suggestion I have is to truncate the database file until it can compact, but this necessarily means losing data.
>>
>> Do other developers have other suggestions here?
>>
>> B.
>>
>> On 21 Jan 2014, at 09:08, Igor Klimer <i....@getbacksa.pl> wrote:
>>
>>> Hi,
>>> I'm extremely sorry for not replying sooner, however I was on sick leave last week.
>>> I've tried your suggestion with an empty .compact file, however the results seem to be the same...
>>> Log: http://pastebin.com/MJCgGM8C
>>>
>>> Started with an empty ecrepo.couch.compact file (touch ecrepo.couch.compact), then after about 3 hours, the error was printed in the logs and the compaction failed:
>>> -rw-r--r-- 1 couchdb couchdb 137502523517 Jan 21 09:51 ecrepo.couch
>>> -rw-r--r-- 1 couchdb couchdb  51692612367 Jan 21 02:07 ecrepo.couch.compact
>>>
>>> There's over 100GB free space available on the disk.
>>>
>>> At least I think I know what the number 51692471440 in log means ;) But I don't know if there's a way to check which document resides at that position in file.
>>>
>>> Best regards,
>>> Igor Klimer
>>>
>>> ________________________________________
>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>> Wysłano: 10 stycznia 2014 18:45
>>> Do: user
>>> Temat: Re: Error during compaction
>>>
>>> Yes, I understood. The empty .compact file will trigger more checking in the compaction process, I’m hoping it gets us past the problem.
>>>
>>> B.
>>>
>>> On 10 Jan 2014, at 13:34, Igor Klimer <i....@getbacksa.pl> wrote:
>>>
>>>> :)
>>>> Just to clarify - the .compact file is getting created and then the compaction fails after some time (an hour or more):
>>>> 1) attempt on Windows with Couchdb 1.2.0 it failed because insufficient disk space. The .compact file had at least 10GB, unfortunately, I don't remember how much (and whether it was bigger then the one produced in the later attempts). There was no free disk space when it failed, so I'm assuming that was the cause.
>>>> 2) attempt on Windows with Couchdb 1.2.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk.
>>>> 3) attempt on Ubuntu with Couchdb 1.5.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk, and judging from the numbers present in the log (ids? node numbers?) it failed at the same moment as attempt #2.
>>>>
>>>> Just wanted to make sure we're on the same page :) Do you still want me to try it with an empty .compact file? (I can do this only during night hours, since I don't want to put too much load on the server during working hours)
>>>>
>>>> Best regards,
>>>> Igor Klimer
>>>> ________________________________________
>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>> Wysłano: 10 stycznia 2014 14:03
>>>> Do: user
>>>> Temat: Re: Error during compaction
>>>>
>>>> Hrm, strike one. Ok. Next thing to try is subtly different. stop couchdb, delete the .compact file, but then make a new, empty .compact file (so ’touch /path/to/dbname.compact’), start couchdb and compact.
>>>>
>>>> B.
>>>>
>>>> On 10 Jan 2014, at 12:42, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>
>>>>> Yes, I've already done that after the very fist attempt at compaction (the one that failed because of lack of disk space). And it resulted in the second fail (on Windows), then the same on Linux - I always deleted the incomplete (about 50% of the database, around 50GB) .compact file before running the compaction again. So I was always doing compaction from scratch.
>>>>>
>>>>> Best regards,
>>>>> Igor Klimer
>>>>> ________________________________________
>>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>>> Wysłano: 10 stycznia 2014 13:08
>>>>> Do: user
>>>>> Temat: Re: Error during compaction
>>>>>
>>>>> Thanks! that’s very useful. Hitting end of disk certainly feels like a cause here. Since the compaction has never completed, I suggest we redo compaction from scratch.
>>>>>
>>>>> 1) stop couchdb
>>>>> 2) delete (or move aside) the dbname.compact file for this database
>>>>> 3) start couchdb
>>>>> 4) compact the db
>>>>>
>>>>> Whether it works or not, please let us know.
>>>>>
>>>>> B.
>>>>>
>>>>> On 10 Jan 2014, at 08:25, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>>
>>>>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>>>>
>>>>>> No, we've been running 1.2.0 from the start (around Oct 2012), then switched to Ubuntu and 1.5.0.
>>>>>>
>>>>>>> Do you have free disk space?
>>>>>> Yes, there's about 150% of the DB's size worth of free space :) I forgot to mention ("OK, here we go, the user will confess to some sin he committed and is ashamed of and is most likely the reason for this failure") that we've run the compaction once before the error on Windows I mentioned below, but it failed because of insufficient disk space - so I double checked before running the compaction again if there's enough space. Here's the log, if it's any helpful: http://pastebin.com/S1URXN0p
>>>>>> Do you think it could have left the database in some corrupted state? It seems it failed at a different part then the two next attempts (and, as far as I understand, compaction is just copying over the database while pruning the old revisions and deleted documents).
>>>>>>
>>>>>> Thank you for your time and help and best regards,
>>>>>> Igor Klimer
>>>>>> ________________________________________
>>>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>>>> Wysłano: 9 stycznia 2014 17:13
>>>>>> Do: user
>>>>>> Temat: Re: Error during compaction
>>>>>>
>>>>>> Do you have free disk space?
>>>>>>
>>>>>> On 9 Jan 2014, at 15:25, Robert Samuel Newson <rn...@apache.org> wrote:
>>>>>>
>>>>>>>
>>>>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>>>>>
>>>>>>> B.
>>>>>>>
>>>>>>> On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
>>>>>>>> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
>>>>>>>> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
>>>>>>>> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>>>>>>>>
>>>>>>>> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Igor Klimer
>>>>>>>>
>>>>>>>> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>




-------------------------------

getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
Numer KRS: 0000413997
NIP: 8992733884
REGON: 021829989

Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł

Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.


Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.


This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Re: Error during compaction

Posted by Igor Klimer <i....@getbacksa.pl>.

Yes, currently it's 1.5.0.
I meant to run _changes and _all_docs last night, but there was some other maintenance tasks scheduled. I'll definitely do it during the weekend and report back.

Best regards,
Igor Klimer
________________________________________
Od: Robert Samuel Newson [rnewson@apache.org]
Wysłano: 23 stycznia 2014 19:28
Do: user
Temat: Re: Error during compaction

Understood on access to the file but I figured I should ask.

The simplest thing will be a small patch to print out the values that CouchDB expected to match but did not. Remind which exact CouchDB version you are running? I think you said you’re up to 1.5.0 now.

B.

On 23 Jan 2014, at 15:37, Igor Klimer <i....@getbacksa.pl> wrote:

> Well, if you put it that way, then yes, (potential) database corruption does seem like something you might be interested in ;)
>
> As far as I can tell, there weren't any options set that could affect fsyncing, in fact there was hardly any customization done at all. Here are the config files from the Windows machine (since that's where the corruption occurred):
> default.ini: http://pastebin.com/kUz0qyNk
> local.ini: http://pastebin.com/srZUMwzB
>
> I've checked http://localhost:5984/_utils/config.html and delayed_commits is set to true (since that's the default).
>
> Parity checking the RAID seems like a good idea, but from what the admins are telling me, they'd need to take it offline, so I want to exhaust other possibilities before that.
> So for now, I'll try checking the DB in the way you suggested:
> curl localhost:5984/dbname/_changes?include_docs=true
> curl localhost:5984/dbname/_all_docs?include_docs=true
>
> And then (if it succeeds) I'll try to recover the database using the recover-couchdb script.
>
> As for receiving a copy of the database - as a programmer I understand how that'd help you investigate this issue, but it seems that "the powers that be" aren't so understanding. Even if your software is at the core of their enterprise... But I'm more than willing to compile/run any debug builds you throw at me or learn some basic Erlang debugging if that'd help finding the core of this issue.
>
> Best regards,
> Igor Klimer
> ________________________________________
> Od: Robert Samuel Newson [rnewson@apache.org]
> Wysłano: 23 stycznia 2014 15:40
> Do: user
> Temat: Re: Error during compaction
>
> Database corruption is not esoteric, we take it very seriously. :)
>
> Yes, strictly append only (opened with O_APPEND in posix land, not sure what the equivalent is for Win). This doesn’t stop hardware failure from mangling earlier bits, of course. You have RAID-5, so you have a parity stripe, perhaps you could force a parity check? Abrupt shutdown or hitting end of disk should do no more than cause unflushed updates to be lost, nothing fsync’ed() prior to that should be at risk, by design. Are you running with delayed_commits equal true or false? The difference is that we fsync immediately or at up to a one second delay. In neither case do we omit to make fsync() calls before and after writing each new database footer. There are config options to disable those fsync calls, please let us know if you did that (your local.ini file will tell you).
>
> So, the two causes of true corruption would be 1) a bug in our code and 2) disk corruption or failure to honor fsync() and/or write ordering. Obviously we’re very interested in any occurrence of type 1.
>
> A few things to try to confirm that your database is ok but not compactable (which would be hard to explain)
>
> curl localhost:5984/dbname/_changes?include_docs=true
> curl localhost:5984/dbname/_all_docs?include_docs=true
>
> Let us know if those complete without error (or the errors if there are any).
>
> Finally, would it be possible for the CouchDB development team (or a subset) to receive a copy of the database file for forensic investigation? I believe we can sign NDA’s and the like if that helps.
>
> B.
>
> On 23 Jan 2014, at 13:54, Igor Klimer <i....@getbacksa.pl> wrote:
>
>> No problem - I'm sure that you have more then enough on your plate without users pestering you with such esoteric problems ;)
>>
>> As for the I/O subsystem - on the lowest level it's a couple of RAID5 matrices. Then, there is a Windows Server 2008 managing that and some virtual servers (via Hyper-V), including both of the machines that we had the database on (Windows Server 2008 R2 Enterprise  and now Ubuntu 12 LTS). I was assured that the matrices have been running rock-solid since the moment they've been setup and didn't exhibit any hardware failures. However, the servers could have been abruptly shut down, for example when it run out of free disk space. While NTFS should recover from something like that, I wonder if there isn't a corner case that would cause some database corruption (like you're suggesting).
>>
>> Thanks for the recover-couchdb script - I'll try to give it a go ASAP. We do have backups, but I don't think they are helpful in this case - the database as a whole is working, we don't have any reports of missing files or not working documents. It would be interesting to inspect the backups to see when and, more interestingly, what happened. You mentioned that the database format is strictly append only, so the corruption shouldn't "move" and the "old"/early parts of the database file shouldn't change? Still, that seems like a really time consuming task, so for now I'll try to give the recovery script a go.
>>
>> Thanks again for your help and best regards,
>> Igor Klimer
>>
>> ________________________________________
>> Od: Robert Samuel Newson [rnewson@apache.org]
>> Wysłano: 23 stycznia 2014 13:52
>> Do: user
>> Temat: Re: Error during compaction
>>
>> Hi,
>>
>> Sorry, in turn, for not replying sooner. I’m not really sure what to suggest, it does sound like the database file is corrupt, which is quite hard to do with our strictly append only format. The only oddities here are the use of Windows (and, presumably, NTFS?) and the fact that you did hit end of disk. I’ve not observed corruption when hitting end of disk on other OS/FS combinations, though. Do you happen to know any details of your I/O subsystem that might provide a hint? Could any of the disks themselves have suffered block failures that couldn’t be corrected by disk firmware? Bit flips?Would your disk controller lie about fsync() calls?
>>
>> Those questions will help us figure out how the corruption occurs, but it obviously doesn’t help you fix it. For that, beyond hoping you have backups, I suggest trying https://github.com/jhs/recover-couchdb (perhaps Jason can chip in with how likely this is to still work, given the age). If that doesn’t extract all your data, then the only other suggestion I have is to truncate the database file until it can compact, but this necessarily means losing data.
>>
>> Do other developers have other suggestions here?
>>
>> B.
>>
>> On 21 Jan 2014, at 09:08, Igor Klimer <i....@getbacksa.pl> wrote:
>>
>>> Hi,
>>> I'm extremely sorry for not replying sooner, however I was on sick leave last week.
>>> I've tried your suggestion with an empty .compact file, however the results seem to be the same...
>>> Log: http://pastebin.com/MJCgGM8C
>>>
>>> Started with an empty ecrepo.couch.compact file (touch ecrepo.couch.compact), then after about 3 hours, the error was printed in the logs and the compaction failed:
>>> -rw-r--r-- 1 couchdb couchdb 137502523517 Jan 21 09:51 ecrepo.couch
>>> -rw-r--r-- 1 couchdb couchdb  51692612367 Jan 21 02:07 ecrepo.couch.compact
>>>
>>> There's over 100GB free space available on the disk.
>>>
>>> At least I think I know what the number 51692471440 in log means ;) But I don't know if there's a way to check which document resides at that position in file.
>>>
>>> Best regards,
>>> Igor Klimer
>>>
>>> ________________________________________
>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>> Wysłano: 10 stycznia 2014 18:45
>>> Do: user
>>> Temat: Re: Error during compaction
>>>
>>> Yes, I understood. The empty .compact file will trigger more checking in the compaction process, I’m hoping it gets us past the problem.
>>>
>>> B.
>>>
>>> On 10 Jan 2014, at 13:34, Igor Klimer <i....@getbacksa.pl> wrote:
>>>
>>>> :)
>>>> Just to clarify - the .compact file is getting created and then the compaction fails after some time (an hour or more):
>>>> 1) attempt on Windows with Couchdb 1.2.0 it failed because insufficient disk space. The .compact file had at least 10GB, unfortunately, I don't remember how much (and whether it was bigger then the one produced in the later attempts). There was no free disk space when it failed, so I'm assuming that was the cause.
>>>> 2) attempt on Windows with Couchdb 1.2.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk.
>>>> 3) attempt on Ubuntu with Couchdb 1.5.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk, and judging from the numbers present in the log (ids? node numbers?) it failed at the same moment as attempt #2.
>>>>
>>>> Just wanted to make sure we're on the same page :) Do you still want me to try it with an empty .compact file? (I can do this only during night hours, since I don't want to put too much load on the server during working hours)
>>>>
>>>> Best regards,
>>>> Igor Klimer
>>>> ________________________________________
>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>> Wysłano: 10 stycznia 2014 14:03
>>>> Do: user
>>>> Temat: Re: Error during compaction
>>>>
>>>> Hrm, strike one. Ok. Next thing to try is subtly different. stop couchdb, delete the .compact file, but then make a new, empty .compact file (so ’touch /path/to/dbname.compact’), start couchdb and compact.
>>>>
>>>> B.
>>>>
>>>> On 10 Jan 2014, at 12:42, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>
>>>>> Yes, I've already done that after the very fist attempt at compaction (the one that failed because of lack of disk space). And it resulted in the second fail (on Windows), then the same on Linux - I always deleted the incomplete (about 50% of the database, around 50GB) .compact file before running the compaction again. So I was always doing compaction from scratch.
>>>>>
>>>>> Best regards,
>>>>> Igor Klimer
>>>>> ________________________________________
>>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>>> Wysłano: 10 stycznia 2014 13:08
>>>>> Do: user
>>>>> Temat: Re: Error during compaction
>>>>>
>>>>> Thanks! that’s very useful. Hitting end of disk certainly feels like a cause here. Since the compaction has never completed, I suggest we redo compaction from scratch.
>>>>>
>>>>> 1) stop couchdb
>>>>> 2) delete (or move aside) the dbname.compact file for this database
>>>>> 3) start couchdb
>>>>> 4) compact the db
>>>>>
>>>>> Whether it works or not, please let us know.
>>>>>
>>>>> B.
>>>>>
>>>>> On 10 Jan 2014, at 08:25, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>>
>>>>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>>>>
>>>>>> No, we've been running 1.2.0 from the start (around Oct 2012), then switched to Ubuntu and 1.5.0.
>>>>>>
>>>>>>> Do you have free disk space?
>>>>>> Yes, there's about 150% of the DB's size worth of free space :) I forgot to mention ("OK, here we go, the user will confess to some sin he committed and is ashamed of and is most likely the reason for this failure") that we've run the compaction once before the error on Windows I mentioned below, but it failed because of insufficient disk space - so I double checked before running the compaction again if there's enough space. Here's the log, if it's any helpful: http://pastebin.com/S1URXN0p
>>>>>> Do you think it could have left the database in some corrupted state? It seems it failed at a different part then the two next attempts (and, as far as I understand, compaction is just copying over the database while pruning the old revisions and deleted documents).
>>>>>>
>>>>>> Thank you for your time and help and best regards,
>>>>>> Igor Klimer
>>>>>> ________________________________________
>>>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>>>> Wysłano: 9 stycznia 2014 17:13
>>>>>> Do: user
>>>>>> Temat: Re: Error during compaction
>>>>>>
>>>>>> Do you have free disk space?
>>>>>>
>>>>>> On 9 Jan 2014, at 15:25, Robert Samuel Newson <rn...@apache.org> wrote:
>>>>>>
>>>>>>>
>>>>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>>>>>
>>>>>>> B.
>>>>>>>
>>>>>>> On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
>>>>>>>> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
>>>>>>>> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
>>>>>>>> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>>>>>>>>
>>>>>>>> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Igor Klimer
>>>>>>>>
>>>>>>>> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>




-------------------------------

getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
Numer KRS: 0000413997
NIP: 8992733884
REGON: 021829989

Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł

Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.


Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.


This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Re: Error during compaction

Posted by Robert Samuel Newson <rn...@apache.org>.

Understood on access to the file but I figured I should ask.

The simplest thing will be a small patch to print out the values that CouchDB expected to match but did not. Remind which exact CouchDB version you are running? I think you said you’re up to 1.5.0 now.

B.

On 23 Jan 2014, at 15:37, Igor Klimer <i....@getbacksa.pl> wrote:

> Well, if you put it that way, then yes, (potential) database corruption does seem like something you might be interested in ;)
> 
> As far as I can tell, there weren't any options set that could affect fsyncing, in fact there was hardly any customization done at all. Here are the config files from the Windows machine (since that's where the corruption occurred):
> default.ini: http://pastebin.com/kUz0qyNk
> local.ini: http://pastebin.com/srZUMwzB
> 
> I've checked http://localhost:5984/_utils/config.html and delayed_commits is set to true (since that's the default).
> 
> Parity checking the RAID seems like a good idea, but from what the admins are telling me, they'd need to take it offline, so I want to exhaust other possibilities before that.
> So for now, I'll try checking the DB in the way you suggested:
> curl localhost:5984/dbname/_changes?include_docs=true
> curl localhost:5984/dbname/_all_docs?include_docs=true
> 
> And then (if it succeeds) I'll try to recover the database using the recover-couchdb script.
> 
> As for receiving a copy of the database - as a programmer I understand how that'd help you investigate this issue, but it seems that "the powers that be" aren't so understanding. Even if your software is at the core of their enterprise... But I'm more than willing to compile/run any debug builds you throw at me or learn some basic Erlang debugging if that'd help finding the core of this issue.
> 
> Best regards,
> Igor Klimer
> ________________________________________
> Od: Robert Samuel Newson [rnewson@apache.org]
> Wysłano: 23 stycznia 2014 15:40
> Do: user
> Temat: Re: Error during compaction
> 
> Database corruption is not esoteric, we take it very seriously. :)
> 
> Yes, strictly append only (opened with O_APPEND in posix land, not sure what the equivalent is for Win). This doesn’t stop hardware failure from mangling earlier bits, of course. You have RAID-5, so you have a parity stripe, perhaps you could force a parity check? Abrupt shutdown or hitting end of disk should do no more than cause unflushed updates to be lost, nothing fsync’ed() prior to that should be at risk, by design. Are you running with delayed_commits equal true or false? The difference is that we fsync immediately or at up to a one second delay. In neither case do we omit to make fsync() calls before and after writing each new database footer. There are config options to disable those fsync calls, please let us know if you did that (your local.ini file will tell you).
> 
> So, the two causes of true corruption would be 1) a bug in our code and 2) disk corruption or failure to honor fsync() and/or write ordering. Obviously we’re very interested in any occurrence of type 1.
> 
> A few things to try to confirm that your database is ok but not compactable (which would be hard to explain)
> 
> curl localhost:5984/dbname/_changes?include_docs=true
> curl localhost:5984/dbname/_all_docs?include_docs=true
> 
> Let us know if those complete without error (or the errors if there are any).
> 
> Finally, would it be possible for the CouchDB development team (or a subset) to receive a copy of the database file for forensic investigation? I believe we can sign NDA’s and the like if that helps.
> 
> B.
> 
> On 23 Jan 2014, at 13:54, Igor Klimer <i....@getbacksa.pl> wrote:
> 
>> No problem - I'm sure that you have more then enough on your plate without users pestering you with such esoteric problems ;)
>> 
>> As for the I/O subsystem - on the lowest level it's a couple of RAID5 matrices. Then, there is a Windows Server 2008 managing that and some virtual servers (via Hyper-V), including both of the machines that we had the database on (Windows Server 2008 R2 Enterprise  and now Ubuntu 12 LTS). I was assured that the matrices have been running rock-solid since the moment they've been setup and didn't exhibit any hardware failures. However, the servers could have been abruptly shut down, for example when it run out of free disk space. While NTFS should recover from something like that, I wonder if there isn't a corner case that would cause some database corruption (like you're suggesting).
>> 
>> Thanks for the recover-couchdb script - I'll try to give it a go ASAP. We do have backups, but I don't think they are helpful in this case - the database as a whole is working, we don't have any reports of missing files or not working documents. It would be interesting to inspect the backups to see when and, more interestingly, what happened. You mentioned that the database format is strictly append only, so the corruption shouldn't "move" and the "old"/early parts of the database file shouldn't change? Still, that seems like a really time consuming task, so for now I'll try to give the recovery script a go.
>> 
>> Thanks again for your help and best regards,
>> Igor Klimer
>> 
>> ________________________________________
>> Od: Robert Samuel Newson [rnewson@apache.org]
>> Wysłano: 23 stycznia 2014 13:52
>> Do: user
>> Temat: Re: Error during compaction
>> 
>> Hi,
>> 
>> Sorry, in turn, for not replying sooner. I’m not really sure what to suggest, it does sound like the database file is corrupt, which is quite hard to do with our strictly append only format. The only oddities here are the use of Windows (and, presumably, NTFS?) and the fact that you did hit end of disk. I’ve not observed corruption when hitting end of disk on other OS/FS combinations, though. Do you happen to know any details of your I/O subsystem that might provide a hint? Could any of the disks themselves have suffered block failures that couldn’t be corrected by disk firmware? Bit flips?Would your disk controller lie about fsync() calls?
>> 
>> Those questions will help us figure out how the corruption occurs, but it obviously doesn’t help you fix it. For that, beyond hoping you have backups, I suggest trying https://github.com/jhs/recover-couchdb (perhaps Jason can chip in with how likely this is to still work, given the age). If that doesn’t extract all your data, then the only other suggestion I have is to truncate the database file until it can compact, but this necessarily means losing data.
>> 
>> Do other developers have other suggestions here?
>> 
>> B.
>> 
>> On 21 Jan 2014, at 09:08, Igor Klimer <i....@getbacksa.pl> wrote:
>> 
>>> Hi,
>>> I'm extremely sorry for not replying sooner, however I was on sick leave last week.
>>> I've tried your suggestion with an empty .compact file, however the results seem to be the same...
>>> Log: http://pastebin.com/MJCgGM8C
>>> 
>>> Started with an empty ecrepo.couch.compact file (touch ecrepo.couch.compact), then after about 3 hours, the error was printed in the logs and the compaction failed:
>>> -rw-r--r-- 1 couchdb couchdb 137502523517 Jan 21 09:51 ecrepo.couch
>>> -rw-r--r-- 1 couchdb couchdb  51692612367 Jan 21 02:07 ecrepo.couch.compact
>>> 
>>> There's over 100GB free space available on the disk.
>>> 
>>> At least I think I know what the number 51692471440 in log means ;) But I don't know if there's a way to check which document resides at that position in file.
>>> 
>>> Best regards,
>>> Igor Klimer
>>> 
>>> ________________________________________
>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>> Wysłano: 10 stycznia 2014 18:45
>>> Do: user
>>> Temat: Re: Error during compaction
>>> 
>>> Yes, I understood. The empty .compact file will trigger more checking in the compaction process, I’m hoping it gets us past the problem.
>>> 
>>> B.
>>> 
>>> On 10 Jan 2014, at 13:34, Igor Klimer <i....@getbacksa.pl> wrote:
>>> 
>>>> :)
>>>> Just to clarify - the .compact file is getting created and then the compaction fails after some time (an hour or more):
>>>> 1) attempt on Windows with Couchdb 1.2.0 it failed because insufficient disk space. The .compact file had at least 10GB, unfortunately, I don't remember how much (and whether it was bigger then the one produced in the later attempts). There was no free disk space when it failed, so I'm assuming that was the cause.
>>>> 2) attempt on Windows with Couchdb 1.2.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk.
>>>> 3) attempt on Ubuntu with Couchdb 1.5.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk, and judging from the numbers present in the log (ids? node numbers?) it failed at the same moment as attempt #2.
>>>> 
>>>> Just wanted to make sure we're on the same page :) Do you still want me to try it with an empty .compact file? (I can do this only during night hours, since I don't want to put too much load on the server during working hours)
>>>> 
>>>> Best regards,
>>>> Igor Klimer
>>>> ________________________________________
>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>> Wysłano: 10 stycznia 2014 14:03
>>>> Do: user
>>>> Temat: Re: Error during compaction
>>>> 
>>>> Hrm, strike one. Ok. Next thing to try is subtly different. stop couchdb, delete the .compact file, but then make a new, empty .compact file (so ’touch /path/to/dbname.compact’), start couchdb and compact.
>>>> 
>>>> B.
>>>> 
>>>> On 10 Jan 2014, at 12:42, Igor Klimer <i....@getbacksa.pl> wrote:
>>>> 
>>>>> Yes, I've already done that after the very fist attempt at compaction (the one that failed because of lack of disk space). And it resulted in the second fail (on Windows), then the same on Linux - I always deleted the incomplete (about 50% of the database, around 50GB) .compact file before running the compaction again. So I was always doing compaction from scratch.
>>>>> 
>>>>> Best regards,
>>>>> Igor Klimer
>>>>> ________________________________________
>>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>>> Wysłano: 10 stycznia 2014 13:08
>>>>> Do: user
>>>>> Temat: Re: Error during compaction
>>>>> 
>>>>> Thanks! that’s very useful. Hitting end of disk certainly feels like a cause here. Since the compaction has never completed, I suggest we redo compaction from scratch.
>>>>> 
>>>>> 1) stop couchdb
>>>>> 2) delete (or move aside) the dbname.compact file for this database
>>>>> 3) start couchdb
>>>>> 4) compact the db
>>>>> 
>>>>> Whether it works or not, please let us know.
>>>>> 
>>>>> B.
>>>>> 
>>>>> On 10 Jan 2014, at 08:25, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>> 
>>>>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>>>> 
>>>>>> No, we've been running 1.2.0 from the start (around Oct 2012), then switched to Ubuntu and 1.5.0.
>>>>>> 
>>>>>>> Do you have free disk space?
>>>>>> Yes, there's about 150% of the DB's size worth of free space :) I forgot to mention ("OK, here we go, the user will confess to some sin he committed and is ashamed of and is most likely the reason for this failure") that we've run the compaction once before the error on Windows I mentioned below, but it failed because of insufficient disk space - so I double checked before running the compaction again if there's enough space. Here's the log, if it's any helpful: http://pastebin.com/S1URXN0p
>>>>>> Do you think it could have left the database in some corrupted state? It seems it failed at a different part then the two next attempts (and, as far as I understand, compaction is just copying over the database while pruning the old revisions and deleted documents).
>>>>>> 
>>>>>> Thank you for your time and help and best regards,
>>>>>> Igor Klimer
>>>>>> ________________________________________
>>>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>>>> Wysłano: 9 stycznia 2014 17:13
>>>>>> Do: user
>>>>>> Temat: Re: Error during compaction
>>>>>> 
>>>>>> Do you have free disk space?
>>>>>> 
>>>>>> On 9 Jan 2014, at 15:25, Robert Samuel Newson <rn...@apache.org> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>>>>> 
>>>>>>> B.
>>>>>>> 
>>>>>>> On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
>>>>>>>> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
>>>>>>>> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
>>>>>>>> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>>>>>>>> 
>>>>>>>> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
>>>>>>>> 
>>>>>>>> Best regards,
>>>>>>>> Igor Klimer
>>>>>>>> 
>>>>>>>> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
> 
> 
> 
> 
> -------------------------------
> 
> getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
> Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
> Numer KRS: 0000413997
> NIP: 8992733884
> REGON: 021829989
> 
> Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł
> 
> Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.
> 
> 
> Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.
> 
> 
> This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Re: Error during compaction

Posted by Igor Klimer <i....@getbacksa.pl>.

Well, if you put it that way, then yes, (potential) database corruption does seem like something you might be interested in ;)

As far as I can tell, there weren't any options set that could affect fsyncing, in fact there was hardly any customization done at all. Here are the config files from the Windows machine (since that's where the corruption occurred):
default.ini: http://pastebin.com/kUz0qyNk
local.ini: http://pastebin.com/srZUMwzB

I've checked http://localhost:5984/_utils/config.html and delayed_commits is set to true (since that's the default).

Parity checking the RAID seems like a good idea, but from what the admins are telling me, they'd need to take it offline, so I want to exhaust other possibilities before that.
So for now, I'll try checking the DB in the way you suggested:
curl localhost:5984/dbname/_changes?include_docs=true
curl localhost:5984/dbname/_all_docs?include_docs=true

And then (if it succeeds) I'll try to recover the database using the recover-couchdb script.

As for receiving a copy of the database - as a programmer I understand how that'd help you investigate this issue, but it seems that "the powers that be" aren't so understanding. Even if your software is at the core of their enterprise... But I'm more than willing to compile/run any debug builds you throw at me or learn some basic Erlang debugging if that'd help finding the core of this issue.

Best regards,
Igor Klimer
________________________________________
Od: Robert Samuel Newson [rnewson@apache.org]
Wysłano: 23 stycznia 2014 15:40
Do: user
Temat: Re: Error during compaction

Database corruption is not esoteric, we take it very seriously. :)

Yes, strictly append only (opened with O_APPEND in posix land, not sure what the equivalent is for Win). This doesn’t stop hardware failure from mangling earlier bits, of course. You have RAID-5, so you have a parity stripe, perhaps you could force a parity check? Abrupt shutdown or hitting end of disk should do no more than cause unflushed updates to be lost, nothing fsync’ed() prior to that should be at risk, by design. Are you running with delayed_commits equal true or false? The difference is that we fsync immediately or at up to a one second delay. In neither case do we omit to make fsync() calls before and after writing each new database footer. There are config options to disable those fsync calls, please let us know if you did that (your local.ini file will tell you).

So, the two causes of true corruption would be 1) a bug in our code and 2) disk corruption or failure to honor fsync() and/or write ordering. Obviously we’re very interested in any occurrence of type 1.

A few things to try to confirm that your database is ok but not compactable (which would be hard to explain)

curl localhost:5984/dbname/_changes?include_docs=true
curl localhost:5984/dbname/_all_docs?include_docs=true

Let us know if those complete without error (or the errors if there are any).

Finally, would it be possible for the CouchDB development team (or a subset) to receive a copy of the database file for forensic investigation? I believe we can sign NDA’s and the like if that helps.

B.

On 23 Jan 2014, at 13:54, Igor Klimer <i....@getbacksa.pl> wrote:

> No problem - I'm sure that you have more then enough on your plate without users pestering you with such esoteric problems ;)
>
> As for the I/O subsystem - on the lowest level it's a couple of RAID5 matrices. Then, there is a Windows Server 2008 managing that and some virtual servers (via Hyper-V), including both of the machines that we had the database on (Windows Server 2008 R2 Enterprise  and now Ubuntu 12 LTS). I was assured that the matrices have been running rock-solid since the moment they've been setup and didn't exhibit any hardware failures. However, the servers could have been abruptly shut down, for example when it run out of free disk space. While NTFS should recover from something like that, I wonder if there isn't a corner case that would cause some database corruption (like you're suggesting).
>
> Thanks for the recover-couchdb script - I'll try to give it a go ASAP. We do have backups, but I don't think they are helpful in this case - the database as a whole is working, we don't have any reports of missing files or not working documents. It would be interesting to inspect the backups to see when and, more interestingly, what happened. You mentioned that the database format is strictly append only, so the corruption shouldn't "move" and the "old"/early parts of the database file shouldn't change? Still, that seems like a really time consuming task, so for now I'll try to give the recovery script a go.
>
> Thanks again for your help and best regards,
> Igor Klimer
>
> ________________________________________
> Od: Robert Samuel Newson [rnewson@apache.org]
> Wysłano: 23 stycznia 2014 13:52
> Do: user
> Temat: Re: Error during compaction
>
> Hi,
>
> Sorry, in turn, for not replying sooner. I’m not really sure what to suggest, it does sound like the database file is corrupt, which is quite hard to do with our strictly append only format. The only oddities here are the use of Windows (and, presumably, NTFS?) and the fact that you did hit end of disk. I’ve not observed corruption when hitting end of disk on other OS/FS combinations, though. Do you happen to know any details of your I/O subsystem that might provide a hint? Could any of the disks themselves have suffered block failures that couldn’t be corrected by disk firmware? Bit flips?Would your disk controller lie about fsync() calls?
>
> Those questions will help us figure out how the corruption occurs, but it obviously doesn’t help you fix it. For that, beyond hoping you have backups, I suggest trying https://github.com/jhs/recover-couchdb (perhaps Jason can chip in with how likely this is to still work, given the age). If that doesn’t extract all your data, then the only other suggestion I have is to truncate the database file until it can compact, but this necessarily means losing data.
>
> Do other developers have other suggestions here?
>
> B.
>
> On 21 Jan 2014, at 09:08, Igor Klimer <i....@getbacksa.pl> wrote:
>
>> Hi,
>> I'm extremely sorry for not replying sooner, however I was on sick leave last week.
>> I've tried your suggestion with an empty .compact file, however the results seem to be the same...
>> Log: http://pastebin.com/MJCgGM8C
>>
>> Started with an empty ecrepo.couch.compact file (touch ecrepo.couch.compact), then after about 3 hours, the error was printed in the logs and the compaction failed:
>> -rw-r--r-- 1 couchdb couchdb 137502523517 Jan 21 09:51 ecrepo.couch
>> -rw-r--r-- 1 couchdb couchdb  51692612367 Jan 21 02:07 ecrepo.couch.compact
>>
>> There's over 100GB free space available on the disk.
>>
>> At least I think I know what the number 51692471440 in log means ;) But I don't know if there's a way to check which document resides at that position in file.
>>
>> Best regards,
>> Igor Klimer
>>
>> ________________________________________
>> Od: Robert Samuel Newson [rnewson@apache.org]
>> Wysłano: 10 stycznia 2014 18:45
>> Do: user
>> Temat: Re: Error during compaction
>>
>> Yes, I understood. The empty .compact file will trigger more checking in the compaction process, I’m hoping it gets us past the problem.
>>
>> B.
>>
>> On 10 Jan 2014, at 13:34, Igor Klimer <i....@getbacksa.pl> wrote:
>>
>>> :)
>>> Just to clarify - the .compact file is getting created and then the compaction fails after some time (an hour or more):
>>> 1) attempt on Windows with Couchdb 1.2.0 it failed because insufficient disk space. The .compact file had at least 10GB, unfortunately, I don't remember how much (and whether it was bigger then the one produced in the later attempts). There was no free disk space when it failed, so I'm assuming that was the cause.
>>> 2) attempt on Windows with Couchdb 1.2.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk.
>>> 3) attempt on Ubuntu with Couchdb 1.5.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk, and judging from the numbers present in the log (ids? node numbers?) it failed at the same moment as attempt #2.
>>>
>>> Just wanted to make sure we're on the same page :) Do you still want me to try it with an empty .compact file? (I can do this only during night hours, since I don't want to put too much load on the server during working hours)
>>>
>>> Best regards,
>>> Igor Klimer
>>> ________________________________________
>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>> Wysłano: 10 stycznia 2014 14:03
>>> Do: user
>>> Temat: Re: Error during compaction
>>>
>>> Hrm, strike one. Ok. Next thing to try is subtly different. stop couchdb, delete the .compact file, but then make a new, empty .compact file (so ’touch /path/to/dbname.compact’), start couchdb and compact.
>>>
>>> B.
>>>
>>> On 10 Jan 2014, at 12:42, Igor Klimer <i....@getbacksa.pl> wrote:
>>>
>>>> Yes, I've already done that after the very fist attempt at compaction (the one that failed because of lack of disk space). And it resulted in the second fail (on Windows), then the same on Linux - I always deleted the incomplete (about 50% of the database, around 50GB) .compact file before running the compaction again. So I was always doing compaction from scratch.
>>>>
>>>> Best regards,
>>>> Igor Klimer
>>>> ________________________________________
>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>> Wysłano: 10 stycznia 2014 13:08
>>>> Do: user
>>>> Temat: Re: Error during compaction
>>>>
>>>> Thanks! that’s very useful. Hitting end of disk certainly feels like a cause here. Since the compaction has never completed, I suggest we redo compaction from scratch.
>>>>
>>>> 1) stop couchdb
>>>> 2) delete (or move aside) the dbname.compact file for this database
>>>> 3) start couchdb
>>>> 4) compact the db
>>>>
>>>> Whether it works or not, please let us know.
>>>>
>>>> B.
>>>>
>>>> On 10 Jan 2014, at 08:25, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>
>>>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>>>
>>>>> No, we've been running 1.2.0 from the start (around Oct 2012), then switched to Ubuntu and 1.5.0.
>>>>>
>>>>>> Do you have free disk space?
>>>>> Yes, there's about 150% of the DB's size worth of free space :) I forgot to mention ("OK, here we go, the user will confess to some sin he committed and is ashamed of and is most likely the reason for this failure") that we've run the compaction once before the error on Windows I mentioned below, but it failed because of insufficient disk space - so I double checked before running the compaction again if there's enough space. Here's the log, if it's any helpful: http://pastebin.com/S1URXN0p
>>>>> Do you think it could have left the database in some corrupted state? It seems it failed at a different part then the two next attempts (and, as far as I understand, compaction is just copying over the database while pruning the old revisions and deleted documents).
>>>>>
>>>>> Thank you for your time and help and best regards,
>>>>> Igor Klimer
>>>>> ________________________________________
>>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>>> Wysłano: 9 stycznia 2014 17:13
>>>>> Do: user
>>>>> Temat: Re: Error during compaction
>>>>>
>>>>> Do you have free disk space?
>>>>>
>>>>> On 9 Jan 2014, at 15:25, Robert Samuel Newson <rn...@apache.org> wrote:
>>>>>
>>>>>>
>>>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>>>>
>>>>>> B.
>>>>>>
>>>>>> On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
>>>>>>> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
>>>>>>> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
>>>>>>> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>>>>>>>
>>>>>>> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Igor Klimer
>>>>>>>
>>>>>>> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>




-------------------------------

getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
Numer KRS: 0000413997
NIP: 8992733884
REGON: 021829989

Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł

Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.


Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.


This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Re: Error during compaction

Posted by Robert Samuel Newson <rn...@apache.org>.

Database corruption is not esoteric, we take it very seriously. :)

Yes, strictly append only (opened with O_APPEND in posix land, not sure what the equivalent is for Win). This doesn’t stop hardware failure from mangling earlier bits, of course. You have RAID-5, so you have a parity stripe, perhaps you could force a parity check? Abrupt shutdown or hitting end of disk should do no more than cause unflushed updates to be lost, nothing fsync’ed() prior to that should be at risk, by design. Are you running with delayed_commits equal true or false? The difference is that we fsync immediately or at up to a one second delay. In neither case do we omit to make fsync() calls before and after writing each new database footer. There are config options to disable those fsync calls, please let us know if you did that (your local.ini file will tell you).

So, the two causes of true corruption would be 1) a bug in our code and 2) disk corruption or failure to honor fsync() and/or write ordering. Obviously we’re very interested in any occurrence of type 1.

A few things to try to confirm that your database is ok but not compactable (which would be hard to explain)

curl localhost:5984/dbname/_changes?include_docs=true
curl localhost:5984/dbname/_all_docs?include_docs=true

Let us know if those complete without error (or the errors if there are any).

Finally, would it be possible for the CouchDB development team (or a subset) to receive a copy of the database file for forensic investigation? I believe we can sign NDA’s and the like if that helps.

B.

On 23 Jan 2014, at 13:54, Igor Klimer <i....@getbacksa.pl> wrote:

> No problem - I'm sure that you have more then enough on your plate without users pestering you with such esoteric problems ;)
> 
> As for the I/O subsystem - on the lowest level it's a couple of RAID5 matrices. Then, there is a Windows Server 2008 managing that and some virtual servers (via Hyper-V), including both of the machines that we had the database on (Windows Server 2008 R2 Enterprise  and now Ubuntu 12 LTS). I was assured that the matrices have been running rock-solid since the moment they've been setup and didn't exhibit any hardware failures. However, the servers could have been abruptly shut down, for example when it run out of free disk space. While NTFS should recover from something like that, I wonder if there isn't a corner case that would cause some database corruption (like you're suggesting).
> 
> Thanks for the recover-couchdb script - I'll try to give it a go ASAP. We do have backups, but I don't think they are helpful in this case - the database as a whole is working, we don't have any reports of missing files or not working documents. It would be interesting to inspect the backups to see when and, more interestingly, what happened. You mentioned that the database format is strictly append only, so the corruption shouldn't "move" and the "old"/early parts of the database file shouldn't change? Still, that seems like a really time consuming task, so for now I'll try to give the recovery script a go.
> 
> Thanks again for your help and best regards,
> Igor Klimer
> 
> ________________________________________
> Od: Robert Samuel Newson [rnewson@apache.org]
> Wysłano: 23 stycznia 2014 13:52
> Do: user
> Temat: Re: Error during compaction
> 
> Hi,
> 
> Sorry, in turn, for not replying sooner. I’m not really sure what to suggest, it does sound like the database file is corrupt, which is quite hard to do with our strictly append only format. The only oddities here are the use of Windows (and, presumably, NTFS?) and the fact that you did hit end of disk. I’ve not observed corruption when hitting end of disk on other OS/FS combinations, though. Do you happen to know any details of your I/O subsystem that might provide a hint? Could any of the disks themselves have suffered block failures that couldn’t be corrected by disk firmware? Bit flips?Would your disk controller lie about fsync() calls?
> 
> Those questions will help us figure out how the corruption occurs, but it obviously doesn’t help you fix it. For that, beyond hoping you have backups, I suggest trying https://github.com/jhs/recover-couchdb (perhaps Jason can chip in with how likely this is to still work, given the age). If that doesn’t extract all your data, then the only other suggestion I have is to truncate the database file until it can compact, but this necessarily means losing data.
> 
> Do other developers have other suggestions here?
> 
> B.
> 
> On 21 Jan 2014, at 09:08, Igor Klimer <i....@getbacksa.pl> wrote:
> 
>> Hi,
>> I'm extremely sorry for not replying sooner, however I was on sick leave last week.
>> I've tried your suggestion with an empty .compact file, however the results seem to be the same...
>> Log: http://pastebin.com/MJCgGM8C
>> 
>> Started with an empty ecrepo.couch.compact file (touch ecrepo.couch.compact), then after about 3 hours, the error was printed in the logs and the compaction failed:
>> -rw-r--r-- 1 couchdb couchdb 137502523517 Jan 21 09:51 ecrepo.couch
>> -rw-r--r-- 1 couchdb couchdb  51692612367 Jan 21 02:07 ecrepo.couch.compact
>> 
>> There's over 100GB free space available on the disk.
>> 
>> At least I think I know what the number 51692471440 in log means ;) But I don't know if there's a way to check which document resides at that position in file.
>> 
>> Best regards,
>> Igor Klimer
>> 
>> ________________________________________
>> Od: Robert Samuel Newson [rnewson@apache.org]
>> Wysłano: 10 stycznia 2014 18:45
>> Do: user
>> Temat: Re: Error during compaction
>> 
>> Yes, I understood. The empty .compact file will trigger more checking in the compaction process, I’m hoping it gets us past the problem.
>> 
>> B.
>> 
>> On 10 Jan 2014, at 13:34, Igor Klimer <i....@getbacksa.pl> wrote:
>> 
>>> :)
>>> Just to clarify - the .compact file is getting created and then the compaction fails after some time (an hour or more):
>>> 1) attempt on Windows with Couchdb 1.2.0 it failed because insufficient disk space. The .compact file had at least 10GB, unfortunately, I don't remember how much (and whether it was bigger then the one produced in the later attempts). There was no free disk space when it failed, so I'm assuming that was the cause.
>>> 2) attempt on Windows with Couchdb 1.2.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk.
>>> 3) attempt on Ubuntu with Couchdb 1.5.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk, and judging from the numbers present in the log (ids? node numbers?) it failed at the same moment as attempt #2.
>>> 
>>> Just wanted to make sure we're on the same page :) Do you still want me to try it with an empty .compact file? (I can do this only during night hours, since I don't want to put too much load on the server during working hours)
>>> 
>>> Best regards,
>>> Igor Klimer
>>> ________________________________________
>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>> Wysłano: 10 stycznia 2014 14:03
>>> Do: user
>>> Temat: Re: Error during compaction
>>> 
>>> Hrm, strike one. Ok. Next thing to try is subtly different. stop couchdb, delete the .compact file, but then make a new, empty .compact file (so ’touch /path/to/dbname.compact’), start couchdb and compact.
>>> 
>>> B.
>>> 
>>> On 10 Jan 2014, at 12:42, Igor Klimer <i....@getbacksa.pl> wrote:
>>> 
>>>> Yes, I've already done that after the very fist attempt at compaction (the one that failed because of lack of disk space). And it resulted in the second fail (on Windows), then the same on Linux - I always deleted the incomplete (about 50% of the database, around 50GB) .compact file before running the compaction again. So I was always doing compaction from scratch.
>>>> 
>>>> Best regards,
>>>> Igor Klimer
>>>> ________________________________________
>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>> Wysłano: 10 stycznia 2014 13:08
>>>> Do: user
>>>> Temat: Re: Error during compaction
>>>> 
>>>> Thanks! that’s very useful. Hitting end of disk certainly feels like a cause here. Since the compaction has never completed, I suggest we redo compaction from scratch.
>>>> 
>>>> 1) stop couchdb
>>>> 2) delete (or move aside) the dbname.compact file for this database
>>>> 3) start couchdb
>>>> 4) compact the db
>>>> 
>>>> Whether it works or not, please let us know.
>>>> 
>>>> B.
>>>> 
>>>> On 10 Jan 2014, at 08:25, Igor Klimer <i....@getbacksa.pl> wrote:
>>>> 
>>>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>>> 
>>>>> No, we've been running 1.2.0 from the start (around Oct 2012), then switched to Ubuntu and 1.5.0.
>>>>> 
>>>>>> Do you have free disk space?
>>>>> Yes, there's about 150% of the DB's size worth of free space :) I forgot to mention ("OK, here we go, the user will confess to some sin he committed and is ashamed of and is most likely the reason for this failure") that we've run the compaction once before the error on Windows I mentioned below, but it failed because of insufficient disk space - so I double checked before running the compaction again if there's enough space. Here's the log, if it's any helpful: http://pastebin.com/S1URXN0p
>>>>> Do you think it could have left the database in some corrupted state? It seems it failed at a different part then the two next attempts (and, as far as I understand, compaction is just copying over the database while pruning the old revisions and deleted documents).
>>>>> 
>>>>> Thank you for your time and help and best regards,
>>>>> Igor Klimer
>>>>> ________________________________________
>>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>>> Wysłano: 9 stycznia 2014 17:13
>>>>> Do: user
>>>>> Temat: Re: Error during compaction
>>>>> 
>>>>> Do you have free disk space?
>>>>> 
>>>>> On 9 Jan 2014, at 15:25, Robert Samuel Newson <rn...@apache.org> wrote:
>>>>> 
>>>>>> 
>>>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>>>> 
>>>>>> B.
>>>>>> 
>>>>>> On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>>> 
>>>>>>> Hi all,
>>>>>>> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
>>>>>>> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
>>>>>>> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
>>>>>>> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>>>>>>> 
>>>>>>> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> Igor Klimer
>>>>>>> 
>>>>>>> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
> 
> 
> 
> 
> -------------------------------
> 
> getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
> Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
> Numer KRS: 0000413997
> NIP: 8992733884
> REGON: 021829989
> 
> Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł
> 
> Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.
> 
> 
> Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.
> 
> 
> This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Re: Error during compaction

Posted by Igor Klimer <i....@getbacksa.pl>.

No problem - I'm sure that you have more then enough on your plate without users pestering you with such esoteric problems ;)

As for the I/O subsystem - on the lowest level it's a couple of RAID5 matrices. Then, there is a Windows Server 2008 managing that and some virtual servers (via Hyper-V), including both of the machines that we had the database on (Windows Server 2008 R2 Enterprise  and now Ubuntu 12 LTS). I was assured that the matrices have been running rock-solid since the moment they've been setup and didn't exhibit any hardware failures. However, the servers could have been abruptly shut down, for example when it run out of free disk space. While NTFS should recover from something like that, I wonder if there isn't a corner case that would cause some database corruption (like you're suggesting).

Thanks for the recover-couchdb script - I'll try to give it a go ASAP. We do have backups, but I don't think they are helpful in this case - the database as a whole is working, we don't have any reports of missing files or not working documents. It would be interesting to inspect the backups to see when and, more interestingly, what happened. You mentioned that the database format is strictly append only, so the corruption shouldn't "move" and the "old"/early parts of the database file shouldn't change? Still, that seems like a really time consuming task, so for now I'll try to give the recovery script a go.

Thanks again for your help and best regards,
Igor Klimer

________________________________________
Od: Robert Samuel Newson [rnewson@apache.org]
Wysłano: 23 stycznia 2014 13:52
Do: user
Temat: Re: Error during compaction

Hi,

Sorry, in turn, for not replying sooner. I’m not really sure what to suggest, it does sound like the database file is corrupt, which is quite hard to do with our strictly append only format. The only oddities here are the use of Windows (and, presumably, NTFS?) and the fact that you did hit end of disk. I’ve not observed corruption when hitting end of disk on other OS/FS combinations, though. Do you happen to know any details of your I/O subsystem that might provide a hint? Could any of the disks themselves have suffered block failures that couldn’t be corrected by disk firmware? Bit flips?Would your disk controller lie about fsync() calls?

Those questions will help us figure out how the corruption occurs, but it obviously doesn’t help you fix it. For that, beyond hoping you have backups, I suggest trying https://github.com/jhs/recover-couchdb (perhaps Jason can chip in with how likely this is to still work, given the age). If that doesn’t extract all your data, then the only other suggestion I have is to truncate the database file until it can compact, but this necessarily means losing data.

Do other developers have other suggestions here?

B.

On 21 Jan 2014, at 09:08, Igor Klimer <i....@getbacksa.pl> wrote:

> Hi,
> I'm extremely sorry for not replying sooner, however I was on sick leave last week.
> I've tried your suggestion with an empty .compact file, however the results seem to be the same...
> Log: http://pastebin.com/MJCgGM8C
>
> Started with an empty ecrepo.couch.compact file (touch ecrepo.couch.compact), then after about 3 hours, the error was printed in the logs and the compaction failed:
> -rw-r--r-- 1 couchdb couchdb 137502523517 Jan 21 09:51 ecrepo.couch
> -rw-r--r-- 1 couchdb couchdb  51692612367 Jan 21 02:07 ecrepo.couch.compact
>
> There's over 100GB free space available on the disk.
>
> At least I think I know what the number 51692471440 in log means ;) But I don't know if there's a way to check which document resides at that position in file.
>
> Best regards,
> Igor Klimer
>
> ________________________________________
> Od: Robert Samuel Newson [rnewson@apache.org]
> Wysłano: 10 stycznia 2014 18:45
> Do: user
> Temat: Re: Error during compaction
>
> Yes, I understood. The empty .compact file will trigger more checking in the compaction process, I’m hoping it gets us past the problem.
>
> B.
>
> On 10 Jan 2014, at 13:34, Igor Klimer <i....@getbacksa.pl> wrote:
>
>> :)
>> Just to clarify - the .compact file is getting created and then the compaction fails after some time (an hour or more):
>> 1) attempt on Windows with Couchdb 1.2.0 it failed because insufficient disk space. The .compact file had at least 10GB, unfortunately, I don't remember how much (and whether it was bigger then the one produced in the later attempts). There was no free disk space when it failed, so I'm assuming that was the cause.
>> 2) attempt on Windows with Couchdb 1.2.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk.
>> 3) attempt on Ubuntu with Couchdb 1.5.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk, and judging from the numbers present in the log (ids? node numbers?) it failed at the same moment as attempt #2.
>>
>> Just wanted to make sure we're on the same page :) Do you still want me to try it with an empty .compact file? (I can do this only during night hours, since I don't want to put too much load on the server during working hours)
>>
>> Best regards,
>> Igor Klimer
>> ________________________________________
>> Od: Robert Samuel Newson [rnewson@apache.org]
>> Wysłano: 10 stycznia 2014 14:03
>> Do: user
>> Temat: Re: Error during compaction
>>
>> Hrm, strike one. Ok. Next thing to try is subtly different. stop couchdb, delete the .compact file, but then make a new, empty .compact file (so ’touch /path/to/dbname.compact’), start couchdb and compact.
>>
>> B.
>>
>> On 10 Jan 2014, at 12:42, Igor Klimer <i....@getbacksa.pl> wrote:
>>
>>> Yes, I've already done that after the very fist attempt at compaction (the one that failed because of lack of disk space). And it resulted in the second fail (on Windows), then the same on Linux - I always deleted the incomplete (about 50% of the database, around 50GB) .compact file before running the compaction again. So I was always doing compaction from scratch.
>>>
>>> Best regards,
>>> Igor Klimer
>>> ________________________________________
>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>> Wysłano: 10 stycznia 2014 13:08
>>> Do: user
>>> Temat: Re: Error during compaction
>>>
>>> Thanks! that’s very useful. Hitting end of disk certainly feels like a cause here. Since the compaction has never completed, I suggest we redo compaction from scratch.
>>>
>>> 1) stop couchdb
>>> 2) delete (or move aside) the dbname.compact file for this database
>>> 3) start couchdb
>>> 4) compact the db
>>>
>>> Whether it works or not, please let us know.
>>>
>>> B.
>>>
>>> On 10 Jan 2014, at 08:25, Igor Klimer <i....@getbacksa.pl> wrote:
>>>
>>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>>
>>>> No, we've been running 1.2.0 from the start (around Oct 2012), then switched to Ubuntu and 1.5.0.
>>>>
>>>>> Do you have free disk space?
>>>> Yes, there's about 150% of the DB's size worth of free space :) I forgot to mention ("OK, here we go, the user will confess to some sin he committed and is ashamed of and is most likely the reason for this failure") that we've run the compaction once before the error on Windows I mentioned below, but it failed because of insufficient disk space - so I double checked before running the compaction again if there's enough space. Here's the log, if it's any helpful: http://pastebin.com/S1URXN0p
>>>> Do you think it could have left the database in some corrupted state? It seems it failed at a different part then the two next attempts (and, as far as I understand, compaction is just copying over the database while pruning the old revisions and deleted documents).
>>>>
>>>> Thank you for your time and help and best regards,
>>>> Igor Klimer
>>>> ________________________________________
>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>> Wysłano: 9 stycznia 2014 17:13
>>>> Do: user
>>>> Temat: Re: Error during compaction
>>>>
>>>> Do you have free disk space?
>>>>
>>>> On 9 Jan 2014, at 15:25, Robert Samuel Newson <rn...@apache.org> wrote:
>>>>
>>>>>
>>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>>>
>>>>> B.
>>>>>
>>>>> On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>>
>>>>>> Hi all,
>>>>>> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
>>>>>> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
>>>>>> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
>>>>>> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>>>>>>
>>>>>> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
>>>>>>
>>>>>> Best regards,
>>>>>> Igor Klimer
>>>>>>
>>>>>> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
>>>>>>
>>>>>>
>>>>>>
>>>>>>

-------------------------------

getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
Numer KRS: 0000413997
NIP: 8992733884
REGON: 021829989

Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł

Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.

Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.

This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Re: Error during compaction

Posted by Robert Samuel Newson <rn...@apache.org>.

Hi,

Sorry, in turn, for not replying sooner. I’m not really sure what to suggest, it does sound like the database file is corrupt, which is quite hard to do with our strictly append only format. The only oddities here are the use of Windows (and, presumably, NTFS?) and the fact that you did hit end of disk. I’ve not observed corruption when hitting end of disk on other OS/FS combinations, though. Do you happen to know any details of your I/O subsystem that might provide a hint? Could any of the disks themselves have suffered block failures that couldn’t be corrected by disk firmware? Bit flips?Would your disk controller lie about fsync() calls?

Those questions will help us figure out how the corruption occurs, but it obviously doesn’t help you fix it. For that, beyond hoping you have backups, I suggest trying https://github.com/jhs/recover-couchdb (perhaps Jason can chip in with how likely this is to still work, given the age). If that doesn’t extract all your data, then the only other suggestion I have is to truncate the database file until it can compact, but this necessarily means losing data.

Do other developers have other suggestions here?

B.

On 21 Jan 2014, at 09:08, Igor Klimer <i....@getbacksa.pl> wrote:

> Hi,
> I'm extremely sorry for not replying sooner, however I was on sick leave last week.
> I've tried your suggestion with an empty .compact file, however the results seem to be the same...
> Log: http://pastebin.com/MJCgGM8C
> 
> Started with an empty ecrepo.couch.compact file (touch ecrepo.couch.compact), then after about 3 hours, the error was printed in the logs and the compaction failed:
> -rw-r--r-- 1 couchdb couchdb 137502523517 Jan 21 09:51 ecrepo.couch
> -rw-r--r-- 1 couchdb couchdb  51692612367 Jan 21 02:07 ecrepo.couch.compact
> 
> There's over 100GB free space available on the disk.
> 
> At least I think I know what the number 51692471440 in log means ;) But I don't know if there's a way to check which document resides at that position in file.
> 
> Best regards,
> Igor Klimer
> 
> ________________________________________
> Od: Robert Samuel Newson [rnewson@apache.org]
> Wysłano: 10 stycznia 2014 18:45
> Do: user
> Temat: Re: Error during compaction
> 
> Yes, I understood. The empty .compact file will trigger more checking in the compaction process, I’m hoping it gets us past the problem.
> 
> B.
> 
> On 10 Jan 2014, at 13:34, Igor Klimer <i....@getbacksa.pl> wrote:
> 
>> :)
>> Just to clarify - the .compact file is getting created and then the compaction fails after some time (an hour or more):
>> 1) attempt on Windows with Couchdb 1.2.0 it failed because insufficient disk space. The .compact file had at least 10GB, unfortunately, I don't remember how much (and whether it was bigger then the one produced in the later attempts). There was no free disk space when it failed, so I'm assuming that was the cause.
>> 2) attempt on Windows with Couchdb 1.2.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk.
>> 3) attempt on Ubuntu with Couchdb 1.5.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk, and judging from the numbers present in the log (ids? node numbers?) it failed at the same moment as attempt #2.
>> 
>> Just wanted to make sure we're on the same page :) Do you still want me to try it with an empty .compact file? (I can do this only during night hours, since I don't want to put too much load on the server during working hours)
>> 
>> Best regards,
>> Igor Klimer
>> ________________________________________
>> Od: Robert Samuel Newson [rnewson@apache.org]
>> Wysłano: 10 stycznia 2014 14:03
>> Do: user
>> Temat: Re: Error during compaction
>> 
>> Hrm, strike one. Ok. Next thing to try is subtly different. stop couchdb, delete the .compact file, but then make a new, empty .compact file (so ’touch /path/to/dbname.compact’), start couchdb and compact.
>> 
>> B.
>> 
>> On 10 Jan 2014, at 12:42, Igor Klimer <i....@getbacksa.pl> wrote:
>> 
>>> Yes, I've already done that after the very fist attempt at compaction (the one that failed because of lack of disk space). And it resulted in the second fail (on Windows), then the same on Linux - I always deleted the incomplete (about 50% of the database, around 50GB) .compact file before running the compaction again. So I was always doing compaction from scratch.
>>> 
>>> Best regards,
>>> Igor Klimer
>>> ________________________________________
>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>> Wysłano: 10 stycznia 2014 13:08
>>> Do: user
>>> Temat: Re: Error during compaction
>>> 
>>> Thanks! that’s very useful. Hitting end of disk certainly feels like a cause here. Since the compaction has never completed, I suggest we redo compaction from scratch.
>>> 
>>> 1) stop couchdb
>>> 2) delete (or move aside) the dbname.compact file for this database
>>> 3) start couchdb
>>> 4) compact the db
>>> 
>>> Whether it works or not, please let us know.
>>> 
>>> B.
>>> 
>>> On 10 Jan 2014, at 08:25, Igor Klimer <i....@getbacksa.pl> wrote:
>>> 
>>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>> 
>>>> No, we've been running 1.2.0 from the start (around Oct 2012), then switched to Ubuntu and 1.5.0.
>>>> 
>>>>> Do you have free disk space?
>>>> Yes, there's about 150% of the DB's size worth of free space :) I forgot to mention ("OK, here we go, the user will confess to some sin he committed and is ashamed of and is most likely the reason for this failure") that we've run the compaction once before the error on Windows I mentioned below, but it failed because of insufficient disk space - so I double checked before running the compaction again if there's enough space. Here's the log, if it's any helpful: http://pastebin.com/S1URXN0p
>>>> Do you think it could have left the database in some corrupted state? It seems it failed at a different part then the two next attempts (and, as far as I understand, compaction is just copying over the database while pruning the old revisions and deleted documents).
>>>> 
>>>> Thank you for your time and help and best regards,
>>>> Igor Klimer
>>>> ________________________________________
>>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>>> Wysłano: 9 stycznia 2014 17:13
>>>> Do: user
>>>> Temat: Re: Error during compaction
>>>> 
>>>> Do you have free disk space?
>>>> 
>>>> On 9 Jan 2014, at 15:25, Robert Samuel Newson <rn...@apache.org> wrote:
>>>> 
>>>>> 
>>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>>> 
>>>>> B.
>>>>> 
>>>>> On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
>>>>>> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
>>>>>> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
>>>>>> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>>>>>> 
>>>>>> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
>>>>>> 
>>>>>> Best regards,
>>>>>> Igor Klimer
>>>>>> 
>>>>>> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
> 
> 
> 
> 
> -------------------------------
> 
> getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
> Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
> Numer KRS: 0000413997
> NIP: 8992733884
> REGON: 021829989
> 
> Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł
> 
> Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.
> 
> 
> Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.
> 
> 
> This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

ODP: Error during compaction

Posted by Igor Klimer <i....@getbacksa.pl>.

Hi,
I'm extremely sorry for not replying sooner, however I was on sick leave last week.
I've tried your suggestion with an empty .compact file, however the results seem to be the same...
Log: http://pastebin.com/MJCgGM8C

Started with an empty ecrepo.couch.compact file (touch ecrepo.couch.compact), then after about 3 hours, the error was printed in the logs and the compaction failed:
-rw-r--r-- 1 couchdb couchdb 137502523517 Jan 21 09:51 ecrepo.couch
-rw-r--r-- 1 couchdb couchdb  51692612367 Jan 21 02:07 ecrepo.couch.compact

There's over 100GB free space available on the disk.

At least I think I know what the number 51692471440 in log means ;) But I don't know if there's a way to check which document resides at that position in file.

Best regards,
Igor Klimer

________________________________________
Od: Robert Samuel Newson [rnewson@apache.org]
Wysłano: 10 stycznia 2014 18:45
Do: user
Temat: Re: Error during compaction

Yes, I understood. The empty .compact file will trigger more checking in the compaction process, I’m hoping it gets us past the problem.

B.

On 10 Jan 2014, at 13:34, Igor Klimer <i....@getbacksa.pl> wrote:

> :)
> Just to clarify - the .compact file is getting created and then the compaction fails after some time (an hour or more):
> 1) attempt on Windows with Couchdb 1.2.0 it failed because insufficient disk space. The .compact file had at least 10GB, unfortunately, I don't remember how much (and whether it was bigger then the one produced in the later attempts). There was no free disk space when it failed, so I'm assuming that was the cause.
> 2) attempt on Windows with Couchdb 1.2.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk.
> 3) attempt on Ubuntu with Couchdb 1.5.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk, and judging from the numbers present in the log (ids? node numbers?) it failed at the same moment as attempt #2.
>
> Just wanted to make sure we're on the same page :) Do you still want me to try it with an empty .compact file? (I can do this only during night hours, since I don't want to put too much load on the server during working hours)
>
> Best regards,
> Igor Klimer
> ________________________________________
> Od: Robert Samuel Newson [rnewson@apache.org]
> Wysłano: 10 stycznia 2014 14:03
> Do: user
> Temat: Re: Error during compaction
>
> Hrm, strike one. Ok. Next thing to try is subtly different. stop couchdb, delete the .compact file, but then make a new, empty .compact file (so ’touch /path/to/dbname.compact’), start couchdb and compact.
>
> B.
>
> On 10 Jan 2014, at 12:42, Igor Klimer <i....@getbacksa.pl> wrote:
>
>> Yes, I've already done that after the very fist attempt at compaction (the one that failed because of lack of disk space). And it resulted in the second fail (on Windows), then the same on Linux - I always deleted the incomplete (about 50% of the database, around 50GB) .compact file before running the compaction again. So I was always doing compaction from scratch.
>>
>> Best regards,
>> Igor Klimer
>> ________________________________________
>> Od: Robert Samuel Newson [rnewson@apache.org]
>> Wysłano: 10 stycznia 2014 13:08
>> Do: user
>> Temat: Re: Error during compaction
>>
>> Thanks! that’s very useful. Hitting end of disk certainly feels like a cause here. Since the compaction has never completed, I suggest we redo compaction from scratch.
>>
>> 1) stop couchdb
>> 2) delete (or move aside) the dbname.compact file for this database
>> 3) start couchdb
>> 4) compact the db
>>
>> Whether it works or not, please let us know.
>>
>> B.
>>
>> On 10 Jan 2014, at 08:25, Igor Klimer <i....@getbacksa.pl> wrote:
>>
>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>
>>> No, we've been running 1.2.0 from the start (around Oct 2012), then switched to Ubuntu and 1.5.0.
>>>
>>>> Do you have free disk space?
>>> Yes, there's about 150% of the DB's size worth of free space :) I forgot to mention ("OK, here we go, the user will confess to some sin he committed and is ashamed of and is most likely the reason for this failure") that we've run the compaction once before the error on Windows I mentioned below, but it failed because of insufficient disk space - so I double checked before running the compaction again if there's enough space. Here's the log, if it's any helpful: http://pastebin.com/S1URXN0p
>>> Do you think it could have left the database in some corrupted state? It seems it failed at a different part then the two next attempts (and, as far as I understand, compaction is just copying over the database while pruning the old revisions and deleted documents).
>>>
>>> Thank you for your time and help and best regards,
>>> Igor Klimer
>>> ________________________________________
>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>> Wysłano: 9 stycznia 2014 17:13
>>> Do: user
>>> Temat: Re: Error during compaction
>>>
>>> Do you have free disk space?
>>>
>>> On 9 Jan 2014, at 15:25, Robert Samuel Newson <rn...@apache.org> wrote:
>>>
>>>>
>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>>
>>>> B.
>>>>
>>>> On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:
>>>>
>>>>> Hi all,
>>>>> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
>>>>> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
>>>>> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
>>>>> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>>>>>
>>>>> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
>>>>>
>>>>> Best regards,
>>>>> Igor Klimer
>>>>>
>>>>> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
>>>>>
>>>>>
>>>>>
>>>>>




-------------------------------

getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
Numer KRS: 0000413997
NIP: 8992733884
REGON: 021829989

Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł

Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.


Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.


This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Re: Error during compaction

Posted by Robert Samuel Newson <rn...@apache.org>.

Yes, I understood. The empty .compact file will trigger more checking in the compaction process, I’m hoping it gets us past the problem.

B.

On 10 Jan 2014, at 13:34, Igor Klimer <i....@getbacksa.pl> wrote:

> :)
> Just to clarify - the .compact file is getting created and then the compaction fails after some time (an hour or more):
> 1) attempt on Windows with Couchdb 1.2.0 it failed because insufficient disk space. The .compact file had at least 10GB, unfortunately, I don't remember how much (and whether it was bigger then the one produced in the later attempts). There was no free disk space when it failed, so I'm assuming that was the cause.
> 2) attempt on Windows with Couchdb 1.2.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk.
> 3) attempt on Ubuntu with Couchdb 1.5.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk, and judging from the numbers present in the log (ids? node numbers?) it failed at the same moment as attempt #2.
> 
> Just wanted to make sure we're on the same page :) Do you still want me to try it with an empty .compact file? (I can do this only during night hours, since I don't want to put too much load on the server during working hours)
> 
> Best regards,
> Igor Klimer
> ________________________________________
> Od: Robert Samuel Newson [rnewson@apache.org]
> Wysłano: 10 stycznia 2014 14:03
> Do: user
> Temat: Re: Error during compaction
> 
> Hrm, strike one. Ok. Next thing to try is subtly different. stop couchdb, delete the .compact file, but then make a new, empty .compact file (so ’touch /path/to/dbname.compact’), start couchdb and compact.
> 
> B.
> 
> On 10 Jan 2014, at 12:42, Igor Klimer <i....@getbacksa.pl> wrote:
> 
>> Yes, I've already done that after the very fist attempt at compaction (the one that failed because of lack of disk space). And it resulted in the second fail (on Windows), then the same on Linux - I always deleted the incomplete (about 50% of the database, around 50GB) .compact file before running the compaction again. So I was always doing compaction from scratch.
>> 
>> Best regards,
>> Igor Klimer
>> ________________________________________
>> Od: Robert Samuel Newson [rnewson@apache.org]
>> Wysłano: 10 stycznia 2014 13:08
>> Do: user
>> Temat: Re: Error during compaction
>> 
>> Thanks! that’s very useful. Hitting end of disk certainly feels like a cause here. Since the compaction has never completed, I suggest we redo compaction from scratch.
>> 
>> 1) stop couchdb
>> 2) delete (or move aside) the dbname.compact file for this database
>> 3) start couchdb
>> 4) compact the db
>> 
>> Whether it works or not, please let us know.
>> 
>> B.
>> 
>> On 10 Jan 2014, at 08:25, Igor Klimer <i....@getbacksa.pl> wrote:
>> 
>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>> 
>>> No, we've been running 1.2.0 from the start (around Oct 2012), then switched to Ubuntu and 1.5.0.
>>> 
>>>> Do you have free disk space?
>>> Yes, there's about 150% of the DB's size worth of free space :) I forgot to mention ("OK, here we go, the user will confess to some sin he committed and is ashamed of and is most likely the reason for this failure") that we've run the compaction once before the error on Windows I mentioned below, but it failed because of insufficient disk space - so I double checked before running the compaction again if there's enough space. Here's the log, if it's any helpful: http://pastebin.com/S1URXN0p
>>> Do you think it could have left the database in some corrupted state? It seems it failed at a different part then the two next attempts (and, as far as I understand, compaction is just copying over the database while pruning the old revisions and deleted documents).
>>> 
>>> Thank you for your time and help and best regards,
>>> Igor Klimer
>>> ________________________________________
>>> Od: Robert Samuel Newson [rnewson@apache.org]
>>> Wysłano: 9 stycznia 2014 17:13
>>> Do: user
>>> Temat: Re: Error during compaction
>>> 
>>> Do you have free disk space?
>>> 
>>> On 9 Jan 2014, at 15:25, Robert Samuel Newson <rn...@apache.org> wrote:
>>> 
>>>> 
>>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>> 
>>>> B.
>>>> 
>>>> On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:
>>>> 
>>>>> Hi all,
>>>>> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
>>>>> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
>>>>> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
>>>>> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>>>>> 
>>>>> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
>>>>> 
>>>>> Best regards,
>>>>> Igor Klimer
>>>>> 
>>>>> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
>>>>> 
>>>>> 
>>>>> 
>>>>> 
> 
> 
> 
> 
> -------------------------------
> 
> getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
> Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
> Numer KRS: 0000413997
> NIP: 8992733884
> REGON: 021829989
> 
> Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł
> 
> Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.
> 
> 
> Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.
> 
> 
> This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Re: Error during compaction

Posted by Igor Klimer <i....@getbacksa.pl>.

:)
Just to clarify - the .compact file is getting created and then the compaction fails after some time (an hour or more):
 1) attempt on Windows with Couchdb 1.2.0 it failed because insufficient disk space. The .compact file had at least 10GB, unfortunately, I don't remember how much (and whether it was bigger then the one produced in the later attempts). There was no free disk space when it failed, so I'm assuming that was the cause.
 2) attempt on Windows with Couchdb 1.2.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk.
 3) attempt on Ubuntu with Couchdb 1.5.0 it failed with the error mentioned below. The .compact file had around 50GB, there was plenty of free space left on the disk, and judging from the numbers present in the log (ids? node numbers?) it failed at the same moment as attempt #2.

Just wanted to make sure we're on the same page :) Do you still want me to try it with an empty .compact file? (I can do this only during night hours, since I don't want to put too much load on the server during working hours)

Best regards,
Igor Klimer
________________________________________
Od: Robert Samuel Newson [rnewson@apache.org]
Wysłano: 10 stycznia 2014 14:03
Do: user
Temat: Re: Error during compaction

Hrm, strike one. Ok. Next thing to try is subtly different. stop couchdb, delete the .compact file, but then make a new, empty .compact file (so ’touch /path/to/dbname.compact’), start couchdb and compact.

B.

On 10 Jan 2014, at 12:42, Igor Klimer <i....@getbacksa.pl> wrote:

> Yes, I've already done that after the very fist attempt at compaction (the one that failed because of lack of disk space). And it resulted in the second fail (on Windows), then the same on Linux - I always deleted the incomplete (about 50% of the database, around 50GB) .compact file before running the compaction again. So I was always doing compaction from scratch.
>
> Best regards,
> Igor Klimer
> ________________________________________
> Od: Robert Samuel Newson [rnewson@apache.org]
> Wysłano: 10 stycznia 2014 13:08
> Do: user
> Temat: Re: Error during compaction
>
> Thanks! that’s very useful. Hitting end of disk certainly feels like a cause here. Since the compaction has never completed, I suggest we redo compaction from scratch.
>
> 1) stop couchdb
> 2) delete (or move aside) the dbname.compact file for this database
> 3) start couchdb
> 4) compact the db
>
> Whether it works or not, please let us know.
>
> B.
>
> On 10 Jan 2014, at 08:25, Igor Klimer <i....@getbacksa.pl> wrote:
>
>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>
>> No, we've been running 1.2.0 from the start (around Oct 2012), then switched to Ubuntu and 1.5.0.
>>
>>> Do you have free disk space?
>> Yes, there's about 150% of the DB's size worth of free space :) I forgot to mention ("OK, here we go, the user will confess to some sin he committed and is ashamed of and is most likely the reason for this failure") that we've run the compaction once before the error on Windows I mentioned below, but it failed because of insufficient disk space - so I double checked before running the compaction again if there's enough space. Here's the log, if it's any helpful: http://pastebin.com/S1URXN0p
>> Do you think it could have left the database in some corrupted state? It seems it failed at a different part then the two next attempts (and, as far as I understand, compaction is just copying over the database while pruning the old revisions and deleted documents).
>>
>> Thank you for your time and help and best regards,
>> Igor Klimer
>> ________________________________________
>> Od: Robert Samuel Newson [rnewson@apache.org]
>> Wysłano: 9 stycznia 2014 17:13
>> Do: user
>> Temat: Re: Error during compaction
>>
>> Do you have free disk space?
>>
>> On 9 Jan 2014, at 15:25, Robert Samuel Newson <rn...@apache.org> wrote:
>>
>>>
>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>>
>>> B.
>>>
>>> On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:
>>>
>>>> Hi all,
>>>> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
>>>> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
>>>> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
>>>> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>>>>
>>>> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
>>>>
>>>> Best regards,
>>>> Igor Klimer
>>>>
>>>> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
>>>>
>>>>
>>>>
>>>>

-------------------------------

getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
Numer KRS: 0000413997
NIP: 8992733884
REGON: 021829989

Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł

Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.

Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.

This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Re: Error during compaction

Posted by Robert Samuel Newson <rn...@apache.org>.

Hrm, strike one. Ok. Next thing to try is subtly different. stop couchdb, delete the .compact file, but then make a new, empty .compact file (so ’touch /path/to/dbname.compact’), start couchdb and compact.

B.

On 10 Jan 2014, at 12:42, Igor Klimer <i....@getbacksa.pl> wrote:

> Yes, I've already done that after the very fist attempt at compaction (the one that failed because of lack of disk space). And it resulted in the second fail (on Windows), then the same on Linux - I always deleted the incomplete (about 50% of the database, around 50GB) .compact file before running the compaction again. So I was always doing compaction from scratch.
> 
> Best regards,
> Igor Klimer
> ________________________________________
> Od: Robert Samuel Newson [rnewson@apache.org]
> Wysłano: 10 stycznia 2014 13:08
> Do: user
> Temat: Re: Error during compaction
> 
> Thanks! that’s very useful. Hitting end of disk certainly feels like a cause here. Since the compaction has never completed, I suggest we redo compaction from scratch.
> 
> 1) stop couchdb
> 2) delete (or move aside) the dbname.compact file for this database
> 3) start couchdb
> 4) compact the db
> 
> Whether it works or not, please let us know.
> 
> B.
> 
> On 10 Jan 2014, at 08:25, Igor Klimer <i....@getbacksa.pl> wrote:
> 
>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>> 
>> No, we've been running 1.2.0 from the start (around Oct 2012), then switched to Ubuntu and 1.5.0.
>> 
>>> Do you have free disk space?
>> Yes, there's about 150% of the DB's size worth of free space :) I forgot to mention ("OK, here we go, the user will confess to some sin he committed and is ashamed of and is most likely the reason for this failure") that we've run the compaction once before the error on Windows I mentioned below, but it failed because of insufficient disk space - so I double checked before running the compaction again if there's enough space. Here's the log, if it's any helpful: http://pastebin.com/S1URXN0p
>> Do you think it could have left the database in some corrupted state? It seems it failed at a different part then the two next attempts (and, as far as I understand, compaction is just copying over the database while pruning the old revisions and deleted documents).
>> 
>> Thank you for your time and help and best regards,
>> Igor Klimer
>> ________________________________________
>> Od: Robert Samuel Newson [rnewson@apache.org]
>> Wysłano: 9 stycznia 2014 17:13
>> Do: user
>> Temat: Re: Error during compaction
>> 
>> Do you have free disk space?
>> 
>> On 9 Jan 2014, at 15:25, Robert Samuel Newson <rn...@apache.org> wrote:
>> 
>>> 
>>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>> 
>>> B.
>>> 
>>> On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:
>>> 
>>>> Hi all,
>>>> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
>>>> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
>>>> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
>>>> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>>>> 
>>>> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
>>>> 
>>>> Best regards,
>>>> Igor Klimer
>>>> 
>>>> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
>>>> 
>>>> 
>>>> 
>>>> 
> 
> 
> 
> 
> -------------------------------
> 
> getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
> Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
> Numer KRS: 0000413997
> NIP: 8992733884
> REGON: 021829989
> 
> Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł
> 
> Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.
> 
> 
> Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.
> 
> 
> This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

ODP: Error during compaction

Posted by Igor Klimer <i....@getbacksa.pl>.

Yes, I've already done that after the very fist attempt at compaction (the one that failed because of lack of disk space). And it resulted in the second fail (on Windows), then the same on Linux - I always deleted the incomplete (about 50% of the database, around 50GB) .compact file before running the compaction again. So I was always doing compaction from scratch.

Best regards,
Igor Klimer
________________________________________
Od: Robert Samuel Newson [rnewson@apache.org]
Wysłano: 10 stycznia 2014 13:08
Do: user
Temat: Re: Error during compaction

Thanks! that’s very useful. Hitting end of disk certainly feels like a cause here. Since the compaction has never completed, I suggest we redo compaction from scratch.

1) stop couchdb
2) delete (or move aside) the dbname.compact file for this database
3) start couchdb
4) compact the db

Whether it works or not, please let us know.

B.

On 10 Jan 2014, at 08:25, Igor Klimer <i....@getbacksa.pl> wrote:

>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>
> No, we've been running 1.2.0 from the start (around Oct 2012), then switched to Ubuntu and 1.5.0.
>
>> Do you have free disk space?
> Yes, there's about 150% of the DB's size worth of free space :) I forgot to mention ("OK, here we go, the user will confess to some sin he committed and is ashamed of and is most likely the reason for this failure") that we've run the compaction once before the error on Windows I mentioned below, but it failed because of insufficient disk space - so I double checked before running the compaction again if there's enough space. Here's the log, if it's any helpful: http://pastebin.com/S1URXN0p
> Do you think it could have left the database in some corrupted state? It seems it failed at a different part then the two next attempts (and, as far as I understand, compaction is just copying over the database while pruning the old revisions and deleted documents).
>
> Thank you for your time and help and best regards,
> Igor Klimer
> ________________________________________
> Od: Robert Samuel Newson [rnewson@apache.org]
> Wysłano: 9 stycznia 2014 17:13
> Do: user
> Temat: Re: Error during compaction
>
> Do you have free disk space?
>
> On 9 Jan 2014, at 15:25, Robert Samuel Newson <rn...@apache.org> wrote:
>
>>
>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>>
>> B.
>>
>> On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:
>>
>>> Hi all,
>>> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
>>> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
>>> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
>>> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>>>
>>> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
>>>
>>> Best regards,
>>> Igor Klimer
>>>
>>> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
>>>
>>>
>>>
>>>

-------------------------------

getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
Numer KRS: 0000413997
NIP: 8992733884
REGON: 021829989

Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł

Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.

Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.

This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Re: Error during compaction

Posted by Robert Samuel Newson <rn...@apache.org>.

Thanks! that’s very useful. Hitting end of disk certainly feels like a cause here. Since the compaction has never completed, I suggest we redo compaction from scratch.

1) stop couchdb
2) delete (or move aside) the dbname.compact file for this database
3) start couchdb
4) compact the db

Whether it works or not, please let us know.

B.

On 10 Jan 2014, at 08:25, Igor Klimer <i....@getbacksa.pl> wrote:

>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
> 
> No, we've been running 1.2.0 from the start (around Oct 2012), then switched to Ubuntu and 1.5.0.
> 
>> Do you have free disk space?
> Yes, there's about 150% of the DB's size worth of free space :) I forgot to mention ("OK, here we go, the user will confess to some sin he committed and is ashamed of and is most likely the reason for this failure") that we've run the compaction once before the error on Windows I mentioned below, but it failed because of insufficient disk space - so I double checked before running the compaction again if there's enough space. Here's the log, if it's any helpful: http://pastebin.com/S1URXN0p
> Do you think it could have left the database in some corrupted state? It seems it failed at a different part then the two next attempts (and, as far as I understand, compaction is just copying over the database while pruning the old revisions and deleted documents).
> 
> Thank you for your time and help and best regards,
> Igor Klimer
> ________________________________________
> Od: Robert Samuel Newson [rnewson@apache.org]
> Wysłano: 9 stycznia 2014 17:13
> Do: user
> Temat: Re: Error during compaction
> 
> Do you have free disk space?
> 
> On 9 Jan 2014, at 15:25, Robert Samuel Newson <rn...@apache.org> wrote:
> 
>> 
>> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>> 
>> B.
>> 
>> On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:
>> 
>>> Hi all,
>>> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
>>> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
>>> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
>>> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>>> 
>>> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
>>> 
>>> Best regards,
>>> Igor Klimer
>>> 
>>> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
>>> 
>>> 
>>> 
>>> 
>>> -------------------------------
>>> 
>>> getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
>>> Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
>>> Numer KRS: 0000413997
>>> NIP: 8992733884
>>> REGON: 021829989
>>> 
>>> Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł
>>> 
>>> Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.
>>> 
>>> 
>>> Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.
>>> 
>>> 
>>> This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
>> 
> 
> 
> 
> 
> 
> -------------------------------
> 
> getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
> Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
> Numer KRS: 0000413997
> NIP: 8992733884
> REGON: 021829989
> 
> Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł
> 
> Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.
> 
> 
> Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.
> 
> 
> This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Re: Error during compaction

Posted by Igor Klimer <i....@getbacksa.pl>.

> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?

No, we've been running 1.2.0 from the start (around Oct 2012), then switched to Ubuntu and 1.5.0.

> Do you have free disk space?
Yes, there's about 150% of the DB's size worth of free space :) I forgot to mention ("OK, here we go, the user will confess to some sin he committed and is ashamed of and is most likely the reason for this failure") that we've run the compaction once before the error on Windows I mentioned below, but it failed because of insufficient disk space - so I double checked before running the compaction again if there's enough space. Here's the log, if it's any helpful: http://pastebin.com/S1URXN0p
Do you think it could have left the database in some corrupted state? It seems it failed at a different part then the two next attempts (and, as far as I understand, compaction is just copying over the database while pruning the old revisions and deleted documents).

Thank you for your time and help and best regards,
Igor Klimer
________________________________________
Od: Robert Samuel Newson [rnewson@apache.org]
Wysłano: 9 stycznia 2014 17:13
Do: user
Temat: Re: Error during compaction

Do you have free disk space?

On 9 Jan 2014, at 15:25, Robert Samuel Newson <rn...@apache.org> wrote:

>
> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
>
> B.
>
> On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:
>
>> Hi all,
>> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
>> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
>> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
>> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>>
>> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
>>
>> Best regards,
>> Igor Klimer
>>
>> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
>>
>>
>>
>>
>> -------------------------------
>>
>> getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
>> Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
>> Numer KRS: 0000413997
>> NIP: 8992733884
>> REGON: 021829989
>>
>> Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł
>>
>> Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.
>>
>>
>> Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.
>>
>>
>> This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
>





-------------------------------

getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
Numer KRS: 0000413997
NIP: 8992733884
REGON: 021829989

Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł

Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.


Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.


This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Re: Error during compaction

Posted by Robert Samuel Newson <rn...@apache.org>.

Do you have free disk space?

On 9 Jan 2014, at 15:25, Robert Samuel Newson <rn...@apache.org> wrote:

> 
> Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?
> 
> B.
> 
> On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:
> 
>> Hi all,
>> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
>> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
>> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
>> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
>> 
>> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
>> 
>> Best regards,
>> Igor Klimer
>> 
>> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
>> 
>> 
>> 
>> 
>> -------------------------------
>> 
>> getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
>> Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
>> Numer KRS: 0000413997
>> NIP: 8992733884
>> REGON: 021829989
>> 
>> Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł
>> 
>> Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.
>> 
>> 
>> Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.
>> 
>> 
>> This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
>

Re: Error during compaction

Posted by Robert Samuel Newson <rn...@apache.org>.

Given that you’re at 100Gb and compacting for the first time, can you tell us if you were running on older couchdb versions that 1.2.0 between db creation and today?

B.

On 9 Jan 2014, at 14:39, Igor Klimer <i....@getbacksa.pl> wrote:

> Hi all,
> I've stumbled upon a peculiar problem while trying to compact (for the first time) a large(-ish) database (~100GB at that time). At about 50% it failed with this error: http://pastebin.com/qeaZNHMj
> This is from Windows Server 2008 R2 Enterprise with Couchdb 1.2.0.
> I figured that it might be a bug in the Windows build (Erlang on Windows? C'mon, that can't be good ;)) or already fixed in a newer version. Some time later we migrated the server to a Linux box running Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64) and update Couchdb to 1.5.0.
> Unfortunately, the same error occurred: http://pastebin.com/feJWu7bN
> 
> I've tried wrapping my head around that error, googling it, checking this mail list but to no avail :) So if anyone can give me any pointers as to what might be causing this problem, I'd be very grateful.
> 
> Best regards,
> Igor Klimer
> 
> (sorry for the footer that will probably follow, unfortunately it's added for all outgoing external mail...)
> 
> 
> 
> 
> -------------------------------
> 
> getBACK S.A., ul. Powstańców Śląskich 2-4, 53-333 Wrocław
> Sad rejestrowy: Sąd Rejonowy dla Wrocławia - Fabrycznej, VI Wydział Gospodarczy KRS.
> Numer KRS: 0000413997
> NIP: 8992733884
> REGON: 021829989
> 
> Wysokość kapitału zakładowego opłaconego w całości: 4 000 000,00 zł
> 
> Zamieszczenie powyższych danych identyfikujących getBACK S.A. stosownie do art. 374 par.1 Kodeksu spółek handlowych nie jest równoznaczne z handlowym charakterem dostarczonej do Państwa wiadomości e-mailowej i pozostaje bez wpływu na interpretacje zawartych w niej oświadczeń.
> 
> 
> Niniejszy e-mail oraz wszelkie załączone do niego pliki są poufne i mogą podlegać ochronie prawnej. Jeżeli nie jest Pan/Pani zamierzonym adresatem powyższej wiadomości, nie może jej Pan/Pani ujawniać, kopiować, dystrybuować, ani tez w żaden inny sposób udostępniać lub wykorzystywać. O błędnym zaadresowaniu wiadomości prosimy niezwłocznie poinformować nadawcę i usunąć wiadomość.
> 
> 
> This e-mail message may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.