You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Vishal K (JIRA)" <ji...@apache.org> on 2010/09/15 03:53:33 UTC

[jira] Created: (ZOOKEEPER-872) Small fixes to PurgeTxnLog

Small fixes to PurgeTxnLog 
---------------------------

                 Key: ZOOKEEPER-872
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-872
             Project: Zookeeper
          Issue Type: Bug
    Affects Versions: 3.3.1
            Reporter: Vishal K
            Priority: Minor


PurgeTxnLog forces us to have at least 2 backups (by having count >= 3. Also, it prints to stdout instead of using Logger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-872) Small fixes to PurgeTxnLog

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927082#action_12927082 ] 

Patrick Hunt commented on ZOOKEEPER-872:
----------------------------------------

Hi Vishal, I noticed a couple issues.

This class is a command line utility. As such we are outputting to both the command line and to the log. The usage() in particular should go to std out so that the user will see it regardless of the log settings (fine if you want to output it to LOG as well, but I think this is unnecessary).

good catch on the error handling for this:

     public static void purge(File dataDir, File snapDir, int num) throws IOException {
-        if (num < 3) {
-            throw new IllegalArgumentException("count should be greater than 3");
+        if (num < 2) {
+            throw new IllegalArgumentException("count should be greater than 1");
         }

However the number 3 was chosen to ensure that ppl don't shoot themselves in the foot (if the most recent logs get corrupted we'll fall back to the prior when attempting to recover). There really should be a comment to this effect (would be good to add). I don't know how Mahadev feels on this setting (min 3 vs some other number) but he might have more insight as IIRC he implemented this originally.

this following is there to provide feedback to the user when running on command line:

-            System.out.println("Removing file: "+
-                DateFormat.getDateTimeInstance().format(f.lastModified())+
-                "\t"+f.getPath());

again, regardless of logging setup.

Perhaps we should have a "-q" option that turns off the CLI logging and just logs to the log file? I know this has been an issue previously (stdout/err) given that cron will spitout emails by default containing stdout/err.

Also, is there a test for this? If you're up to it would be great to add.


> Small fixes to PurgeTxnLog 
> ---------------------------
>
>                 Key: ZOOKEEPER-872
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-872
>             Project: Zookeeper
>          Issue Type: Bug
>    Affects Versions: 3.3.1
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Minor
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-872
>
>
> PurgeTxnLog forces us to have at least 2 backups (by having count >= 3. Also, it prints to stdout instead of using Logger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-872) Small fixes to PurgeTxnLog

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-872:
------------------------------------

    Status: Patch Available  (was: Open)

> Small fixes to PurgeTxnLog 
> ---------------------------
>
>                 Key: ZOOKEEPER-872
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-872
>             Project: Zookeeper
>          Issue Type: Bug
>    Affects Versions: 3.3.1
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Minor
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-872
>
>
> PurgeTxnLog forces us to have at least 2 backups (by having count >= 3. Also, it prints to stdout instead of using Logger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-872) Small fixes to PurgeTxnLog

Posted by "Vishal K (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927127#action_12927127 ] 

Vishal K commented on ZOOKEEPER-872:
------------------------------------

Hi Pat,

The current code prints to stdout.  We have a RMI service that has ZK server embedded in it. We do this so that we can run/start/stop ZK across platforms without having to write platform specific scripts. In this server, we  start a thread that periodically calls PurgeTxnlog.purge(). As you pointed out, we should have a -q flag to direct to log instead stdout to statisfy both the approaches. I will make that change.

We chose number 2 here because we think having only one backup will be enough. It is not clear to us under what conditions the additional backup will be useful.

Backups are useful under the following scenario (correct me if I am wrong):
1. The current ZooKeeper transaction log and/or snapshot is corrupted, but the past snapshots and transaction logs are ok. Corrupting can mean either disk file corruption or corrupting of transaction entries in the log. We store ZooKeeper data on mirrored disks.
2. The application itself made some errors that requires reverting back to the older version.

For the first point, having one additional backup would suffice. The second point is really tricky. I am not sure how the application can decide which snapshot to revert to. I think in most cases it will be trial and error. It is not clear to me how to estimate the number of backups needed. Also, it is not clear how one would go about going back in time. I looked at LogFormatter utility and that utility does not help much in undoing the erroneous transactions for case 2 above. In general, I think it is good to enforce users to have a minimum of one backup.

Related question: Is there hash on the log files (or internal tree structures) that can tell the ZooKeeper server if the logs are corrupted. If yes, the zookeeper server can verify the hash during startup and take some action based on that. For example, make sure that it never becomes a leader until it gets the correct snapshot from the existing leader (otherwise it may endup corrupting other server's log). "Corrupting" here refers to the case where the file is readable, but one or more transactions in the log are bad.

I am not sure if there is a test for this. If I remember correctly, there is a bug that causes the purge() function to leave behind one addition log file. Please refer to my question above about findNRecentSnapshots(). I can add a test or modify the pruge utlity once we have concluded this discussion.

> Small fixes to PurgeTxnLog 
> ---------------------------
>
>                 Key: ZOOKEEPER-872
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-872
>             Project: Zookeeper
>          Issue Type: Bug
>    Affects Versions: 3.3.1
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Minor
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-872
>
>
> PurgeTxnLog forces us to have at least 2 backups (by having count >= 3. Also, it prints to stdout instead of using Logger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-872) Small fixes to PurgeTxnLog

Posted by "Vishal K (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909564#action_12909564 ] 

Vishal K commented on ZOOKEEPER-872:
------------------------------------

Is there any reason why we dont have a findNRecentLogs(int n) method to return the n most recent logs (similar to findNRecentSnapshots)?
While testing I noticed that it can happen that a log file is left undeleted depending on the transaction id of the nth snapshot file.  Thus, we will have n snapshots, but n+1 log file left behind instead of n. This file will be deleted after the next snapshot is taken.

We won't have this problem if we have a function that keeps the n most recent logs and removes the rest.

> Small fixes to PurgeTxnLog 
> ---------------------------
>
>                 Key: ZOOKEEPER-872
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-872
>             Project: Zookeeper
>          Issue Type: Bug
>    Affects Versions: 3.3.1
>            Reporter: Vishal K
>            Priority: Minor
>         Attachments: ZOOKEEPER-872
>
>
> PurgeTxnLog forces us to have at least 2 backups (by having count >= 3. Also, it prints to stdout instead of using Logger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-872) Small fixes to PurgeTxnLog

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-872:
-----------------------------------

    Status: Open  (was: Patch Available)

> Small fixes to PurgeTxnLog 
> ---------------------------
>
>                 Key: ZOOKEEPER-872
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-872
>             Project: Zookeeper
>          Issue Type: Bug
>    Affects Versions: 3.3.1
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Minor
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-872
>
>
> PurgeTxnLog forces us to have at least 2 backups (by having count >= 3. Also, it prints to stdout instead of using Logger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-872) Small fixes to PurgeTxnLog

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-872:
------------------------------------

    Fix Version/s: 3.4.0
         Assignee: Vishal K

> Small fixes to PurgeTxnLog 
> ---------------------------
>
>                 Key: ZOOKEEPER-872
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-872
>             Project: Zookeeper
>          Issue Type: Bug
>    Affects Versions: 3.3.1
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Minor
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-872
>
>
> PurgeTxnLog forces us to have at least 2 backups (by having count >= 3. Also, it prints to stdout instead of using Logger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-872) Small fixes to PurgeTxnLog

Posted by "Vishal K (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vishal K updated ZOOKEEPER-872:
-------------------------------

    Attachment: ZOOKEEPER-872

patch attached.


> Small fixes to PurgeTxnLog 
> ---------------------------
>
>                 Key: ZOOKEEPER-872
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-872
>             Project: Zookeeper
>          Issue Type: Bug
>    Affects Versions: 3.3.1
>            Reporter: Vishal K
>            Priority: Minor
>         Attachments: ZOOKEEPER-872
>
>
> PurgeTxnLog forces us to have at least 2 backups (by having count >= 3. Also, it prints to stdout instead of using Logger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.