You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2008/04/02 05:31:24 UTC
[jira] Updated: (HBASE-469) Streamline HStore startup and compactions

     [ https://issues.apache.org/jira/browse/HBASE-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-469:
--------------------------------

    Attachment: patch.txt

> Here's a couple of comments:
>
> Why catch the exception? Why not just let out?
{code}
+    if (fs == null) {
+      try {
+        this.fs = FileSystem.get(conf);
+      } catch (IOException e) {
+        LOG.fatal("error getting file system", e);
+        throw e;
+      }
{code}
Beats me. Is what we did before. All I really did was encase the existing code inside the if. I have removed the try/catch

> Regards the below:
{code}
+  protected void deleteLocal(File f) {
+    if (f.exists()) {
+      if (f.isDirectory()) {
+        File[] children = f.listFiles();
+        if (children != null && children.length > 0) {
+          for (int i = 0; i < children.length; i++) {
+            deleteLocal(children[i]);
+          }
+        }
+      }
+      f.delete();
+    }
+  }
{code}
> Its called deleteLocalFile - does it only work against local filesystem?

Yes. It is used to clean up the cruft that MapReduce leaves in the local
file system when it runs a test (because it does not clean up after itself)

> Should this method be in file utils instead?
> Is there a method that does this up in hadoop?

Yes, duh! Removed.

> What if test is run on DFS?

It is only used to clean up the cruft that MapReduce leaves in the local file system.

> Should the System.out.* logging be changed in
> src/test/org/apache/hadoop/hbase/TestHBaseCluster.java?

Yes also in TestHRegion.

> Whats happening with TestMemcache? Is there a renaming going on?
> Its different from TestHMemcache?

It was a new test I wrote while debugging. I folded it into TestHMemcache.

> Why do the below:
{code}
+    } catch (IOException e) {
+      e.printStackTrace();
+      throw e;
{code}
> Doesn't a stacktrace come out anyways?

Yes, removed.

> Log instead? + System.err.println("setup completed.");
> Change all the System.out* to use log?

Yes. Done.

> Just let the below out?
{code}
+    try {
+      r.flushcache();
+    } catch (IOException e) {
+      e.printStackTrace();
+      throw e;
+    } 
+    try {
+      r.compactStores();
+    } catch (IOException e) {
+      e.printStackTrace();
+      throw e;
+    }
{code}
> And here:
{code}
+    try {
+      r.flushcache();
+      r.compactStores();
+    } catch (IOException e) {
+      e.printStackTrace();
+      throw e;
+    }
{code}
Done.

> Should this run in tearDown?
{code}
} finally {
       mrCluster.shutdown();
+      if (jobConf != null) {
+        deleteLocal(new File(jobConf.get("hadoop.tmp.dir")));
+      }
{code}
Yes. Done.

> Why is this necessary?
{code}
+  void close() {
+    this.lock.writeLock().lock();
+    try {
+      this.memcache.clear();
+      this.snapshot.clear();
+    } finally {
+      this.lock.writeLock().unlock();
+    }
+  }
{code}
> Its just busy work and it will mask real memory retention issues - e.g.
> if the host of this MemCache is being held on to, then this clearing will
> make it that condidtion harder to detect (if the host is not being
> referenced, the memcache and snapshot have no path to root and GC will
> clean them up). I see that we used do this elsewhere...and you're just
> tidying stuff.

I was trying to give the garbage collector some help, but you're right, it's not necessary.

> Can this be final? + protected Memcache memcache = new Memcache();
Yes. Done.

> Why remove the final in below?
{code}
    * final FileSystem fs;
    * private final HBaseConfiguration conf;
      + FileSystem fs;
      + private HBaseConfiguration conf;
{code}
> Because not set on construction?

No, close at one point nulled out these pointers to try to help the garbage collector. They are final now.

> This is a nice addition -> + private volatile long storeSize;

Yes. HStore for the most part knows how big it is, so why recompute it every time we want to know if the store wants to be split.

> I think one of my patches yesterday might make it so this part doesn't apply:
{code}
@@ -208,29 +220,18 @@
     if(LOG.isDebugEnabled()) {
       LOG.debug("starting " + storeName +
           ((reconstructionLog == null || !fs.exists(reconstructionLog)) ?
-          " (no reconstruction log)" :
-            " with reconstruction log: " + reconstructionLog.toString()));
+              " (no reconstruction log)" :
+                " with reconstruction log: " + reconstructionLog.toString()));
{code}
y, I resolved the conflict.

> What you think our policy on data member access should be? Sometimes we make
> things default access. Other times we add accessors:
{code}
+
+  HColumnDescriptor getFamily() {
+    return this.family;
   }
{code}

I don't have a clear opinion on this. If access is package scoped, usually I don't go through an accessor. But if you'd like to formalize this, that's cool with me too.

> Should we log debug something here just in case things are not working as
> we'd hope? Just suggesting...
{code}
+      } catch (IOException e) {
+        // If the HSTORE_LOGINFOFILE doesn't contain a number, just ignore it.
+        // That means it was built prior to the previous run of HStore, and so
+        // it cannot contain any updates also contained in the log.
+      }
{code}
Done.

> You want to remove this?
{code}
+//      this.basedir = null;
+//      this.info = null;
+//      this.family = null;
+//      this.fs = null;
{code}
Done.

> Would suggest this log not necessary
{code}
+            if (LOG.isDebugEnabled()) {
+              LOG.debug("Not compacting " + this.storeName +
+                  " because no store files to compact.");
+            }
{code}
I think this is useful at debug level. If we ever get our INFO/DEBUG messages right, then this should only show up if we are trying to trace a problem.

> Lets log events and changes in state - not the fact that they did not
> happen (there are a few other logs after this one). Would suggest that as
> part of logging compactions, you add rationale.

Like I said above, it we make INFO level logging provide us with enough information, this, as a debug level log, should disappear.

> Should the below check be done inside the checkSplit method for the sake of
> better coherence?
{code}
+      if (storeSize > this.desiredMaxFileSize) {
+        return checkSplit();
+      }
{code}
Yes. Done.

> This is intentional (Moving where reportOpen happens?)?
{code}
-      reportOpen(regionInfo); 
     }   
+    reportOpen(regionInfo);
{code}
Yes. Sometimes the master sends multiple open region requests to the region server, why I don't know. If it does and the region server already has the region open, It can make the master happy by telling it that the region has been opened.

> I don't think the HRegion cleanup you've added helps (See above note on
> how I think it could make memleaks harder to detect).

Agreed. Removed.


> Streamline HStore startup and compactions
> -----------------------------------------
>
>                 Key: HBASE-469
>                 URL: https://issues.apache.org/jira/browse/HBASE-469
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.2.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.2.0
>
>         Attachments: patch.txt, patch.txt, patch.txt
>
>
> Several useful patches that streamline HStore startup and compactions that were a part of the abandoned changes in HBASE-69 should be incorporated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.