You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2007/04/12 17:41:32 UTC

[jira] Created: (SOLR-207) snappuller inefficient finding latest snapshot

snappuller inefficient finding latest snapshot
----------------------------------------------

                 Key: SOLR-207
                 URL: https://issues.apache.org/jira/browse/SOLR-207
             Project: Solr
          Issue Type: Bug
          Components: replication
            Reporter: Yonik Seeley


snapinstaller (and snappuller) do the following to find the latest snapshot:
name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`

This recurses into all of the snapshot directories, doing much more disk-io than is necessary.
I think it is the cause of bloated kernel memory usage we have seen on some of our Linux boxes, caused
by kernel dentry and inode caches.   Those caches compete with buffer cache (caching the actual data of the index)
and can thus decrease performance.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-207) snappuller inefficient finding latest snapshot

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley resolved SOLR-207.
-------------------------------

    Resolution: Fixed

Tested and committed.

> snappuller inefficient finding latest snapshot
> ----------------------------------------------
>
>                 Key: SOLR-207
>                 URL: https://issues.apache.org/jira/browse/SOLR-207
>             Project: Solr
>          Issue Type: Bug
>          Components: replication
>            Reporter: Yonik Seeley
>         Attachments: find_maxdepth.patch, find_maxdepth.patch
>
>
> snapinstaller (and snappuller) do the following to find the latest snapshot:
> name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
> This recurses into all of the snapshot directories, doing much more disk-io than is necessary.
> I think it is the cause of bloated kernel memory usage we have seen on some of our Linux boxes, caused
> by kernel dentry and inode caches.   Those caches compete with buffer cache (caching the actual data of the index)
> and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-207) snappuller inefficient finding latest snapshot

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-207:
------------------------------

    Attachment: find_maxdepth.patch

Updated patch:
- switches back to "ls",
- tries to determine if "maxdepth" is supported for the cleanup scripts that need to find -mtime
- in snappuller, make the master find the latest snapshot instead of sending the complete "ls" across the network.

This has not yet been tested.

> snappuller inefficient finding latest snapshot
> ----------------------------------------------
>
>                 Key: SOLR-207
>                 URL: https://issues.apache.org/jira/browse/SOLR-207
>             Project: Solr
>          Issue Type: Bug
>          Components: replication
>            Reporter: Yonik Seeley
>         Attachments: find_maxdepth.patch, find_maxdepth.patch
>
>
> snapinstaller (and snappuller) do the following to find the latest snapshot:
> name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
> This recurses into all of the snapshot directories, doing much more disk-io than is necessary.
> I think it is the cause of bloated kernel memory usage we have seen on some of our Linux boxes, caused
> by kernel dentry and inode caches.   Those caches compete with buffer cache (caching the actual data of the index)
> and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-207) snappuller inefficient finding latest snapshot

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-207:
------------------------------

    Attachment:     (was: find_maxdepth.patch)

> snappuller inefficient finding latest snapshot
> ----------------------------------------------
>
>                 Key: SOLR-207
>                 URL: https://issues.apache.org/jira/browse/SOLR-207
>             Project: Solr
>          Issue Type: Bug
>          Components: replication
>            Reporter: Yonik Seeley
>         Attachments: find_maxdepth.patch, find_maxdepth.patch
>
>
> snapinstaller (and snappuller) do the following to find the latest snapshot:
> name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
> This recurses into all of the snapshot directories, doing much more disk-io than is necessary.
> I think it is the cause of bloated kernel memory usage we have seen on some of our Linux boxes, caused
> by kernel dentry and inode caches.   Those caches compete with buffer cache (caching the actual data of the index)
> and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488420 ] 

Yonik Seeley commented on SOLR-207:
-----------------------------------

Although, another alternative that doesn't have the shell expansion problem would be

ls -r ${data_dir} | grep snapshot\\.  | grep -v wip | head -1



> snappuller inefficient finding latest snapshot
> ----------------------------------------------
>
>                 Key: SOLR-207
>                 URL: https://issues.apache.org/jira/browse/SOLR-207
>             Project: Solr
>          Issue Type: Bug
>          Components: replication
>            Reporter: Yonik Seeley
>         Attachments: find_maxdepth.patch
>
>
> snapinstaller (and snappuller) do the following to find the latest snapshot:
> name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
> This recurses into all of the snapshot directories, doing much more disk-io than is necessary.
> I think it is the cause of bloated kernel memory usage we have seen on some of our Linux boxes, caused
> by kernel dentry and inode caches.   Those caches compete with buffer cache (caching the actual data of the index)
> and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488469 ] 

Yonik Seeley commented on SOLR-207:
-----------------------------------

I tried both versions out, and the "find" version was quicker (on Linux at least).
System time was about the same, but "ls" had much higher user time.

$ time find . -maxdepth 1 -name 'snapshot.*' | grep -v wip | head -1
./snapshot.20070411235957

real    0m0.009s
user    0m0.002s
sys     0m0.008s

$ time ls -r . | grep snapshot\\. | grep -v wip | head -1
snapshot.20070412114504

real    0m0.050s
user    0m0.043s
sys     0m0.009s



> snappuller inefficient finding latest snapshot
> ----------------------------------------------
>
>                 Key: SOLR-207
>                 URL: https://issues.apache.org/jira/browse/SOLR-207
>             Project: Solr
>          Issue Type: Bug
>          Components: replication
>            Reporter: Yonik Seeley
>         Attachments: find_maxdepth.patch
>
>
> snapinstaller (and snappuller) do the following to find the latest snapshot:
> name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
> This recurses into all of the snapshot directories, doing much more disk-io than is necessary.
> I think it is the cause of bloated kernel memory usage we have seen on some of our Linux boxes, caused
> by kernel dentry and inode caches.   Those caches compete with buffer cache (caching the actual data of the index)
> and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-207) snappuller inefficient finding latest snapshot

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-207:
------------------------------

    Attachment: find_maxdepth.patch

uses "-maxdepth 1" to avoid recursion.

Bill - does this look OK?

> snappuller inefficient finding latest snapshot
> ----------------------------------------------
>
>                 Key: SOLR-207
>                 URL: https://issues.apache.org/jira/browse/SOLR-207
>             Project: Solr
>          Issue Type: Bug
>          Components: replication
>            Reporter: Yonik Seeley
>         Attachments: find_maxdepth.patch
>
>
> snapinstaller (and snappuller) do the following to find the latest snapshot:
> name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
> This recurses into all of the snapshot directories, doing much more disk-io than is necessary.
> I think it is the cause of bloated kernel memory usage we have seen on some of our Linux boxes, caused
> by kernel dentry and inode caches.   Those caches compete with buffer cache (caching the actual data of the index)
> and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488485 ] 

Yonik Seeley commented on SOLR-207:
-----------------------------------

> I think find -maxdepth is not supported on Solaris

Sigh... back to ls then.

> snappuller inefficient finding latest snapshot
> ----------------------------------------------
>
>                 Key: SOLR-207
>                 URL: https://issues.apache.org/jira/browse/SOLR-207
>             Project: Solr
>          Issue Type: Bug
>          Components: replication
>            Reporter: Yonik Seeley
>         Attachments: find_maxdepth.patch
>
>
> snapinstaller (and snappuller) do the following to find the latest snapshot:
> name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
> This recurses into all of the snapshot directories, doing much more disk-io than is necessary.
> I think it is the cause of bloated kernel memory usage we have seen on some of our Linux boxes, caused
> by kernel dentry and inode caches.   Those caches compete with buffer cache (caching the actual data of the index)
> and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488418 ] 

Yonik Seeley commented on SOLR-207:
-----------------------------------

That's close to the way it was done in the past, but some people ran into problems because of shell restrictions w.r.t. number or size of the argments passed to the process (because the shell expands the list).

> snappuller inefficient finding latest snapshot
> ----------------------------------------------
>
>                 Key: SOLR-207
>                 URL: https://issues.apache.org/jira/browse/SOLR-207
>             Project: Solr
>          Issue Type: Bug
>          Components: replication
>            Reporter: Yonik Seeley
>         Attachments: find_maxdepth.patch
>
>
> snapinstaller (and snappuller) do the following to find the latest snapshot:
> name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
> This recurses into all of the snapshot directories, doing much more disk-io than is necessary.
> I think it is the cause of bloated kernel memory usage we have seen on some of our Linux boxes, caused
> by kernel dentry and inode caches.   Those caches compete with buffer cache (caching the actual data of the index)
> and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-207) snappuller inefficient finding latest snapshot

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-207:
------------------------------

    Attachment: find_maxdepth.patch

re-attaching with ASF perms (in the older JIRA version, the "grant license" option was first, and now it is last... hence I keep clicking the incorrect one)

> snappuller inefficient finding latest snapshot
> ----------------------------------------------
>
>                 Key: SOLR-207
>                 URL: https://issues.apache.org/jira/browse/SOLR-207
>             Project: Solr
>          Issue Type: Bug
>          Components: replication
>            Reporter: Yonik Seeley
>         Attachments: find_maxdepth.patch, find_maxdepth.patch
>
>
> snapinstaller (and snappuller) do the following to find the latest snapshot:
> name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
> This recurses into all of the snapshot directories, doing much more disk-io than is necessary.
> I think it is the cause of bloated kernel memory usage we have seen on some of our Linux boxes, caused
> by kernel dentry and inode caches.   Those caches compete with buffer cache (caching the actual data of the index)
> and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

Posted by "Bertrand Delacretaz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488413 ] 

Bertrand Delacretaz commented on SOLR-207:
------------------------------------------

IIUC the snapshot directories are named like

  snapshot.YYYYMMDDHHMMSS

and they are all under the same parent directory.

If that's the case, then doing

  ls -rt ${data_dir}/snapshot.* | head -1

will return the name of the most recent directory, efficiently.


> snappuller inefficient finding latest snapshot
> ----------------------------------------------
>
>                 Key: SOLR-207
>                 URL: https://issues.apache.org/jira/browse/SOLR-207
>             Project: Solr
>          Issue Type: Bug
>          Components: replication
>            Reporter: Yonik Seeley
>         Attachments: find_maxdepth.patch
>
>
> snapinstaller (and snappuller) do the following to find the latest snapshot:
> name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
> This recurses into all of the snapshot directories, doing much more disk-io than is necessary.
> I think it is the cause of bloated kernel memory usage we have seen on some of our Linux boxes, caused
> by kernel dentry and inode caches.   Those caches compete with buffer cache (caching the actual data of the index)
> and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

Posted by "Bertrand Delacretaz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488468 ] 

Bertrand Delacretaz commented on SOLR-207:
------------------------------------------

I think find -maxdepth is not supported on Solaris. And the -t option in my previous example was obviously wrong.

I'm not sure if ls -r sorts by filename everywhere (but I have no evidence that it does not).

The most portable version might be

  ls ${data_dir} | grep snapshot\\. | grep -v wip | sort -r | head -1 

> snappuller inefficient finding latest snapshot
> ----------------------------------------------
>
>                 Key: SOLR-207
>                 URL: https://issues.apache.org/jira/browse/SOLR-207
>             Project: Solr
>          Issue Type: Bug
>          Components: replication
>            Reporter: Yonik Seeley
>         Attachments: find_maxdepth.patch
>
>
> snapinstaller (and snappuller) do the following to find the latest snapshot:
> name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
> This recurses into all of the snapshot directories, doing much more disk-io than is necessary.
> I think it is the cause of bloated kernel memory usage we have seen on some of our Linux boxes, caused
> by kernel dentry and inode caches.   Those caches compete with buffer cache (caching the actual data of the index)
> and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

Posted by "Bill Au (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488491 ] 

Bill Au commented on SOLR-207:
------------------------------

I confirmed that find -maxdepth does not work on Solaris.  So it is back to ls.  We should be OK as long as we don't use any wildcard that causes expansion.

> snappuller inefficient finding latest snapshot
> ----------------------------------------------
>
>                 Key: SOLR-207
>                 URL: https://issues.apache.org/jira/browse/SOLR-207
>             Project: Solr
>          Issue Type: Bug
>          Components: replication
>            Reporter: Yonik Seeley
>         Attachments: find_maxdepth.patch
>
>
> snapinstaller (and snappuller) do the following to find the latest snapshot:
> name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
> This recurses into all of the snapshot directories, doing much more disk-io than is necessary.
> I think it is the cause of bloated kernel memory usage we have seen on some of our Linux boxes, caused
> by kernel dentry and inode caches.   Those caches compete with buffer cache (caching the actual data of the index)
> and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.