You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2011/04/01 17:09:06 UTC

[jira] [Closed] (NUTCH-50) Benchmarks & Performance goals

     [ https://issues.apache.org/jira/browse/NUTCH-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma closed NUTCH-50.
------------------------------


Bulk close of resolved issues:
http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_open_legacy_issues_in_jira

> Benchmarks & Performance goals
> ------------------------------
>
>                 Key: NUTCH-50
>                 URL: https://issues.apache.org/jira/browse/NUTCH-50
>             Project: Nutch
>          Issue Type: Task
>          Components: searcher
>         Environment: Linux, Windows
>            Reporter: byron miller
>            Assignee: Chris A. Mattmann
>             Fix For: 2.0
>
>
> I am interested in developing a strategy and toolset used to benchmark nutch search.  Please give your feedback on the following approaches or recommendations for setting standards and goals.
> Example test case(s).
> JDK 1.4.x 32 bit/Linux Platform
> Single Node/2 gigs of memory
> Single Index/Segment
> 1 million pages  
> -- single node --
> JDK 1.4.x 32 bit/Linux Platform
> Single Node/2 gigs of memory
> Single Index/Segment
> 10 million pages
> JDK 1.4.x 32 bit/Linux Platform
> Single Node/2 gigs of memory
> Single Index/Segment
> 10 million pages
> -- dual node --
> JDK 1.4.2 32 bit/Linux Platform
> 2 Node/2 gigs of memory
> 2 Indexes/Segments (1 per node)
> 1 million pages
> JDK 1.4.2 32 bit/Linux Platform
> 2 Node/2 gigs of memory
> 2 Indexes/Segments (1 per node)
> 1 million pages
> -- test queries --
> * single term
> * term AND term
> * exact "small phrase"
> * lang:en term
> * term cluster
> --- standards ----
> 10 results per page
> ---------------------
> For me a testcase will help prove scalability, bottlenecks, application environments, settings and such.  The amount of customizations availble is where we need to really look at setting the best base for X amount of documents and some type of scalability scale.  For example a 10 node system may only scale x percent better for x reasons and x is the bottleneck for that scenerio.
> Test cases would serve multiple purposes for returning performance, response time and application stability. 
> Tools/possibilities:
> * JMX components
> * http://grinder.sourceforge.net/
> * JMeter
> * others???
> ---------------------
> Query "stuffing" - use of dictionary that contains broad & vastly different terms. Something that could be scripted as a "warm up" for production systems as well.  Possibly combine terms from our logs of common search queries to use as a benchmark?
> What is your feedback/ideas on building a good test case/stress testing system/framework?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira