You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2020/06/01 14:29:43 UTC

[GitHub] [lucene-solr] dsmiley opened a new pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

dsmiley opened a new pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550


   I switched from "java-library" type of Gradle plugin/module to more plainly "java" because this module isn't just some library, it's closer to an app.  I tried type "application" but I didn't have the same control that the "JavaExec" task gives you.  One consequence of not using "java-library" is that the names of the categories of dependencies are different, and so this appears odd/unusual relative to the other modules.
   
   I did not convert "collation" and "shingle" Ant targets, but I put there the two-line CLI equivalents for both in the form of a comment.  I ram them and they worked... albeit a confusion in one of the perl scripts that thought "darwin" OS was ==~ Windows simply because it contained "win" :-). 
   
   Notice the style of "getEnWiki" and "getGeoNames" and "getTop100kWikiWordFiles":  One task that does all it needs to do by adding a final step in doLast.  Now notice a different style: "reuters" (depending on extractReuters depending on getReuters).  This is more verbose, but admittedly for this case it has to do more.  I'm not well versed enough in Gradle to know which style is preferable.  I lean towards short & concise.  The current state is a nocommit IMO; need to harmonize the approaches.
   
   I did not convert https://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz (aka news20) or https://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz or https://kdd.ics.uci.edu/databases/20newsgroups/mini_newsgroups.tar.gz (aka mini-news) because I could not find .alg files that used them.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550#discussion_r434727518



##########
File path: lucene/benchmark/build.gradle
##########
@@ -37,5 +37,121 @@ dependencies {
     exclude module: "xml-apis"
   })
 
+  runtimeOnly project(':lucene:analysis:icu')
+
   testImplementation project(':lucene:test-framework')
 }
+
+def tempDir = file("temp")
+def workDir = file("work")
+
+task run(type: JavaExec) {
+  description "Run a perf test (optional: -PtaskAlg=conf/your-algorithm-file -PmaxHeapSize=1G)"
+  main 'org.apache.lucene.benchmark.byTask.Benchmark'
+  classpath sourceSets.main.runtimeClasspath
+  // allow these to be specified on the CLI via -PtaskAlg=  for example
+  def taskAlg = propertyOrDefault('taskAlg', 'conf/micro-standard.alg')
+  args = [taskAlg]
+
+  maxHeapSize = propertyOrDefault('maxHeapSize', '1G')
+
+  String stdOutStr = propertyOrDefault('standardOutput', null)

Review comment:
       ehh; I'd prefer to keep this the way it is.  The code/scripts in the alg files generally don't print tons of output, so I don't think there's a perf interference concern.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

Posted by GitBox <gi...@apache.org>.
dsmiley commented on pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550#issuecomment-639848880


   Merged: 89784ad7be45640ad835fe41981becfc91e62349


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] madrob commented on a change in pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

Posted by GitBox <gi...@apache.org>.
madrob commented on a change in pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550#discussion_r433278450



##########
File path: lucene/benchmark/build.gradle
##########
@@ -15,27 +15,138 @@
  * limitations under the License.
  */
 
-
-apply plugin: 'java-library'
+apply plugin: 'java'
+// NOT a 'java-library'.  Maybe 'application' but seems too limiting.
 
 description = 'System for benchmarking Lucene'
 
 dependencies {  
-  api project(':lucene:core')
-
-  implementation project(':lucene:analysis:common')
-  implementation project(':lucene:facet')
-  implementation project(':lucene:highlighter')
-  implementation project(':lucene:queries')
-  implementation project(':lucene:spatial-extras')
-  implementation project(':lucene:queryparser')
-
-  implementation "org.apache.commons:commons-compress"
-  implementation "com.ibm.icu:icu4j"
-  implementation "org.locationtech.spatial4j:spatial4j"
-  implementation("net.sourceforge.nekohtml:nekohtml", {
+  compile project(':lucene:core')
+
+  compile project(':lucene:analysis:common')
+  compile project(':lucene:facet')
+  compile project(':lucene:highlighter')
+  compile project(':lucene:queries')
+  compile project(':lucene:spatial-extras')
+  compile project(':lucene:queryparser')
+
+  compile "org.apache.commons:commons-compress"
+  compile "com.ibm.icu:icu4j"
+  compile "org.locationtech.spatial4j:spatial4j"
+  compile("net.sourceforge.nekohtml:nekohtml", {
     exclude module: "xml-apis"
   })
 
-  testImplementation project(':lucene:test-framework')
+  runtime project(':lucene:analysis:icu')
+
+  testCompile project(':lucene:test-framework')
+}
+
+ext {
+  tempDir = file("temp")
+  workDir = file("work")
+}
+
+task run(type: JavaExec) {
+  description "Run a perf test (optional: -PtaskAlg=conf/your-algorithm-file -PmaxHeapSize=1G)"
+  main 'org.apache.lucene.benchmark.byTask.Benchmark'
+  classpath sourceSets.main.runtimeClasspath
+  // allow these to be specified on the CLI via -PtaskAlg=  for example
+  def taskAlg = project.properties['taskAlg'] ?: 'conf/micro-standard.alg'
+  args = [taskAlg]
+
+  maxHeapSize = project.properties['maxHeapSize'] ?: '1G'
+
+  String stdOutStr = project.properties['standardOutput']
+  if (stdOutStr != null) {
+    standardOutput = new File(stdOutStr).newOutputStream()
+  }
+
+  debugOptions {
+    enabled = false
+    port = 5005
+    suspend = true
+  }
+}
+
+/* Old "collation" Ant target:
+gradle getTop100kWikiWordFiles run -PtaskAlg=conf/collation.alg -PstandardOutput=work/collation.benchmark.output.txt
+perl -CSD scripts/collation.bm2jira.pl work/collation.benchmark.output.txt
+ */
+
+/* Old "shingle" Ant target:
+gradle reuters run -PtaskAlg=conf/shingle.alg -PstandardOutput=work/shingle.benchmark.output.txt
+perl -CSD scripts/shingle.bm2jira.pl work/shingle.benchmark.output.txt
+ */
+
+// The remaining tasks just get / extract / prepare data
+
+task getEnWiki(type: Download) {
+  src "https://home.apache.org/~dsmiley/data/enwiki-20070527-pages-articles.xml.bz2"
+  dest file("$tempDir/${src.file.split('/').last()}")
+  overwrite false
+  compress false
+
+  doLast {
+    ant.bunzip2(src: dest, dest: tempDir) // will chop off .bz2
+  }
+}
+
+task getGeoNames(type: Download) {
+  // note: latest data is at: https://download.geonames.org/export/dump/allCountries.zip
+  //       and then randomize with: gsort -R -S 1500M file.txt > file_random.txt
+  //       and then compress with: bzip2 -9 -k file_random.txt
+  src "https://home.apache.org/~dsmiley/data/geonames_20130921_randomOrder_allCountries.txt.bz2"
+  dest file("$tempDir/${src.file.split('/').last()}")
+  overwrite false
+  compress false
+
+  doLast {
+    ant.bunzip2(src: dest, dest: tempDir) // will chop off .bz2
+  }
+}
+
+task getReuters(type: Download) {
+  // note: there is no HTTPS url and we don't care because this is merely test/perf data
+  src "http://www.daviddlewis.com/resources/testcollections/reuters21578/reuters21578.tar.gz"
+  dest file("$tempDir/${src.file.split('/').last()}")
+  overwrite false
+  compress false
+}
+task extractReuters(type: Copy) {
+  dependsOn getReuters
+  from(tarTree(getReuters.dest)) { // can expand a .gz on the fly
+    exclude '*.txt'
+  }
+  into file("$workDir/reuters")
+}
+task reuters(type: JavaExec) {
+  dependsOn extractReuters
+  def input = extractReuters.outputs.files[0]
+  def output = "$workDir/reuters-out"
+  inputs.dir(input)
+  outputs.dir(output)
+  main = 'org.apache.lucene.benchmark.utils.ExtractReuters'
+  classpath = sourceSets.main.runtimeClasspath
+  jvmArgs = ['-Xmx1G']

Review comment:
       Use `maxHeapSize`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550#discussion_r434805283



##########
File path: lucene/benchmark/build.gradle
##########
@@ -15,13 +15,13 @@
  * limitations under the License.
  */
 
-
-apply plugin: 'java-library'
+apply plugin: 'java'
+// NOT a 'java-library'.  Maybe 'application' but seems too limiting.

Review comment:
       From my (seasoned) gradle viewpoint this comment really doesn't make much sense: it's not an "application" in gradle sense - we launch multiple classes, have infrastructure in the build file, not a main class etc. But fine with me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550#discussion_r433824063



##########
File path: lucene/benchmark/build.gradle
##########
@@ -15,27 +15,138 @@
  * limitations under the License.
  */
 
-
-apply plugin: 'java-library'
+apply plugin: 'java'
+// NOT a 'java-library'.  Maybe 'application' but seems too limiting.
 
 description = 'System for benchmarking Lucene'
 
 dependencies {  
-  api project(':lucene:core')
-
-  implementation project(':lucene:analysis:common')
-  implementation project(':lucene:facet')
-  implementation project(':lucene:highlighter')
-  implementation project(':lucene:queries')
-  implementation project(':lucene:spatial-extras')
-  implementation project(':lucene:queryparser')
-
-  implementation "org.apache.commons:commons-compress"
-  implementation "com.ibm.icu:icu4j"
-  implementation "org.locationtech.spatial4j:spatial4j"
-  implementation("net.sourceforge.nekohtml:nekohtml", {
+  compile project(':lucene:core')
+
+  compile project(':lucene:analysis:common')
+  compile project(':lucene:facet')
+  compile project(':lucene:highlighter')
+  compile project(':lucene:queries')
+  compile project(':lucene:spatial-extras')
+  compile project(':lucene:queryparser')
+
+  compile "org.apache.commons:commons-compress"
+  compile "com.ibm.icu:icu4j"
+  compile "org.locationtech.spatial4j:spatial4j"
+  compile("net.sourceforge.nekohtml:nekohtml", {
     exclude module: "xml-apis"
   })
 
-  testImplementation project(':lucene:test-framework')
+  runtime project(':lucene:analysis:icu')
+
+  testCompile project(':lucene:test-framework')
+}
+
+ext {
+  tempDir = file("temp")
+  workDir = file("work")
+}
+
+task run(type: JavaExec) {
+  description "Run a perf test (optional: -PtaskAlg=conf/your-algorithm-file -PmaxHeapSize=1G)"
+  main 'org.apache.lucene.benchmark.byTask.Benchmark'
+  classpath sourceSets.main.runtimeClasspath
+  // allow these to be specified on the CLI via -PtaskAlg=  for example
+  def taskAlg = project.properties['taskAlg'] ?: 'conf/micro-standard.alg'
+  args = [taskAlg]
+
+  maxHeapSize = project.properties['maxHeapSize'] ?: '1G'
+
+  String stdOutStr = project.properties['standardOutput']
+  if (stdOutStr != null) {
+    standardOutput = new File(stdOutStr).newOutputStream()
+  }
+
+  debugOptions {
+    enabled = false
+    port = 5005
+    suspend = true
+  }
+}
+
+/* Old "collation" Ant target:
+gradle getTop100kWikiWordFiles run -PtaskAlg=conf/collation.alg -PstandardOutput=work/collation.benchmark.output.txt
+perl -CSD scripts/collation.bm2jira.pl work/collation.benchmark.output.txt
+ */
+
+/* Old "shingle" Ant target:
+gradle reuters run -PtaskAlg=conf/shingle.alg -PstandardOutput=work/shingle.benchmark.output.txt
+perl -CSD scripts/shingle.bm2jira.pl work/shingle.benchmark.output.txt
+ */
+
+// The remaining tasks just get / extract / prepare data
+
+task getEnWiki(type: Download) {
+  src "https://home.apache.org/~dsmiley/data/enwiki-20070527-pages-articles.xml.bz2"
+  dest file("$tempDir/${src.file.split('/').last()}")
+  overwrite false
+  compress false
+
+  doLast {
+    ant.bunzip2(src: dest, dest: tempDir) // will chop off .bz2
+  }
+}
+
+task getGeoNames(type: Download) {
+  // note: latest data is at: https://download.geonames.org/export/dump/allCountries.zip
+  //       and then randomize with: gsort -R -S 1500M file.txt > file_random.txt
+  //       and then compress with: bzip2 -9 -k file_random.txt
+  src "https://home.apache.org/~dsmiley/data/geonames_20130921_randomOrder_allCountries.txt.bz2"
+  dest file("$tempDir/${src.file.split('/').last()}")
+  overwrite false
+  compress false
+
+  doLast {
+    ant.bunzip2(src: dest, dest: tempDir) // will chop off .bz2
+  }
+}
+
+task getReuters(type: Download) {
+  // note: there is no HTTPS url and we don't care because this is merely test/perf data
+  src "http://www.daviddlewis.com/resources/testcollections/reuters21578/reuters21578.tar.gz"
+  dest file("$tempDir/${src.file.split('/').last()}")
+  overwrite false
+  compress false
+}
+task extractReuters(type: Copy) {

Review comment:
       Add newline between tasks for clarity?

##########
File path: lucene/benchmark/build.gradle
##########
@@ -15,27 +15,138 @@
  * limitations under the License.
  */
 
-
-apply plugin: 'java-library'
+apply plugin: 'java'
+// NOT a 'java-library'.  Maybe 'application' but seems too limiting.
 
 description = 'System for benchmarking Lucene'
 
 dependencies {  
-  api project(':lucene:core')
-
-  implementation project(':lucene:analysis:common')
-  implementation project(':lucene:facet')
-  implementation project(':lucene:highlighter')
-  implementation project(':lucene:queries')
-  implementation project(':lucene:spatial-extras')
-  implementation project(':lucene:queryparser')
-
-  implementation "org.apache.commons:commons-compress"
-  implementation "com.ibm.icu:icu4j"
-  implementation "org.locationtech.spatial4j:spatial4j"
-  implementation("net.sourceforge.nekohtml:nekohtml", {
+  compile project(':lucene:core')
+
+  compile project(':lucene:analysis:common')
+  compile project(':lucene:facet')
+  compile project(':lucene:highlighter')
+  compile project(':lucene:queries')
+  compile project(':lucene:spatial-extras')
+  compile project(':lucene:queryparser')
+
+  compile "org.apache.commons:commons-compress"
+  compile "com.ibm.icu:icu4j"
+  compile "org.locationtech.spatial4j:spatial4j"
+  compile("net.sourceforge.nekohtml:nekohtml", {
     exclude module: "xml-apis"
   })
 
-  testImplementation project(':lucene:test-framework')
+  runtime project(':lucene:analysis:icu')
+
+  testCompile project(':lucene:test-framework')

Review comment:
       Similar to above.

##########
File path: lucene/benchmark/build.gradle
##########
@@ -15,27 +15,138 @@
  * limitations under the License.
  */
 
-
-apply plugin: 'java-library'
+apply plugin: 'java'
+// NOT a 'java-library'.  Maybe 'application' but seems too limiting.
 
 description = 'System for benchmarking Lucene'
 
 dependencies {  
-  api project(':lucene:core')
-
-  implementation project(':lucene:analysis:common')
-  implementation project(':lucene:facet')
-  implementation project(':lucene:highlighter')
-  implementation project(':lucene:queries')
-  implementation project(':lucene:spatial-extras')
-  implementation project(':lucene:queryparser')
-
-  implementation "org.apache.commons:commons-compress"
-  implementation "com.ibm.icu:icu4j"
-  implementation "org.locationtech.spatial4j:spatial4j"
-  implementation("net.sourceforge.nekohtml:nekohtml", {
+  compile project(':lucene:core')

Review comment:
       This is wrong. You used a deprecated configuration name. Please take a look at this:
   
   https://docs.gradle.org/current/userguide/java_plugin.html#sec:java_plugin_and_dependency_management
   
   All these should be implementation dependencies I think.

##########
File path: lucene/benchmark/build.gradle
##########
@@ -15,27 +15,138 @@
  * limitations under the License.
  */
 
-
-apply plugin: 'java-library'
+apply plugin: 'java'
+// NOT a 'java-library'.  Maybe 'application' but seems too limiting.
 
 description = 'System for benchmarking Lucene'
 
 dependencies {  
-  api project(':lucene:core')
-
-  implementation project(':lucene:analysis:common')
-  implementation project(':lucene:facet')
-  implementation project(':lucene:highlighter')
-  implementation project(':lucene:queries')
-  implementation project(':lucene:spatial-extras')
-  implementation project(':lucene:queryparser')
-
-  implementation "org.apache.commons:commons-compress"
-  implementation "com.ibm.icu:icu4j"
-  implementation "org.locationtech.spatial4j:spatial4j"
-  implementation("net.sourceforge.nekohtml:nekohtml", {
+  compile project(':lucene:core')
+
+  compile project(':lucene:analysis:common')
+  compile project(':lucene:facet')
+  compile project(':lucene:highlighter')
+  compile project(':lucene:queries')
+  compile project(':lucene:spatial-extras')
+  compile project(':lucene:queryparser')
+
+  compile "org.apache.commons:commons-compress"
+  compile "com.ibm.icu:icu4j"
+  compile "org.locationtech.spatial4j:spatial4j"
+  compile("net.sourceforge.nekohtml:nekohtml", {
     exclude module: "xml-apis"
   })
 
-  testImplementation project(':lucene:test-framework')
+  runtime project(':lucene:analysis:icu')
+
+  testCompile project(':lucene:test-framework')
+}
+
+ext {
+  tempDir = file("temp")
+  workDir = file("work")
+}
+
+task run(type: JavaExec) {
+  description "Run a perf test (optional: -PtaskAlg=conf/your-algorithm-file -PmaxHeapSize=1G)"
+  main 'org.apache.lucene.benchmark.byTask.Benchmark'
+  classpath sourceSets.main.runtimeClasspath
+  // allow these to be specified on the CLI via -PtaskAlg=  for example
+  def taskAlg = project.properties['taskAlg'] ?: 'conf/micro-standard.alg'

Review comment:
       Use global function propertyOrDefault which accepts more than just project.properties.

##########
File path: lucene/benchmark/build.gradle
##########
@@ -15,27 +15,138 @@
  * limitations under the License.
  */
 
-
-apply plugin: 'java-library'
+apply plugin: 'java'
+// NOT a 'java-library'.  Maybe 'application' but seems too limiting.
 
 description = 'System for benchmarking Lucene'
 
 dependencies {  
-  api project(':lucene:core')
-
-  implementation project(':lucene:analysis:common')
-  implementation project(':lucene:facet')
-  implementation project(':lucene:highlighter')
-  implementation project(':lucene:queries')
-  implementation project(':lucene:spatial-extras')
-  implementation project(':lucene:queryparser')
-
-  implementation "org.apache.commons:commons-compress"
-  implementation "com.ibm.icu:icu4j"
-  implementation "org.locationtech.spatial4j:spatial4j"
-  implementation("net.sourceforge.nekohtml:nekohtml", {
+  compile project(':lucene:core')
+
+  compile project(':lucene:analysis:common')
+  compile project(':lucene:facet')
+  compile project(':lucene:highlighter')
+  compile project(':lucene:queries')
+  compile project(':lucene:spatial-extras')
+  compile project(':lucene:queryparser')
+
+  compile "org.apache.commons:commons-compress"
+  compile "com.ibm.icu:icu4j"
+  compile "org.locationtech.spatial4j:spatial4j"
+  compile("net.sourceforge.nekohtml:nekohtml", {
     exclude module: "xml-apis"
   })
 
-  testImplementation project(':lucene:test-framework')
+  runtime project(':lucene:analysis:icu')
+
+  testCompile project(':lucene:test-framework')
+}
+
+ext {
+  tempDir = file("temp")
+  workDir = file("work")
+}
+
+task run(type: JavaExec) {
+  description "Run a perf test (optional: -PtaskAlg=conf/your-algorithm-file -PmaxHeapSize=1G)"
+  main 'org.apache.lucene.benchmark.byTask.Benchmark'
+  classpath sourceSets.main.runtimeClasspath
+  // allow these to be specified on the CLI via -PtaskAlg=  for example
+  def taskAlg = project.properties['taskAlg'] ?: 'conf/micro-standard.alg'
+  args = [taskAlg]
+
+  maxHeapSize = project.properties['maxHeapSize'] ?: '1G'
+
+  String stdOutStr = project.properties['standardOutput']
+  if (stdOutStr != null) {
+    standardOutput = new File(stdOutStr).newOutputStream()
+  }
+
+  debugOptions {
+    enabled = false
+    port = 5005
+    suspend = true
+  }
+}
+
+/* Old "collation" Ant target:
+gradle getTop100kWikiWordFiles run -PtaskAlg=conf/collation.alg -PstandardOutput=work/collation.benchmark.output.txt
+perl -CSD scripts/collation.bm2jira.pl work/collation.benchmark.output.txt
+ */
+
+/* Old "shingle" Ant target:
+gradle reuters run -PtaskAlg=conf/shingle.alg -PstandardOutput=work/shingle.benchmark.output.txt
+perl -CSD scripts/shingle.bm2jira.pl work/shingle.benchmark.output.txt
+ */
+
+// The remaining tasks just get / extract / prepare data
+
+task getEnWiki(type: Download) {
+  src "https://home.apache.org/~dsmiley/data/enwiki-20070527-pages-articles.xml.bz2"
+  dest file("$tempDir/${src.file.split('/').last()}")
+  overwrite false
+  compress false
+
+  doLast {
+    ant.bunzip2(src: dest, dest: tempDir) // will chop off .bz2
+  }
+}
+
+task getGeoNames(type: Download) {
+  // note: latest data is at: https://download.geonames.org/export/dump/allCountries.zip
+  //       and then randomize with: gsort -R -S 1500M file.txt > file_random.txt
+  //       and then compress with: bzip2 -9 -k file_random.txt
+  src "https://home.apache.org/~dsmiley/data/geonames_20130921_randomOrder_allCountries.txt.bz2"
+  dest file("$tempDir/${src.file.split('/').last()}")
+  overwrite false
+  compress false
+
+  doLast {
+    ant.bunzip2(src: dest, dest: tempDir) // will chop off .bz2
+  }
+}
+
+task getReuters(type: Download) {
+  // note: there is no HTTPS url and we don't care because this is merely test/perf data
+  src "http://www.daviddlewis.com/resources/testcollections/reuters21578/reuters21578.tar.gz"
+  dest file("$tempDir/${src.file.split('/').last()}")
+  overwrite false
+  compress false
+}
+task extractReuters(type: Copy) {
+  dependsOn getReuters
+  from(tarTree(getReuters.dest)) { // can expand a .gz on the fly
+    exclude '*.txt'
+  }
+  into file("$workDir/reuters")
+}
+task reuters(type: JavaExec) {
+  dependsOn extractReuters
+  def input = extractReuters.outputs.files[0]

Review comment:
       A few problems here - this runs at configuration time so you can't take outputs of another task right away (yes, it'll work but it's not right). Besides, inputs and outputs don't need to be declared for these tasks at all so I'd just leave out inputs.dir and outputs.dir entirely. The task will just always execute.
   
   Args should be moved to doFirst or use absolute location (not the depending task's resolved outputs).

##########
File path: lucene/benchmark/build.gradle
##########
@@ -15,27 +15,138 @@
  * limitations under the License.
  */
 
-
-apply plugin: 'java-library'
+apply plugin: 'java'
+// NOT a 'java-library'.  Maybe 'application' but seems too limiting.
 
 description = 'System for benchmarking Lucene'
 
 dependencies {  
-  api project(':lucene:core')
-
-  implementation project(':lucene:analysis:common')
-  implementation project(':lucene:facet')
-  implementation project(':lucene:highlighter')
-  implementation project(':lucene:queries')
-  implementation project(':lucene:spatial-extras')
-  implementation project(':lucene:queryparser')
-
-  implementation "org.apache.commons:commons-compress"
-  implementation "com.ibm.icu:icu4j"
-  implementation "org.locationtech.spatial4j:spatial4j"
-  implementation("net.sourceforge.nekohtml:nekohtml", {
+  compile project(':lucene:core')
+
+  compile project(':lucene:analysis:common')
+  compile project(':lucene:facet')
+  compile project(':lucene:highlighter')
+  compile project(':lucene:queries')
+  compile project(':lucene:spatial-extras')
+  compile project(':lucene:queryparser')
+
+  compile "org.apache.commons:commons-compress"
+  compile "com.ibm.icu:icu4j"
+  compile "org.locationtech.spatial4j:spatial4j"
+  compile("net.sourceforge.nekohtml:nekohtml", {
     exclude module: "xml-apis"
   })
 
-  testImplementation project(':lucene:test-framework')
+  runtime project(':lucene:analysis:icu')
+
+  testCompile project(':lucene:test-framework')
+}
+
+ext {
+  tempDir = file("temp")
+  workDir = file("work")
+}
+
+task run(type: JavaExec) {
+  description "Run a perf test (optional: -PtaskAlg=conf/your-algorithm-file -PmaxHeapSize=1G)"
+  main 'org.apache.lucene.benchmark.byTask.Benchmark'
+  classpath sourceSets.main.runtimeClasspath
+  // allow these to be specified on the CLI via -PtaskAlg=  for example
+  def taskAlg = project.properties['taskAlg'] ?: 'conf/micro-standard.alg'
+  args = [taskAlg]
+
+  maxHeapSize = project.properties['maxHeapSize'] ?: '1G'
+
+  String stdOutStr = project.properties['standardOutput']
+  if (stdOutStr != null) {
+    standardOutput = new File(stdOutStr).newOutputStream()
+  }
+
+  debugOptions {
+    enabled = false
+    port = 5005
+    suspend = true
+  }
+}
+
+/* Old "collation" Ant target:
+gradle getTop100kWikiWordFiles run -PtaskAlg=conf/collation.alg -PstandardOutput=work/collation.benchmark.output.txt
+perl -CSD scripts/collation.bm2jira.pl work/collation.benchmark.output.txt
+ */
+
+/* Old "shingle" Ant target:
+gradle reuters run -PtaskAlg=conf/shingle.alg -PstandardOutput=work/shingle.benchmark.output.txt
+perl -CSD scripts/shingle.bm2jira.pl work/shingle.benchmark.output.txt
+ */
+
+// The remaining tasks just get / extract / prepare data
+
+task getEnWiki(type: Download) {
+  src "https://home.apache.org/~dsmiley/data/enwiki-20070527-pages-articles.xml.bz2"
+  dest file("$tempDir/${src.file.split('/').last()}")
+  overwrite false
+  compress false
+
+  doLast {
+    ant.bunzip2(src: dest, dest: tempDir) // will chop off .bz2
+  }
+}
+
+task getGeoNames(type: Download) {
+  // note: latest data is at: https://download.geonames.org/export/dump/allCountries.zip
+  //       and then randomize with: gsort -R -S 1500M file.txt > file_random.txt
+  //       and then compress with: bzip2 -9 -k file_random.txt
+  src "https://home.apache.org/~dsmiley/data/geonames_20130921_randomOrder_allCountries.txt.bz2"
+  dest file("$tempDir/${src.file.split('/').last()}")
+  overwrite false
+  compress false
+
+  doLast {
+    ant.bunzip2(src: dest, dest: tempDir) // will chop off .bz2
+  }
+}
+
+task getReuters(type: Download) {
+  // note: there is no HTTPS url and we don't care because this is merely test/perf data
+  src "http://www.daviddlewis.com/resources/testcollections/reuters21578/reuters21578.tar.gz"
+  dest file("$tempDir/${src.file.split('/').last()}")
+  overwrite false
+  compress false
+}
+task extractReuters(type: Copy) {
+  dependsOn getReuters
+  from(tarTree(getReuters.dest)) { // can expand a .gz on the fly
+    exclude '*.txt'
+  }
+  into file("$workDir/reuters")
+}
+task reuters(type: JavaExec) {
+  dependsOn extractReuters
+  def input = extractReuters.outputs.files[0]
+  def output = "$workDir/reuters-out"
+  inputs.dir(input)
+  outputs.dir(output)
+  main = 'org.apache.lucene.benchmark.utils.ExtractReuters'
+  classpath = sourceSets.main.runtimeClasspath
+  jvmArgs = ['-Xmx1G']
+  args = [input, output]
+
+  doFirst {
+    file(output).deleteDir()
+    println "Extracting reuters to $output"
+  }
+}
+
+task getTop100kWikiWordFiles(type: Download) {
+  src "https://home.apache.org/~rmuir/wikipedia/top.100k.words.de.en.fr.uk.wikipedia.2009-11.tar.bz2"
+  dest file("$tempDir/${src.file.split('/').last()}")
+  overwrite false
+  compress false
+
+  doLast {
+    copy {

Review comment:
       sync rather than copy?

##########
File path: lucene/benchmark/build.gradle
##########
@@ -15,27 +15,138 @@
  * limitations under the License.
  */
 
-
-apply plugin: 'java-library'
+apply plugin: 'java'
+// NOT a 'java-library'.  Maybe 'application' but seems too limiting.
 
 description = 'System for benchmarking Lucene'
 
 dependencies {  
-  api project(':lucene:core')
-
-  implementation project(':lucene:analysis:common')
-  implementation project(':lucene:facet')
-  implementation project(':lucene:highlighter')
-  implementation project(':lucene:queries')
-  implementation project(':lucene:spatial-extras')
-  implementation project(':lucene:queryparser')
-
-  implementation "org.apache.commons:commons-compress"
-  implementation "com.ibm.icu:icu4j"
-  implementation "org.locationtech.spatial4j:spatial4j"
-  implementation("net.sourceforge.nekohtml:nekohtml", {
+  compile project(':lucene:core')
+
+  compile project(':lucene:analysis:common')
+  compile project(':lucene:facet')
+  compile project(':lucene:highlighter')
+  compile project(':lucene:queries')
+  compile project(':lucene:spatial-extras')
+  compile project(':lucene:queryparser')
+
+  compile "org.apache.commons:commons-compress"
+  compile "com.ibm.icu:icu4j"
+  compile "org.locationtech.spatial4j:spatial4j"
+  compile("net.sourceforge.nekohtml:nekohtml", {
     exclude module: "xml-apis"
   })
 
-  testImplementation project(':lucene:test-framework')
+  runtime project(':lucene:analysis:icu')
+
+  testCompile project(':lucene:test-framework')
+}
+
+ext {

Review comment:
       Only declare externalized properties if they really have to be externalized (read from somewhere outside the script). Here it's fine to just declare variables.
   
   def tempDir = project.file("temp")




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550#discussion_r434352523



##########
File path: lucene/benchmark/build.gradle
##########
@@ -37,5 +37,121 @@ dependencies {
     exclude module: "xml-apis"
   })
 
+  runtimeOnly project(':lucene:analysis:icu')
+
   testImplementation project(':lucene:test-framework')
 }
+
+def tempDir = file("temp")
+def workDir = file("work")
+
+task run(type: JavaExec) {
+  description "Run a perf test (optional: -PtaskAlg=conf/your-algorithm-file -PmaxHeapSize=1G)"
+  main 'org.apache.lucene.benchmark.byTask.Benchmark'
+  classpath sourceSets.main.runtimeClasspath
+  // allow these to be specified on the CLI via -PtaskAlg=  for example
+  def taskAlg = propertyOrDefault('taskAlg', 'conf/micro-standard.alg')

Review comment:
       I'd just inline taskAlg into the array for brevity, but it's fine as is too.

##########
File path: lucene/benchmark/build.gradle
##########
@@ -37,5 +37,121 @@ dependencies {
     exclude module: "xml-apis"
   })
 
+  runtimeOnly project(':lucene:analysis:icu')
+
   testImplementation project(':lucene:test-framework')
 }
+
+def tempDir = file("temp")
+def workDir = file("work")
+
+task run(type: JavaExec) {
+  description "Run a perf test (optional: -PtaskAlg=conf/your-algorithm-file -PmaxHeapSize=1G)"
+  main 'org.apache.lucene.benchmark.byTask.Benchmark'
+  classpath sourceSets.main.runtimeClasspath
+  // allow these to be specified on the CLI via -PtaskAlg=  for example
+  def taskAlg = propertyOrDefault('taskAlg', 'conf/micro-standard.alg')
+  args = [taskAlg]
+
+  maxHeapSize = propertyOrDefault('maxHeapSize', '1G')
+
+  String stdOutStr = propertyOrDefault('standardOutput', null)

Review comment:
       Just had a random thought that if you don't redirect to a file the process is piped between gradle (parent) and this may cause artificial slowdowns on buffers between processes... Don't know if this matters but an alternative design could create a temporary file (task class has a method for creating task-relative temporary files), redirect the output into that file (always) and only pipe it to the console at the end if stdOutStr is not defined. 
   
   I really don't know how these benchmarks are used in practice but I wanted to signal a potential issue here.

##########
File path: lucene/benchmark/build.gradle
##########
@@ -15,13 +15,13 @@
  * limitations under the License.
  */
 
-
-apply plugin: 'java-library'
+apply plugin: 'java'
+// NOT a 'java-library'.  Maybe 'application' but seems too limiting.

Review comment:
       I think java plugin is more than fine here so remove the comment for the final version, maybe?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550#discussion_r434727998



##########
File path: lucene/benchmark/build.gradle
##########
@@ -15,13 +15,13 @@
  * limitations under the License.
  */
 
-
-apply plugin: 'java-library'
+apply plugin: 'java'
+// NOT a 'java-library'.  Maybe 'application' but seems too limiting.

Review comment:
       I like that this comment spells out a difference from how all the other modules are.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

Posted by GitBox <gi...@apache.org>.
dweiss commented on pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550#issuecomment-637493362


   > I switched from "java-library" type of Gradle plugin/module to more plainly "java" because this module isn't a library (isn't something depended on by anything), it's closer to an app. 
   
   Fine. 
   
   > I tried type "application" but I didn't have the same control that the "JavaExec" task gives you. One consequence of not using "java-library" is that the names of the categories of dependencies are different, and so this appears odd/unusual relative to the other modules.
   
   This isn't accurate. The names of dependency configurations are different (and their setup is different). Gradle plugins (such as java, java-library, etc.) set up defaultsa for essentially the same underlying infrastructure. So you could just stay with 'java-library' and it'd still be fine.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley closed pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

Posted by GitBox <gi...@apache.org>.
dsmiley closed pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550#discussion_r434805356



##########
File path: lucene/benchmark/build.gradle
##########
@@ -37,5 +37,121 @@ dependencies {
     exclude module: "xml-apis"
   })
 
+  runtimeOnly project(':lucene:analysis:icu')
+
   testImplementation project(':lucene:test-framework')
 }
+
+def tempDir = file("temp")
+def workDir = file("work")
+
+task run(type: JavaExec) {
+  description "Run a perf test (optional: -PtaskAlg=conf/your-algorithm-file -PmaxHeapSize=1G)"
+  main 'org.apache.lucene.benchmark.byTask.Benchmark'
+  classpath sourceSets.main.runtimeClasspath
+  // allow these to be specified on the CLI via -PtaskAlg=  for example
+  def taskAlg = propertyOrDefault('taskAlg', 'conf/micro-standard.alg')
+  args = [taskAlg]
+
+  maxHeapSize = propertyOrDefault('maxHeapSize', '1G')
+
+  String stdOutStr = propertyOrDefault('standardOutput', null)

Review comment:
       Sure.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550#discussion_r434119837



##########
File path: lucene/benchmark/build.gradle
##########
@@ -15,27 +15,138 @@
  * limitations under the License.
  */
 
-
-apply plugin: 'java-library'
+apply plugin: 'java'
+// NOT a 'java-library'.  Maybe 'application' but seems too limiting.
 
 description = 'System for benchmarking Lucene'
 
 dependencies {  
-  api project(':lucene:core')
-
-  implementation project(':lucene:analysis:common')
-  implementation project(':lucene:facet')
-  implementation project(':lucene:highlighter')
-  implementation project(':lucene:queries')
-  implementation project(':lucene:spatial-extras')
-  implementation project(':lucene:queryparser')
-
-  implementation "org.apache.commons:commons-compress"
-  implementation "com.ibm.icu:icu4j"
-  implementation "org.locationtech.spatial4j:spatial4j"
-  implementation("net.sourceforge.nekohtml:nekohtml", {
+  compile project(':lucene:core')

Review comment:
       Right; I was confused in my failed experiment with "application"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org