You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jason Rutherglen <ja...@gmail.com> on 2009/01/22 02:00:45 UTC

contrib Benchmark enwiki problem

I downloaded trunk via SVN.  Went to trunk/contrib/benchmark.  Executed ant
enwiki.  I'm not sure what else needs to be done. Received this error:

enwiki:
     [echo] Working Directory:
/Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/work
     [java] Running algorithm from:
/Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/conf/extractWikipedia.alg
     [java] ------------> config properties:
     [java] doc.maker =
org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker
     [java] doc.maker.forever = false
     [java] docs.file = temp/enwiki-20070527-pages-articles.xml
     [java] line.file.out = work/enwiki.txt
     [java] work.dir = work
     [java] -------------------------------
     [java] ------------> algorithm:
     [java]     Seq_Exhaust {
     [java]         WriteLineDoc
     [java]     > * EXHAUST
     [java]
     [java] Error: cannot execute the algorithm! work/enwiki.txt (No such
file or directory)
     [java] ####################
     [java] java.io.FileNotFoundException: work/enwiki.txt (No such file or
directory)
     [java] ###  D O N E !!! ###
     [java]     at java.io.FileOutputStream.open(Native Method)
     [java] ####################
     [java]     at
java.io.FileOutputStream.<init>(FileOutputStream.java:179)
     [java]     at java.io.FileOutputStream.<init>(FileOutputStream.java:70)
     [java]     at
org.apache.lucene.benchmark.byTask.tasks.WriteLineDocTask.setup(WriteLineDocTask.java:63)
     [java]     at
org.apache.lucene.benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:83)
     [java]     at
org.apache.lucene.benchmark.byTask.tasks.TaskSequence.doSerialTasks(TaskSequence.java:141)
     [java]     at
org.apache.lucene.benchmark.byTask.tasks.TaskSequence.doLogic(TaskSequence.java:122)
     [java]     at
org.apache.lucene.benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:92)
     [java]     at
org.apache.lucene.benchmark.byTask.utils.Algorithm.execute(Algorithm.java:246)
     [java]     at
org.apache.lucene.benchmark.byTask.Benchmark.execute(Benchmark.java:73)
     [java]     at
org.apache.lucene.benchmark.byTask.Benchmark.main(Benchmark.java:109)

Re: contrib Benchmark enwiki problem

Posted by Grant Ingersoll <gs...@apache.org>.
On Jan 23, 2009, at 3:36 PM, Michael McCandless wrote:

>
> I think temp is for downloading X.gz and un-gzipping it, and then X  
> is supposed to get unpacked/moved into work.  I think?
>

Yep.  I'm not married to it, so we can change it.  I think the key  
thing for me is you want to make sure you don't have to download that  
big file very often.


> work also holds the index subdir by default.
>
> I'd suggest moving your massive Wikipedia XML file to somewhere  
> "safe" (ie, not in "temp") and then symlinking from work to that  
> safe place.

+1

>
>
> Mike
>
> Jason Rutherglen wrote:
>
>> I did the symbolic link from work to temp and things worked.  Perhaps
>> benchmark should download directly to work?  What is temp for?
>>
>> On Thu, Jan 22, 2009 at 4:58 AM, Grant Ingersoll  
>> <gs...@apache.org>wrote:
>>
>>> There is a little funkiness in the ant script there in that if the  
>>> original
>>> file exists in temp, but hasn't been processed in work, then it  
>>> doesn't do
>>> the proper thing.  The workaround is to do the second step to get  
>>> into work
>>> by hand.  I believe there is a JIRA issue on it.
>>>
>>> Also, I highly recommend copying the original file off to  
>>> somewhere else on
>>> your system and soft linking to it so that you don't have to  
>>> download it
>>> again in case you accidentally delete that directory.
>>>
>>> -Grant
>>>
>>> On Jan 22, 2009, at 6:01 AM, Michael McCandless wrote:
>>>
>>>
>>>> An "alg" is simply a file (file.alg) that the benchmarking code  
>>>> runs.  You
>>>> run it something like this:
>>>>
>>>> ant run-task -Dtask.alg=/path/to/file.alg -Dtask.mem=1024M
>>>>
>>>> For docs... there's the package.html in contrib/benchmark.  LIA 2  
>>>> (only
>>>> via MEAP right now) also covers benchmark's alg syntax in an  
>>>> appendix.
>>>>
>>>> But you are close... oh, it actually looks like the output file
>>>> (work/enwiki.txt) could not be written.  Does that directory (../ 
>>>> work)
>>>> exist?  (I think build.xml should have created it).
>>>>
>>>> Mike
>>>>
>>>> Jason Rutherglen wrote:
>>>>
>>>> The xml file temp/enwiki-20070527-pages-articles.xml was  
>>>> downloaded by
>>>>> "ant
>>>>> get-enwiki expand-enwiki".  The docs.file in  
>>>>> extractWikipedia.alg and
>>>>> wikipedia.alg points to it.  The error message is regarding
>>>>> work/enwiki.txt.
>>>>>
>>>>> Is there a how to on this stuff?  What is an alg?
>>>>>
>>>>> On Wed, Jan 21, 2009 at 5:03 PM, Michael McCandless <
>>>>> lucene@mikemccandless.com> wrote:
>>>>>
>>>>>
>>>>>> You should download Wikipedia's XML file manually yourself,  
>>>>>> uncompress
>>>>>> it,
>>>>>> and then edit docs.file in that alg to point to it.
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>>
>>>>>> Jason Rutherglen wrote:
>>>>>>
>>>>>> I downloaded trunk via SVN.  Went to trunk/contrib/benchmark.   
>>>>>> Executed
>>>>>>
>>>>>>> ant
>>>>>>> enwiki.  I'm not sure what else needs to be done. Received  
>>>>>>> this error:
>>>>>>>
>>>>>>> enwiki:
>>>>>>> [echo] Working Directory:
>>>>>>> /Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/work
>>>>>>> [java] Running algorithm from:
>>>>>>>
>>>>>>>
>>>>>>> /Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/conf/ 
>>>>>>> extractWikipedia.alg
>>>>>>> [java] ------------> config properties:
>>>>>>> [java] doc.maker =
>>>>>>> org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker
>>>>>>> [java] doc.maker.forever = false
>>>>>>> [java] docs.file = temp/enwiki-20070527-pages-articles.xml
>>>>>>> [java] line.file.out = work/enwiki.txt
>>>>>>> [java] work.dir = work
>>>>>>> [java] -------------------------------
>>>>>>> [java] ------------> algorithm:
>>>>>>> [java]     Seq_Exhaust {
>>>>>>> [java]         WriteLineDoc
>>>>>>> [java]     > * EXHAUST
>>>>>>> [java]
>>>>>>> [java] Error: cannot execute the algorithm! work/enwiki.txt  
>>>>>>> (No such
>>>>>>> file or directory)
>>>>>>> [java] ####################
>>>>>>> [java] java.io.FileNotFoundException: work/enwiki.txt (No such  
>>>>>>> file or
>>>>>>> directory)
>>>>>>> [java] ###  D O N E !!! ###
>>>>>>> [java]     at java.io.FileOutputStream.open(Native Method)
>>>>>>> [java] ####################
>>>>>>> [java]     at
>>>>>>> java.io.FileOutputStream.<init>(FileOutputStream.java:179)
>>>>>>> [java]     at
>>>>>>> java.io.FileOutputStream.<init>(FileOutputStream.java:70)
>>>>>>> [java]     at
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .lucene 
>>>>>>> .benchmark 
>>>>>>> .byTask.tasks.WriteLineDocTask.setup(WriteLineDocTask.java:63)
>>>>>>> [java]     at
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .lucene 
>>>>>>> .benchmark 
>>>>>>> .byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:83)
>>>>>>> [java]     at
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .lucene 
>>>>>>> .benchmark 
>>>>>>> .byTask.tasks.TaskSequence.doSerialTasks(TaskSequence.java:141)
>>>>>>> [java]     at
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .lucene 
>>>>>>> .benchmark.byTask.tasks.TaskSequence.doLogic(TaskSequence.java: 
>>>>>>> 122)
>>>>>>> [java]     at
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .lucene 
>>>>>>> .benchmark 
>>>>>>> .byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:92)
>>>>>>> [java]     at
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .lucene 
>>>>>>> .benchmark.byTask.utils.Algorithm.execute(Algorithm.java:246)
>>>>>>> [java]     at
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .lucene.benchmark.byTask.Benchmark.execute(Benchmark.java:73)
>>>>>>> [java]     at
>>>>>>> org 
>>>>>>> .apache.lucene.benchmark.byTask.Benchmark.main(Benchmark.java: 
>>>>>>> 109)
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>> --------------------------
>>> Grant Ingersoll
>>>
>>> Lucene Helpful Hints:
>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: contrib Benchmark enwiki problem

Posted by Michael McCandless <lu...@mikemccandless.com>.
I think temp is for downloading X.gz and un-gzipping it, and then X is  
supposed to get unpacked/moved into work.  I think?

work also holds the index subdir by default.

I'd suggest moving your massive Wikipedia XML file to somewhere  
"safe" (ie, not in "temp") and then symlinking from work to that safe  
place.

Mike

Jason Rutherglen wrote:

> I did the symbolic link from work to temp and things worked.  Perhaps
> benchmark should download directly to work?  What is temp for?
>
> On Thu, Jan 22, 2009 at 4:58 AM, Grant Ingersoll  
> <gs...@apache.org>wrote:
>
>> There is a little funkiness in the ant script there in that if the  
>> original
>> file exists in temp, but hasn't been processed in work, then it  
>> doesn't do
>> the proper thing.  The workaround is to do the second step to get  
>> into work
>> by hand.  I believe there is a JIRA issue on it.
>>
>> Also, I highly recommend copying the original file off to somewhere  
>> else on
>> your system and soft linking to it so that you don't have to  
>> download it
>> again in case you accidentally delete that directory.
>>
>> -Grant
>>
>> On Jan 22, 2009, at 6:01 AM, Michael McCandless wrote:
>>
>>
>>> An "alg" is simply a file (file.alg) that the benchmarking code  
>>> runs.  You
>>> run it something like this:
>>>
>>> ant run-task -Dtask.alg=/path/to/file.alg -Dtask.mem=1024M
>>>
>>> For docs... there's the package.html in contrib/benchmark.  LIA 2  
>>> (only
>>> via MEAP right now) also covers benchmark's alg syntax in an  
>>> appendix.
>>>
>>> But you are close... oh, it actually looks like the output file
>>> (work/enwiki.txt) could not be written.  Does that directory (../ 
>>> work)
>>> exist?  (I think build.xml should have created it).
>>>
>>> Mike
>>>
>>> Jason Rutherglen wrote:
>>>
>>> The xml file temp/enwiki-20070527-pages-articles.xml was  
>>> downloaded by
>>>> "ant
>>>> get-enwiki expand-enwiki".  The docs.file in extractWikipedia.alg  
>>>> and
>>>> wikipedia.alg points to it.  The error message is regarding
>>>> work/enwiki.txt.
>>>>
>>>> Is there a how to on this stuff?  What is an alg?
>>>>
>>>> On Wed, Jan 21, 2009 at 5:03 PM, Michael McCandless <
>>>> lucene@mikemccandless.com> wrote:
>>>>
>>>>
>>>>> You should download Wikipedia's XML file manually yourself,  
>>>>> uncompress
>>>>> it,
>>>>> and then edit docs.file in that alg to point to it.
>>>>>
>>>>> Mike
>>>>>
>>>>>
>>>>> Jason Rutherglen wrote:
>>>>>
>>>>> I downloaded trunk via SVN.  Went to trunk/contrib/benchmark.   
>>>>> Executed
>>>>>
>>>>>> ant
>>>>>> enwiki.  I'm not sure what else needs to be done. Received this  
>>>>>> error:
>>>>>>
>>>>>> enwiki:
>>>>>> [echo] Working Directory:
>>>>>> /Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/work
>>>>>> [java] Running algorithm from:
>>>>>>
>>>>>>
>>>>>> /Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/conf/ 
>>>>>> extractWikipedia.alg
>>>>>> [java] ------------> config properties:
>>>>>> [java] doc.maker =
>>>>>> org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker
>>>>>> [java] doc.maker.forever = false
>>>>>> [java] docs.file = temp/enwiki-20070527-pages-articles.xml
>>>>>> [java] line.file.out = work/enwiki.txt
>>>>>> [java] work.dir = work
>>>>>> [java] -------------------------------
>>>>>> [java] ------------> algorithm:
>>>>>> [java]     Seq_Exhaust {
>>>>>> [java]         WriteLineDoc
>>>>>> [java]     > * EXHAUST
>>>>>> [java]
>>>>>> [java] Error: cannot execute the algorithm! work/enwiki.txt (No  
>>>>>> such
>>>>>> file or directory)
>>>>>> [java] ####################
>>>>>> [java] java.io.FileNotFoundException: work/enwiki.txt (No such  
>>>>>> file or
>>>>>> directory)
>>>>>> [java] ###  D O N E !!! ###
>>>>>> [java]     at java.io.FileOutputStream.open(Native Method)
>>>>>> [java] ####################
>>>>>> [java]     at
>>>>>> java.io.FileOutputStream.<init>(FileOutputStream.java:179)
>>>>>> [java]     at
>>>>>> java.io.FileOutputStream.<init>(FileOutputStream.java:70)
>>>>>> [java]     at
>>>>>>
>>>>>>
>>>>>> org 
>>>>>> .apache 
>>>>>> .lucene 
>>>>>> .benchmark 
>>>>>> .byTask.tasks.WriteLineDocTask.setup(WriteLineDocTask.java:63)
>>>>>> [java]     at
>>>>>>
>>>>>>
>>>>>> org 
>>>>>> .apache 
>>>>>> .lucene 
>>>>>> .benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java: 
>>>>>> 83)
>>>>>> [java]     at
>>>>>>
>>>>>>
>>>>>> org 
>>>>>> .apache 
>>>>>> .lucene 
>>>>>> .benchmark 
>>>>>> .byTask.tasks.TaskSequence.doSerialTasks(TaskSequence.java:141)
>>>>>> [java]     at
>>>>>>
>>>>>>
>>>>>> org 
>>>>>> .apache 
>>>>>> .lucene 
>>>>>> .benchmark.byTask.tasks.TaskSequence.doLogic(TaskSequence.java: 
>>>>>> 122)
>>>>>> [java]     at
>>>>>>
>>>>>>
>>>>>> org 
>>>>>> .apache 
>>>>>> .lucene 
>>>>>> .benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java: 
>>>>>> 92)
>>>>>> [java]     at
>>>>>>
>>>>>>
>>>>>> org 
>>>>>> .apache 
>>>>>> .lucene.benchmark.byTask.utils.Algorithm.execute(Algorithm.java: 
>>>>>> 246)
>>>>>> [java]     at
>>>>>> org 
>>>>>> .apache 
>>>>>> .lucene.benchmark.byTask.Benchmark.execute(Benchmark.java:73)
>>>>>> [java]     at
>>>>>> org 
>>>>>> .apache.lucene.benchmark.byTask.Benchmark.main(Benchmark.java: 
>>>>>> 109)
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>> --------------------------
>> Grant Ingersoll
>>
>> Lucene Helpful Hints:
>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: contrib Benchmark enwiki problem

Posted by Jason Rutherglen <ja...@gmail.com>.
I did the symbolic link from work to temp and things worked.  Perhaps
benchmark should download directly to work?  What is temp for?

On Thu, Jan 22, 2009 at 4:58 AM, Grant Ingersoll <gs...@apache.org>wrote:

> There is a little funkiness in the ant script there in that if the original
> file exists in temp, but hasn't been processed in work, then it doesn't do
> the proper thing.  The workaround is to do the second step to get into work
> by hand.  I believe there is a JIRA issue on it.
>
> Also, I highly recommend copying the original file off to somewhere else on
> your system and soft linking to it so that you don't have to download it
> again in case you accidentally delete that directory.
>
> -Grant
>
> On Jan 22, 2009, at 6:01 AM, Michael McCandless wrote:
>
>
>> An "alg" is simply a file (file.alg) that the benchmarking code runs.  You
>> run it something like this:
>>
>>  ant run-task -Dtask.alg=/path/to/file.alg -Dtask.mem=1024M
>>
>> For docs... there's the package.html in contrib/benchmark.  LIA 2 (only
>> via MEAP right now) also covers benchmark's alg syntax in an appendix.
>>
>> But you are close... oh, it actually looks like the output file
>> (work/enwiki.txt) could not be written.  Does that directory (../work)
>> exist?  (I think build.xml should have created it).
>>
>> Mike
>>
>> Jason Rutherglen wrote:
>>
>>  The xml file temp/enwiki-20070527-pages-articles.xml was downloaded by
>>> "ant
>>> get-enwiki expand-enwiki".  The docs.file in extractWikipedia.alg and
>>> wikipedia.alg points to it.  The error message is regarding
>>> work/enwiki.txt.
>>>
>>> Is there a how to on this stuff?  What is an alg?
>>>
>>> On Wed, Jan 21, 2009 at 5:03 PM, Michael McCandless <
>>> lucene@mikemccandless.com> wrote:
>>>
>>>
>>>> You should download Wikipedia's XML file manually yourself, uncompress
>>>> it,
>>>> and then edit docs.file in that alg to point to it.
>>>>
>>>> Mike
>>>>
>>>>
>>>> Jason Rutherglen wrote:
>>>>
>>>> I downloaded trunk via SVN.  Went to trunk/contrib/benchmark.  Executed
>>>>
>>>>> ant
>>>>> enwiki.  I'm not sure what else needs to be done. Received this error:
>>>>>
>>>>> enwiki:
>>>>>  [echo] Working Directory:
>>>>> /Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/work
>>>>>  [java] Running algorithm from:
>>>>>
>>>>>
>>>>> /Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/conf/extractWikipedia.alg
>>>>>  [java] ------------> config properties:
>>>>>  [java] doc.maker =
>>>>> org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker
>>>>>  [java] doc.maker.forever = false
>>>>>  [java] docs.file = temp/enwiki-20070527-pages-articles.xml
>>>>>  [java] line.file.out = work/enwiki.txt
>>>>>  [java] work.dir = work
>>>>>  [java] -------------------------------
>>>>>  [java] ------------> algorithm:
>>>>>  [java]     Seq_Exhaust {
>>>>>  [java]         WriteLineDoc
>>>>>  [java]     > * EXHAUST
>>>>>  [java]
>>>>>  [java] Error: cannot execute the algorithm! work/enwiki.txt (No such
>>>>> file or directory)
>>>>>  [java] ####################
>>>>>  [java] java.io.FileNotFoundException: work/enwiki.txt (No such file or
>>>>> directory)
>>>>>  [java] ###  D O N E !!! ###
>>>>>  [java]     at java.io.FileOutputStream.open(Native Method)
>>>>>  [java] ####################
>>>>>  [java]     at
>>>>> java.io.FileOutputStream.<init>(FileOutputStream.java:179)
>>>>>  [java]     at
>>>>> java.io.FileOutputStream.<init>(FileOutputStream.java:70)
>>>>>  [java]     at
>>>>>
>>>>>
>>>>> org.apache.lucene.benchmark.byTask.tasks.WriteLineDocTask.setup(WriteLineDocTask.java:63)
>>>>>  [java]     at
>>>>>
>>>>>
>>>>> org.apache.lucene.benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:83)
>>>>>  [java]     at
>>>>>
>>>>>
>>>>> org.apache.lucene.benchmark.byTask.tasks.TaskSequence.doSerialTasks(TaskSequence.java:141)
>>>>>  [java]     at
>>>>>
>>>>>
>>>>> org.apache.lucene.benchmark.byTask.tasks.TaskSequence.doLogic(TaskSequence.java:122)
>>>>>  [java]     at
>>>>>
>>>>>
>>>>> org.apache.lucene.benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:92)
>>>>>  [java]     at
>>>>>
>>>>>
>>>>> org.apache.lucene.benchmark.byTask.utils.Algorithm.execute(Algorithm.java:246)
>>>>>  [java]     at
>>>>> org.apache.lucene.benchmark.byTask.Benchmark.execute(Benchmark.java:73)
>>>>>  [java]     at
>>>>> org.apache.lucene.benchmark.byTask.Benchmark.main(Benchmark.java:109)
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> --------------------------
> Grant Ingersoll
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: contrib Benchmark enwiki problem

Posted by Grant Ingersoll <gs...@apache.org>.
There is a little funkiness in the ant script there in that if the  
original file exists in temp, but hasn't been processed in work, then  
it doesn't do the proper thing.  The workaround is to do the second  
step to get into work by hand.  I believe there is a JIRA issue on it.

Also, I highly recommend copying the original file off to somewhere  
else on your system and soft linking to it so that you don't have to  
download it again in case you accidentally delete that directory.

-Grant
On Jan 22, 2009, at 6:01 AM, Michael McCandless wrote:

>
> An "alg" is simply a file (file.alg) that the benchmarking code  
> runs.  You run it something like this:
>
>  ant run-task -Dtask.alg=/path/to/file.alg -Dtask.mem=1024M
>
> For docs... there's the package.html in contrib/benchmark.  LIA 2  
> (only via MEAP right now) also covers benchmark's alg syntax in an  
> appendix.
>
> But you are close... oh, it actually looks like the output file  
> (work/enwiki.txt) could not be written.  Does that directory (../ 
> work) exist?  (I think build.xml should have created it).
>
> Mike
>
> Jason Rutherglen wrote:
>
>> The xml file temp/enwiki-20070527-pages-articles.xml was downloaded  
>> by "ant
>> get-enwiki expand-enwiki".  The docs.file in extractWikipedia.alg and
>> wikipedia.alg points to it.  The error message is regarding
>> work/enwiki.txt.
>>
>> Is there a how to on this stuff?  What is an alg?
>>
>> On Wed, Jan 21, 2009 at 5:03 PM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>>
>>> You should download Wikipedia's XML file manually yourself,  
>>> uncompress it,
>>> and then edit docs.file in that alg to point to it.
>>>
>>> Mike
>>>
>>>
>>> Jason Rutherglen wrote:
>>>
>>> I downloaded trunk via SVN.  Went to trunk/contrib/benchmark.   
>>> Executed
>>>> ant
>>>> enwiki.  I'm not sure what else needs to be done. Received this  
>>>> error:
>>>>
>>>> enwiki:
>>>>  [echo] Working Directory:
>>>> /Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/work
>>>>  [java] Running algorithm from:
>>>>
>>>> /Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/conf/ 
>>>> extractWikipedia.alg
>>>>  [java] ------------> config properties:
>>>>  [java] doc.maker =
>>>> org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker
>>>>  [java] doc.maker.forever = false
>>>>  [java] docs.file = temp/enwiki-20070527-pages-articles.xml
>>>>  [java] line.file.out = work/enwiki.txt
>>>>  [java] work.dir = work
>>>>  [java] -------------------------------
>>>>  [java] ------------> algorithm:
>>>>  [java]     Seq_Exhaust {
>>>>  [java]         WriteLineDoc
>>>>  [java]     > * EXHAUST
>>>>  [java]
>>>>  [java] Error: cannot execute the algorithm! work/enwiki.txt (No  
>>>> such
>>>> file or directory)
>>>>  [java] ####################
>>>>  [java] java.io.FileNotFoundException: work/enwiki.txt (No such  
>>>> file or
>>>> directory)
>>>>  [java] ###  D O N E !!! ###
>>>>  [java]     at java.io.FileOutputStream.open(Native Method)
>>>>  [java] ####################
>>>>  [java]     at
>>>> java.io.FileOutputStream.<init>(FileOutputStream.java:179)
>>>>  [java]     at  
>>>> java.io.FileOutputStream.<init>(FileOutputStream.java:70)
>>>>  [java]     at
>>>>
>>>> org 
>>>> .apache 
>>>> .lucene 
>>>> .benchmark 
>>>> .byTask.tasks.WriteLineDocTask.setup(WriteLineDocTask.java:63)
>>>>  [java]     at
>>>>
>>>> org 
>>>> .apache 
>>>> .lucene 
>>>> .benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:83)
>>>>  [java]     at
>>>>
>>>> org 
>>>> .apache 
>>>> .lucene 
>>>> .benchmark 
>>>> .byTask.tasks.TaskSequence.doSerialTasks(TaskSequence.java:141)
>>>>  [java]     at
>>>>
>>>> org 
>>>> .apache 
>>>> .lucene 
>>>> .benchmark.byTask.tasks.TaskSequence.doLogic(TaskSequence.java:122)
>>>>  [java]     at
>>>>
>>>> org 
>>>> .apache 
>>>> .lucene 
>>>> .benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:92)
>>>>  [java]     at
>>>>
>>>> org 
>>>> .apache 
>>>> .lucene.benchmark.byTask.utils.Algorithm.execute(Algorithm.java: 
>>>> 246)
>>>>  [java]     at
>>>> org 
>>>> .apache.lucene.benchmark.byTask.Benchmark.execute(Benchmark.java: 
>>>> 73)
>>>>  [java]     at
>>>> org.apache.lucene.benchmark.byTask.Benchmark.main(Benchmark.java: 
>>>> 109)
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: contrib Benchmark enwiki problem

Posted by Michael McCandless <lu...@mikemccandless.com>.
An "alg" is simply a file (file.alg) that the benchmarking code runs.   
You run it something like this:

   ant run-task -Dtask.alg=/path/to/file.alg -Dtask.mem=1024M

For docs... there's the package.html in contrib/benchmark.  LIA 2  
(only via MEAP right now) also covers benchmark's alg syntax in an  
appendix.

But you are close... oh, it actually looks like the output file (work/ 
enwiki.txt) could not be written.  Does that directory (../work)  
exist?  (I think build.xml should have created it).

Mike

Jason Rutherglen wrote:

> The xml file temp/enwiki-20070527-pages-articles.xml was downloaded  
> by "ant
> get-enwiki expand-enwiki".  The docs.file in extractWikipedia.alg and
> wikipedia.alg points to it.  The error message is regarding
> work/enwiki.txt.
>
> Is there a how to on this stuff?  What is an alg?
>
> On Wed, Jan 21, 2009 at 5:03 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>>
>> You should download Wikipedia's XML file manually yourself,  
>> uncompress it,
>> and then edit docs.file in that alg to point to it.
>>
>> Mike
>>
>>
>> Jason Rutherglen wrote:
>>
>> I downloaded trunk via SVN.  Went to trunk/contrib/benchmark.   
>> Executed
>>> ant
>>> enwiki.  I'm not sure what else needs to be done. Received this  
>>> error:
>>>
>>> enwiki:
>>>   [echo] Working Directory:
>>> /Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/work
>>>   [java] Running algorithm from:
>>>
>>> /Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/conf/ 
>>> extractWikipedia.alg
>>>   [java] ------------> config properties:
>>>   [java] doc.maker =
>>> org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker
>>>   [java] doc.maker.forever = false
>>>   [java] docs.file = temp/enwiki-20070527-pages-articles.xml
>>>   [java] line.file.out = work/enwiki.txt
>>>   [java] work.dir = work
>>>   [java] -------------------------------
>>>   [java] ------------> algorithm:
>>>   [java]     Seq_Exhaust {
>>>   [java]         WriteLineDoc
>>>   [java]     > * EXHAUST
>>>   [java]
>>>   [java] Error: cannot execute the algorithm! work/enwiki.txt (No  
>>> such
>>> file or directory)
>>>   [java] ####################
>>>   [java] java.io.FileNotFoundException: work/enwiki.txt (No such  
>>> file or
>>> directory)
>>>   [java] ###  D O N E !!! ###
>>>   [java]     at java.io.FileOutputStream.open(Native Method)
>>>   [java] ####################
>>>   [java]     at
>>> java.io.FileOutputStream.<init>(FileOutputStream.java:179)
>>>   [java]     at  
>>> java.io.FileOutputStream.<init>(FileOutputStream.java:70)
>>>   [java]     at
>>>
>>> org 
>>> .apache 
>>> .lucene 
>>> .benchmark 
>>> .byTask.tasks.WriteLineDocTask.setup(WriteLineDocTask.java:63)
>>>   [java]     at
>>>
>>> org 
>>> .apache 
>>> .lucene 
>>> .benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:83)
>>>   [java]     at
>>>
>>> org 
>>> .apache 
>>> .lucene 
>>> .benchmark 
>>> .byTask.tasks.TaskSequence.doSerialTasks(TaskSequence.java:141)
>>>   [java]     at
>>>
>>> org 
>>> .apache 
>>> .lucene 
>>> .benchmark.byTask.tasks.TaskSequence.doLogic(TaskSequence.java:122)
>>>   [java]     at
>>>
>>> org 
>>> .apache 
>>> .lucene 
>>> .benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:92)
>>>   [java]     at
>>>
>>> org 
>>> .apache 
>>> .lucene.benchmark.byTask.utils.Algorithm.execute(Algorithm.java:246)
>>>   [java]     at
>>> org 
>>> .apache.lucene.benchmark.byTask.Benchmark.execute(Benchmark.java:73)
>>>   [java]     at
>>> org.apache.lucene.benchmark.byTask.Benchmark.main(Benchmark.java: 
>>> 109)
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: contrib Benchmark enwiki problem

Posted by Jason Rutherglen <ja...@gmail.com>.
The xml file temp/enwiki-20070527-pages-articles.xml was downloaded by "ant
get-enwiki expand-enwiki".  The docs.file in extractWikipedia.alg and
wikipedia.alg points to it.  The error message is regarding
work/enwiki.txt.

Is there a how to on this stuff?  What is an alg?

On Wed, Jan 21, 2009 at 5:03 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

>
> You should download Wikipedia's XML file manually yourself, uncompress it,
> and then edit docs.file in that alg to point to it.
>
> Mike
>
>
> Jason Rutherglen wrote:
>
>  I downloaded trunk via SVN.  Went to trunk/contrib/benchmark.  Executed
>> ant
>> enwiki.  I'm not sure what else needs to be done. Received this error:
>>
>> enwiki:
>>    [echo] Working Directory:
>> /Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/work
>>    [java] Running algorithm from:
>>
>> /Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/conf/extractWikipedia.alg
>>    [java] ------------> config properties:
>>    [java] doc.maker =
>> org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker
>>    [java] doc.maker.forever = false
>>    [java] docs.file = temp/enwiki-20070527-pages-articles.xml
>>    [java] line.file.out = work/enwiki.txt
>>    [java] work.dir = work
>>    [java] -------------------------------
>>    [java] ------------> algorithm:
>>    [java]     Seq_Exhaust {
>>    [java]         WriteLineDoc
>>    [java]     > * EXHAUST
>>    [java]
>>    [java] Error: cannot execute the algorithm! work/enwiki.txt (No such
>> file or directory)
>>    [java] ####################
>>    [java] java.io.FileNotFoundException: work/enwiki.txt (No such file or
>> directory)
>>    [java] ###  D O N E !!! ###
>>    [java]     at java.io.FileOutputStream.open(Native Method)
>>    [java] ####################
>>    [java]     at
>> java.io.FileOutputStream.<init>(FileOutputStream.java:179)
>>    [java]     at java.io.FileOutputStream.<init>(FileOutputStream.java:70)
>>    [java]     at
>>
>> org.apache.lucene.benchmark.byTask.tasks.WriteLineDocTask.setup(WriteLineDocTask.java:63)
>>    [java]     at
>>
>> org.apache.lucene.benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:83)
>>    [java]     at
>>
>> org.apache.lucene.benchmark.byTask.tasks.TaskSequence.doSerialTasks(TaskSequence.java:141)
>>    [java]     at
>>
>> org.apache.lucene.benchmark.byTask.tasks.TaskSequence.doLogic(TaskSequence.java:122)
>>    [java]     at
>>
>> org.apache.lucene.benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:92)
>>    [java]     at
>>
>> org.apache.lucene.benchmark.byTask.utils.Algorithm.execute(Algorithm.java:246)
>>    [java]     at
>> org.apache.lucene.benchmark.byTask.Benchmark.execute(Benchmark.java:73)
>>    [java]     at
>> org.apache.lucene.benchmark.byTask.Benchmark.main(Benchmark.java:109)
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: contrib Benchmark enwiki problem

Posted by Michael McCandless <lu...@mikemccandless.com>.
You should download Wikipedia's XML file manually yourself, uncompress  
it, and then edit docs.file in that alg to point to it.

Mike

Jason Rutherglen wrote:

> I downloaded trunk via SVN.  Went to trunk/contrib/benchmark.   
> Executed ant
> enwiki.  I'm not sure what else needs to be done. Received this error:
>
> enwiki:
>     [echo] Working Directory:
> /Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/work
>     [java] Running algorithm from:
> /Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/conf/ 
> extractWikipedia.alg
>     [java] ------------> config properties:
>     [java] doc.maker =
> org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker
>     [java] doc.maker.forever = false
>     [java] docs.file = temp/enwiki-20070527-pages-articles.xml
>     [java] line.file.out = work/enwiki.txt
>     [java] work.dir = work
>     [java] -------------------------------
>     [java] ------------> algorithm:
>     [java]     Seq_Exhaust {
>     [java]         WriteLineDoc
>     [java]     > * EXHAUST
>     [java]
>     [java] Error: cannot execute the algorithm! work/enwiki.txt (No  
> such
> file or directory)
>     [java] ####################
>     [java] java.io.FileNotFoundException: work/enwiki.txt (No such  
> file or
> directory)
>     [java] ###  D O N E !!! ###
>     [java]     at java.io.FileOutputStream.open(Native Method)
>     [java] ####################
>     [java]     at
> java.io.FileOutputStream.<init>(FileOutputStream.java:179)
>     [java]     at  
> java.io.FileOutputStream.<init>(FileOutputStream.java:70)
>     [java]     at
> org 
> .apache 
> .lucene 
> .benchmark.byTask.tasks.WriteLineDocTask.setup(WriteLineDocTask.java: 
> 63)
>     [java]     at
> org 
> .apache 
> .lucene 
> .benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:83)
>     [java]     at
> org 
> .apache 
> .lucene 
> .benchmark.byTask.tasks.TaskSequence.doSerialTasks(TaskSequence.java: 
> 141)
>     [java]     at
> org 
> .apache 
> .lucene 
> .benchmark.byTask.tasks.TaskSequence.doLogic(TaskSequence.java:122)
>     [java]     at
> org 
> .apache 
> .lucene 
> .benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:92)
>     [java]     at
> org 
> .apache 
> .lucene.benchmark.byTask.utils.Algorithm.execute(Algorithm.java:246)
>     [java]     at
> org.apache.lucene.benchmark.byTask.Benchmark.execute(Benchmark.java: 
> 73)
>     [java]     at
> org.apache.lucene.benchmark.byTask.Benchmark.main(Benchmark.java:109)


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org