You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Sami Siren <ss...@gmail.com> on 2009/02/28 09:26:10 UTC
planning for nutch-1.0-rc1
I am planning to build the first rc for nutch 1.0 at Tue 3.3.2009
morning (EET). There are still some issues marked as fix for 1.0 in
Jira. Neither of the two remaining _bugs_ seems too important to me,
actually I only count the issues assigned to developers as real
candidates to be included in 1.0:
NUTCH-578 (kubes)
NUTCH-477 (ab)
NUTCH-669 (siren)
I am also volunteering to push all open issues to 1.1 before starting
the RC build on Tuesday. Any objections on the proposed procedure or timing?
--
Sami Siren
Re: planning for nutch-1.0-rc1
Posted by Bartosz Gadzimski <ba...@o2.pl>.
Hello,
It's on 2 linux boxes one with centos and one with ubuntu. Both properly
running "old" bin/nutch crawl.
Problem is that it doesn't give exception on command line or in eclipse
just writes to logs so it's hard to debug.
One is running nutch trunk from 07 march, and one from todays rc1
Any hints? Maybe some logs properties or sth?
In hadoop.log it looks exactly the same:
2009-03-09 12:12:09,452 INFO plugin.PluginRepository - Nutch
Scoring (org.apache.nutch.scoring.ScoringFilter)
2009-03-09 12:12:09,452 INFO plugin.PluginRepository - Ontology
Model Loader (org.apache.nutch.ontology.Ontology)
2009-03-09 12:12:09,560 INFO field.FieldIndexer - IFD [Thread-11]:
setInfoStream
deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@6210fb
2009-03-09 12:12:09,560 INFO field.FieldIndexer - IW 0 [Thread-11]:
setInfoStream:
dir=org.apache.lucene.store.FSDirectory@/tmp/hadoop-agniesia441/mapred/local/index/_-174719952
autoCommit=true
mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@48edb5
mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@1ee2c2c
ramBufferSizeMB=16.0 maxBufferedDocs=50 maxBuffereDeleteTerms=-1
maxFieldLength=10000 index=
2009-03-09 12:12:09,585 WARN mapred.LocalJobRunner - job_local_0001
java.lang.NullPointerException
at
org.apache.nutch.indexer.field.FieldIndexer$OutputFormat$1.write(FieldIndexer.java:139)
at
org.apache.nutch.indexer.field.FieldIndexer$OutputFormat$1.write(FieldIndexer.java:1)
at
org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410)
at
org.apache.nutch.indexer.field.FieldIndexer.reduce(FieldIndexer.java:239)
at
org.apache.nutch.indexer.field.FieldIndexer.reduce(FieldIndexer.java:1)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)
2009-03-09 12:12:10,021 FATAL field.FieldIndexer - FieldIndexer:
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at
org.apache.nutch.indexer.field.FieldIndexer.index(FieldIndexer.java:267)
at
org.apache.nutch.indexer.field.FieldIndexer.run(FieldIndexer.java:312)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.nutch.indexer.field.FieldIndexer.main(FieldIndexer.java:275)
Thanks,
Bartosz
Dennis Kubes pisze:
> Sorry about the docs being sparse on this. I will write more about
> the process as time permits. Don't know about the problem below.
> What platform are you running on, windows, linux?
>
> Dennis
>
> Bartosz Gadzimski wrote:
>> Hello,
>>
>> Thanks Dennis for updateing wiki it helped a lot.
>>
>> You gave example with indexing but you didn't said a bit about it.
>> Can you write some more? :)
>>
>> Anyways I have problems at the last step (nutch from 07 march):
>>
>> bin/nutch org.apache.nutch.indexer.field.FieldIndexer
>>
>> It simply stops somewhere
>>
>> 2009-03-07 16:09:04,432 INFO field.FieldIndexer - FieldIndexer:
>> starting
>> 2009-03-07 16:09:04,436 INFO field.FieldIndexer - FieldIndexer:
>> adding fields db: crawl/fields/basicfields
>> 2009-03-07 16:09:04,498 INFO field.FieldIndexer - FieldIndexer:
>> adding fields db: crawl/fields/anchorfields
>> 2009-03-07 16:09:05,636 INFO plugin.PluginRepository - Plugins:
>> looking in: /usr/local/nutch/plugins
>> 2009-03-07 16:09:06,437 INFO plugin.PluginRepository - Plugin
>> Auto-activation mode: [true]
>> 2009-03-07 16:09:06,437 INFO plugin.PluginRepository - Registered
>> Plugins:
>> 2009-03-07 16:09:06,437 INFO plugin.PluginRepository - the
>> nutch core extension points (nutch-extensionpoints)
>> 2009-03-07 16:09:06,437 INFO plugin.PluginRepository - Basic
>> Query Filter (query-basic)
>> .... plugins....
>>
>> 2009-03-07 16:09:07,769 INFO field.FieldIndexer - IFD [Thread-11]:
>> setInfoStream
>> deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@1b4a74b
>>
>> 2009-03-07 16:09:07,769 INFO field.FieldIndexer - IW 0 [Thread-11]:
>> setInfoStream:
>> dir=org.apache.lucene.store.FSDirectory@/tmp/hadoop-root/mapred/local/index/_-884655313
>> autoCommit=true
>> mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@15356d5
>> mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@69d02b
>> ramBufferSizeMB=16.0 maxBufferedDocs=50 maxBuffereDeleteTerms=-1
>> maxFieldLength=10000 index=
>> 2009-03-07 16:09:07,781 WARN mapred.LocalJobRunner - job_local_0001
>> java.lang.NullPointerException
>> at
>> org.apache.nutch.indexer.field.FieldIndexer$OutputFormat$1.write(FieldIndexer.java:139)
>>
>> at
>> org.apache.nutch.indexer.field.FieldIndexer$OutputFormat$1.write(FieldIndexer.java:131)
>>
>> at
>> org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410)
>> at
>> org.apache.nutch.indexer.field.FieldIndexer.reduce(FieldIndexer.java:239)
>>
>> at
>> org.apache.nutch.indexer.field.FieldIndexer.reduce(FieldIndexer.java:69)
>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)
>> 2009-03-07 16:09:08,197 FATAL field.FieldIndexer - FieldIndexer:
>> java.io.IOException: Job failed!
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>> at
>> org.apache.nutch.indexer.field.FieldIndexer.index(FieldIndexer.java:267)
>> at
>> org.apache.nutch.indexer.field.FieldIndexer.run(FieldIndexer.java:312)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at
>> org.apache.nutch.indexer.field.FieldIndexer.main(FieldIndexer.java:275)
>>
>>
>>
>>
>> In crawl/indexes is only _temporary folder.
>>
>> I will try to debug this but have problems with running nutch in eclipse
>>
>> Thanks,
>> Bartosz
>>
>>
>>
>> Dennis Kubes pisze:
>>> I don't know if I would make this primary yet. I need to check what
>>> is causing this as it worked fine for me, in fact we currently have
>>> it in production. Also we would need to update the shell scripts to
>>> integrate this more tightly.
>>>
>>> Dennis
>>>
>>> Bartosz Gadzimski wrote:
>>>> Sami Siren pisze:
>>>>> Andrzej Bialecki wrote:
>>>>>> Sami Siren wrote:
>>>>>>> I am planning to build the first rc for nutch 1.0 at Tue
>>>>>>> 3.3.2009 morning (EET). There are still some issues marked as
>>>>>>> fix for 1.0 in Jira. Neither of the two remaining _bugs_ seems
>>>>>>> too important to me, actually I only count the issues assigned
>>>>>>> to developers as real candidates to be included in 1.0:
>>>>>>>
>>>>>>> NUTCH-578 (kubes)
>>>>>>> NUTCH-477 (ab)
>>>>>>> NUTCH-669 (siren)
>>>>>>
>>>>>> There's one Critical issue reported, related to NekoHTML
>>>>>> (NUTCH-700). I'm not sure what are the feature differences
>>>>>> (pertinent to Nutch) between 0.9.4 and 1.9.11 - perhaps
>>>>>> downgrading is the safest course of action.
>>>>> I will take care of that.
>>>>>>
>>>>>>
>>>>>>> I am also volunteering to push all open issues to 1.1 before
>>>>>>> starting the RC build on Tuesday. Any objections on the proposed
>>>>>>> procedure or timing?
>>>>>>
>>>>>> Sounds good.
>>>>> great!
>>>>>
>>>>> --
>>>>> Sami Siren
>>>>>
>>>>>
>>>>>
>>>> What about new scoring and new indexing? Will it be integrated as a
>>>> primary scoring algorithm? I have problem with it on LinkRank:
>>>>
>>>> 2009-03-02 20:43:45,708 INFO webgraph.LinkRank - Starting link
>>>> counter job
>>>> 2009-03-02 20:43:47,838 INFO webgraph.LinkRank - Finished link
>>>> counter job
>>>> 2009-03-02 20:43:47,839 INFO webgraph.LinkRank - Reading numlinks
>>>> temp file
>>>> 2009-03-02 20:43:47,840 INFO webgraph.LinkRank - Deleting numlinks
>>>> temp file
>>>> 2009-03-02 20:43:47,842 FATAL webgraph.LinkRank - LinkAnalysis:
>>>> java.lang.NullPointerException
>>>> at
>>>> org.apache.nutch.scoring.webgraph.LinkRank.runCounter(LinkRank.java:113)
>>>>
>>>> at
>>>> org.apache.nutch.scoring.webgraph.LinkRank.analyze(LinkRank.java:582)
>>>> at
>>>> org.apache.nutch.scoring.webgraph.LinkRank.run(LinkRank.java:657)
>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>> at
>>>> org.apache.nutch.scoring.webgraph.LinkRank.main(LinkRank.java:627)
>>>>
>>>> Another question what about indexing framework mentioned here:
>>>> http://www.mail-archive.com/nutch-user@lucene.apache.org/msg11764.html
>>>>
>>>>
>>>> Have all those new scoring and indexing would be real step forward.
>>>>
>>>> Thanks,
>>>> Bartosz
>>>>
>>>
>>
>
Re: planning for nutch-1.0-rc1
Posted by Dennis Kubes <ku...@apache.org>.
The output of FieldIndexer similar to the current indexer is an indexes
folder. This output can be named whatever you want but to be searched
by NutchBean, it expects a base folder containing folders named either
index and segments or indexes and segments.
The index folder would be a single index versus segment indexes. It is
a little confusing here as segment indexes does NOT refer to the nutch
segments content but to output under the indexes folder such as
part-xxxxx folders each with a lucene index, one part or index per
output of the reduce task.
You can use the bin/nutch merge command with and output similar to
/yourfolder/crawl/index to merge (possibly multiple) segment indexes
into a single index. Nutch though should be able to handle both segment
and single indexes, it just handles them based on how they are named.
Further than that lucene can create a compound index file from a single
index where everything is in a single file. I don't think Nutch
currently supports that, in terms of creation, it should in terms of
querying.
Dennis
Bartosz Gadzimski wrote:
> Hello Dennis,
>
> We'v been trying your new framework and indexer and everything looks
> better now. But we can't understand what should be output of last
> command (FieldIndexer).
>
> We have:
> user@kubuntu:~/nutch-1.0$ ls crawl/indexes/part-00000/
> index.done segments_1 segments.gen
> .index.done.crc .segments_1.crc .segments.gen.crc
>
> How to generate "normal" index from those indexes?
>
> Thanks in advance,
> Bartosz
>
>
> Dennis Kubes pisze:
>> Sorry about the docs being sparse on this. I will write more about
>> the process as time permits. Don't know about the problem below.
>> What platform are you running on, windows, linux?
>>
>> Dennis
>>
>> Bartosz Gadzimski wrote:
>>> Hello,
>>>
>>> Thanks Dennis for updateing wiki it helped a lot.
>>>
>>> You gave example with indexing but you didn't said a bit about it.
>>> Can you write some more? :)
>>>
>>> Anyways I have problems at the last step (nutch from 07 march):
>>>
>>> bin/nutch org.apache.nutch.indexer.field.FieldIndexer
>>>
>>> It simply stops somewhere
>>>
>>> 2009-03-07 16:09:04,432 INFO field.FieldIndexer - FieldIndexer:
>>> starting
>>> 2009-03-07 16:09:04,436 INFO field.FieldIndexer - FieldIndexer:
>>> adding fields db: crawl/fields/basicfields
>>> 2009-03-07 16:09:04,498 INFO field.FieldIndexer - FieldIndexer:
>>> adding fields db: crawl/fields/anchorfields
>>> 2009-03-07 16:09:05,636 INFO plugin.PluginRepository - Plugins:
>>> looking in: /usr/local/nutch/plugins
>>> 2009-03-07 16:09:06,437 INFO plugin.PluginRepository - Plugin
>>> Auto-activation mode: [true]
>>> 2009-03-07 16:09:06,437 INFO plugin.PluginRepository - Registered
>>> Plugins:
>>> 2009-03-07 16:09:06,437 INFO plugin.PluginRepository - the
>>> nutch core extension points (nutch-extensionpoints)
>>> 2009-03-07 16:09:06,437 INFO plugin.PluginRepository - Basic
>>> Query Filter (query-basic)
>>> .... plugins....
>>>
>>> 2009-03-07 16:09:07,769 INFO field.FieldIndexer - IFD [Thread-11]:
>>> setInfoStream
>>> deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@1b4a74b
>>>
>>> 2009-03-07 16:09:07,769 INFO field.FieldIndexer - IW 0 [Thread-11]:
>>> setInfoStream:
>>> dir=org.apache.lucene.store.FSDirectory@/tmp/hadoop-root/mapred/local/index/_-884655313
>>> autoCommit=true
>>> mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@15356d5
>>> mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@69d02b
>>> ramBufferSizeMB=16.0 maxBufferedDocs=50 maxBuffereDeleteTerms=-1
>>> maxFieldLength=10000 index=
>>> 2009-03-07 16:09:07,781 WARN mapred.LocalJobRunner - job_local_0001
>>> java.lang.NullPointerException
>>> at
>>> org.apache.nutch.indexer.field.FieldIndexer$OutputFormat$1.write(FieldIndexer.java:139)
>>>
>>> at
>>> org.apache.nutch.indexer.field.FieldIndexer$OutputFormat$1.write(FieldIndexer.java:131)
>>>
>>> at
>>> org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410)
>>> at
>>> org.apache.nutch.indexer.field.FieldIndexer.reduce(FieldIndexer.java:239)
>>>
>>> at
>>> org.apache.nutch.indexer.field.FieldIndexer.reduce(FieldIndexer.java:69)
>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
>>> at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)
>>> 2009-03-07 16:09:08,197 FATAL field.FieldIndexer - FieldIndexer:
>>> java.io.IOException: Job failed!
>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>>> at
>>> org.apache.nutch.indexer.field.FieldIndexer.index(FieldIndexer.java:267)
>>> at
>>> org.apache.nutch.indexer.field.FieldIndexer.run(FieldIndexer.java:312)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>> at
>>> org.apache.nutch.indexer.field.FieldIndexer.main(FieldIndexer.java:275)
>>>
>>>
>>>
>>>
>>> In crawl/indexes is only _temporary folder.
>>>
>>> I will try to debug this but have problems with running nutch in eclipse
>>>
>>> Thanks,
>>> Bartosz
>>>
>>>
>>>
>>> Dennis Kubes pisze:
>>>> I don't know if I would make this primary yet. I need to check what
>>>> is causing this as it worked fine for me, in fact we currently have
>>>> it in production. Also we would need to update the shell scripts to
>>>> integrate this more tightly.
>>>>
>>>> Dennis
>>>>
>>>> Bartosz Gadzimski wrote:
>>>>> Sami Siren pisze:
>>>>>> Andrzej Bialecki wrote:
>>>>>>> Sami Siren wrote:
>>>>>>>> I am planning to build the first rc for nutch 1.0 at Tue
>>>>>>>> 3.3.2009 morning (EET). There are still some issues marked as
>>>>>>>> fix for 1.0 in Jira. Neither of the two remaining _bugs_ seems
>>>>>>>> too important to me, actually I only count the issues assigned
>>>>>>>> to developers as real candidates to be included in 1.0:
>>>>>>>>
>>>>>>>> NUTCH-578 (kubes)
>>>>>>>> NUTCH-477 (ab)
>>>>>>>> NUTCH-669 (siren)
>>>>>>>
>>>>>>> There's one Critical issue reported, related to NekoHTML
>>>>>>> (NUTCH-700). I'm not sure what are the feature differences
>>>>>>> (pertinent to Nutch) between 0.9.4 and 1.9.11 - perhaps
>>>>>>> downgrading is the safest course of action.
>>>>>> I will take care of that.
>>>>>>>
>>>>>>>
>>>>>>>> I am also volunteering to push all open issues to 1.1 before
>>>>>>>> starting the RC build on Tuesday. Any objections on the proposed
>>>>>>>> procedure or timing?
>>>>>>>
>>>>>>> Sounds good.
>>>>>> great!
>>>>>>
>>>>>> --
>>>>>> Sami Siren
>>>>>>
>>>>>>
>>>>>>
>>>>> What about new scoring and new indexing? Will it be integrated as a
>>>>> primary scoring algorithm? I have problem with it on LinkRank:
>>>>>
>>>>> 2009-03-02 20:43:45,708 INFO webgraph.LinkRank - Starting link
>>>>> counter job
>>>>> 2009-03-02 20:43:47,838 INFO webgraph.LinkRank - Finished link
>>>>> counter job
>>>>> 2009-03-02 20:43:47,839 INFO webgraph.LinkRank - Reading numlinks
>>>>> temp file
>>>>> 2009-03-02 20:43:47,840 INFO webgraph.LinkRank - Deleting numlinks
>>>>> temp file
>>>>> 2009-03-02 20:43:47,842 FATAL webgraph.LinkRank - LinkAnalysis:
>>>>> java.lang.NullPointerException
>>>>> at
>>>>> org.apache.nutch.scoring.webgraph.LinkRank.runCounter(LinkRank.java:113)
>>>>>
>>>>> at
>>>>> org.apache.nutch.scoring.webgraph.LinkRank.analyze(LinkRank.java:582)
>>>>> at
>>>>> org.apache.nutch.scoring.webgraph.LinkRank.run(LinkRank.java:657)
>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>> at
>>>>> org.apache.nutch.scoring.webgraph.LinkRank.main(LinkRank.java:627)
>>>>>
>>>>> Another question what about indexing framework mentioned here:
>>>>> http://www.mail-archive.com/nutch-user@lucene.apache.org/msg11764.html
>>>>>
>>>>>
>>>>> Have all those new scoring and indexing would be real step forward.
>>>>>
>>>>> Thanks,
>>>>> Bartosz
>>>>>
>>>>
>>>
>>
>
Re: planning for nutch-1.0-rc1
Posted by Bartosz Gadzimski <ba...@o2.pl>.
Hello Dennis,
We'v been trying your new framework and indexer and everything looks
better now. But we can't understand what should be output of last
command (FieldIndexer).
We have:
user@kubuntu:~/nutch-1.0$ ls crawl/indexes/part-00000/
index.done segments_1 segments.gen
.index.done.crc .segments_1.crc .segments.gen.crc
How to generate "normal" index from those indexes?
Thanks in advance,
Bartosz
Dennis Kubes pisze:
> Sorry about the docs being sparse on this. I will write more about
> the process as time permits. Don't know about the problem below.
> What platform are you running on, windows, linux?
>
> Dennis
>
> Bartosz Gadzimski wrote:
>> Hello,
>>
>> Thanks Dennis for updateing wiki it helped a lot.
>>
>> You gave example with indexing but you didn't said a bit about it.
>> Can you write some more? :)
>>
>> Anyways I have problems at the last step (nutch from 07 march):
>>
>> bin/nutch org.apache.nutch.indexer.field.FieldIndexer
>>
>> It simply stops somewhere
>>
>> 2009-03-07 16:09:04,432 INFO field.FieldIndexer - FieldIndexer:
>> starting
>> 2009-03-07 16:09:04,436 INFO field.FieldIndexer - FieldIndexer:
>> adding fields db: crawl/fields/basicfields
>> 2009-03-07 16:09:04,498 INFO field.FieldIndexer - FieldIndexer:
>> adding fields db: crawl/fields/anchorfields
>> 2009-03-07 16:09:05,636 INFO plugin.PluginRepository - Plugins:
>> looking in: /usr/local/nutch/plugins
>> 2009-03-07 16:09:06,437 INFO plugin.PluginRepository - Plugin
>> Auto-activation mode: [true]
>> 2009-03-07 16:09:06,437 INFO plugin.PluginRepository - Registered
>> Plugins:
>> 2009-03-07 16:09:06,437 INFO plugin.PluginRepository - the
>> nutch core extension points (nutch-extensionpoints)
>> 2009-03-07 16:09:06,437 INFO plugin.PluginRepository - Basic
>> Query Filter (query-basic)
>> .... plugins....
>>
>> 2009-03-07 16:09:07,769 INFO field.FieldIndexer - IFD [Thread-11]:
>> setInfoStream
>> deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@1b4a74b
>>
>> 2009-03-07 16:09:07,769 INFO field.FieldIndexer - IW 0 [Thread-11]:
>> setInfoStream:
>> dir=org.apache.lucene.store.FSDirectory@/tmp/hadoop-root/mapred/local/index/_-884655313
>> autoCommit=true
>> mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@15356d5
>> mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@69d02b
>> ramBufferSizeMB=16.0 maxBufferedDocs=50 maxBuffereDeleteTerms=-1
>> maxFieldLength=10000 index=
>> 2009-03-07 16:09:07,781 WARN mapred.LocalJobRunner - job_local_0001
>> java.lang.NullPointerException
>> at
>> org.apache.nutch.indexer.field.FieldIndexer$OutputFormat$1.write(FieldIndexer.java:139)
>>
>> at
>> org.apache.nutch.indexer.field.FieldIndexer$OutputFormat$1.write(FieldIndexer.java:131)
>>
>> at
>> org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410)
>> at
>> org.apache.nutch.indexer.field.FieldIndexer.reduce(FieldIndexer.java:239)
>>
>> at
>> org.apache.nutch.indexer.field.FieldIndexer.reduce(FieldIndexer.java:69)
>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)
>> 2009-03-07 16:09:08,197 FATAL field.FieldIndexer - FieldIndexer:
>> java.io.IOException: Job failed!
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>> at
>> org.apache.nutch.indexer.field.FieldIndexer.index(FieldIndexer.java:267)
>> at
>> org.apache.nutch.indexer.field.FieldIndexer.run(FieldIndexer.java:312)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at
>> org.apache.nutch.indexer.field.FieldIndexer.main(FieldIndexer.java:275)
>>
>>
>>
>>
>> In crawl/indexes is only _temporary folder.
>>
>> I will try to debug this but have problems with running nutch in eclipse
>>
>> Thanks,
>> Bartosz
>>
>>
>>
>> Dennis Kubes pisze:
>>> I don't know if I would make this primary yet. I need to check what
>>> is causing this as it worked fine for me, in fact we currently have
>>> it in production. Also we would need to update the shell scripts to
>>> integrate this more tightly.
>>>
>>> Dennis
>>>
>>> Bartosz Gadzimski wrote:
>>>> Sami Siren pisze:
>>>>> Andrzej Bialecki wrote:
>>>>>> Sami Siren wrote:
>>>>>>> I am planning to build the first rc for nutch 1.0 at Tue
>>>>>>> 3.3.2009 morning (EET). There are still some issues marked as
>>>>>>> fix for 1.0 in Jira. Neither of the two remaining _bugs_ seems
>>>>>>> too important to me, actually I only count the issues assigned
>>>>>>> to developers as real candidates to be included in 1.0:
>>>>>>>
>>>>>>> NUTCH-578 (kubes)
>>>>>>> NUTCH-477 (ab)
>>>>>>> NUTCH-669 (siren)
>>>>>>
>>>>>> There's one Critical issue reported, related to NekoHTML
>>>>>> (NUTCH-700). I'm not sure what are the feature differences
>>>>>> (pertinent to Nutch) between 0.9.4 and 1.9.11 - perhaps
>>>>>> downgrading is the safest course of action.
>>>>> I will take care of that.
>>>>>>
>>>>>>
>>>>>>> I am also volunteering to push all open issues to 1.1 before
>>>>>>> starting the RC build on Tuesday. Any objections on the proposed
>>>>>>> procedure or timing?
>>>>>>
>>>>>> Sounds good.
>>>>> great!
>>>>>
>>>>> --
>>>>> Sami Siren
>>>>>
>>>>>
>>>>>
>>>> What about new scoring and new indexing? Will it be integrated as a
>>>> primary scoring algorithm? I have problem with it on LinkRank:
>>>>
>>>> 2009-03-02 20:43:45,708 INFO webgraph.LinkRank - Starting link
>>>> counter job
>>>> 2009-03-02 20:43:47,838 INFO webgraph.LinkRank - Finished link
>>>> counter job
>>>> 2009-03-02 20:43:47,839 INFO webgraph.LinkRank - Reading numlinks
>>>> temp file
>>>> 2009-03-02 20:43:47,840 INFO webgraph.LinkRank - Deleting numlinks
>>>> temp file
>>>> 2009-03-02 20:43:47,842 FATAL webgraph.LinkRank - LinkAnalysis:
>>>> java.lang.NullPointerException
>>>> at
>>>> org.apache.nutch.scoring.webgraph.LinkRank.runCounter(LinkRank.java:113)
>>>>
>>>> at
>>>> org.apache.nutch.scoring.webgraph.LinkRank.analyze(LinkRank.java:582)
>>>> at
>>>> org.apache.nutch.scoring.webgraph.LinkRank.run(LinkRank.java:657)
>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>> at
>>>> org.apache.nutch.scoring.webgraph.LinkRank.main(LinkRank.java:627)
>>>>
>>>> Another question what about indexing framework mentioned here:
>>>> http://www.mail-archive.com/nutch-user@lucene.apache.org/msg11764.html
>>>>
>>>>
>>>> Have all those new scoring and indexing would be real step forward.
>>>>
>>>> Thanks,
>>>> Bartosz
>>>>
>>>
>>
>
Re: planning for nutch-1.0-rc1
Posted by Dennis Kubes <ku...@apache.org>.
Sorry about the docs being sparse on this. I will write more about the
process as time permits. Don't know about the problem below. What
platform are you running on, windows, linux?
Dennis
Bartosz Gadzimski wrote:
> Hello,
>
> Thanks Dennis for updateing wiki it helped a lot.
>
> You gave example with indexing but you didn't said a bit about it. Can
> you write some more? :)
>
> Anyways I have problems at the last step (nutch from 07 march):
>
> bin/nutch org.apache.nutch.indexer.field.FieldIndexer
>
> It simply stops somewhere
>
> 2009-03-07 16:09:04,432 INFO field.FieldIndexer - FieldIndexer: starting
> 2009-03-07 16:09:04,436 INFO field.FieldIndexer - FieldIndexer: adding
> fields db: crawl/fields/basicfields
> 2009-03-07 16:09:04,498 INFO field.FieldIndexer - FieldIndexer: adding
> fields db: crawl/fields/anchorfields
> 2009-03-07 16:09:05,636 INFO plugin.PluginRepository - Plugins: looking
> in: /usr/local/nutch/plugins
> 2009-03-07 16:09:06,437 INFO plugin.PluginRepository - Plugin
> Auto-activation mode: [true]
> 2009-03-07 16:09:06,437 INFO plugin.PluginRepository - Registered Plugins:
> 2009-03-07 16:09:06,437 INFO plugin.PluginRepository - the
> nutch core extension points (nutch-extensionpoints)
> 2009-03-07 16:09:06,437 INFO plugin.PluginRepository - Basic
> Query Filter (query-basic)
> .... plugins....
>
> 2009-03-07 16:09:07,769 INFO field.FieldIndexer - IFD [Thread-11]:
> setInfoStream
> deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@1b4a74b
>
> 2009-03-07 16:09:07,769 INFO field.FieldIndexer - IW 0 [Thread-11]:
> setInfoStream:
> dir=org.apache.lucene.store.FSDirectory@/tmp/hadoop-root/mapred/local/index/_-884655313
> autoCommit=true
> mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@15356d5
> mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@69d02b
> ramBufferSizeMB=16.0 maxBufferedDocs=50 maxBuffereDeleteTerms=-1
> maxFieldLength=10000 index=
> 2009-03-07 16:09:07,781 WARN mapred.LocalJobRunner - job_local_0001
> java.lang.NullPointerException
> at
> org.apache.nutch.indexer.field.FieldIndexer$OutputFormat$1.write(FieldIndexer.java:139)
>
> at
> org.apache.nutch.indexer.field.FieldIndexer$OutputFormat$1.write(FieldIndexer.java:131)
>
> at
> org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410)
> at
> org.apache.nutch.indexer.field.FieldIndexer.reduce(FieldIndexer.java:239)
> at
> org.apache.nutch.indexer.field.FieldIndexer.reduce(FieldIndexer.java:69)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)
> 2009-03-07 16:09:08,197 FATAL field.FieldIndexer - FieldIndexer:
> java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
> at
> org.apache.nutch.indexer.field.FieldIndexer.index(FieldIndexer.java:267)
> at
> org.apache.nutch.indexer.field.FieldIndexer.run(FieldIndexer.java:312)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at
> org.apache.nutch.indexer.field.FieldIndexer.main(FieldIndexer.java:275)
>
>
>
>
> In crawl/indexes is only _temporary folder.
>
> I will try to debug this but have problems with running nutch in eclipse
>
> Thanks,
> Bartosz
>
>
>
> Dennis Kubes pisze:
>> I don't know if I would make this primary yet. I need to check what
>> is causing this as it worked fine for me, in fact we currently have it
>> in production. Also we would need to update the shell scripts to
>> integrate this more tightly.
>>
>> Dennis
>>
>> Bartosz Gadzimski wrote:
>>> Sami Siren pisze:
>>>> Andrzej Bialecki wrote:
>>>>> Sami Siren wrote:
>>>>>> I am planning to build the first rc for nutch 1.0 at Tue 3.3.2009
>>>>>> morning (EET). There are still some issues marked as fix for 1.0
>>>>>> in Jira. Neither of the two remaining _bugs_ seems too important
>>>>>> to me, actually I only count the issues assigned to developers as
>>>>>> real candidates to be included in 1.0:
>>>>>>
>>>>>> NUTCH-578 (kubes)
>>>>>> NUTCH-477 (ab)
>>>>>> NUTCH-669 (siren)
>>>>>
>>>>> There's one Critical issue reported, related to NekoHTML
>>>>> (NUTCH-700). I'm not sure what are the feature differences
>>>>> (pertinent to Nutch) between 0.9.4 and 1.9.11 - perhaps downgrading
>>>>> is the safest course of action.
>>>> I will take care of that.
>>>>>
>>>>>
>>>>>> I am also volunteering to push all open issues to 1.1 before
>>>>>> starting the RC build on Tuesday. Any objections on the proposed
>>>>>> procedure or timing?
>>>>>
>>>>> Sounds good.
>>>> great!
>>>>
>>>> --
>>>> Sami Siren
>>>>
>>>>
>>>>
>>> What about new scoring and new indexing? Will it be integrated as a
>>> primary scoring algorithm? I have problem with it on LinkRank:
>>>
>>> 2009-03-02 20:43:45,708 INFO webgraph.LinkRank - Starting link
>>> counter job
>>> 2009-03-02 20:43:47,838 INFO webgraph.LinkRank - Finished link
>>> counter job
>>> 2009-03-02 20:43:47,839 INFO webgraph.LinkRank - Reading numlinks
>>> temp file
>>> 2009-03-02 20:43:47,840 INFO webgraph.LinkRank - Deleting numlinks
>>> temp file
>>> 2009-03-02 20:43:47,842 FATAL webgraph.LinkRank - LinkAnalysis:
>>> java.lang.NullPointerException
>>> at
>>> org.apache.nutch.scoring.webgraph.LinkRank.runCounter(LinkRank.java:113)
>>> at
>>> org.apache.nutch.scoring.webgraph.LinkRank.analyze(LinkRank.java:582)
>>> at
>>> org.apache.nutch.scoring.webgraph.LinkRank.run(LinkRank.java:657)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>> at
>>> org.apache.nutch.scoring.webgraph.LinkRank.main(LinkRank.java:627)
>>>
>>> Another question what about indexing framework mentioned here:
>>> http://www.mail-archive.com/nutch-user@lucene.apache.org/msg11764.html
>>>
>>>
>>> Have all those new scoring and indexing would be real step forward.
>>>
>>> Thanks,
>>> Bartosz
>>>
>>
>
Re: planning for nutch-1.0-rc1
Posted by Bartosz Gadzimski <ba...@o2.pl>.
Hello,
Thanks Dennis for updateing wiki it helped a lot.
You gave example with indexing but you didn't said a bit about it. Can
you write some more? :)
Anyways I have problems at the last step (nutch from 07 march):
bin/nutch org.apache.nutch.indexer.field.FieldIndexer
It simply stops somewhere
2009-03-07 16:09:04,432 INFO field.FieldIndexer - FieldIndexer: starting
2009-03-07 16:09:04,436 INFO field.FieldIndexer - FieldIndexer: adding fields db: crawl/fields/basicfields
2009-03-07 16:09:04,498 INFO field.FieldIndexer - FieldIndexer: adding fields db: crawl/fields/anchorfields
2009-03-07 16:09:05,636 INFO plugin.PluginRepository - Plugins: looking in: /usr/local/nutch/plugins
2009-03-07 16:09:06,437 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true]
2009-03-07 16:09:06,437 INFO plugin.PluginRepository - Registered Plugins:
2009-03-07 16:09:06,437 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints)
2009-03-07 16:09:06,437 INFO plugin.PluginRepository - Basic Query Filter (query-basic)
.... plugins....
2009-03-07 16:09:07,769 INFO field.FieldIndexer - IFD [Thread-11]: setInfoStream deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@1b4a74b
2009-03-07 16:09:07,769 INFO field.FieldIndexer - IW 0 [Thread-11]: setInfoStream: dir=org.apache.lucene.store.FSDirectory@/tmp/hadoop-root/mapred/local/index/_-884655313 autoCommit=true mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@15356d5 mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@69d02b ramBufferSizeMB=16.0 maxBufferedDocs=50 maxBuffereDeleteTerms=-1 maxFieldLength=10000 index=
2009-03-07 16:09:07,781 WARN mapred.LocalJobRunner - job_local_0001
java.lang.NullPointerException
at org.apache.nutch.indexer.field.FieldIndexer$OutputFormat$1.write(FieldIndexer.java:139)
at org.apache.nutch.indexer.field.FieldIndexer$OutputFormat$1.write(FieldIndexer.java:131)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410)
at org.apache.nutch.indexer.field.FieldIndexer.reduce(FieldIndexer.java:239)
at org.apache.nutch.indexer.field.FieldIndexer.reduce(FieldIndexer.java:69)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)
2009-03-07 16:09:08,197 FATAL field.FieldIndexer - FieldIndexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at org.apache.nutch.indexer.field.FieldIndexer.index(FieldIndexer.java:267)
at org.apache.nutch.indexer.field.FieldIndexer.run(FieldIndexer.java:312)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.field.FieldIndexer.main(FieldIndexer.java:275)
In crawl/indexes is only _temporary folder.
I will try to debug this but have problems with running nutch in eclipse
Thanks,
Bartosz
Dennis Kubes pisze:
> I don't know if I would make this primary yet. I need to check what
> is causing this as it worked fine for me, in fact we currently have it
> in production. Also we would need to update the shell scripts to
> integrate this more tightly.
>
> Dennis
>
> Bartosz Gadzimski wrote:
>> Sami Siren pisze:
>>> Andrzej Bialecki wrote:
>>>> Sami Siren wrote:
>>>>> I am planning to build the first rc for nutch 1.0 at Tue 3.3.2009
>>>>> morning (EET). There are still some issues marked as fix for 1.0
>>>>> in Jira. Neither of the two remaining _bugs_ seems too important
>>>>> to me, actually I only count the issues assigned to developers as
>>>>> real candidates to be included in 1.0:
>>>>>
>>>>> NUTCH-578 (kubes)
>>>>> NUTCH-477 (ab)
>>>>> NUTCH-669 (siren)
>>>>
>>>> There's one Critical issue reported, related to NekoHTML
>>>> (NUTCH-700). I'm not sure what are the feature differences
>>>> (pertinent to Nutch) between 0.9.4 and 1.9.11 - perhaps downgrading
>>>> is the safest course of action.
>>> I will take care of that.
>>>>
>>>>
>>>>> I am also volunteering to push all open issues to 1.1 before
>>>>> starting the RC build on Tuesday. Any objections on the proposed
>>>>> procedure or timing?
>>>>
>>>> Sounds good.
>>> great!
>>>
>>> --
>>> Sami Siren
>>>
>>>
>>>
>> What about new scoring and new indexing? Will it be integrated as a
>> primary scoring algorithm? I have problem with it on LinkRank:
>>
>> 2009-03-02 20:43:45,708 INFO webgraph.LinkRank - Starting link
>> counter job
>> 2009-03-02 20:43:47,838 INFO webgraph.LinkRank - Finished link
>> counter job
>> 2009-03-02 20:43:47,839 INFO webgraph.LinkRank - Reading numlinks
>> temp file
>> 2009-03-02 20:43:47,840 INFO webgraph.LinkRank - Deleting numlinks
>> temp file
>> 2009-03-02 20:43:47,842 FATAL webgraph.LinkRank - LinkAnalysis:
>> java.lang.NullPointerException
>> at
>> org.apache.nutch.scoring.webgraph.LinkRank.runCounter(LinkRank.java:113)
>> at
>> org.apache.nutch.scoring.webgraph.LinkRank.analyze(LinkRank.java:582)
>> at
>> org.apache.nutch.scoring.webgraph.LinkRank.run(LinkRank.java:657)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at
>> org.apache.nutch.scoring.webgraph.LinkRank.main(LinkRank.java:627)
>>
>> Another question what about indexing framework mentioned here:
>> http://www.mail-archive.com/nutch-user@lucene.apache.org/msg11764.html
>>
>>
>> Have all those new scoring and indexing would be real step forward.
>>
>> Thanks,
>> Bartosz
>>
>
Re: planning for nutch-1.0-rc1
Posted by Andrzej Bialecki <ab...@getopt.org>.
Dennis Kubes wrote:
> I don't know if I would make this primary yet.
Not before 1.0 ... ;) After that, we need to discuss what to do with the
new and the current framework.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: planning for nutch-1.0-rc1
Posted by Dennis Kubes <ku...@apache.org>.
I don't know if I would make this primary yet. I need to check what is
causing this as it worked fine for me, in fact we currently have it in
production. Also we would need to update the shell scripts to integrate
this more tightly.
Dennis
Bartosz Gadzimski wrote:
> Sami Siren pisze:
>> Andrzej Bialecki wrote:
>>> Sami Siren wrote:
>>>> I am planning to build the first rc for nutch 1.0 at Tue 3.3.2009
>>>> morning (EET). There are still some issues marked as fix for 1.0 in
>>>> Jira. Neither of the two remaining _bugs_ seems too important to me,
>>>> actually I only count the issues assigned to developers as real
>>>> candidates to be included in 1.0:
>>>>
>>>> NUTCH-578 (kubes)
>>>> NUTCH-477 (ab)
>>>> NUTCH-669 (siren)
>>>
>>> There's one Critical issue reported, related to NekoHTML (NUTCH-700).
>>> I'm not sure what are the feature differences (pertinent to Nutch)
>>> between 0.9.4 and 1.9.11 - perhaps downgrading is the safest course
>>> of action.
>> I will take care of that.
>>>
>>>
>>>> I am also volunteering to push all open issues to 1.1 before
>>>> starting the RC build on Tuesday. Any objections on the proposed
>>>> procedure or timing?
>>>
>>> Sounds good.
>> great!
>>
>> --
>> Sami Siren
>>
>>
>>
> What about new scoring and new indexing? Will it be integrated as a
> primary scoring algorithm? I have problem with it on LinkRank:
>
> 2009-03-02 20:43:45,708 INFO webgraph.LinkRank - Starting link counter job
> 2009-03-02 20:43:47,838 INFO webgraph.LinkRank - Finished link counter job
> 2009-03-02 20:43:47,839 INFO webgraph.LinkRank - Reading numlinks temp
> file
> 2009-03-02 20:43:47,840 INFO webgraph.LinkRank - Deleting numlinks temp
> file
> 2009-03-02 20:43:47,842 FATAL webgraph.LinkRank - LinkAnalysis:
> java.lang.NullPointerException
> at
> org.apache.nutch.scoring.webgraph.LinkRank.runCounter(LinkRank.java:113)
> at
> org.apache.nutch.scoring.webgraph.LinkRank.analyze(LinkRank.java:582)
> at org.apache.nutch.scoring.webgraph.LinkRank.run(LinkRank.java:657)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at
> org.apache.nutch.scoring.webgraph.LinkRank.main(LinkRank.java:627)
>
> Another question what about indexing framework mentioned here:
> http://www.mail-archive.com/nutch-user@lucene.apache.org/msg11764.html
>
>
> Have all those new scoring and indexing would be real step forward.
>
> Thanks,
> Bartosz
>
Re: planning for nutch-1.0-rc1
Posted by Bartosz Gadzimski <ba...@o2.pl>.
Sami Siren pisze:
> Andrzej Bialecki wrote:
>> Sami Siren wrote:
>>> I am planning to build the first rc for nutch 1.0 at Tue 3.3.2009
>>> morning (EET). There are still some issues marked as fix for 1.0 in
>>> Jira. Neither of the two remaining _bugs_ seems too important to me,
>>> actually I only count the issues assigned to developers as real
>>> candidates to be included in 1.0:
>>>
>>> NUTCH-578 (kubes)
>>> NUTCH-477 (ab)
>>> NUTCH-669 (siren)
>>
>> There's one Critical issue reported, related to NekoHTML (NUTCH-700).
>> I'm not sure what are the feature differences (pertinent to Nutch)
>> between 0.9.4 and 1.9.11 - perhaps downgrading is the safest course
>> of action.
> I will take care of that.
>>
>>
>>> I am also volunteering to push all open issues to 1.1 before
>>> starting the RC build on Tuesday. Any objections on the proposed
>>> procedure or timing?
>>
>> Sounds good.
> great!
>
> --
> Sami Siren
>
>
>
What about new scoring and new indexing? Will it be integrated as a
primary scoring algorithm? I have problem with it on LinkRank:
2009-03-02 20:43:45,708 INFO webgraph.LinkRank - Starting link counter job
2009-03-02 20:43:47,838 INFO webgraph.LinkRank - Finished link counter job
2009-03-02 20:43:47,839 INFO webgraph.LinkRank - Reading numlinks temp file
2009-03-02 20:43:47,840 INFO webgraph.LinkRank - Deleting numlinks temp
file
2009-03-02 20:43:47,842 FATAL webgraph.LinkRank - LinkAnalysis:
java.lang.NullPointerException
at
org.apache.nutch.scoring.webgraph.LinkRank.runCounter(LinkRank.java:113)
at
org.apache.nutch.scoring.webgraph.LinkRank.analyze(LinkRank.java:582)
at org.apache.nutch.scoring.webgraph.LinkRank.run(LinkRank.java:657)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.nutch.scoring.webgraph.LinkRank.main(LinkRank.java:627)
Another question what about indexing framework mentioned here:
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg11764.html
Have all those new scoring and indexing would be real step forward.
Thanks,
Bartosz
Re: planning for nutch-1.0-rc1
Posted by Sami Siren <ss...@gmail.com>.
Andrzej Bialecki wrote:
> Sami Siren wrote:
>> I am planning to build the first rc for nutch 1.0 at Tue 3.3.2009
>> morning (EET). There are still some issues marked as fix for 1.0 in
>> Jira. Neither of the two remaining _bugs_ seems too important to me,
>> actually I only count the issues assigned to developers as real
>> candidates to be included in 1.0:
>>
>> NUTCH-578 (kubes)
>> NUTCH-477 (ab)
>> NUTCH-669 (siren)
>
> There's one Critical issue reported, related to NekoHTML (NUTCH-700).
> I'm not sure what are the feature differences (pertinent to Nutch)
> between 0.9.4 and 1.9.11 - perhaps downgrading is the safest course of
> action.
I will take care of that.
>
>
>> I am also volunteering to push all open issues to 1.1 before starting
>> the RC build on Tuesday. Any objections on the proposed procedure or
>> timing?
>
> Sounds good.
great!
--
Sami Siren
Re: planning for nutch-1.0-rc1
Posted by Andrzej Bialecki <ab...@getopt.org>.
Sami Siren wrote:
> I am planning to build the first rc for nutch 1.0 at Tue 3.3.2009
> morning (EET). There are still some issues marked as fix for 1.0 in
> Jira. Neither of the two remaining _bugs_ seems too important to me,
> actually I only count the issues assigned to developers as real
> candidates to be included in 1.0:
>
> NUTCH-578 (kubes)
> NUTCH-477 (ab)
> NUTCH-669 (siren)
There's one Critical issue reported, related to NekoHTML (NUTCH-700).
I'm not sure what are the feature differences (pertinent to Nutch)
between 0.9.4 and 1.9.11 - perhaps downgrading is the safest course of
action.
> I am also volunteering to push all open issues to 1.1 before starting
> the RC build on Tuesday. Any objections on the proposed procedure or
> timing?
Sounds good.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: planning for nutch-1.0-rc1
Posted by Andrzej Bialecki <ab...@getopt.org>.
Sami Siren wrote:
> NUTCH-477 (ab)
I decided to postpone this - the patch brings a lot of complexity, and
it seems that it would be useful to few users.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: planning for nutch-1.0-rc1
Posted by Sami Siren <ss...@gmail.com>.
I am sure all of you noticed that the release planned to be cut during
this week was delayed because of a new discovery right before the
deadline (NUTCH-711). That has now been fixed so it's time to move on. I
am now going to build the first RC during the weekend.
--
Sami Siren
Sami Siren wrote:
> I am planning to build the first rc for nutch 1.0 at Tue 3.3.2009
> morning (EET). There are still some issues marked as fix for 1.0 in
> Jira. Neither of the two remaining _bugs_ seems too important to me,
> actually I only count the issues assigned to developers as real
> candidates to be included in 1.0:
>
> NUTCH-578 (kubes)
> NUTCH-477 (ab)
> NUTCH-669 (siren)
>
> I am also volunteering to push all open issues to 1.1 before starting
> the RC build on Tuesday. Any objections on the proposed procedure or
> timing?
>
> --
> Sami Siren
>
Re: planning for nutch-1.0-rc1
Posted by Dennis Kubes <ku...@apache.org>.
NUTCH-578 was a while back but as I remember it worked fine. No
objections to either including or pushing it.
Dennis
Sami Siren wrote:
> I am planning to build the first rc for nutch 1.0 at Tue 3.3.2009
> morning (EET). There are still some issues marked as fix for 1.0 in
> Jira. Neither of the two remaining _bugs_ seems too important to me,
> actually I only count the issues assigned to developers as real
> candidates to be included in 1.0:
>
> NUTCH-578 (kubes)
> NUTCH-477 (ab)
> NUTCH-669 (siren)
>
> I am also volunteering to push all open issues to 1.1 before starting
> the RC build on Tuesday. Any objections on the proposed procedure or
> timing?
>
> --
> Sami Siren
>