You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by lewis john mcgibbney <le...@apache.org> on 2012/06/08 16:49:19 UTC

VOTE Apache Nutch 2.0 RC1

Good Evening Everyone,

A candidate for the Apache Nutch 2.0 RC1 is available at:

http://people.apache.org/~lewismc/nutch-2.0

The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
archive of the sources in:

http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc1

Further, a staged Maven repository of the 2.0 jar, sources.jar and
javadoc.jar is available here:

https://repository.apache.org/content/repositories/orgapachenutch-215

Please vote on releasing this package as Apache Nutch 2.0.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Nutch PMC votes are cast.

 [ ] +1 Release this package as Apache Nutch 2.0
 [ ] -1 Do not release this package because...

Many Thanks and heres to plenty more.

Have a great weekend, Kind Regards,
Lewis

P.S. Here's my +1.

Re: VOTE Apache Nutch 2.0 RC1

Posted by Ferdy Galema <fe...@kalooga.com>.
Maybe just 1392? I went ahead and made a patch that should fix this. Feel
free to commit or ignore prior to RC2.

On Thu, Jun 14, 2012 at 1:44 AM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi Sebastian,
>
> On Wed, Jun 13, 2012 at 11:30 PM, Sebastian Nagel
> <wa...@googlemail.com> wrote:
> >I'll managed to perform a crawl with 2.0 and HBase: it rocks, indeed.
> > Much simpler than 1.x (no segments!).
>
> :0)
>
> > % ./bin/nutch readdb -stats
> > WebTable statistics start
> > WebTableReader: java.io.EOFException
> >        at java.io.DataInputStream.readFully(DataInputStream.java:197)
> >        at java.io.DataInputStream.readFully(DataInputStream.java:169)
> >        at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
> >        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
> >        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
> >        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
> >        at
> >
> org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileOutputFormat.java:89)
> >        at
> org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:537)
> >        at
> org.apache.nutch.crawl.WebTableReader.processStatJob(WebTableReader.java:218)
> >        at
> org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:479)
> >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >        at
> org.apache.nutch.crawl.WebTableReader.main(WebTableReader.java:412)
> > --> readdb -dump works.
>
> Confirmed and ticket opened as NUTCH-1391
>
> > % ./bin/nutch fetch 1339621550-203073321 -threads 1 -parse
> > Exception in thread "main" java.lang.IllegalArgumentException: arg
> -parse not recognized
>
> The parse argument was removed in Nutch 2.0 and now throws an
> illegalargumentexception. This is now normal. To enable parsing during
> fetching please set config in nutch-site.xml. The reason that the
> incorrect -parse argument is till in the Usage message, is because I
> was not diligent enough when patching the fetcher CLI aesthetics. I'll
> address this within the issue below as well.
>
> >
> >
> > % ./bin/nutch parse -all -force -resume
> > ParserJob: starting
> > ParserJob: resuming:    false           <<< -resume and
> > ParserJob: forced reparse:      false   <<< -force obviously ignored ?
> > ParserJob: parsing all
>
> Yes confirmed and ticket opened as NUTCH-1392
>
>
> > % ./bin/nutch generate
> > --> generates batchid, but should show help as in 1.x ?
> > --> is there an option -topN ?
>
> Yes this is opened in NUTCH-1393. Users may not necessarily wish to
> generate at all, instead wishing to merely find out the GeneratorJob
> CLI options... I will open this just now and fix for 2.1.
>
> > The 2.0 Solr schema and mappings still contain the field "site"
> > which has been removed in 1.x (NUTCH-1232).
> > Should be done also in 2.0: it's easier to maintain only one Solr
> installation
> > for all Nutch versions.
>
> Logged in NUTCH-1394
>
> Thanks Seb for your contributions here... this is exactly what we are
> after.
>
> Does anyone have issues with running another RC and addressing these
> issues in 2.1?
>
> --
> Lewis
>

Re: VOTE Apache Nutch 2.0 RC1

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Sebastian,

On Wed, Jun 13, 2012 at 11:30 PM, Sebastian Nagel
<wa...@googlemail.com> wrote:
>I'll managed to perform a crawl with 2.0 and HBase: it rocks, indeed.
> Much simpler than 1.x (no segments!).

:0)

> % ./bin/nutch readdb -stats
> WebTable statistics start
> WebTableReader: java.io.EOFException
>        at java.io.DataInputStream.readFully(DataInputStream.java:197)
>        at java.io.DataInputStream.readFully(DataInputStream.java:169)
>        at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
>        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
>        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
>        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
>        at
> org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileOutputFormat.java:89)
>        at org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:537)
>        at org.apache.nutch.crawl.WebTableReader.processStatJob(WebTableReader.java:218)
>        at org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:479)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.nutch.crawl.WebTableReader.main(WebTableReader.java:412)
> --> readdb -dump works.

Confirmed and ticket opened as NUTCH-1391

> % ./bin/nutch fetch 1339621550-203073321 -threads 1 -parse
> Exception in thread "main" java.lang.IllegalArgumentException: arg -parse not recognized

The parse argument was removed in Nutch 2.0 and now throws an
illegalargumentexception. This is now normal. To enable parsing during
fetching please set config in nutch-site.xml. The reason that the
incorrect -parse argument is till in the Usage message, is because I
was not diligent enough when patching the fetcher CLI aesthetics. I'll
address this within the issue below as well.

>
>
> % ./bin/nutch parse -all -force -resume
> ParserJob: starting
> ParserJob: resuming:    false           <<< -resume and
> ParserJob: forced reparse:      false   <<< -force obviously ignored ?
> ParserJob: parsing all

Yes confirmed and ticket opened as NUTCH-1392


> % ./bin/nutch generate
> --> generates batchid, but should show help as in 1.x ?
> --> is there an option -topN ?

Yes this is opened in NUTCH-1393. Users may not necessarily wish to
generate at all, instead wishing to merely find out the GeneratorJob
CLI options... I will open this just now and fix for 2.1.

> The 2.0 Solr schema and mappings still contain the field "site"
> which has been removed in 1.x (NUTCH-1232).
> Should be done also in 2.0: it's easier to maintain only one Solr installation
> for all Nutch versions.

Logged in NUTCH-1394

Thanks Seb for your contributions here... this is exactly what we are after.

Does anyone have issues with running another RC and addressing these
issues in 2.1?

-- 
Lewis

Re: VOTE Apache Nutch 2.0 RC1

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi Lewis,

> Please see http://wiki.apache.org/nutch/Nutch2Tutorial which is an
> update of Julien's (I think) page on GORA_HBase. Thsi will get you
> rocking with HBase. The changes between Cassandra, Accumulo and the
> other data stores are fairly trivial.

I'll managed to perform a crawl with 2.0 and HBase: it rocks, indeed.
Much simpler than 1.x (no segments!).

Below a couple of problems I've run into (possible issues to be adressed in 2.1).

Cheers,
Sebastian



% ./bin/nutch readdb -stats
WebTable statistics start
WebTableReader: java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:197)
        at java.io.DataInputStream.readFully(DataInputStream.java:169)
        at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
        at
org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileOutputFormat.java:89)
        at org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:537)
        at org.apache.nutch.crawl.WebTableReader.processStatJob(WebTableReader.java:218)
        at org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:479)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.WebTableReader.main(WebTableReader.java:412)
--> readdb -dump works.



% ./bin/nutch fetch 1339621550-203073321 -threads 1 -parse
Exception in thread "main" java.lang.IllegalArgumentException: arg -parse not recognized



% ./bin/nutch parse -all -force -resume
ParserJob: starting
ParserJob: resuming:    false           <<< -resume and
ParserJob: forced reparse:      false   <<< -force obviously ignored ?
ParserJob: parsing all



% ./bin/nutch generate
--> generates batchid, but should show help as in 1.x ?
--> is there an option -topN ?



The 2.0 Solr schema and mappings still contain the field "site"
which has been removed in 1.x (NUTCH-1232).
Should be done also in 2.0: it's easier to maintain only one Solr installation
for all Nutch versions.


Re: VOTE Apache Nutch 2.0 RC1

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Seb,

Quick update

On Tue, Jun 12, 2012 at 11:33 PM, Sebastian Nagel
<wa...@googlemail.com> wrote:
>1 some guidance would be nice. README.txt points
> to http://wiki.apache.org/nutch/NutchTutorial which refers to 1.x

Please see http://wiki.apache.org/nutch/Nutch2Tutorial which is an
update of Julien's (I think) page on GORA_HBase. Thsi will get you
rocking with HBase. The changes between Cassandra, Accumulo and the
other data stores are fairly trivial.

> 2 the package contains your nutch-site.xml:
>    <name>http.agent.email</name>
>    <value>lewismc@apache.org</value>
> I guess that's not intended :)

I'll deal with this when I spin RC2. Thanks

Lewis

Re: VOTE Apache Nutch 2.0 RC1

Posted by Lewis John Mcgibbney <le...@gmail.com>.
This is what is currently done and what I was essentially proposing.

I really don't know about the size of the bin artifact if we enable all
gora-* dependencies before packaging it for distribution... thanks to input
from yourselves we recently sorted out some size issues with 1.5, it would
be good to to have 2.0 shadow this.

I am +1 for shipping just src distributions for 2.0, this would keep the
default (gora-sql 0.1.1-incubating) ivy configuration.

If users can't do 'ant runtime' then you kinda got to wonder how they're
using Nutch at all...

On Thu, Jun 14, 2012 at 9:56 PM, Julien Nioche <
lists.digitalpebble@gmail.com> wrote:

> yep, remember that you can't build from the bin package so inevitably
> someone will wonder why only such or such backend is available etc...
>
> another option is to NOT have a binary release at all, in which case it is
> acceptable I think not to include the deps in ivy. Maybe we should at least
> add them but comment them out
>
> Ju
>
>
> On 14 June 2012 21:51, Lewis John Mcgibbney <le...@gmail.com>wrote:
>
>> Hi Julien,
>>
>> Do you suggest with the binary release that we simply open up all gora-*
>> deps and ship it with every jar available?
>>
>> Lewis
>>
>>
>> On Thu, Jun 14, 2012 at 9:39 PM, Julien Nioche <
>> lists.digitalpebble@gmail.com> wrote:
>>
>>> I disagree. You'd expect a binary release to work out of the box - which
>>> is not the case. Plus we'd have to spend more time explaining the
>>> workaround, answering the same questions over and over on the ML etc...
>>> Fixing this should not be a big deal (i.e. add the gore-x modules for the
>>> backends to the ivy deps file).
>>>
>>> Julien
>>>
>>>
>>> On 14 June 2012 20:27, Mattmann, Chris A (388J) <
>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>
>>>> Hey Guys,
>>>>
>>>> I think the annoyance is probably something folks can live with as they
>>>> have been
>>>> waiting for an "official" release of 2.x for years :)
>>>>
>>>> My +1 to roll RC #2 with or without a solution to this and mark it as a
>>>> TODO. "release
>>>> eary", "release often" :)
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>> On Jun 14, 2012, at 10:04 AM, Lewis John Mcgibbney wrote:
>>>>
>>>> > Aye this is no good at all. Depending on which backend you wish to
>>>> use with Gora, you will need to go and manually fetch the correct .jar's
>>>> from maven central.
>>>> >
>>>> > Does anyone else have either solution or a workaround before I push
>>>> RC2 with just src dists?
>>>> >
>>>> > Thanks
>>>> >
>>>> > Lewis
>>>> >
>>>> > On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel <
>>>> wastl.nagel@googlemail.com> wrote:
>>>> > > We only supply src distributions...
>>>> > > Does this principle apply to Nutch 2 as well?
>>>> > Maybe, yes.
>>>> > The situation with the current binary package is uncomfortable:
>>>> > I had to copy/link gora-hbase and hbase jars into lib/ to get nutch
>>>> running.
>>>> >
>>>> > 2012/6/13 Lewis John Mcgibbney <le...@gmail.com>
>>>> > Hi Guys,
>>>> >
>>>> > Whilst updating the Nutch2Tutorial I got thinking that within Gora we
>>>> don't supply binary distributions of the code, this is because when using
>>>> Gora a user may wish/require to recompile the code to accomodate config
>>>> changes etc. We only supply src distributions...
>>>> >
>>>> > Does this principle apply to Nutch 2 as well? I mean, what if your
>>>> using the gora-sql dependency, then you wish to switch to HBase and
>>>> recompile, is this possible within the binary distribution?
>>>> >
>>>> > Best
>>>> >
>>>> > Lewis
>>>> >
>>>> >
>>>> > On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche <
>>>> lists.digitalpebble@gmail.com> wrote:
>>>> > Ferdy
>>>> >
>>>> > The Nutch job jar is not present in the binary archive. This means
>>>> distributed running of jobs is not supported. I'm not sure if this is a
>>>> problem (since users can always build one themselves), merely pointing it
>>>> out. The recently released 1.5 also lacks this job jar, so at least no
>>>> difference there.
>>>> >
>>>> > The binary distrib corresponds to runtime/local and as such should
>>>> NOT have the job file there. This is now the norm since 1.5
>>>> >
>>>> > Will try and do some testing of the RC
>>>> >
>>>> > Thanks
>>>> >
>>>> > Julien
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> >
>>>> > Open Source Solutions for Text Engineering
>>>> >
>>>> > http://digitalpebble.blogspot.com/
>>>> > http://www.digitalpebble.com
>>>> > http://twitter.com/digitalpebble
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Lewis
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Lewis
>>>> >
>>>>
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: chris.a.mattmann@nasa.gov
>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>
>>>
>>>
>>> --
>>> *
>>> *
>>> Open Source Solutions for Text Engineering
>>>
>>> http://digitalpebble.blogspot.com/
>>> http://www.digitalpebble.com
>>> http://twitter.com/digitalpebble
>>>
>>>
>>
>>
>> --
>> *Lewis*
>>
>>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>
>


-- 
*Lewis*

Re: VOTE Apache Nutch 2.0 RC1

Posted by Julien Nioche <li...@gmail.com>.
yep, remember that you can't build from the bin package so inevitably
someone will wonder why only such or such backend is available etc...

another option is to NOT have a binary release at all, in which case it is
acceptable I think not to include the deps in ivy. Maybe we should at least
add them but comment them out

Ju

On 14 June 2012 21:51, Lewis John Mcgibbney <le...@gmail.com>wrote:

> Hi Julien,
>
> Do you suggest with the binary release that we simply open up all gora-*
> deps and ship it with every jar available?
>
> Lewis
>
>
> On Thu, Jun 14, 2012 at 9:39 PM, Julien Nioche <
> lists.digitalpebble@gmail.com> wrote:
>
>> I disagree. You'd expect a binary release to work out of the box - which
>> is not the case. Plus we'd have to spend more time explaining the
>> workaround, answering the same questions over and over on the ML etc...
>> Fixing this should not be a big deal (i.e. add the gore-x modules for the
>> backends to the ivy deps file).
>>
>> Julien
>>
>>
>> On 14 June 2012 20:27, Mattmann, Chris A (388J) <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>
>>> Hey Guys,
>>>
>>> I think the annoyance is probably something folks can live with as they
>>> have been
>>> waiting for an "official" release of 2.x for years :)
>>>
>>> My +1 to roll RC #2 with or without a solution to this and mark it as a
>>> TODO. "release
>>> eary", "release often" :)
>>>
>>> Cheers,
>>> Chris
>>>
>>> On Jun 14, 2012, at 10:04 AM, Lewis John Mcgibbney wrote:
>>>
>>> > Aye this is no good at all. Depending on which backend you wish to use
>>> with Gora, you will need to go and manually fetch the correct .jar's from
>>> maven central.
>>> >
>>> > Does anyone else have either solution or a workaround before I push
>>> RC2 with just src dists?
>>> >
>>> > Thanks
>>> >
>>> > Lewis
>>> >
>>> > On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel <
>>> wastl.nagel@googlemail.com> wrote:
>>> > > We only supply src distributions...
>>> > > Does this principle apply to Nutch 2 as well?
>>> > Maybe, yes.
>>> > The situation with the current binary package is uncomfortable:
>>> > I had to copy/link gora-hbase and hbase jars into lib/ to get nutch
>>> running.
>>> >
>>> > 2012/6/13 Lewis John Mcgibbney <le...@gmail.com>
>>> > Hi Guys,
>>> >
>>> > Whilst updating the Nutch2Tutorial I got thinking that within Gora we
>>> don't supply binary distributions of the code, this is because when using
>>> Gora a user may wish/require to recompile the code to accomodate config
>>> changes etc. We only supply src distributions...
>>> >
>>> > Does this principle apply to Nutch 2 as well? I mean, what if your
>>> using the gora-sql dependency, then you wish to switch to HBase and
>>> recompile, is this possible within the binary distribution?
>>> >
>>> > Best
>>> >
>>> > Lewis
>>> >
>>> >
>>> > On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche <
>>> lists.digitalpebble@gmail.com> wrote:
>>> > Ferdy
>>> >
>>> > The Nutch job jar is not present in the binary archive. This means
>>> distributed running of jobs is not supported. I'm not sure if this is a
>>> problem (since users can always build one themselves), merely pointing it
>>> out. The recently released 1.5 also lacks this job jar, so at least no
>>> difference there.
>>> >
>>> > The binary distrib corresponds to runtime/local and as such should NOT
>>> have the job file there. This is now the norm since 1.5
>>> >
>>> > Will try and do some testing of the RC
>>> >
>>> > Thanks
>>> >
>>> > Julien
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> > Open Source Solutions for Text Engineering
>>> >
>>> > http://digitalpebble.blogspot.com/
>>> > http://www.digitalpebble.com
>>> > http://twitter.com/digitalpebble
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Lewis
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Lewis
>>> >
>>>
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>
>>
>> --
>> *
>> *
>> Open Source Solutions for Text Engineering
>>
>> http://digitalpebble.blogspot.com/
>> http://www.digitalpebble.com
>> http://twitter.com/digitalpebble
>>
>>
>
>
> --
> *Lewis*
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: VOTE Apache Nutch 2.0 RC1

Posted by Julien Nioche <li...@gmail.com>.
That was not intented. Just that am on holidays, it's raining and the
children were either asleep or playing nicely :-)

On 15 June 2012 18:19, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> OK you are just making us all look bad now Juls ;)
>
> Super fast!
>
> Cheers,
> Chris
>
>
> On Jun 15, 2012, at 2:54 AM, Julien Nioche wrote:
>
> > see https://issues.apache.org/jira/browse/NUTCH-1396
> >
> > On 15 June 2012 10:43, Julien Nioche <li...@gmail.com>
> wrote:
> > Before you do, could you check that NutchGora passes ant test
> successfully. I just tried and got an error related to the parse-tika
> tests. Am about to open a JIRA to update to the latest version of Tika for
> NutchGora which should fix the problem and put it at the same level as trunk
> >
> > J
> >
> > On 15 June 2012 10:01, Lewis John Mcgibbney <le...@gmail.com>
> wrote:ly
> >
> > I'll push this in an hour or so guys.
> >
> > Thanks for the input.
> >
> > Lewis
> >
> >
> > On Fri, Jun 15, 2012 at 9:39 AM, Julien Nioche <
> lists.digitalpebble@gmail.com> wrote:
> > +1
> >
> >
> > On 15 June 2012 09:00, Ferdy Galema <fe...@kalooga.com> wrote:
> > Agree with only releasing src.
> >
> >
> > On Thu, Jun 14, 2012 at 11:32 PM, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
> > Or just not ship a bin release at all. Src is the only thing we really
> VOTE on legally though bin is provided for convenience purposes. Will type
> more on this later...
> >
> > Sent from my iPhone
> >
> > On Jun 14, 2012, at 2:18 PM, "Lewis John Mcgibbney" <
> lewis.mcgibbney@gmail.com> wrote:
> >
> >> Hi Julien,
> >>
> >> Do you suggest with the binary release that we simply open up all
> gora-* deps and ship it with every jar available?
> >>
> >> Lewis
> >>
> >> On Thu, Jun 14, 2012 at 9:39 PM, Julien Nioche <
> lists.digitalpebble@gmail.com> wrote:
> >> I disagree. You'd expect a binary release to work out of the box -
> which is not the case. Plus we'd have to spend more time explaining the
> workaround, answering the same questions over and over on the ML etc...
> Fixing this should not be a big deal (i.e. add the gore-x modules for the
> backends to the ivy deps file).
> >>
> >> Julien
> >>
> >>
> >> On 14 June 2012 20:27, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
> >> Hey Guys,
> >>
> >> I think the annoyance is probably something folks can live with as they
> have been
> >> waiting for an "official" release of 2.x for years :)
> >>
> >> My +1 to roll RC #2 with or without a solution to this and mark it as a
> TODO. "release
> >> eary", "release often" :)
> >>
> >> Cheers,
> >> Chris
> >>
> >> On Jun 14, 2012, at 10:04 AM, Lewis John Mcgibbney wrote:
> >>
> >> > Aye this is no good at all. Depending on which backend you wish to
> use with Gora, you will need to go and manually fetch the correct .jar's
> from maven central.
> >> >
> >> > Does anyone else have either solution or a workaround before I push
> RC2 with just src dists?
> >> >
> >> > Thanks
> >> >
> >> > Lewis
> >> >
> >> > On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel <
> wastl.nagel@googlemail.com> wrote:
> >> > > We only supply src distributions...
> >> > > Does this principle apply to Nutch 2 as well?
> >> > Maybe, yes.
> >> > The situation with the current binary package is uncomfortable:
> >> > I had to copy/link gora-hbase and hbase jars into lib/ to get nutch
> running.
> >> >
> >> > 2012/6/13 Lewis John Mcgibbney <le...@gmail.com>
> >> > Hi Guys,
> >> >
> >> > Whilst updating the Nutch2Tutorial I got thinking that within Gora we
> don't supply binary distributions of the code, this is because when using
> Gora a user may wish/require to recompile the code to accomodate config
> changes etc. We only supply src distributions...
> >> >
> >> > Does this principle apply to Nutch 2 as well? I mean, what if your
> using the gora-sql dependency, then you wish to switch to HBase and
> recompile, is this possible within the binary distribution?
> >> >
> >> > Best
> >> >
> >> > Lewis
> >> >
> >> >
> >> > On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche <
> lists.digitalpebble@gmail.com> wrote:
> >> > Ferdy
> >> >
> >> > The Nutch job jar is not present in the binary archive. This means
> distributed running of jobs is not supported. I'm not sure if this is a
> problem (since users can always build one themselves), merely pointing it
> out. The recently released 1.5 also lacks this job jar, so at least no
> difference there.
> >> >
> >> > The binary distrib corresponds to runtime/local and as such should
> NOT have the job file there. This is now the norm since 1.5
> >> >
> >> > Will try and do some testing of the RC
> >> >
> >> > Thanks
> >> >
> >> > Julien
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Open Source Solutions for Text Engineering
> >> >
> >> > http://digitalpebble.blogspot.com/
> >> > http://www.digitalpebble.com
> >> > http://twitter.com/digitalpebble
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Lewis
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Lewis
> >> >
> >>
> >>
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Chris Mattmann, Ph.D.
> >> Senior Computer Scientist
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 171-266B, Mailstop: 171-246
> >> Email: chris.a.mattmann@nasa.gov
> >> WWW:   http://sunset.usc.edu/~mattmann/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Adjunct Assistant Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >>
> >>
> >>
> >> --
> >>
> >> Open Source Solutions for Text Engineering
> >>
> >> http://digitalpebble.blogspot.com/
> >> http://www.digitalpebble.com
> >> http://twitter.com/digitalpebble
> >>
> >>
> >>
> >>
> >> --
> >> Lewis
> >>
> >
> >
> >
> >
> > --
> >
> > Open Source Solutions for Text Engineering
> >
> > http://digitalpebble.blogspot.com/
> > http://www.digitalpebble.com
> > http://twitter.com/digitalpebble
> >
> >
> >
> >
> > --
> > Lewis
> >
> >
> >
> >
> > --
> >
> > Open Source Solutions for Text Engineering
> >
> > http://digitalpebble.blogspot.com/
> > http://www.digitalpebble.com
> > http://twitter.com/digitalpebble
> >
> >
> >
> >
> > --
> >
> > Open Source Solutions for Text Engineering
> >
> > http://digitalpebble.blogspot.com/
> > http://www.digitalpebble.com
> > http://twitter.com/digitalpebble
> >
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: VOTE Apache Nutch 2.0 RC1

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
OK you are just making us all look bad now Juls ;)

Super fast!

Cheers,
Chris


On Jun 15, 2012, at 2:54 AM, Julien Nioche wrote:

> see https://issues.apache.org/jira/browse/NUTCH-1396
> 
> On 15 June 2012 10:43, Julien Nioche <li...@gmail.com> wrote:
> Before you do, could you check that NutchGora passes ant test successfully. I just tried and got an error related to the parse-tika tests. Am about to open a JIRA to update to the latest version of Tika for NutchGora which should fix the problem and put it at the same level as trunk
> 
> J
> 
> On 15 June 2012 10:01, Lewis John Mcgibbney <le...@gmail.com> wrote:ly
> 
> I'll push this in an hour or so guys.
> 
> Thanks for the input.
> 
> Lewis
> 
> 
> On Fri, Jun 15, 2012 at 9:39 AM, Julien Nioche <li...@gmail.com> wrote:
> +1
> 
> 
> On 15 June 2012 09:00, Ferdy Galema <fe...@kalooga.com> wrote:
> Agree with only releasing src.
> 
> 
> On Thu, Jun 14, 2012 at 11:32 PM, Mattmann, Chris A (388J) <ch...@jpl.nasa.gov> wrote:
> Or just not ship a bin release at all. Src is the only thing we really VOTE on legally though bin is provided for convenience purposes. Will type more on this later...
> 
> Sent from my iPhone
> 
> On Jun 14, 2012, at 2:18 PM, "Lewis John Mcgibbney" <le...@gmail.com> wrote:
> 
>> Hi Julien,
>> 
>> Do you suggest with the binary release that we simply open up all gora-* deps and ship it with every jar available?
>> 
>> Lewis
>> 
>> On Thu, Jun 14, 2012 at 9:39 PM, Julien Nioche <li...@gmail.com> wrote:
>> I disagree. You'd expect a binary release to work out of the box - which is not the case. Plus we'd have to spend more time explaining the workaround, answering the same questions over and over on the ML etc... Fixing this should not be a big deal (i.e. add the gore-x modules for the backends to the ivy deps file).
>> 
>> Julien
>> 
>> 
>> On 14 June 2012 20:27, Mattmann, Chris A (388J) <ch...@jpl.nasa.gov> wrote:
>> Hey Guys,
>> 
>> I think the annoyance is probably something folks can live with as they have been
>> waiting for an "official" release of 2.x for years :)
>> 
>> My +1 to roll RC #2 with or without a solution to this and mark it as a TODO. "release
>> eary", "release often" :)
>> 
>> Cheers,
>> Chris
>> 
>> On Jun 14, 2012, at 10:04 AM, Lewis John Mcgibbney wrote:
>> 
>> > Aye this is no good at all. Depending on which backend you wish to use with Gora, you will need to go and manually fetch the correct .jar's from maven central.
>> >
>> > Does anyone else have either solution or a workaround before I push RC2 with just src dists?
>> >
>> > Thanks
>> >
>> > Lewis
>> >
>> > On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel <wa...@googlemail.com> wrote:
>> > > We only supply src distributions...
>> > > Does this principle apply to Nutch 2 as well?
>> > Maybe, yes.
>> > The situation with the current binary package is uncomfortable:
>> > I had to copy/link gora-hbase and hbase jars into lib/ to get nutch running.
>> >
>> > 2012/6/13 Lewis John Mcgibbney <le...@gmail.com>
>> > Hi Guys,
>> >
>> > Whilst updating the Nutch2Tutorial I got thinking that within Gora we don't supply binary distributions of the code, this is because when using Gora a user may wish/require to recompile the code to accomodate config changes etc. We only supply src distributions...
>> >
>> > Does this principle apply to Nutch 2 as well? I mean, what if your using the gora-sql dependency, then you wish to switch to HBase and recompile, is this possible within the binary distribution?
>> >
>> > Best
>> >
>> > Lewis
>> >
>> >
>> > On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche <li...@gmail.com> wrote:
>> > Ferdy
>> >
>> > The Nutch job jar is not present in the binary archive. This means distributed running of jobs is not supported. I'm not sure if this is a problem (since users can always build one themselves), merely pointing it out. The recently released 1.5 also lacks this job jar, so at least no difference there.
>> >
>> > The binary distrib corresponds to runtime/local and as such should NOT have the job file there. This is now the norm since 1.5
>> >
>> > Will try and do some testing of the RC
>> >
>> > Thanks
>> >
>> > Julien
>> >
>> >
>> >
>> > --
>> >
>> > Open Source Solutions for Text Engineering
>> >
>> > http://digitalpebble.blogspot.com/
>> > http://www.digitalpebble.com
>> > http://twitter.com/digitalpebble
>> >
>> >
>> >
>> >
>> > --
>> > Lewis
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Lewis
>> >
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> 
>> 
>> 
>> -- 
>> 
>> Open Source Solutions for Text Engineering
>> 
>> http://digitalpebble.blogspot.com/
>> http://www.digitalpebble.com
>> http://twitter.com/digitalpebble
>> 
>> 
>> 
>> 
>> -- 
>> Lewis 
>> 
> 
> 
> 
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 
> 
> 
> 
> -- 
> Lewis 
> 
> 
> 
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 
> 
> 
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: VOTE Apache Nutch 2.0 RC1

Posted by Julien Nioche <li...@gmail.com>.
see https://issues.apache.org/jira/browse/NUTCH-1396

On 15 June 2012 10:43, Julien Nioche <li...@gmail.com> wrote:

> Before you do, could you check that NutchGora passes ant test
> successfully. I just tried and got an error related to the parse-tika
> tests. Am about to open a JIRA to update to the latest version of Tika for
> NutchGora which should fix the problem and put it at the same level as trunk
>
> J
>
> On 15 June 2012 10:01, Lewis John Mcgibbney <le...@gmail.com>wrote:ly
>
> I'll push this in an hour or so guys.
>>
>> Thanks for the input.
>>
>> Lewis
>>
>>
>> On Fri, Jun 15, 2012 at 9:39 AM, Julien Nioche <
>> lists.digitalpebble@gmail.com> wrote:
>>
>>> +1
>>>
>>>
>>> On 15 June 2012 09:00, Ferdy Galema <fe...@kalooga.com> wrote:
>>>
>>>> Agree with only releasing src.
>>>>
>>>>
>>>> On Thu, Jun 14, 2012 at 11:32 PM, Mattmann, Chris A (388J) <
>>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>>
>>>>>  Or just not ship a bin release at all. Src is the only thing we
>>>>> really VOTE on legally though bin is provided for convenience purposes.
>>>>> Will type more on this later...
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On Jun 14, 2012, at 2:18 PM, "Lewis John Mcgibbney" <
>>>>> lewis.mcgibbney@gmail.com> wrote:
>>>>>
>>>>>   Hi Julien,
>>>>>
>>>>> Do you suggest with the binary release that we simply open up all
>>>>> gora-* deps and ship it with every jar available?
>>>>>
>>>>> Lewis
>>>>>
>>>>> On Thu, Jun 14, 2012 at 9:39 PM, Julien Nioche <
>>>>> lists.digitalpebble@gmail.com> wrote:
>>>>>
>>>>>> I disagree. You'd expect a binary release to work out of the box -
>>>>>> which is not the case. Plus we'd have to spend more time explaining the
>>>>>> workaround, answering the same questions over and over on the ML etc...
>>>>>> Fixing this should not be a big deal (i.e. add the gore-x modules for the
>>>>>> backends to the ivy deps file).
>>>>>>
>>>>>> Julien
>>>>>>
>>>>>>
>>>>>> On 14 June 2012 20:27, Mattmann, Chris A (388J) <
>>>>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>>>>
>>>>>>> Hey Guys,
>>>>>>>
>>>>>>> I think the annoyance is probably something folks can live with as
>>>>>>> they have been
>>>>>>> waiting for an "official" release of 2.x for years :)
>>>>>>>
>>>>>>> My +1 to roll RC #2 with or without a solution to this and mark it
>>>>>>> as a TODO. "release
>>>>>>> eary", "release often" :)
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Chris
>>>>>>>
>>>>>>> On Jun 14, 2012, at 10:04 AM, Lewis John Mcgibbney wrote:
>>>>>>>
>>>>>>> > Aye this is no good at all. Depending on which backend you wish to
>>>>>>> use with Gora, you will need to go and manually fetch the correct .jar's
>>>>>>> from maven central.
>>>>>>> >
>>>>>>> > Does anyone else have either solution or a workaround before I
>>>>>>> push RC2 with just src dists?
>>>>>>> >
>>>>>>> > Thanks
>>>>>>> >
>>>>>>> > Lewis
>>>>>>> >
>>>>>>> > On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel <
>>>>>>> wastl.nagel@googlemail.com> wrote:
>>>>>>> > > We only supply src distributions...
>>>>>>> > > Does this principle apply to Nutch 2 as well?
>>>>>>> > Maybe, yes.
>>>>>>> > The situation with the current binary package is uncomfortable:
>>>>>>> > I had to copy/link gora-hbase and hbase jars into lib/ to get
>>>>>>> nutch running.
>>>>>>> >
>>>>>>> > 2012/6/13 Lewis John Mcgibbney <le...@gmail.com>
>>>>>>> > Hi Guys,
>>>>>>> >
>>>>>>> > Whilst updating the Nutch2Tutorial I got thinking that within Gora
>>>>>>> we don't supply binary distributions of the code, this is because when
>>>>>>> using Gora a user may wish/require to recompile the code to accomodate
>>>>>>> config changes etc. We only supply src distributions...
>>>>>>> >
>>>>>>> > Does this principle apply to Nutch 2 as well? I mean, what if your
>>>>>>> using the gora-sql dependency, then you wish to switch to HBase and
>>>>>>> recompile, is this possible within the binary distribution?
>>>>>>> >
>>>>>>> > Best
>>>>>>> >
>>>>>>> > Lewis
>>>>>>> >
>>>>>>> >
>>>>>>> > On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche <
>>>>>>> lists.digitalpebble@gmail.com> wrote:
>>>>>>> > Ferdy
>>>>>>> >
>>>>>>> > The Nutch job jar is not present in the binary archive. This means
>>>>>>> distributed running of jobs is not supported. I'm not sure if this is a
>>>>>>> problem (since users can always build one themselves), merely pointing it
>>>>>>> out. The recently released 1.5 also lacks this job jar, so at least no
>>>>>>> difference there.
>>>>>>> >
>>>>>>> > The binary distrib corresponds to runtime/local and as such should
>>>>>>> NOT have the job file there. This is now the norm since 1.5
>>>>>>> >
>>>>>>> > Will try and do some testing of the RC
>>>>>>> >
>>>>>>> > Thanks
>>>>>>> >
>>>>>>> > Julien
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> >
>>>>>>>  > Open Source Solutions for Text Engineering
>>>>>>> >
>>>>>>> > http://digitalpebble.blogspot.com/
>>>>>>> > http://www.digitalpebble.com
>>>>>>> > http://twitter.com/digitalpebble
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > Lewis
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > Lewis
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Chris Mattmann, Ph.D.
>>>>>>> Senior Computer Scientist
>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>> Email: chris.a.mattmann@nasa.gov
>>>>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> *
>>>>>> *
>>>>>> Open Source Solutions for Text Engineering
>>>>>>
>>>>>>   http://digitalpebble.blogspot.com/
>>>>>> http://www.digitalpebble.com
>>>>>> http://twitter.com/digitalpebble
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Lewis*
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *
>>> *Open Source Solutions for Text Engineering
>>>
>>> http://digitalpebble.blogspot.com/
>>> http://www.digitalpebble.com
>>> http://twitter.com/digitalpebble
>>>
>>>
>>
>>
>> --
>> *Lewis*
>>
>>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: VOTE Apache Nutch 2.0 RC1

Posted by Julien Nioche <li...@gmail.com>.
Before you do, could you check that NutchGora passes ant test successfully.
I just tried and got an error related to the parse-tika tests. Am about to
open a JIRA to update to the latest version of Tika for NutchGora which
should fix the problem and put it at the same level as trunk

J

On 15 June 2012 10:01, Lewis John Mcgibbney <le...@gmail.com>wrote:ly

> I'll push this in an hour or so guys.
>
> Thanks for the input.
>
> Lewis
>
>
> On Fri, Jun 15, 2012 at 9:39 AM, Julien Nioche <
> lists.digitalpebble@gmail.com> wrote:
>
>> +1
>>
>>
>> On 15 June 2012 09:00, Ferdy Galema <fe...@kalooga.com> wrote:
>>
>>> Agree with only releasing src.
>>>
>>>
>>> On Thu, Jun 14, 2012 at 11:32 PM, Mattmann, Chris A (388J) <
>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>
>>>>  Or just not ship a bin release at all. Src is the only thing we
>>>> really VOTE on legally though bin is provided for convenience purposes.
>>>> Will type more on this later...
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Jun 14, 2012, at 2:18 PM, "Lewis John Mcgibbney" <
>>>> lewis.mcgibbney@gmail.com> wrote:
>>>>
>>>>   Hi Julien,
>>>>
>>>> Do you suggest with the binary release that we simply open up all
>>>> gora-* deps and ship it with every jar available?
>>>>
>>>> Lewis
>>>>
>>>> On Thu, Jun 14, 2012 at 9:39 PM, Julien Nioche <
>>>> lists.digitalpebble@gmail.com> wrote:
>>>>
>>>>> I disagree. You'd expect a binary release to work out of the box -
>>>>> which is not the case. Plus we'd have to spend more time explaining the
>>>>> workaround, answering the same questions over and over on the ML etc...
>>>>> Fixing this should not be a big deal (i.e. add the gore-x modules for the
>>>>> backends to the ivy deps file).
>>>>>
>>>>> Julien
>>>>>
>>>>>
>>>>> On 14 June 2012 20:27, Mattmann, Chris A (388J) <
>>>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>>>
>>>>>> Hey Guys,
>>>>>>
>>>>>> I think the annoyance is probably something folks can live with as
>>>>>> they have been
>>>>>> waiting for an "official" release of 2.x for years :)
>>>>>>
>>>>>> My +1 to roll RC #2 with or without a solution to this and mark it as
>>>>>> a TODO. "release
>>>>>> eary", "release often" :)
>>>>>>
>>>>>> Cheers,
>>>>>> Chris
>>>>>>
>>>>>> On Jun 14, 2012, at 10:04 AM, Lewis John Mcgibbney wrote:
>>>>>>
>>>>>> > Aye this is no good at all. Depending on which backend you wish to
>>>>>> use with Gora, you will need to go and manually fetch the correct .jar's
>>>>>> from maven central.
>>>>>> >
>>>>>> > Does anyone else have either solution or a workaround before I push
>>>>>> RC2 with just src dists?
>>>>>> >
>>>>>> > Thanks
>>>>>> >
>>>>>> > Lewis
>>>>>> >
>>>>>> > On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel <
>>>>>> wastl.nagel@googlemail.com> wrote:
>>>>>> > > We only supply src distributions...
>>>>>> > > Does this principle apply to Nutch 2 as well?
>>>>>> > Maybe, yes.
>>>>>> > The situation with the current binary package is uncomfortable:
>>>>>> > I had to copy/link gora-hbase and hbase jars into lib/ to get nutch
>>>>>> running.
>>>>>> >
>>>>>> > 2012/6/13 Lewis John Mcgibbney <le...@gmail.com>
>>>>>> > Hi Guys,
>>>>>> >
>>>>>> > Whilst updating the Nutch2Tutorial I got thinking that within Gora
>>>>>> we don't supply binary distributions of the code, this is because when
>>>>>> using Gora a user may wish/require to recompile the code to accomodate
>>>>>> config changes etc. We only supply src distributions...
>>>>>> >
>>>>>> > Does this principle apply to Nutch 2 as well? I mean, what if your
>>>>>> using the gora-sql dependency, then you wish to switch to HBase and
>>>>>> recompile, is this possible within the binary distribution?
>>>>>> >
>>>>>> > Best
>>>>>> >
>>>>>> > Lewis
>>>>>> >
>>>>>> >
>>>>>> > On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche <
>>>>>> lists.digitalpebble@gmail.com> wrote:
>>>>>> > Ferdy
>>>>>> >
>>>>>> > The Nutch job jar is not present in the binary archive. This means
>>>>>> distributed running of jobs is not supported. I'm not sure if this is a
>>>>>> problem (since users can always build one themselves), merely pointing it
>>>>>> out. The recently released 1.5 also lacks this job jar, so at least no
>>>>>> difference there.
>>>>>> >
>>>>>> > The binary distrib corresponds to runtime/local and as such should
>>>>>> NOT have the job file there. This is now the norm since 1.5
>>>>>> >
>>>>>> > Will try and do some testing of the RC
>>>>>> >
>>>>>> > Thanks
>>>>>> >
>>>>>> > Julien
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> >
>>>>>>  > Open Source Solutions for Text Engineering
>>>>>> >
>>>>>> > http://digitalpebble.blogspot.com/
>>>>>> > http://www.digitalpebble.com
>>>>>> > http://twitter.com/digitalpebble
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Lewis
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Lewis
>>>>>> >
>>>>>>
>>>>>>
>>>>>>  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Chris Mattmann, Ph.D.
>>>>>> Senior Computer Scientist
>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>> Email: chris.a.mattmann@nasa.gov
>>>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> *
>>>>> *
>>>>> Open Source Solutions for Text Engineering
>>>>>
>>>>>   http://digitalpebble.blogspot.com/
>>>>> http://www.digitalpebble.com
>>>>> http://twitter.com/digitalpebble
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Lewis*
>>>>
>>>>
>>>
>>
>>
>> --
>> *
>> *Open Source Solutions for Text Engineering
>>
>> http://digitalpebble.blogspot.com/
>> http://www.digitalpebble.com
>> http://twitter.com/digitalpebble
>>
>>
>
>
> --
> *Lewis*
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: VOTE Apache Nutch 2.0 RC1

Posted by Lewis John Mcgibbney <le...@gmail.com>.
I'll push this in an hour or so guys.

Thanks for the input.

Lewis

On Fri, Jun 15, 2012 at 9:39 AM, Julien Nioche <
lists.digitalpebble@gmail.com> wrote:

> +1
>
>
> On 15 June 2012 09:00, Ferdy Galema <fe...@kalooga.com> wrote:
>
>> Agree with only releasing src.
>>
>>
>> On Thu, Jun 14, 2012 at 11:32 PM, Mattmann, Chris A (388J) <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>
>>>  Or just not ship a bin release at all. Src is the only thing we really
>>> VOTE on legally though bin is provided for convenience purposes. Will type
>>> more on this later...
>>>
>>> Sent from my iPhone
>>>
>>> On Jun 14, 2012, at 2:18 PM, "Lewis John Mcgibbney" <
>>> lewis.mcgibbney@gmail.com> wrote:
>>>
>>>   Hi Julien,
>>>
>>> Do you suggest with the binary release that we simply open up all gora-*
>>> deps and ship it with every jar available?
>>>
>>> Lewis
>>>
>>> On Thu, Jun 14, 2012 at 9:39 PM, Julien Nioche <
>>> lists.digitalpebble@gmail.com> wrote:
>>>
>>>> I disagree. You'd expect a binary release to work out of the box -
>>>> which is not the case. Plus we'd have to spend more time explaining the
>>>> workaround, answering the same questions over and over on the ML etc...
>>>> Fixing this should not be a big deal (i.e. add the gore-x modules for the
>>>> backends to the ivy deps file).
>>>>
>>>> Julien
>>>>
>>>>
>>>> On 14 June 2012 20:27, Mattmann, Chris A (388J) <
>>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>>
>>>>> Hey Guys,
>>>>>
>>>>> I think the annoyance is probably something folks can live with as
>>>>> they have been
>>>>> waiting for an "official" release of 2.x for years :)
>>>>>
>>>>> My +1 to roll RC #2 with or without a solution to this and mark it as
>>>>> a TODO. "release
>>>>> eary", "release often" :)
>>>>>
>>>>> Cheers,
>>>>> Chris
>>>>>
>>>>> On Jun 14, 2012, at 10:04 AM, Lewis John Mcgibbney wrote:
>>>>>
>>>>> > Aye this is no good at all. Depending on which backend you wish to
>>>>> use with Gora, you will need to go and manually fetch the correct .jar's
>>>>> from maven central.
>>>>> >
>>>>> > Does anyone else have either solution or a workaround before I push
>>>>> RC2 with just src dists?
>>>>> >
>>>>> > Thanks
>>>>> >
>>>>> > Lewis
>>>>> >
>>>>> > On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel <
>>>>> wastl.nagel@googlemail.com> wrote:
>>>>> > > We only supply src distributions...
>>>>> > > Does this principle apply to Nutch 2 as well?
>>>>> > Maybe, yes.
>>>>> > The situation with the current binary package is uncomfortable:
>>>>> > I had to copy/link gora-hbase and hbase jars into lib/ to get nutch
>>>>> running.
>>>>> >
>>>>> > 2012/6/13 Lewis John Mcgibbney <le...@gmail.com>
>>>>> > Hi Guys,
>>>>> >
>>>>> > Whilst updating the Nutch2Tutorial I got thinking that within Gora
>>>>> we don't supply binary distributions of the code, this is because when
>>>>> using Gora a user may wish/require to recompile the code to accomodate
>>>>> config changes etc. We only supply src distributions...
>>>>> >
>>>>> > Does this principle apply to Nutch 2 as well? I mean, what if your
>>>>> using the gora-sql dependency, then you wish to switch to HBase and
>>>>> recompile, is this possible within the binary distribution?
>>>>> >
>>>>> > Best
>>>>> >
>>>>> > Lewis
>>>>> >
>>>>> >
>>>>> > On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche <
>>>>> lists.digitalpebble@gmail.com> wrote:
>>>>> > Ferdy
>>>>> >
>>>>> > The Nutch job jar is not present in the binary archive. This means
>>>>> distributed running of jobs is not supported. I'm not sure if this is a
>>>>> problem (since users can always build one themselves), merely pointing it
>>>>> out. The recently released 1.5 also lacks this job jar, so at least no
>>>>> difference there.
>>>>> >
>>>>> > The binary distrib corresponds to runtime/local and as such should
>>>>> NOT have the job file there. This is now the norm since 1.5
>>>>> >
>>>>> > Will try and do some testing of the RC
>>>>> >
>>>>> > Thanks
>>>>> >
>>>>> > Julien
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> >
>>>>>  > Open Source Solutions for Text Engineering
>>>>> >
>>>>> > http://digitalpebble.blogspot.com/
>>>>> > http://www.digitalpebble.com
>>>>> > http://twitter.com/digitalpebble
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Lewis
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Lewis
>>>>> >
>>>>>
>>>>>
>>>>>  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Chris Mattmann, Ph.D.
>>>>> Senior Computer Scientist
>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>> Office: 171-266B, Mailstop: 171-246
>>>>> Email: chris.a.mattmann@nasa.gov
>>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>>
>>>>
>>>>
>>>>  --
>>>> *
>>>> *
>>>> Open Source Solutions for Text Engineering
>>>>
>>>>   http://digitalpebble.blogspot.com/
>>>> http://www.digitalpebble.com
>>>> http://twitter.com/digitalpebble
>>>>
>>>>
>>>
>>>
>>> --
>>> *Lewis*
>>>
>>>
>>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>
>


-- 
*Lewis*

Re: VOTE Apache Nutch 2.0 RC1

Posted by Julien Nioche <li...@gmail.com>.
+1

On 15 June 2012 09:00, Ferdy Galema <fe...@kalooga.com> wrote:

> Agree with only releasing src.
>
>
> On Thu, Jun 14, 2012 at 11:32 PM, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>>  Or just not ship a bin release at all. Src is the only thing we really
>> VOTE on legally though bin is provided for convenience purposes. Will type
>> more on this later...
>>
>> Sent from my iPhone
>>
>> On Jun 14, 2012, at 2:18 PM, "Lewis John Mcgibbney" <
>> lewis.mcgibbney@gmail.com> wrote:
>>
>>   Hi Julien,
>>
>> Do you suggest with the binary release that we simply open up all gora-*
>> deps and ship it with every jar available?
>>
>> Lewis
>>
>> On Thu, Jun 14, 2012 at 9:39 PM, Julien Nioche <
>> lists.digitalpebble@gmail.com> wrote:
>>
>>> I disagree. You'd expect a binary release to work out of the box - which
>>> is not the case. Plus we'd have to spend more time explaining the
>>> workaround, answering the same questions over and over on the ML etc...
>>> Fixing this should not be a big deal (i.e. add the gore-x modules for the
>>> backends to the ivy deps file).
>>>
>>> Julien
>>>
>>>
>>> On 14 June 2012 20:27, Mattmann, Chris A (388J) <
>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>
>>>> Hey Guys,
>>>>
>>>> I think the annoyance is probably something folks can live with as they
>>>> have been
>>>> waiting for an "official" release of 2.x for years :)
>>>>
>>>> My +1 to roll RC #2 with or without a solution to this and mark it as a
>>>> TODO. "release
>>>> eary", "release often" :)
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>> On Jun 14, 2012, at 10:04 AM, Lewis John Mcgibbney wrote:
>>>>
>>>> > Aye this is no good at all. Depending on which backend you wish to
>>>> use with Gora, you will need to go and manually fetch the correct .jar's
>>>> from maven central.
>>>> >
>>>> > Does anyone else have either solution or a workaround before I push
>>>> RC2 with just src dists?
>>>> >
>>>> > Thanks
>>>> >
>>>> > Lewis
>>>> >
>>>> > On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel <
>>>> wastl.nagel@googlemail.com> wrote:
>>>> > > We only supply src distributions...
>>>> > > Does this principle apply to Nutch 2 as well?
>>>> > Maybe, yes.
>>>> > The situation with the current binary package is uncomfortable:
>>>> > I had to copy/link gora-hbase and hbase jars into lib/ to get nutch
>>>> running.
>>>> >
>>>> > 2012/6/13 Lewis John Mcgibbney <le...@gmail.com>
>>>> > Hi Guys,
>>>> >
>>>> > Whilst updating the Nutch2Tutorial I got thinking that within Gora we
>>>> don't supply binary distributions of the code, this is because when using
>>>> Gora a user may wish/require to recompile the code to accomodate config
>>>> changes etc. We only supply src distributions...
>>>> >
>>>> > Does this principle apply to Nutch 2 as well? I mean, what if your
>>>> using the gora-sql dependency, then you wish to switch to HBase and
>>>> recompile, is this possible within the binary distribution?
>>>> >
>>>> > Best
>>>> >
>>>> > Lewis
>>>> >
>>>> >
>>>> > On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche <
>>>> lists.digitalpebble@gmail.com> wrote:
>>>> > Ferdy
>>>> >
>>>> > The Nutch job jar is not present in the binary archive. This means
>>>> distributed running of jobs is not supported. I'm not sure if this is a
>>>> problem (since users can always build one themselves), merely pointing it
>>>> out. The recently released 1.5 also lacks this job jar, so at least no
>>>> difference there.
>>>> >
>>>> > The binary distrib corresponds to runtime/local and as such should
>>>> NOT have the job file there. This is now the norm since 1.5
>>>> >
>>>> > Will try and do some testing of the RC
>>>> >
>>>> > Thanks
>>>> >
>>>> > Julien
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> >
>>>>  > Open Source Solutions for Text Engineering
>>>> >
>>>> > http://digitalpebble.blogspot.com/
>>>> > http://www.digitalpebble.com
>>>> > http://twitter.com/digitalpebble
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Lewis
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Lewis
>>>> >
>>>>
>>>>
>>>>  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: chris.a.mattmann@nasa.gov
>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>
>>>
>>>
>>>  --
>>> *
>>> *
>>> Open Source Solutions for Text Engineering
>>>
>>>   http://digitalpebble.blogspot.com/
>>> http://www.digitalpebble.com
>>> http://twitter.com/digitalpebble
>>>
>>>
>>
>>
>> --
>> *Lewis*
>>
>>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: VOTE Apache Nutch 2.0 RC1

Posted by Ferdy Galema <fe...@kalooga.com>.
Agree with only releasing src.

On Thu, Jun 14, 2012 at 11:32 PM, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

>  Or just not ship a bin release at all. Src is the only thing we really
> VOTE on legally though bin is provided for convenience purposes. Will type
> more on this later...
>
> Sent from my iPhone
>
> On Jun 14, 2012, at 2:18 PM, "Lewis John Mcgibbney" <
> lewis.mcgibbney@gmail.com> wrote:
>
>   Hi Julien,
>
> Do you suggest with the binary release that we simply open up all gora-*
> deps and ship it with every jar available?
>
> Lewis
>
> On Thu, Jun 14, 2012 at 9:39 PM, Julien Nioche <
> lists.digitalpebble@gmail.com> wrote:
>
>> I disagree. You'd expect a binary release to work out of the box - which
>> is not the case. Plus we'd have to spend more time explaining the
>> workaround, answering the same questions over and over on the ML etc...
>> Fixing this should not be a big deal (i.e. add the gore-x modules for the
>> backends to the ivy deps file).
>>
>> Julien
>>
>>
>> On 14 June 2012 20:27, Mattmann, Chris A (388J) <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>
>>> Hey Guys,
>>>
>>> I think the annoyance is probably something folks can live with as they
>>> have been
>>> waiting for an "official" release of 2.x for years :)
>>>
>>> My +1 to roll RC #2 with or without a solution to this and mark it as a
>>> TODO. "release
>>> eary", "release often" :)
>>>
>>> Cheers,
>>> Chris
>>>
>>> On Jun 14, 2012, at 10:04 AM, Lewis John Mcgibbney wrote:
>>>
>>> > Aye this is no good at all. Depending on which backend you wish to use
>>> with Gora, you will need to go and manually fetch the correct .jar's from
>>> maven central.
>>> >
>>> > Does anyone else have either solution or a workaround before I push
>>> RC2 with just src dists?
>>> >
>>> > Thanks
>>> >
>>> > Lewis
>>> >
>>> > On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel <
>>> wastl.nagel@googlemail.com> wrote:
>>> > > We only supply src distributions...
>>> > > Does this principle apply to Nutch 2 as well?
>>> > Maybe, yes.
>>> > The situation with the current binary package is uncomfortable:
>>> > I had to copy/link gora-hbase and hbase jars into lib/ to get nutch
>>> running.
>>> >
>>> > 2012/6/13 Lewis John Mcgibbney <le...@gmail.com>
>>> > Hi Guys,
>>> >
>>> > Whilst updating the Nutch2Tutorial I got thinking that within Gora we
>>> don't supply binary distributions of the code, this is because when using
>>> Gora a user may wish/require to recompile the code to accomodate config
>>> changes etc. We only supply src distributions...
>>> >
>>> > Does this principle apply to Nutch 2 as well? I mean, what if your
>>> using the gora-sql dependency, then you wish to switch to HBase and
>>> recompile, is this possible within the binary distribution?
>>> >
>>> > Best
>>> >
>>> > Lewis
>>> >
>>> >
>>> > On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche <
>>> lists.digitalpebble@gmail.com> wrote:
>>> > Ferdy
>>> >
>>> > The Nutch job jar is not present in the binary archive. This means
>>> distributed running of jobs is not supported. I'm not sure if this is a
>>> problem (since users can always build one themselves), merely pointing it
>>> out. The recently released 1.5 also lacks this job jar, so at least no
>>> difference there.
>>> >
>>> > The binary distrib corresponds to runtime/local and as such should NOT
>>> have the job file there. This is now the norm since 1.5
>>> >
>>> > Will try and do some testing of the RC
>>> >
>>> > Thanks
>>> >
>>> > Julien
>>> >
>>> >
>>> >
>>> > --
>>> >
>>>  > Open Source Solutions for Text Engineering
>>> >
>>> > http://digitalpebble.blogspot.com/
>>> > http://www.digitalpebble.com
>>> > http://twitter.com/digitalpebble
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Lewis
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Lewis
>>> >
>>>
>>>
>>>  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>
>>
>>  --
>> *
>> *
>> Open Source Solutions for Text Engineering
>>
>>   http://digitalpebble.blogspot.com/
>> http://www.digitalpebble.com
>> http://twitter.com/digitalpebble
>>
>>
>
>
> --
> *Lewis*
>
>

Re: VOTE Apache Nutch 2.0 RC1

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Or just not ship a bin release at all. Src is the only thing we really VOTE on legally though bin is provided for convenience purposes. Will type more on this later...

Sent from my iPhone

On Jun 14, 2012, at 2:18 PM, "Lewis John Mcgibbney" <le...@gmail.com>> wrote:

Hi Julien,

Do you suggest with the binary release that we simply open up all gora-* deps and ship it with every jar available?

Lewis

On Thu, Jun 14, 2012 at 9:39 PM, Julien Nioche <li...@gmail.com>> wrote:
I disagree. You'd expect a binary release to work out of the box - which is not the case. Plus we'd have to spend more time explaining the workaround, answering the same questions over and over on the ML etc... Fixing this should not be a big deal (i.e. add the gore-x modules for the backends to the ivy deps file).

Julien


On 14 June 2012 20:27, Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>> wrote:
Hey Guys,

I think the annoyance is probably something folks can live with as they have been
waiting for an "official" release of 2.x for years :)

My +1 to roll RC #2 with or without a solution to this and mark it as a TODO. "release
eary", "release often" :)

Cheers,
Chris

On Jun 14, 2012, at 10:04 AM, Lewis John Mcgibbney wrote:

> Aye this is no good at all. Depending on which backend you wish to use with Gora, you will need to go and manually fetch the correct .jar's from maven central.
>
> Does anyone else have either solution or a workaround before I push RC2 with just src dists?
>
> Thanks
>
> Lewis
>
> On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel <wa...@googlemail.com>> wrote:
> > We only supply src distributions...
> > Does this principle apply to Nutch 2 as well?
> Maybe, yes.
> The situation with the current binary package is uncomfortable:
> I had to copy/link gora-hbase and hbase jars into lib/ to get nutch running.
>
> 2012/6/13 Lewis John Mcgibbney <le...@gmail.com>>
> Hi Guys,
>
> Whilst updating the Nutch2Tutorial I got thinking that within Gora we don't supply binary distributions of the code, this is because when using Gora a user may wish/require to recompile the code to accomodate config changes etc. We only supply src distributions...
>
> Does this principle apply to Nutch 2 as well? I mean, what if your using the gora-sql dependency, then you wish to switch to HBase and recompile, is this possible within the binary distribution?
>
> Best
>
> Lewis
>
>
> On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche <li...@gmail.com>> wrote:
> Ferdy
>
> The Nutch job jar is not present in the binary archive. This means distributed running of jobs is not supported. I'm not sure if this is a problem (since users can always build one themselves), merely pointing it out. The recently released 1.5 also lacks this job jar, so at least no difference there.
>
> The binary distrib corresponds to runtime/local and as such should NOT have the job file there. This is now the norm since 1.5
>
> Will try and do some testing of the RC
>
> Thanks
>
> Julien
>
>
>
> --
>
> Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>
>
>
>
> --
> Lewis
>
>
>
>
>
> --
> Lewis
>


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov<ma...@nasa.gov>
WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++




--
[http://digitalpebble.com/img/logo.gif]
Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble




--
Lewis


Re: VOTE Apache Nutch 2.0 RC1

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Julien,

Do you suggest with the binary release that we simply open up all gora-*
deps and ship it with every jar available?

Lewis

On Thu, Jun 14, 2012 at 9:39 PM, Julien Nioche <
lists.digitalpebble@gmail.com> wrote:

> I disagree. You'd expect a binary release to work out of the box - which
> is not the case. Plus we'd have to spend more time explaining the
> workaround, answering the same questions over and over on the ML etc...
> Fixing this should not be a big deal (i.e. add the gore-x modules for the
> backends to the ivy deps file).
>
> Julien
>
>
> On 14 June 2012 20:27, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Hey Guys,
>>
>> I think the annoyance is probably something folks can live with as they
>> have been
>> waiting for an "official" release of 2.x for years :)
>>
>> My +1 to roll RC #2 with or without a solution to this and mark it as a
>> TODO. "release
>> eary", "release often" :)
>>
>> Cheers,
>> Chris
>>
>> On Jun 14, 2012, at 10:04 AM, Lewis John Mcgibbney wrote:
>>
>> > Aye this is no good at all. Depending on which backend you wish to use
>> with Gora, you will need to go and manually fetch the correct .jar's from
>> maven central.
>> >
>> > Does anyone else have either solution or a workaround before I push RC2
>> with just src dists?
>> >
>> > Thanks
>> >
>> > Lewis
>> >
>> > On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel <
>> wastl.nagel@googlemail.com> wrote:
>> > > We only supply src distributions...
>> > > Does this principle apply to Nutch 2 as well?
>> > Maybe, yes.
>> > The situation with the current binary package is uncomfortable:
>> > I had to copy/link gora-hbase and hbase jars into lib/ to get nutch
>> running.
>> >
>> > 2012/6/13 Lewis John Mcgibbney <le...@gmail.com>
>> > Hi Guys,
>> >
>> > Whilst updating the Nutch2Tutorial I got thinking that within Gora we
>> don't supply binary distributions of the code, this is because when using
>> Gora a user may wish/require to recompile the code to accomodate config
>> changes etc. We only supply src distributions...
>> >
>> > Does this principle apply to Nutch 2 as well? I mean, what if your
>> using the gora-sql dependency, then you wish to switch to HBase and
>> recompile, is this possible within the binary distribution?
>> >
>> > Best
>> >
>> > Lewis
>> >
>> >
>> > On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche <
>> lists.digitalpebble@gmail.com> wrote:
>> > Ferdy
>> >
>> > The Nutch job jar is not present in the binary archive. This means
>> distributed running of jobs is not supported. I'm not sure if this is a
>> problem (since users can always build one themselves), merely pointing it
>> out. The recently released 1.5 also lacks this job jar, so at least no
>> difference there.
>> >
>> > The binary distrib corresponds to runtime/local and as such should NOT
>> have the job file there. This is now the norm since 1.5
>> >
>> > Will try and do some testing of the RC
>> >
>> > Thanks
>> >
>> > Julien
>> >
>> >
>> >
>> > --
>> >
>> > Open Source Solutions for Text Engineering
>> >
>> > http://digitalpebble.blogspot.com/
>> > http://www.digitalpebble.com
>> > http://twitter.com/digitalpebble
>> >
>> >
>> >
>> >
>> > --
>> > Lewis
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Lewis
>> >
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>
>
> --
> *
> *
> Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>
>


-- 
*Lewis*

Re: VOTE Apache Nutch 2.0 RC1

Posted by Julien Nioche <li...@gmail.com>.
I disagree. You'd expect a binary release to work out of the box - which is
not the case. Plus we'd have to spend more time explaining the workaround,
answering the same questions over and over on the ML etc... Fixing this
should not be a big deal (i.e. add the gore-x modules for the backends to
the ivy deps file).

Julien

On 14 June 2012 20:27, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hey Guys,
>
> I think the annoyance is probably something folks can live with as they
> have been
> waiting for an "official" release of 2.x for years :)
>
> My +1 to roll RC #2 with or without a solution to this and mark it as a
> TODO. "release
> eary", "release often" :)
>
> Cheers,
> Chris
>
> On Jun 14, 2012, at 10:04 AM, Lewis John Mcgibbney wrote:
>
> > Aye this is no good at all. Depending on which backend you wish to use
> with Gora, you will need to go and manually fetch the correct .jar's from
> maven central.
> >
> > Does anyone else have either solution or a workaround before I push RC2
> with just src dists?
> >
> > Thanks
> >
> > Lewis
> >
> > On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel <
> wastl.nagel@googlemail.com> wrote:
> > > We only supply src distributions...
> > > Does this principle apply to Nutch 2 as well?
> > Maybe, yes.
> > The situation with the current binary package is uncomfortable:
> > I had to copy/link gora-hbase and hbase jars into lib/ to get nutch
> running.
> >
> > 2012/6/13 Lewis John Mcgibbney <le...@gmail.com>
> > Hi Guys,
> >
> > Whilst updating the Nutch2Tutorial I got thinking that within Gora we
> don't supply binary distributions of the code, this is because when using
> Gora a user may wish/require to recompile the code to accomodate config
> changes etc. We only supply src distributions...
> >
> > Does this principle apply to Nutch 2 as well? I mean, what if your using
> the gora-sql dependency, then you wish to switch to HBase and recompile, is
> this possible within the binary distribution?
> >
> > Best
> >
> > Lewis
> >
> >
> > On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche <
> lists.digitalpebble@gmail.com> wrote:
> > Ferdy
> >
> > The Nutch job jar is not present in the binary archive. This means
> distributed running of jobs is not supported. I'm not sure if this is a
> problem (since users can always build one themselves), merely pointing it
> out. The recently released 1.5 also lacks this job jar, so at least no
> difference there.
> >
> > The binary distrib corresponds to runtime/local and as such should NOT
> have the job file there. This is now the norm since 1.5
> >
> > Will try and do some testing of the RC
> >
> > Thanks
> >
> > Julien
> >
> >
> >
> > --
> >
> > Open Source Solutions for Text Engineering
> >
> > http://digitalpebble.blogspot.com/
> > http://www.digitalpebble.com
> > http://twitter.com/digitalpebble
> >
> >
> >
> >
> > --
> > Lewis
> >
> >
> >
> >
> >
> > --
> > Lewis
> >
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: VOTE Apache Nutch 2.0 RC1

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Guys,

I think the annoyance is probably something folks can live with as they have been
waiting for an "official" release of 2.x for years :)

My +1 to roll RC #2 with or without a solution to this and mark it as a TODO. "release
eary", "release often" :)

Cheers,
Chris

On Jun 14, 2012, at 10:04 AM, Lewis John Mcgibbney wrote:

> Aye this is no good at all. Depending on which backend you wish to use with Gora, you will need to go and manually fetch the correct .jar's from maven central.
> 
> Does anyone else have either solution or a workaround before I push RC2 with just src dists?
> 
> Thanks
> 
> Lewis
> 
> On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel <wa...@googlemail.com> wrote:
> > We only supply src distributions... 
> > Does this principle apply to Nutch 2 as well?
> Maybe, yes.
> The situation with the current binary package is uncomfortable:
> I had to copy/link gora-hbase and hbase jars into lib/ to get nutch running.
> 
> 2012/6/13 Lewis John Mcgibbney <le...@gmail.com>
> Hi Guys,
> 
> Whilst updating the Nutch2Tutorial I got thinking that within Gora we don't supply binary distributions of the code, this is because when using Gora a user may wish/require to recompile the code to accomodate config changes etc. We only supply src distributions... 
> 
> Does this principle apply to Nutch 2 as well? I mean, what if your using the gora-sql dependency, then you wish to switch to HBase and recompile, is this possible within the binary distribution?
> 
> Best
> 
> Lewis
> 
> 
> On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche <li...@gmail.com> wrote:
> Ferdy
> 
> The Nutch job jar is not present in the binary archive. This means distributed running of jobs is not supported. I'm not sure if this is a problem (since users can always build one themselves), merely pointing it out. The recently released 1.5 also lacks this job jar, so at least no difference there.
> 
> The binary distrib corresponds to runtime/local and as such should NOT have the job file there. This is now the norm since 1.5
> 
> Will try and do some testing of the RC
> 
> Thanks
> 
> Julien
> 
> 
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 
> 
> 
> 
> -- 
> Lewis 
> 
> 
> 
> 
> 
> -- 
> Lewis 
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: VOTE Apache Nutch 2.0 RC1

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Aye this is no good at all. Depending on which backend you wish to use with
Gora, you will need to go and manually fetch the correct .jar's from maven
central.

Does anyone else have either solution or a workaround before I push RC2
with just src dists?

Thanks

Lewis

On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel <wastl.nagel@googlemail.com
> wrote:

> > We only supply src distributions...
> > Does this principle apply to Nutch 2 as well?
> Maybe, yes.
> The situation with the current binary package is uncomfortable:
> I had to copy/link gora-hbase and hbase jars into lib/ to get nutch
> running.
>
> 2012/6/13 Lewis John Mcgibbney <le...@gmail.com>
>
>> Hi Guys,
>>
>> Whilst updating the Nutch2Tutorial I got thinking that within Gora we
>> don't supply binary distributions of the code, this is because when using
>> Gora a user may wish/require to recompile the code to accomodate config
>> changes etc. We only supply src distributions...
>>
>> Does this principle apply to Nutch 2 as well? I mean, what if your using
>> the gora-sql dependency, then you wish to switch to HBase and recompile, is
>> this possible within the binary distribution?
>>
>> Best
>>
>> Lewis
>>
>>
>> On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche <
>> lists.digitalpebble@gmail.com> wrote:
>>
>>> Ferdy
>>>
>>>>
>>>> The Nutch job jar is not present in the binary archive. This means
>>>> distributed running of jobs is not supported. I'm not sure if this is a
>>>> problem (since users can always build one themselves), merely pointing it
>>>> out. The recently released 1.5 also lacks this job jar, so at least no
>>>> difference there.
>>>>
>>>
>>> The binary distrib corresponds to runtime/local and as such should NOT
>>> have the job file there. This is now the norm since 1.5
>>>
>>> Will try and do some testing of the RC
>>>
>>> Thanks
>>>
>>> Julien
>>>
>>>
>>>
>>> --
>>> *
>>> *Open Source Solutions for Text Engineering
>>>
>>> http://digitalpebble.blogspot.com/
>>> http://www.digitalpebble.com
>>> http://twitter.com/digitalpebble
>>>
>>>
>>
>>
>> --
>> *Lewis*
>>
>>
>


-- 
*Lewis*

Re: VOTE Apache Nutch 2.0 RC1

Posted by Sebastian Nagel <wa...@googlemail.com>.
 > We only supply src distributions...
> Does this principle apply to Nutch 2 as well?
Maybe, yes.
The situation with the current binary package is uncomfortable:
I had to copy/link gora-hbase and hbase jars into lib/ to get nutch running.

2012/6/13 Lewis John Mcgibbney <le...@gmail.com>

> Hi Guys,
>
> Whilst updating the Nutch2Tutorial I got thinking that within Gora we
> don't supply binary distributions of the code, this is because when using
> Gora a user may wish/require to recompile the code to accomodate config
> changes etc. We only supply src distributions...
>
> Does this principle apply to Nutch 2 as well? I mean, what if your using
> the gora-sql dependency, then you wish to switch to HBase and recompile, is
> this possible within the binary distribution?
>
> Best
>
> Lewis
>
>
> On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche <
> lists.digitalpebble@gmail.com> wrote:
>
>> Ferdy
>>
>>>
>>> The Nutch job jar is not present in the binary archive. This means
>>> distributed running of jobs is not supported. I'm not sure if this is a
>>> problem (since users can always build one themselves), merely pointing it
>>> out. The recently released 1.5 also lacks this job jar, so at least no
>>> difference there.
>>>
>>
>> The binary distrib corresponds to runtime/local and as such should NOT
>> have the job file there. This is now the norm since 1.5
>>
>> Will try and do some testing of the RC
>>
>> Thanks
>>
>> Julien
>>
>>
>>
>> --
>> *
>> *Open Source Solutions for Text Engineering
>>
>> http://digitalpebble.blogspot.com/
>> http://www.digitalpebble.com
>> http://twitter.com/digitalpebble
>>
>>
>
>
> --
> *Lewis*
>
>

Re: VOTE Apache Nutch 2.0 RC1

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Guys,

Whilst updating the Nutch2Tutorial I got thinking that within Gora we don't
supply binary distributions of the code, this is because when using Gora a
user may wish/require to recompile the code to accomodate config changes
etc. We only supply src distributions...

Does this principle apply to Nutch 2 as well? I mean, what if your using
the gora-sql dependency, then you wish to switch to HBase and recompile, is
this possible within the binary distribution?

Best

Lewis

On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche <
lists.digitalpebble@gmail.com> wrote:

> Ferdy
>
>>
>> The Nutch job jar is not present in the binary archive. This means
>> distributed running of jobs is not supported. I'm not sure if this is a
>> problem (since users can always build one themselves), merely pointing it
>> out. The recently released 1.5 also lacks this job jar, so at least no
>> difference there.
>>
>
> The binary distrib corresponds to runtime/local and as such should NOT
> have the job file there. This is now the norm since 1.5
>
> Will try and do some testing of the RC
>
> Thanks
>
> Julien
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>
>


-- 
*Lewis*

Re: VOTE Apache Nutch 2.0 RC1

Posted by Julien Nioche <li...@gmail.com>.
Ferdy

>
> The Nutch job jar is not present in the binary archive. This means
> distributed running of jobs is not supported. I'm not sure if this is a
> problem (since users can always build one themselves), merely pointing it
> out. The recently released 1.5 also lacks this job jar, so at least no
> difference there.
>

The binary distrib corresponds to runtime/local and as such should NOT have
the job file there. This is now the norm since 1.5

Will try and do some testing of the RC

Thanks

Julien



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: VOTE Apache Nutch 2.0 RC1

Posted by Ferdy Galema <fe...@kalooga.com>.
Hmm please ignore "the parse text limited to 100 chars", this is actually
not the case. (Only in our branch that has a fix for limiting anchor texts;
not yet present in in the nutchgora branch because it still needs
polishing). So no need to wait for commits on my part.

On Wed, Jun 13, 2012 at 11:00 AM, Ferdy Galema <fe...@kalooga.com>wrote:

> Findings about Nutch-2.0 RC 1.
>
> The Nutch job jar is not present in the binary archive. This means
> distributed running of jobs is not supported. I'm not sure if this is a
> problem (since users can always build one themselves), merely pointing it
> out. The recently released 1.5 also lacks this job jar, so at least no
> difference there.
>
> Parse text is limited to 100 characters for html. We noticed this when our
> index wasn't showing enough terms for some documents. This is a pretty
> severe bug that I will commit a fix for right away.
>
> Building runtime with the default SqlStore and HBaseStore works fine. Will
> perform some more functionality tests when there is a new RC.
>
> Ferdy.
>
> On Wed, Jun 13, 2012 at 4:24 AM, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Hey Guys,
>>
>> #2 is probably reason enough for a respin.
>>
>> Lewis if you don't have time to do it before Thursday, I could probably
>> give it a whack. Let me know.
>>
>> Cheers,
>> Chris
>>
>> On Jun 12, 2012, at 3:33 PM, Sebastian Nagel wrote:
>>
>> > Hi Lewis,
>> >
>> > my first steps with 2.0 (to be continued, still struggling).
>> >
>> > Two points (I'll try to give a final vote tomorrow):
>> >
>> > 1 some guidance would be nice. README.txt points
>> > to http://wiki.apache.org/nutch/NutchTutorial which refers to 1.x
>> > (I'm using
>> http://sujitpal.blogspot.de/2012/01/exploring-nutch-gora-with-cassandra.html
>> )
>> >
>> > 2 the package contains your nutch-site.xml:
>> >    <name>http.agent.email</name>
>> >    <value>lewismc@apache.org</value>
>> > I guess that's not intended :)
>> >
>> > Cheers,
>> > Sebastian
>> >
>> > On 06/12/2012 10:16 PM, Lewis John Mcgibbney wrote:
>> >> Hi Everyone,
>> >>
>> >> I appreciate that most of the core dev's are using trunk, however I
>> >> would appeal to you guys to at least check out the artifacts and check
>> >> sigs, tests, license headers if possible. Although this does not fully
>> >> satisfy the requirements of a thoroughly reviewed RC, hopefully the
>> >> thorough stuff can be undertaken by those directly using the artifacts
>> >> and code in development/production.
>> >>
>> >> Thanks very much in advance
>> >>
>> >> Best
>> >>
>> >> Lewis
>> >>
>> >> On Fri, Jun 8, 2012 at 3:49 PM, lewis john mcgibbney <
>> lewismc@apache.org> wrote:
>> >>> Good Evening Everyone,
>> >>>
>> >>> A candidate for the Apache Nutch 2.0 RC1 is available at:
>> >>>
>> >>> http://people.apache.org/~lewismc/nutch-2.0
>> >>>
>> >>> The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
>> >>> archive of the sources in:
>> >>>
>> >>> http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc1
>> >>>
>> >>> Further, a staged Maven repository of the 2.0 jar, sources.jar and
>> >>> javadoc.jar is available here:
>> >>>
>> >>> https://repository.apache.org/content/repositories/orgapachenutch-215
>> >>>
>> >>> Please vote on releasing this package as Apache Nutch 2.0.
>> >>> The vote is open for the next 72 hours and passes if a majority of at
>> >>> least three +1 Nutch PMC votes are cast.
>> >>>
>> >>> [ ] +1 Release this package as Apache Nutch 2.0
>> >>> [ ] -1 Do not release this package because...
>> >>>
>> >>> Many Thanks and heres to plenty more.
>> >>>
>> >>> Have a great weekend, Kind Regards,
>> >>> Lewis
>> >>>
>> >>> P.S. Here's my +1.
>> >>
>> >>
>> >>
>> >
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>

Re: VOTE Apache Nutch 2.0 RC1

Posted by Ferdy Galema <fe...@kalooga.com>.
Findings about Nutch-2.0 RC 1.

The Nutch job jar is not present in the binary archive. This means
distributed running of jobs is not supported. I'm not sure if this is a
problem (since users can always build one themselves), merely pointing it
out. The recently released 1.5 also lacks this job jar, so at least no
difference there.

Parse text is limited to 100 characters for html. We noticed this when our
index wasn't showing enough terms for some documents. This is a pretty
severe bug that I will commit a fix for right away.

Building runtime with the default SqlStore and HBaseStore works fine. Will
perform some more functionality tests when there is a new RC.

Ferdy.

On Wed, Jun 13, 2012 at 4:24 AM, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hey Guys,
>
> #2 is probably reason enough for a respin.
>
> Lewis if you don't have time to do it before Thursday, I could probably
> give it a whack. Let me know.
>
> Cheers,
> Chris
>
> On Jun 12, 2012, at 3:33 PM, Sebastian Nagel wrote:
>
> > Hi Lewis,
> >
> > my first steps with 2.0 (to be continued, still struggling).
> >
> > Two points (I'll try to give a final vote tomorrow):
> >
> > 1 some guidance would be nice. README.txt points
> > to http://wiki.apache.org/nutch/NutchTutorial which refers to 1.x
> > (I'm using
> http://sujitpal.blogspot.de/2012/01/exploring-nutch-gora-with-cassandra.html
> )
> >
> > 2 the package contains your nutch-site.xml:
> >    <name>http.agent.email</name>
> >    <value>lewismc@apache.org</value>
> > I guess that's not intended :)
> >
> > Cheers,
> > Sebastian
> >
> > On 06/12/2012 10:16 PM, Lewis John Mcgibbney wrote:
> >> Hi Everyone,
> >>
> >> I appreciate that most of the core dev's are using trunk, however I
> >> would appeal to you guys to at least check out the artifacts and check
> >> sigs, tests, license headers if possible. Although this does not fully
> >> satisfy the requirements of a thoroughly reviewed RC, hopefully the
> >> thorough stuff can be undertaken by those directly using the artifacts
> >> and code in development/production.
> >>
> >> Thanks very much in advance
> >>
> >> Best
> >>
> >> Lewis
> >>
> >> On Fri, Jun 8, 2012 at 3:49 PM, lewis john mcgibbney <
> lewismc@apache.org> wrote:
> >>> Good Evening Everyone,
> >>>
> >>> A candidate for the Apache Nutch 2.0 RC1 is available at:
> >>>
> >>> http://people.apache.org/~lewismc/nutch-2.0
> >>>
> >>> The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
> >>> archive of the sources in:
> >>>
> >>> http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc1
> >>>
> >>> Further, a staged Maven repository of the 2.0 jar, sources.jar and
> >>> javadoc.jar is available here:
> >>>
> >>> https://repository.apache.org/content/repositories/orgapachenutch-215
> >>>
> >>> Please vote on releasing this package as Apache Nutch 2.0.
> >>> The vote is open for the next 72 hours and passes if a majority of at
> >>> least three +1 Nutch PMC votes are cast.
> >>>
> >>> [ ] +1 Release this package as Apache Nutch 2.0
> >>> [ ] -1 Do not release this package because...
> >>>
> >>> Many Thanks and heres to plenty more.
> >>>
> >>> Have a great weekend, Kind Regards,
> >>> Lewis
> >>>
> >>> P.S. Here's my +1.
> >>
> >>
> >>
> >
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Re: VOTE Apache Nutch 2.0 RC1

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Guys,

#2 is probably reason enough for a respin. 

Lewis if you don't have time to do it before Thursday, I could probably
give it a whack. Let me know.

Cheers,
Chris

On Jun 12, 2012, at 3:33 PM, Sebastian Nagel wrote:

> Hi Lewis,
> 
> my first steps with 2.0 (to be continued, still struggling).
> 
> Two points (I'll try to give a final vote tomorrow):
> 
> 1 some guidance would be nice. README.txt points
> to http://wiki.apache.org/nutch/NutchTutorial which refers to 1.x
> (I'm using http://sujitpal.blogspot.de/2012/01/exploring-nutch-gora-with-cassandra.html)
> 
> 2 the package contains your nutch-site.xml:
>    <name>http.agent.email</name>
>    <value>lewismc@apache.org</value>
> I guess that's not intended :)
> 
> Cheers,
> Sebastian
> 
> On 06/12/2012 10:16 PM, Lewis John Mcgibbney wrote:
>> Hi Everyone,
>> 
>> I appreciate that most of the core dev's are using trunk, however I
>> would appeal to you guys to at least check out the artifacts and check
>> sigs, tests, license headers if possible. Although this does not fully
>> satisfy the requirements of a thoroughly reviewed RC, hopefully the
>> thorough stuff can be undertaken by those directly using the artifacts
>> and code in development/production.
>> 
>> Thanks very much in advance
>> 
>> Best
>> 
>> Lewis
>> 
>> On Fri, Jun 8, 2012 at 3:49 PM, lewis john mcgibbney <le...@apache.org> wrote:
>>> Good Evening Everyone,
>>> 
>>> A candidate for the Apache Nutch 2.0 RC1 is available at:
>>> 
>>> http://people.apache.org/~lewismc/nutch-2.0
>>> 
>>> The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
>>> archive of the sources in:
>>> 
>>> http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc1
>>> 
>>> Further, a staged Maven repository of the 2.0 jar, sources.jar and
>>> javadoc.jar is available here:
>>> 
>>> https://repository.apache.org/content/repositories/orgapachenutch-215
>>> 
>>> Please vote on releasing this package as Apache Nutch 2.0.
>>> The vote is open for the next 72 hours and passes if a majority of at
>>> least three +1 Nutch PMC votes are cast.
>>> 
>>> [ ] +1 Release this package as Apache Nutch 2.0
>>> [ ] -1 Do not release this package because...
>>> 
>>> Many Thanks and heres to plenty more.
>>> 
>>> Have a great weekend, Kind Regards,
>>> Lewis
>>> 
>>> P.S. Here's my +1.
>> 
>> 
>> 
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: VOTE Apache Nutch 2.0 RC1

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Seb,

As Chris said, the issues you highlight well justify another RC.

I can shift it by the end of play today.

Thanks very much for having a look through guys

Lewis

On Tue, Jun 12, 2012 at 11:33 PM, Sebastian Nagel
<wa...@googlemail.com> wrote:
> Hi Lewis,
>
> my first steps with 2.0 (to be continued, still struggling).
>
> Two points (I'll try to give a final vote tomorrow):
>
> 1 some guidance would be nice. README.txt points
> to http://wiki.apache.org/nutch/NutchTutorial which refers to 1.x
> (I'm using http://sujitpal.blogspot.de/2012/01/exploring-nutch-gora-with-cassandra.html)
>
> 2 the package contains your nutch-site.xml:
>    <name>http.agent.email</name>
>    <value>lewismc@apache.org</value>
> I guess that's not intended :)
>
> Cheers,
> Sebastian
>
> On 06/12/2012 10:16 PM, Lewis John Mcgibbney wrote:
>> Hi Everyone,
>>
>> I appreciate that most of the core dev's are using trunk, however I
>> would appeal to you guys to at least check out the artifacts and check
>> sigs, tests, license headers if possible. Although this does not fully
>> satisfy the requirements of a thoroughly reviewed RC, hopefully the
>> thorough stuff can be undertaken by those directly using the artifacts
>> and code in development/production.
>>
>> Thanks very much in advance
>>
>> Best
>>
>> Lewis
>>
>> On Fri, Jun 8, 2012 at 3:49 PM, lewis john mcgibbney <le...@apache.org> wrote:
>>> Good Evening Everyone,
>>>
>>> A candidate for the Apache Nutch 2.0 RC1 is available at:
>>>
>>> http://people.apache.org/~lewismc/nutch-2.0
>>>
>>> The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
>>> archive of the sources in:
>>>
>>> http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc1
>>>
>>> Further, a staged Maven repository of the 2.0 jar, sources.jar and
>>> javadoc.jar is available here:
>>>
>>> https://repository.apache.org/content/repositories/orgapachenutch-215
>>>
>>> Please vote on releasing this package as Apache Nutch 2.0.
>>> The vote is open for the next 72 hours and passes if a majority of at
>>> least three +1 Nutch PMC votes are cast.
>>>
>>>  [ ] +1 Release this package as Apache Nutch 2.0
>>>  [ ] -1 Do not release this package because...
>>>
>>> Many Thanks and heres to plenty more.
>>>
>>> Have a great weekend, Kind Regards,
>>> Lewis
>>>
>>> P.S. Here's my +1.
>>
>>
>>
>



-- 
Lewis

Re: VOTE Apache Nutch 2.0 RC1

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi Lewis,

my first steps with 2.0 (to be continued, still struggling).

Two points (I'll try to give a final vote tomorrow):

1 some guidance would be nice. README.txt points
to http://wiki.apache.org/nutch/NutchTutorial which refers to 1.x
(I'm using http://sujitpal.blogspot.de/2012/01/exploring-nutch-gora-with-cassandra.html)

2 the package contains your nutch-site.xml:
    <name>http.agent.email</name>
    <value>lewismc@apache.org</value>
I guess that's not intended :)

Cheers,
Sebastian

On 06/12/2012 10:16 PM, Lewis John Mcgibbney wrote:
> Hi Everyone,
> 
> I appreciate that most of the core dev's are using trunk, however I
> would appeal to you guys to at least check out the artifacts and check
> sigs, tests, license headers if possible. Although this does not fully
> satisfy the requirements of a thoroughly reviewed RC, hopefully the
> thorough stuff can be undertaken by those directly using the artifacts
> and code in development/production.
> 
> Thanks very much in advance
> 
> Best
> 
> Lewis
> 
> On Fri, Jun 8, 2012 at 3:49 PM, lewis john mcgibbney <le...@apache.org> wrote:
>> Good Evening Everyone,
>>
>> A candidate for the Apache Nutch 2.0 RC1 is available at:
>>
>> http://people.apache.org/~lewismc/nutch-2.0
>>
>> The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
>> archive of the sources in:
>>
>> http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc1
>>
>> Further, a staged Maven repository of the 2.0 jar, sources.jar and
>> javadoc.jar is available here:
>>
>> https://repository.apache.org/content/repositories/orgapachenutch-215
>>
>> Please vote on releasing this package as Apache Nutch 2.0.
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 Nutch PMC votes are cast.
>>
>>  [ ] +1 Release this package as Apache Nutch 2.0
>>  [ ] -1 Do not release this package because...
>>
>> Many Thanks and heres to plenty more.
>>
>> Have a great weekend, Kind Regards,
>> Lewis
>>
>> P.S. Here's my +1.
> 
> 
> 


Re: VOTE Apache Nutch 2.0 RC1

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Thank you

On Tue, Jun 12, 2012 at 9:19 PM, Mattmann, Chris A (388J)
<ch...@jpl.nasa.gov> wrote:
> Hey Lewis,
>
> I will get to this tonight, for sure.
>
> Thanks!
>
> Cheers,
> Chris
>
> On Jun 12, 2012, at 1:16 PM, Lewis John Mcgibbney wrote:
>
>> Hi Everyone,
>>
>> I appreciate that most of the core dev's are using trunk, however I
>> would appeal to you guys to at least check out the artifacts and check
>> sigs, tests, license headers if possible. Although this does not fully
>> satisfy the requirements of a thoroughly reviewed RC, hopefully the
>> thorough stuff can be undertaken by those directly using the artifacts
>> and code in development/production.
>>
>> Thanks very much in advance
>>
>> Best
>>
>> Lewis
>>
>> On Fri, Jun 8, 2012 at 3:49 PM, lewis john mcgibbney <le...@apache.org> wrote:
>>> Good Evening Everyone,
>>>
>>> A candidate for the Apache Nutch 2.0 RC1 is available at:
>>>
>>> http://people.apache.org/~lewismc/nutch-2.0
>>>
>>> The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
>>> archive of the sources in:
>>>
>>> http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc1
>>>
>>> Further, a staged Maven repository of the 2.0 jar, sources.jar and
>>> javadoc.jar is available here:
>>>
>>> https://repository.apache.org/content/repositories/orgapachenutch-215
>>>
>>> Please vote on releasing this package as Apache Nutch 2.0.
>>> The vote is open for the next 72 hours and passes if a majority of at
>>> least three +1 Nutch PMC votes are cast.
>>>
>>>  [ ] +1 Release this package as Apache Nutch 2.0
>>>  [ ] -1 Do not release this package because...
>>>
>>> Many Thanks and heres to plenty more.
>>>
>>> Have a great weekend, Kind Regards,
>>> Lewis
>>>
>>> P.S. Here's my +1.
>>
>>
>>
>> --
>> Lewis
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>



-- 
Lewis

Re: VOTE Apache Nutch 2.0 RC1

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Lewis,

I will get to this tonight, for sure.

Thanks!

Cheers,
Chris

On Jun 12, 2012, at 1:16 PM, Lewis John Mcgibbney wrote:

> Hi Everyone,
> 
> I appreciate that most of the core dev's are using trunk, however I
> would appeal to you guys to at least check out the artifacts and check
> sigs, tests, license headers if possible. Although this does not fully
> satisfy the requirements of a thoroughly reviewed RC, hopefully the
> thorough stuff can be undertaken by those directly using the artifacts
> and code in development/production.
> 
> Thanks very much in advance
> 
> Best
> 
> Lewis
> 
> On Fri, Jun 8, 2012 at 3:49 PM, lewis john mcgibbney <le...@apache.org> wrote:
>> Good Evening Everyone,
>> 
>> A candidate for the Apache Nutch 2.0 RC1 is available at:
>> 
>> http://people.apache.org/~lewismc/nutch-2.0
>> 
>> The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
>> archive of the sources in:
>> 
>> http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc1
>> 
>> Further, a staged Maven repository of the 2.0 jar, sources.jar and
>> javadoc.jar is available here:
>> 
>> https://repository.apache.org/content/repositories/orgapachenutch-215
>> 
>> Please vote on releasing this package as Apache Nutch 2.0.
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 Nutch PMC votes are cast.
>> 
>>  [ ] +1 Release this package as Apache Nutch 2.0
>>  [ ] -1 Do not release this package because...
>> 
>> Many Thanks and heres to plenty more.
>> 
>> Have a great weekend, Kind Regards,
>> Lewis
>> 
>> P.S. Here's my +1.
> 
> 
> 
> -- 
> Lewis


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: VOTE Apache Nutch 2.0 RC1

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Everyone,

I appreciate that most of the core dev's are using trunk, however I
would appeal to you guys to at least check out the artifacts and check
sigs, tests, license headers if possible. Although this does not fully
satisfy the requirements of a thoroughly reviewed RC, hopefully the
thorough stuff can be undertaken by those directly using the artifacts
and code in development/production.

Thanks very much in advance

Best

Lewis

On Fri, Jun 8, 2012 at 3:49 PM, lewis john mcgibbney <le...@apache.org> wrote:
> Good Evening Everyone,
>
> A candidate for the Apache Nutch 2.0 RC1 is available at:
>
> http://people.apache.org/~lewismc/nutch-2.0
>
> The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
> archive of the sources in:
>
> http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc1
>
> Further, a staged Maven repository of the 2.0 jar, sources.jar and
> javadoc.jar is available here:
>
> https://repository.apache.org/content/repositories/orgapachenutch-215
>
> Please vote on releasing this package as Apache Nutch 2.0.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Nutch PMC votes are cast.
>
>  [ ] +1 Release this package as Apache Nutch 2.0
>  [ ] -1 Do not release this package because...
>
> Many Thanks and heres to plenty more.
>
> Have a great weekend, Kind Regards,
> Lewis
>
> P.S. Here's my +1.



-- 
Lewis