You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Sebastian Nagel <sn...@apache.org> on 2022/08/22 15:30:36 UTC

[VOTE] Release Apache Nutch 1.19 RC#1

Hi Folks,

A first candidate for the Nutch 1.19 release is available at:

   https://dist.apache.org/repos/dist/dev/nutch/1.19/

The release candidate is a zip and tar.gz archive of the binary and sources in:
   https://github.com/apache/nutch/tree/release-1.19

In addition, a staged maven repository is available here:
   https://repository.apache.org/content/repositories/orgapachenutch-1020

We addressed 87 issues:
   https://s.apache.org/lf6li


Please vote on releasing this package as Apache Nutch 1.19.
The vote is open for the next 72 hours and passes if a majority
of at least three +1 Nutch PMC votes are cast.

[ ] +1 Release this package as Apache Nutch 1.19.
[ ] -1 Do not release this package because…

Cheers,
Sebastian
(On behalf of the Nutch PMC)

P.S.
Here is my +1.
- tested most of Nutch tools and run a test crawl on a single-node cluster
  running Hadoop 3.3.4, see
  https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)

Re: [VOTE] Release Apache Nutch 1.19 RC#1

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi Markus,

thanks!  What's your (final) decision?


>    [javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;

During build the class should be provided in
  build/plugins/indexer-elastic/httpasyncclient-4.1.4.jar
Could you verify whether this jar is there and whether it contains the class
file? See also:
  https://repo1.maven.org/maven2/org/apache/httpcomponents/httpasyncclient/4.1.4/

> I am worried about the indexer-elastic plugin, maybe others have that
> problem too? Otherwise everything seems fine.

In order to fix it, we need to make the error reproducible resp. figure out
what the reason is.


Regarding the logging: we switched to log4j 2.x (NUTCH-2915) while Hadoop now
uses reload4j (HADOOP-18088 [1]). The logging configuration should be improved
to avoid the warnings in local mode. In distributed mode, the logging
configuration of the provided Hadoop takes over.


Best,
Sebastian

[1] https://issues.apache.org/jira/browse/HADOOP-18088


On 8/24/22 13:28, Markus Jelsma wrote:
> Hi,
> 
> Everything seems fine, the crawler seems fine when trying the binary
> distribution. The source won't work because this computer still cannot
> compile it. Clearing the local Ivy cache did not do much. This is the known
> compiler error with the elastic-indexer plugin:
> compile:
>     [echo] Compiling plugin: indexer-elastic
>    [javac] Compiling 3 source files to
> /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
>    [javac]
> /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
> error: package org.apache.http.impl.nio.client does not exist
>    [javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
>    [javac]                                       ^
>    [javac] 1 error
> 
> 
> The binary distribution works fine though. I do see a lot of new messages
> when fetching:
> 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner
> Map Task Executor #0] Found 0 extensions at
> point:'org.apache.nutch.net.URLExemptionFilter'
> 
> This is also new at start of each task:
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 
> SLF4J: Found binding in
> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> 
> And this one at the end of fetcher:
> log4j:WARN No appenders could be found for logger
> (org.apache.commons.httpclient.params.DefaultHttpParams).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> 
> I am worried about the indexer-elastic plugin, maybe others have that
> problem too? Otherwise everything seems fine.
> 
> Markus
> 
> Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <sn...@apache.org>:
> 
>> Hi Folks,
>>
>> A first candidate for the Nutch 1.19 release is available at:
>>
>>    https://dist.apache.org/repos/dist/dev/nutch/1.19/
>>
>> The release candidate is a zip and tar.gz archive of the binary and
>> sources in:
>>    https://github.com/apache/nutch/tree/release-1.19
>>
>> In addition, a staged maven repository is available here:
>>    https://repository.apache.org/content/repositories/orgapachenutch-1020
>>
>> We addressed 87 issues:
>>    https://s.apache.org/lf6li
>>
>>
>> Please vote on releasing this package as Apache Nutch 1.19.
>> The vote is open for the next 72 hours and passes if a majority
>> of at least three +1 Nutch PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Nutch 1.19.
>> [ ] -1 Do not release this package because…
>>
>> Cheers,
>> Sebastian
>> (On behalf of the Nutch PMC)
>>
>> P.S.
>> Here is my +1.
>> - tested most of Nutch tools and run a test crawl on a single-node cluster
>>   running Hadoop 3.3.4, see
>>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)
>>
> 

Re: [VOTE] Release Apache Nutch 1.19 RC#1

Posted by BlackIce <bl...@gmail.com>.
OK,
I compiled Nutch under JDK11
Did some basic fetching, parsing, linkinversion and posterior indexing to Solr 9
[+1]

Great work!
RRK

On Tue, Aug 30, 2022 at 12:22 PM BlackIce <bl...@gmail.com> wrote:
>
> Tried some indexing... but when manually doing "Invertilinks" it says
> something about input path does not exist.
> Has invertilinks changed since 1.18?
>
> Greetz
> RRK
>
> On Mon, Aug 29, 2022 at 3:38 PM BlackIce <bl...@gmail.com> wrote:
> >
> > Haven't indexed anything to solr.. gonna give it a shot in a few hours
> >
> > On Mon, Aug 29, 2022 at 2:17 PM Markus Jelsma
> > <ma...@openindex.io> wrote:
> > >
> > > Hello Sebastian,
> > >
> > > No, the JAR isn't present. Multiple JARs are missing, probably because they
> > > are loaded after httpasyncclient. I checked the previously emptied Ivy
> > > cache. The Ivy files are there, but the JAR is missing there too.
> > >
> > > markus@midas:~$ ls .ivy2/cache/org.apache.httpcomponents/httpasyncclient/
> > > ivy-4.1.4.xml  ivy-4.1.4.xml.original  ivydata-4.1.4.properties
> > >
> > > I manually downloaded the JAR from [1] and added it to the jars/ directory
> > > in the Ivy cache. It still cannot find the JAR, perhaps the Ivy cache needs
> > > some more things than just adding the JAR manually.
> > >
> > > The odd thing is, that i got the URL below FROM the ivydata-4.1.4.properties
> > > file in the cache.
> > >
> > > Since Ralf can compile it without problems, it seems to be an issue on my
> > > machine only. So Nutch seems fine, therefore +1.
> > >
> > > Regards,
> > > Markus
> > >
> > > [1]
> > > https://repo1.maven.org/maven2/org/apache/httpcomponents/httpasyncclient/4.1.4/
> > >
> > >
> > > Op zo 28 aug. 2022 om 12:05 schreef Sebastian Nagel
> > > <wa...@googlemail.com.invalid>:
> > >
> > > > Hi Ralf,
> > > >
> > > > > It fetches it parses
> > > >
> > > > So a +1 ?
> > > >
> > > > Best,
> > > > Sebastian
> > > >
> > > > On 8/25/22 05:22, BlackIce wrote:
> > > > > nevermind I made a typo...
> > > > >
> > > > > It fetches it parses
> > > > >
> > > > > On Thu, Aug 25, 2022 at 3:42 AM BlackIce <bl...@gmail.com> wrote:
> > > > >>
> > > > >> so far... it doesn't select anything when creating segments:
> > > > >> 0 records selected for fetching, exiting
> > > > >>
> > > > >> On Wed, Aug 24, 2022 at 3:02 PM BlackIce <bl...@gmail.com> wrote:
> > > > >>>
> > > > >>> I have been able to compile under OpenJDK 11
> > > > >>> Have not done anything further so far
> > > > >>> I'm gonna try to get to it this evening
> > > > >>>
> > > > >>> Greetz
> > > > >>> Ralf
> > > > >>>
> > > > >>> On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
> > > > >>> <ma...@openindex.io> wrote:
> > > > >>>>
> > > > >>>> Hi,
> > > > >>>>
> > > > >>>> Everything seems fine, the crawler seems fine when trying the binary
> > > > >>>> distribution. The source won't work because this computer still cannot
> > > > >>>> compile it. Clearing the local Ivy cache did not do much. This is the
> > > > known
> > > > >>>> compiler error with the elastic-indexer plugin:
> > > > >>>> compile:
> > > > >>>>     [echo] Compiling plugin: indexer-elastic
> > > > >>>>    [javac] Compiling 3 source files to
> > > > >>>> /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
> > > > >>>>    [javac]
> > > > >>>>
> > > > /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
> > > > >>>> error: package org.apache.http.impl.nio.client does not exist
> > > > >>>>    [javac] import
> > > > org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
> > > > >>>>    [javac]                                       ^
> > > > >>>>    [javac] 1 error
> > > > >>>>
> > > > >>>>
> > > > >>>> The binary distribution works fine though. I do see a lot of new
> > > > messages
> > > > >>>> when fetching:
> > > > >>>> 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters
> > > > [LocalJobRunner
> > > > >>>> Map Task Executor #0] Found 0 extensions at
> > > > >>>> point:'org.apache.nutch.net.URLExemptionFilter'
> > > > >>>>
> > > > >>>> This is also new at start of each task:
> > > > >>>> SLF4J: Class path contains multiple SLF4J bindings.
> > > > >>>> SLF4J: Found binding in
> > > > >>>>
> > > > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > >>>>
> > > > >>>> SLF4J: Found binding in
> > > > >>>>
> > > > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > >>>>
> > > > >>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > > > >>>> explanation.
> > > > >>>> SLF4J: Actual binding is of type
> > > > >>>> [org.apache.logging.slf4j.Log4jLoggerFactory]
> > > > >>>>
> > > > >>>> And this one at the end of fetcher:
> > > > >>>> log4j:WARN No appenders could be found for logger
> > > > >>>> (org.apache.commons.httpclient.params.DefaultHttpParams).
> > > > >>>> log4j:WARN Please initialize the log4j system properly.
> > > > >>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
> > > > for
> > > > >>>> more info.
> > > > >>>>
> > > > >>>> I am worried about the indexer-elastic plugin, maybe others have that
> > > > >>>> problem too? Otherwise everything seems fine.
> > > > >>>>
> > > > >>>> Markus
> > > > >>>>
> > > > >>>> Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <
> > > > snagel@apache.org>:
> > > > >>>>
> > > > >>>>> Hi Folks,
> > > > >>>>>
> > > > >>>>> A first candidate for the Nutch 1.19 release is available at:
> > > > >>>>>
> > > > >>>>>    https://dist.apache.org/repos/dist/dev/nutch/1.19/
> > > > >>>>>
> > > > >>>>> The release candidate is a zip and tar.gz archive of the binary and
> > > > >>>>> sources in:
> > > > >>>>>    https://github.com/apache/nutch/tree/release-1.19
> > > > >>>>>
> > > > >>>>> In addition, a staged maven repository is available here:
> > > > >>>>>
> > > > https://repository.apache.org/content/repositories/orgapachenutch-1020
> > > > >>>>>
> > > > >>>>> We addressed 87 issues:
> > > > >>>>>    https://s.apache.org/lf6li
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Please vote on releasing this package as Apache Nutch 1.19.
> > > > >>>>> The vote is open for the next 72 hours and passes if a majority
> > > > >>>>> of at least three +1 Nutch PMC votes are cast.
> > > > >>>>>
> > > > >>>>> [ ] +1 Release this package as Apache Nutch 1.19.
> > > > >>>>> [ ] -1 Do not release this package because…
> > > > >>>>>
> > > > >>>>> Cheers,
> > > > >>>>> Sebastian
> > > > >>>>> (On behalf of the Nutch PMC)
> > > > >>>>>
> > > > >>>>> P.S.
> > > > >>>>> Here is my +1.
> > > > >>>>> - tested most of Nutch tools and run a test crawl on a single-node
> > > > cluster
> > > > >>>>>   running Hadoop 3.3.4, see
> > > > >>>>>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/
> > > > )
> > > > >>>>>
> > > >

Re: [VOTE] Release Apache Nutch 1.19 RC#1

Posted by BlackIce <bl...@gmail.com>.
Tried some indexing... but when manually doing "Invertilinks" it says
something about input path does not exist.
Has invertilinks changed since 1.18?

Greetz
RRK

On Mon, Aug 29, 2022 at 3:38 PM BlackIce <bl...@gmail.com> wrote:
>
> Haven't indexed anything to solr.. gonna give it a shot in a few hours
>
> On Mon, Aug 29, 2022 at 2:17 PM Markus Jelsma
> <ma...@openindex.io> wrote:
> >
> > Hello Sebastian,
> >
> > No, the JAR isn't present. Multiple JARs are missing, probably because they
> > are loaded after httpasyncclient. I checked the previously emptied Ivy
> > cache. The Ivy files are there, but the JAR is missing there too.
> >
> > markus@midas:~$ ls .ivy2/cache/org.apache.httpcomponents/httpasyncclient/
> > ivy-4.1.4.xml  ivy-4.1.4.xml.original  ivydata-4.1.4.properties
> >
> > I manually downloaded the JAR from [1] and added it to the jars/ directory
> > in the Ivy cache. It still cannot find the JAR, perhaps the Ivy cache needs
> > some more things than just adding the JAR manually.
> >
> > The odd thing is, that i got the URL below FROM the ivydata-4.1.4.properties
> > file in the cache.
> >
> > Since Ralf can compile it without problems, it seems to be an issue on my
> > machine only. So Nutch seems fine, therefore +1.
> >
> > Regards,
> > Markus
> >
> > [1]
> > https://repo1.maven.org/maven2/org/apache/httpcomponents/httpasyncclient/4.1.4/
> >
> >
> > Op zo 28 aug. 2022 om 12:05 schreef Sebastian Nagel
> > <wa...@googlemail.com.invalid>:
> >
> > > Hi Ralf,
> > >
> > > > It fetches it parses
> > >
> > > So a +1 ?
> > >
> > > Best,
> > > Sebastian
> > >
> > > On 8/25/22 05:22, BlackIce wrote:
> > > > nevermind I made a typo...
> > > >
> > > > It fetches it parses
> > > >
> > > > On Thu, Aug 25, 2022 at 3:42 AM BlackIce <bl...@gmail.com> wrote:
> > > >>
> > > >> so far... it doesn't select anything when creating segments:
> > > >> 0 records selected for fetching, exiting
> > > >>
> > > >> On Wed, Aug 24, 2022 at 3:02 PM BlackIce <bl...@gmail.com> wrote:
> > > >>>
> > > >>> I have been able to compile under OpenJDK 11
> > > >>> Have not done anything further so far
> > > >>> I'm gonna try to get to it this evening
> > > >>>
> > > >>> Greetz
> > > >>> Ralf
> > > >>>
> > > >>> On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
> > > >>> <ma...@openindex.io> wrote:
> > > >>>>
> > > >>>> Hi,
> > > >>>>
> > > >>>> Everything seems fine, the crawler seems fine when trying the binary
> > > >>>> distribution. The source won't work because this computer still cannot
> > > >>>> compile it. Clearing the local Ivy cache did not do much. This is the
> > > known
> > > >>>> compiler error with the elastic-indexer plugin:
> > > >>>> compile:
> > > >>>>     [echo] Compiling plugin: indexer-elastic
> > > >>>>    [javac] Compiling 3 source files to
> > > >>>> /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
> > > >>>>    [javac]
> > > >>>>
> > > /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
> > > >>>> error: package org.apache.http.impl.nio.client does not exist
> > > >>>>    [javac] import
> > > org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
> > > >>>>    [javac]                                       ^
> > > >>>>    [javac] 1 error
> > > >>>>
> > > >>>>
> > > >>>> The binary distribution works fine though. I do see a lot of new
> > > messages
> > > >>>> when fetching:
> > > >>>> 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters
> > > [LocalJobRunner
> > > >>>> Map Task Executor #0] Found 0 extensions at
> > > >>>> point:'org.apache.nutch.net.URLExemptionFilter'
> > > >>>>
> > > >>>> This is also new at start of each task:
> > > >>>> SLF4J: Class path contains multiple SLF4J bindings.
> > > >>>> SLF4J: Found binding in
> > > >>>>
> > > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > >>>>
> > > >>>> SLF4J: Found binding in
> > > >>>>
> > > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > >>>>
> > > >>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > > >>>> explanation.
> > > >>>> SLF4J: Actual binding is of type
> > > >>>> [org.apache.logging.slf4j.Log4jLoggerFactory]
> > > >>>>
> > > >>>> And this one at the end of fetcher:
> > > >>>> log4j:WARN No appenders could be found for logger
> > > >>>> (org.apache.commons.httpclient.params.DefaultHttpParams).
> > > >>>> log4j:WARN Please initialize the log4j system properly.
> > > >>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
> > > for
> > > >>>> more info.
> > > >>>>
> > > >>>> I am worried about the indexer-elastic plugin, maybe others have that
> > > >>>> problem too? Otherwise everything seems fine.
> > > >>>>
> > > >>>> Markus
> > > >>>>
> > > >>>> Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <
> > > snagel@apache.org>:
> > > >>>>
> > > >>>>> Hi Folks,
> > > >>>>>
> > > >>>>> A first candidate for the Nutch 1.19 release is available at:
> > > >>>>>
> > > >>>>>    https://dist.apache.org/repos/dist/dev/nutch/1.19/
> > > >>>>>
> > > >>>>> The release candidate is a zip and tar.gz archive of the binary and
> > > >>>>> sources in:
> > > >>>>>    https://github.com/apache/nutch/tree/release-1.19
> > > >>>>>
> > > >>>>> In addition, a staged maven repository is available here:
> > > >>>>>
> > > https://repository.apache.org/content/repositories/orgapachenutch-1020
> > > >>>>>
> > > >>>>> We addressed 87 issues:
> > > >>>>>    https://s.apache.org/lf6li
> > > >>>>>
> > > >>>>>
> > > >>>>> Please vote on releasing this package as Apache Nutch 1.19.
> > > >>>>> The vote is open for the next 72 hours and passes if a majority
> > > >>>>> of at least three +1 Nutch PMC votes are cast.
> > > >>>>>
> > > >>>>> [ ] +1 Release this package as Apache Nutch 1.19.
> > > >>>>> [ ] -1 Do not release this package because…
> > > >>>>>
> > > >>>>> Cheers,
> > > >>>>> Sebastian
> > > >>>>> (On behalf of the Nutch PMC)
> > > >>>>>
> > > >>>>> P.S.
> > > >>>>> Here is my +1.
> > > >>>>> - tested most of Nutch tools and run a test crawl on a single-node
> > > cluster
> > > >>>>>   running Hadoop 3.3.4, see
> > > >>>>>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/
> > > )
> > > >>>>>
> > >

Re: [VOTE] Release Apache Nutch 1.19 RC#1

Posted by BlackIce <bl...@gmail.com>.
Haven't indexed anything to solr.. gonna give it a shot in a few hours

On Mon, Aug 29, 2022 at 2:17 PM Markus Jelsma
<ma...@openindex.io> wrote:
>
> Hello Sebastian,
>
> No, the JAR isn't present. Multiple JARs are missing, probably because they
> are loaded after httpasyncclient. I checked the previously emptied Ivy
> cache. The Ivy files are there, but the JAR is missing there too.
>
> markus@midas:~$ ls .ivy2/cache/org.apache.httpcomponents/httpasyncclient/
> ivy-4.1.4.xml  ivy-4.1.4.xml.original  ivydata-4.1.4.properties
>
> I manually downloaded the JAR from [1] and added it to the jars/ directory
> in the Ivy cache. It still cannot find the JAR, perhaps the Ivy cache needs
> some more things than just adding the JAR manually.
>
> The odd thing is, that i got the URL below FROM the ivydata-4.1.4.properties
> file in the cache.
>
> Since Ralf can compile it without problems, it seems to be an issue on my
> machine only. So Nutch seems fine, therefore +1.
>
> Regards,
> Markus
>
> [1]
> https://repo1.maven.org/maven2/org/apache/httpcomponents/httpasyncclient/4.1.4/
>
>
> Op zo 28 aug. 2022 om 12:05 schreef Sebastian Nagel
> <wa...@googlemail.com.invalid>:
>
> > Hi Ralf,
> >
> > > It fetches it parses
> >
> > So a +1 ?
> >
> > Best,
> > Sebastian
> >
> > On 8/25/22 05:22, BlackIce wrote:
> > > nevermind I made a typo...
> > >
> > > It fetches it parses
> > >
> > > On Thu, Aug 25, 2022 at 3:42 AM BlackIce <bl...@gmail.com> wrote:
> > >>
> > >> so far... it doesn't select anything when creating segments:
> > >> 0 records selected for fetching, exiting
> > >>
> > >> On Wed, Aug 24, 2022 at 3:02 PM BlackIce <bl...@gmail.com> wrote:
> > >>>
> > >>> I have been able to compile under OpenJDK 11
> > >>> Have not done anything further so far
> > >>> I'm gonna try to get to it this evening
> > >>>
> > >>> Greetz
> > >>> Ralf
> > >>>
> > >>> On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
> > >>> <ma...@openindex.io> wrote:
> > >>>>
> > >>>> Hi,
> > >>>>
> > >>>> Everything seems fine, the crawler seems fine when trying the binary
> > >>>> distribution. The source won't work because this computer still cannot
> > >>>> compile it. Clearing the local Ivy cache did not do much. This is the
> > known
> > >>>> compiler error with the elastic-indexer plugin:
> > >>>> compile:
> > >>>>     [echo] Compiling plugin: indexer-elastic
> > >>>>    [javac] Compiling 3 source files to
> > >>>> /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
> > >>>>    [javac]
> > >>>>
> > /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
> > >>>> error: package org.apache.http.impl.nio.client does not exist
> > >>>>    [javac] import
> > org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
> > >>>>    [javac]                                       ^
> > >>>>    [javac] 1 error
> > >>>>
> > >>>>
> > >>>> The binary distribution works fine though. I do see a lot of new
> > messages
> > >>>> when fetching:
> > >>>> 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters
> > [LocalJobRunner
> > >>>> Map Task Executor #0] Found 0 extensions at
> > >>>> point:'org.apache.nutch.net.URLExemptionFilter'
> > >>>>
> > >>>> This is also new at start of each task:
> > >>>> SLF4J: Class path contains multiple SLF4J bindings.
> > >>>> SLF4J: Found binding in
> > >>>>
> > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > >>>>
> > >>>> SLF4J: Found binding in
> > >>>>
> > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > >>>>
> > >>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > >>>> explanation.
> > >>>> SLF4J: Actual binding is of type
> > >>>> [org.apache.logging.slf4j.Log4jLoggerFactory]
> > >>>>
> > >>>> And this one at the end of fetcher:
> > >>>> log4j:WARN No appenders could be found for logger
> > >>>> (org.apache.commons.httpclient.params.DefaultHttpParams).
> > >>>> log4j:WARN Please initialize the log4j system properly.
> > >>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
> > for
> > >>>> more info.
> > >>>>
> > >>>> I am worried about the indexer-elastic plugin, maybe others have that
> > >>>> problem too? Otherwise everything seems fine.
> > >>>>
> > >>>> Markus
> > >>>>
> > >>>> Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <
> > snagel@apache.org>:
> > >>>>
> > >>>>> Hi Folks,
> > >>>>>
> > >>>>> A first candidate for the Nutch 1.19 release is available at:
> > >>>>>
> > >>>>>    https://dist.apache.org/repos/dist/dev/nutch/1.19/
> > >>>>>
> > >>>>> The release candidate is a zip and tar.gz archive of the binary and
> > >>>>> sources in:
> > >>>>>    https://github.com/apache/nutch/tree/release-1.19
> > >>>>>
> > >>>>> In addition, a staged maven repository is available here:
> > >>>>>
> > https://repository.apache.org/content/repositories/orgapachenutch-1020
> > >>>>>
> > >>>>> We addressed 87 issues:
> > >>>>>    https://s.apache.org/lf6li
> > >>>>>
> > >>>>>
> > >>>>> Please vote on releasing this package as Apache Nutch 1.19.
> > >>>>> The vote is open for the next 72 hours and passes if a majority
> > >>>>> of at least three +1 Nutch PMC votes are cast.
> > >>>>>
> > >>>>> [ ] +1 Release this package as Apache Nutch 1.19.
> > >>>>> [ ] -1 Do not release this package because…
> > >>>>>
> > >>>>> Cheers,
> > >>>>> Sebastian
> > >>>>> (On behalf of the Nutch PMC)
> > >>>>>
> > >>>>> P.S.
> > >>>>> Here is my +1.
> > >>>>> - tested most of Nutch tools and run a test crawl on a single-node
> > cluster
> > >>>>>   running Hadoop 3.3.4, see
> > >>>>>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/
> > )
> > >>>>>
> >

Re: [VOTE] Release Apache Nutch 1.19 RC#1

Posted by Sebastian Nagel <wa...@googlemail.com.INVALID>.
Hi Markus,

thanks!

Could you share the files in

  .ivy2/cache/org.apache.httpcomponents/httpasyncclient/

and maybe also the logs of a Nutch build starting with an empty ~/.ivy2/cache ?
I'll have a look and compare it what I find on my system - maybe use a new
thread on user@ or a Jira issue, I'll plan to close the vote over the weekend,
so let's keep this thread for the release vote alone.

Best,
Sebastian

On 8/29/22 14:17, Markus Jelsma wrote:
> Hello Sebastian,
> 
> No, the JAR isn't present. Multiple JARs are missing, probably because they
> are loaded after httpasyncclient. I checked the previously emptied Ivy
> cache. The Ivy files are there, but the JAR is missing there too.
> 
> markus@midas:~$ ls .ivy2/cache/org.apache.httpcomponents/httpasyncclient/
> ivy-4.1.4.xml  ivy-4.1.4.xml.original  ivydata-4.1.4.properties
> 
> I manually downloaded the JAR from [1] and added it to the jars/ directory
> in the Ivy cache. It still cannot find the JAR, perhaps the Ivy cache needs
> some more things than just adding the JAR manually.
> 
> The odd thing is, that i got the URL below FROM the ivydata-4.1.4.properties
> file in the cache.
> 
> Since Ralf can compile it without problems, it seems to be an issue on my
> machine only. So Nutch seems fine, therefore +1.
> 
> Regards,
> Markus
> 
> [1]
> https://repo1.maven.org/maven2/org/apache/httpcomponents/httpasyncclient/4.1.4/
> 
> 
> Op zo 28 aug. 2022 om 12:05 schreef Sebastian Nagel
> <wa...@googlemail.com.invalid>:
> 
>> Hi Ralf,
>>
>>> It fetches it parses
>>
>> So a +1 ?
>>
>> Best,
>> Sebastian
>>
>> On 8/25/22 05:22, BlackIce wrote:
>>> nevermind I made a typo...
>>>
>>> It fetches it parses
>>>
>>> On Thu, Aug 25, 2022 at 3:42 AM BlackIce <bl...@gmail.com> wrote:
>>>>
>>>> so far... it doesn't select anything when creating segments:
>>>> 0 records selected for fetching, exiting
>>>>
>>>> On Wed, Aug 24, 2022 at 3:02 PM BlackIce <bl...@gmail.com> wrote:
>>>>>
>>>>> I have been able to compile under OpenJDK 11
>>>>> Have not done anything further so far
>>>>> I'm gonna try to get to it this evening
>>>>>
>>>>> Greetz
>>>>> Ralf
>>>>>
>>>>> On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
>>>>> <ma...@openindex.io> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Everything seems fine, the crawler seems fine when trying the binary
>>>>>> distribution. The source won't work because this computer still cannot
>>>>>> compile it. Clearing the local Ivy cache did not do much. This is the
>> known
>>>>>> compiler error with the elastic-indexer plugin:
>>>>>> compile:
>>>>>>     [echo] Compiling plugin: indexer-elastic
>>>>>>    [javac] Compiling 3 source files to
>>>>>> /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
>>>>>>    [javac]
>>>>>>
>> /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
>>>>>> error: package org.apache.http.impl.nio.client does not exist
>>>>>>    [javac] import
>> org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
>>>>>>    [javac]                                       ^
>>>>>>    [javac] 1 error
>>>>>>
>>>>>>
>>>>>> The binary distribution works fine though. I do see a lot of new
>> messages
>>>>>> when fetching:
>>>>>> 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters
>> [LocalJobRunner
>>>>>> Map Task Executor #0] Found 0 extensions at
>>>>>> point:'org.apache.nutch.net.URLExemptionFilter'
>>>>>>
>>>>>> This is also new at start of each task:
>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>>> SLF4J: Found binding in
>>>>>>
>> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>
>>>>>> SLF4J: Found binding in
>>>>>>
>> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>
>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>>>> explanation.
>>>>>> SLF4J: Actual binding is of type
>>>>>> [org.apache.logging.slf4j.Log4jLoggerFactory]
>>>>>>
>>>>>> And this one at the end of fetcher:
>>>>>> log4j:WARN No appenders could be found for logger
>>>>>> (org.apache.commons.httpclient.params.DefaultHttpParams).
>>>>>> log4j:WARN Please initialize the log4j system properly.
>>>>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
>> for
>>>>>> more info.
>>>>>>
>>>>>> I am worried about the indexer-elastic plugin, maybe others have that
>>>>>> problem too? Otherwise everything seems fine.
>>>>>>
>>>>>> Markus
>>>>>>
>>>>>> Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <
>> snagel@apache.org>:
>>>>>>
>>>>>>> Hi Folks,
>>>>>>>
>>>>>>> A first candidate for the Nutch 1.19 release is available at:
>>>>>>>
>>>>>>>    https://dist.apache.org/repos/dist/dev/nutch/1.19/
>>>>>>>
>>>>>>> The release candidate is a zip and tar.gz archive of the binary and
>>>>>>> sources in:
>>>>>>>    https://github.com/apache/nutch/tree/release-1.19
>>>>>>>
>>>>>>> In addition, a staged maven repository is available here:
>>>>>>>
>> https://repository.apache.org/content/repositories/orgapachenutch-1020
>>>>>>>
>>>>>>> We addressed 87 issues:
>>>>>>>    https://s.apache.org/lf6li
>>>>>>>
>>>>>>>
>>>>>>> Please vote on releasing this package as Apache Nutch 1.19.
>>>>>>> The vote is open for the next 72 hours and passes if a majority
>>>>>>> of at least three +1 Nutch PMC votes are cast.
>>>>>>>
>>>>>>> [ ] +1 Release this package as Apache Nutch 1.19.
>>>>>>> [ ] -1 Do not release this package because…
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Sebastian
>>>>>>> (On behalf of the Nutch PMC)
>>>>>>>
>>>>>>> P.S.
>>>>>>> Here is my +1.
>>>>>>> - tested most of Nutch tools and run a test crawl on a single-node
>> cluster
>>>>>>>   running Hadoop 3.3.4, see
>>>>>>>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/
>> )
>>>>>>>
>>
> 

Re: [VOTE] Release Apache Nutch 1.19 RC#1

Posted by Markus Jelsma <ma...@openindex.io>.
Hello Sebastian,

No, the JAR isn't present. Multiple JARs are missing, probably because they
are loaded after httpasyncclient. I checked the previously emptied Ivy
cache. The Ivy files are there, but the JAR is missing there too.

markus@midas:~$ ls .ivy2/cache/org.apache.httpcomponents/httpasyncclient/
ivy-4.1.4.xml  ivy-4.1.4.xml.original  ivydata-4.1.4.properties

I manually downloaded the JAR from [1] and added it to the jars/ directory
in the Ivy cache. It still cannot find the JAR, perhaps the Ivy cache needs
some more things than just adding the JAR manually.

The odd thing is, that i got the URL below FROM the ivydata-4.1.4.properties
file in the cache.

Since Ralf can compile it without problems, it seems to be an issue on my
machine only. So Nutch seems fine, therefore +1.

Regards,
Markus

[1]
https://repo1.maven.org/maven2/org/apache/httpcomponents/httpasyncclient/4.1.4/


Op zo 28 aug. 2022 om 12:05 schreef Sebastian Nagel
<wa...@googlemail.com.invalid>:

> Hi Ralf,
>
> > It fetches it parses
>
> So a +1 ?
>
> Best,
> Sebastian
>
> On 8/25/22 05:22, BlackIce wrote:
> > nevermind I made a typo...
> >
> > It fetches it parses
> >
> > On Thu, Aug 25, 2022 at 3:42 AM BlackIce <bl...@gmail.com> wrote:
> >>
> >> so far... it doesn't select anything when creating segments:
> >> 0 records selected for fetching, exiting
> >>
> >> On Wed, Aug 24, 2022 at 3:02 PM BlackIce <bl...@gmail.com> wrote:
> >>>
> >>> I have been able to compile under OpenJDK 11
> >>> Have not done anything further so far
> >>> I'm gonna try to get to it this evening
> >>>
> >>> Greetz
> >>> Ralf
> >>>
> >>> On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
> >>> <ma...@openindex.io> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> Everything seems fine, the crawler seems fine when trying the binary
> >>>> distribution. The source won't work because this computer still cannot
> >>>> compile it. Clearing the local Ivy cache did not do much. This is the
> known
> >>>> compiler error with the elastic-indexer plugin:
> >>>> compile:
> >>>>     [echo] Compiling plugin: indexer-elastic
> >>>>    [javac] Compiling 3 source files to
> >>>> /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
> >>>>    [javac]
> >>>>
> /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
> >>>> error: package org.apache.http.impl.nio.client does not exist
> >>>>    [javac] import
> org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
> >>>>    [javac]                                       ^
> >>>>    [javac] 1 error
> >>>>
> >>>>
> >>>> The binary distribution works fine though. I do see a lot of new
> messages
> >>>> when fetching:
> >>>> 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters
> [LocalJobRunner
> >>>> Map Task Executor #0] Found 0 extensions at
> >>>> point:'org.apache.nutch.net.URLExemptionFilter'
> >>>>
> >>>> This is also new at start of each task:
> >>>> SLF4J: Class path contains multiple SLF4J bindings.
> >>>> SLF4J: Found binding in
> >>>>
> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >>>>
> >>>> SLF4J: Found binding in
> >>>>
> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >>>>
> >>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> >>>> explanation.
> >>>> SLF4J: Actual binding is of type
> >>>> [org.apache.logging.slf4j.Log4jLoggerFactory]
> >>>>
> >>>> And this one at the end of fetcher:
> >>>> log4j:WARN No appenders could be found for logger
> >>>> (org.apache.commons.httpclient.params.DefaultHttpParams).
> >>>> log4j:WARN Please initialize the log4j system properly.
> >>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
> for
> >>>> more info.
> >>>>
> >>>> I am worried about the indexer-elastic plugin, maybe others have that
> >>>> problem too? Otherwise everything seems fine.
> >>>>
> >>>> Markus
> >>>>
> >>>> Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <
> snagel@apache.org>:
> >>>>
> >>>>> Hi Folks,
> >>>>>
> >>>>> A first candidate for the Nutch 1.19 release is available at:
> >>>>>
> >>>>>    https://dist.apache.org/repos/dist/dev/nutch/1.19/
> >>>>>
> >>>>> The release candidate is a zip and tar.gz archive of the binary and
> >>>>> sources in:
> >>>>>    https://github.com/apache/nutch/tree/release-1.19
> >>>>>
> >>>>> In addition, a staged maven repository is available here:
> >>>>>
> https://repository.apache.org/content/repositories/orgapachenutch-1020
> >>>>>
> >>>>> We addressed 87 issues:
> >>>>>    https://s.apache.org/lf6li
> >>>>>
> >>>>>
> >>>>> Please vote on releasing this package as Apache Nutch 1.19.
> >>>>> The vote is open for the next 72 hours and passes if a majority
> >>>>> of at least three +1 Nutch PMC votes are cast.
> >>>>>
> >>>>> [ ] +1 Release this package as Apache Nutch 1.19.
> >>>>> [ ] -1 Do not release this package because…
> >>>>>
> >>>>> Cheers,
> >>>>> Sebastian
> >>>>> (On behalf of the Nutch PMC)
> >>>>>
> >>>>> P.S.
> >>>>> Here is my +1.
> >>>>> - tested most of Nutch tools and run a test crawl on a single-node
> cluster
> >>>>>   running Hadoop 3.3.4, see
> >>>>>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/
> )
> >>>>>
>

Re: [VOTE] Release Apache Nutch 1.19 RC#1

Posted by Sebastian Nagel <wa...@googlemail.com.INVALID>.
Hi Ralf,

> It fetches it parses

So a +1 ?

Best,
Sebastian

On 8/25/22 05:22, BlackIce wrote:
> nevermind I made a typo...
> 
> It fetches it parses
> 
> On Thu, Aug 25, 2022 at 3:42 AM BlackIce <bl...@gmail.com> wrote:
>>
>> so far... it doesn't select anything when creating segments:
>> 0 records selected for fetching, exiting
>>
>> On Wed, Aug 24, 2022 at 3:02 PM BlackIce <bl...@gmail.com> wrote:
>>>
>>> I have been able to compile under OpenJDK 11
>>> Have not done anything further so far
>>> I'm gonna try to get to it this evening
>>>
>>> Greetz
>>> Ralf
>>>
>>> On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
>>> <ma...@openindex.io> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Everything seems fine, the crawler seems fine when trying the binary
>>>> distribution. The source won't work because this computer still cannot
>>>> compile it. Clearing the local Ivy cache did not do much. This is the known
>>>> compiler error with the elastic-indexer plugin:
>>>> compile:
>>>>     [echo] Compiling plugin: indexer-elastic
>>>>    [javac] Compiling 3 source files to
>>>> /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
>>>>    [javac]
>>>> /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
>>>> error: package org.apache.http.impl.nio.client does not exist
>>>>    [javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
>>>>    [javac]                                       ^
>>>>    [javac] 1 error
>>>>
>>>>
>>>> The binary distribution works fine though. I do see a lot of new messages
>>>> when fetching:
>>>> 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner
>>>> Map Task Executor #0] Found 0 extensions at
>>>> point:'org.apache.nutch.net.URLExemptionFilter'
>>>>
>>>> This is also new at start of each task:
>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>> SLF4J: Found binding in
>>>> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>
>>>> SLF4J: Found binding in
>>>> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>
>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>> explanation.
>>>> SLF4J: Actual binding is of type
>>>> [org.apache.logging.slf4j.Log4jLoggerFactory]
>>>>
>>>> And this one at the end of fetcher:
>>>> log4j:WARN No appenders could be found for logger
>>>> (org.apache.commons.httpclient.params.DefaultHttpParams).
>>>> log4j:WARN Please initialize the log4j system properly.
>>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
>>>> more info.
>>>>
>>>> I am worried about the indexer-elastic plugin, maybe others have that
>>>> problem too? Otherwise everything seems fine.
>>>>
>>>> Markus
>>>>
>>>> Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <sn...@apache.org>:
>>>>
>>>>> Hi Folks,
>>>>>
>>>>> A first candidate for the Nutch 1.19 release is available at:
>>>>>
>>>>>    https://dist.apache.org/repos/dist/dev/nutch/1.19/
>>>>>
>>>>> The release candidate is a zip and tar.gz archive of the binary and
>>>>> sources in:
>>>>>    https://github.com/apache/nutch/tree/release-1.19
>>>>>
>>>>> In addition, a staged maven repository is available here:
>>>>>    https://repository.apache.org/content/repositories/orgapachenutch-1020
>>>>>
>>>>> We addressed 87 issues:
>>>>>    https://s.apache.org/lf6li
>>>>>
>>>>>
>>>>> Please vote on releasing this package as Apache Nutch 1.19.
>>>>> The vote is open for the next 72 hours and passes if a majority
>>>>> of at least three +1 Nutch PMC votes are cast.
>>>>>
>>>>> [ ] +1 Release this package as Apache Nutch 1.19.
>>>>> [ ] -1 Do not release this package because…
>>>>>
>>>>> Cheers,
>>>>> Sebastian
>>>>> (On behalf of the Nutch PMC)
>>>>>
>>>>> P.S.
>>>>> Here is my +1.
>>>>> - tested most of Nutch tools and run a test crawl on a single-node cluster
>>>>>   running Hadoop 3.3.4, see
>>>>>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)
>>>>>

Re: [VOTE] Release Apache Nutch 1.19 RC#1

Posted by BlackIce <bl...@gmail.com>.
nevermind I made a typo...

It fetches it parses

On Thu, Aug 25, 2022 at 3:42 AM BlackIce <bl...@gmail.com> wrote:
>
> so far... it doesn't select anything when creating segments:
> 0 records selected for fetching, exiting
>
> On Wed, Aug 24, 2022 at 3:02 PM BlackIce <bl...@gmail.com> wrote:
> >
> > I have been able to compile under OpenJDK 11
> > Have not done anything further so far
> > I'm gonna try to get to it this evening
> >
> > Greetz
> > Ralf
> >
> > On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
> > <ma...@openindex.io> wrote:
> > >
> > > Hi,
> > >
> > > Everything seems fine, the crawler seems fine when trying the binary
> > > distribution. The source won't work because this computer still cannot
> > > compile it. Clearing the local Ivy cache did not do much. This is the known
> > > compiler error with the elastic-indexer plugin:
> > > compile:
> > >     [echo] Compiling plugin: indexer-elastic
> > >    [javac] Compiling 3 source files to
> > > /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
> > >    [javac]
> > > /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
> > > error: package org.apache.http.impl.nio.client does not exist
> > >    [javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
> > >    [javac]                                       ^
> > >    [javac] 1 error
> > >
> > >
> > > The binary distribution works fine though. I do see a lot of new messages
> > > when fetching:
> > > 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner
> > > Map Task Executor #0] Found 0 extensions at
> > > point:'org.apache.nutch.net.URLExemptionFilter'
> > >
> > > This is also new at start of each task:
> > > SLF4J: Class path contains multiple SLF4J bindings.
> > > SLF4J: Found binding in
> > > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > >
> > > SLF4J: Found binding in
> > > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > >
> > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > > explanation.
> > > SLF4J: Actual binding is of type
> > > [org.apache.logging.slf4j.Log4jLoggerFactory]
> > >
> > > And this one at the end of fetcher:
> > > log4j:WARN No appenders could be found for logger
> > > (org.apache.commons.httpclient.params.DefaultHttpParams).
> > > log4j:WARN Please initialize the log4j system properly.
> > > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> > > more info.
> > >
> > > I am worried about the indexer-elastic plugin, maybe others have that
> > > problem too? Otherwise everything seems fine.
> > >
> > > Markus
> > >
> > > Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <sn...@apache.org>:
> > >
> > > > Hi Folks,
> > > >
> > > > A first candidate for the Nutch 1.19 release is available at:
> > > >
> > > >    https://dist.apache.org/repos/dist/dev/nutch/1.19/
> > > >
> > > > The release candidate is a zip and tar.gz archive of the binary and
> > > > sources in:
> > > >    https://github.com/apache/nutch/tree/release-1.19
> > > >
> > > > In addition, a staged maven repository is available here:
> > > >    https://repository.apache.org/content/repositories/orgapachenutch-1020
> > > >
> > > > We addressed 87 issues:
> > > >    https://s.apache.org/lf6li
> > > >
> > > >
> > > > Please vote on releasing this package as Apache Nutch 1.19.
> > > > The vote is open for the next 72 hours and passes if a majority
> > > > of at least three +1 Nutch PMC votes are cast.
> > > >
> > > > [ ] +1 Release this package as Apache Nutch 1.19.
> > > > [ ] -1 Do not release this package because…
> > > >
> > > > Cheers,
> > > > Sebastian
> > > > (On behalf of the Nutch PMC)
> > > >
> > > > P.S.
> > > > Here is my +1.
> > > > - tested most of Nutch tools and run a test crawl on a single-node cluster
> > > >   running Hadoop 3.3.4, see
> > > >   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)
> > > >

Re: [VOTE] Release Apache Nutch 1.19 RC#1

Posted by BlackIce <bl...@gmail.com>.
so far... it doesn't select anything when creating segments:
0 records selected for fetching, exiting

On Wed, Aug 24, 2022 at 3:02 PM BlackIce <bl...@gmail.com> wrote:
>
> I have been able to compile under OpenJDK 11
> Have not done anything further so far
> I'm gonna try to get to it this evening
>
> Greetz
> Ralf
>
> On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
> <ma...@openindex.io> wrote:
> >
> > Hi,
> >
> > Everything seems fine, the crawler seems fine when trying the binary
> > distribution. The source won't work because this computer still cannot
> > compile it. Clearing the local Ivy cache did not do much. This is the known
> > compiler error with the elastic-indexer plugin:
> > compile:
> >     [echo] Compiling plugin: indexer-elastic
> >    [javac] Compiling 3 source files to
> > /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
> >    [javac]
> > /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
> > error: package org.apache.http.impl.nio.client does not exist
> >    [javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
> >    [javac]                                       ^
> >    [javac] 1 error
> >
> >
> > The binary distribution works fine though. I do see a lot of new messages
> > when fetching:
> > 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner
> > Map Task Executor #0] Found 0 extensions at
> > point:'org.apache.nutch.net.URLExemptionFilter'
> >
> > This is also new at start of each task:
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >
> > SLF4J: Found binding in
> > [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > SLF4J: Actual binding is of type
> > [org.apache.logging.slf4j.Log4jLoggerFactory]
> >
> > And this one at the end of fetcher:
> > log4j:WARN No appenders could be found for logger
> > (org.apache.commons.httpclient.params.DefaultHttpParams).
> > log4j:WARN Please initialize the log4j system properly.
> > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> > more info.
> >
> > I am worried about the indexer-elastic plugin, maybe others have that
> > problem too? Otherwise everything seems fine.
> >
> > Markus
> >
> > Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <sn...@apache.org>:
> >
> > > Hi Folks,
> > >
> > > A first candidate for the Nutch 1.19 release is available at:
> > >
> > >    https://dist.apache.org/repos/dist/dev/nutch/1.19/
> > >
> > > The release candidate is a zip and tar.gz archive of the binary and
> > > sources in:
> > >    https://github.com/apache/nutch/tree/release-1.19
> > >
> > > In addition, a staged maven repository is available here:
> > >    https://repository.apache.org/content/repositories/orgapachenutch-1020
> > >
> > > We addressed 87 issues:
> > >    https://s.apache.org/lf6li
> > >
> > >
> > > Please vote on releasing this package as Apache Nutch 1.19.
> > > The vote is open for the next 72 hours and passes if a majority
> > > of at least three +1 Nutch PMC votes are cast.
> > >
> > > [ ] +1 Release this package as Apache Nutch 1.19.
> > > [ ] -1 Do not release this package because…
> > >
> > > Cheers,
> > > Sebastian
> > > (On behalf of the Nutch PMC)
> > >
> > > P.S.
> > > Here is my +1.
> > > - tested most of Nutch tools and run a test crawl on a single-node cluster
> > >   running Hadoop 3.3.4, see
> > >   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)
> > >

Re: [VOTE] Release Apache Nutch 1.19 RC#1

Posted by BlackIce <bl...@gmail.com>.
I have been able to compile under OpenJDK 11
Have not done anything further so far
I'm gonna try to get to it this evening

Greetz
Ralf

On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
<ma...@openindex.io> wrote:
>
> Hi,
>
> Everything seems fine, the crawler seems fine when trying the binary
> distribution. The source won't work because this computer still cannot
> compile it. Clearing the local Ivy cache did not do much. This is the known
> compiler error with the elastic-indexer plugin:
> compile:
>     [echo] Compiling plugin: indexer-elastic
>    [javac] Compiling 3 source files to
> /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
>    [javac]
> /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
> error: package org.apache.http.impl.nio.client does not exist
>    [javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
>    [javac]                                       ^
>    [javac] 1 error
>
>
> The binary distribution works fine though. I do see a lot of new messages
> when fetching:
> 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner
> Map Task Executor #0] Found 0 extensions at
> point:'org.apache.nutch.net.URLExemptionFilter'
>
> This is also new at start of each task:
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: Found binding in
> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>
> And this one at the end of fetcher:
> log4j:WARN No appenders could be found for logger
> (org.apache.commons.httpclient.params.DefaultHttpParams).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
>
> I am worried about the indexer-elastic plugin, maybe others have that
> problem too? Otherwise everything seems fine.
>
> Markus
>
> Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <sn...@apache.org>:
>
> > Hi Folks,
> >
> > A first candidate for the Nutch 1.19 release is available at:
> >
> >    https://dist.apache.org/repos/dist/dev/nutch/1.19/
> >
> > The release candidate is a zip and tar.gz archive of the binary and
> > sources in:
> >    https://github.com/apache/nutch/tree/release-1.19
> >
> > In addition, a staged maven repository is available here:
> >    https://repository.apache.org/content/repositories/orgapachenutch-1020
> >
> > We addressed 87 issues:
> >    https://s.apache.org/lf6li
> >
> >
> > Please vote on releasing this package as Apache Nutch 1.19.
> > The vote is open for the next 72 hours and passes if a majority
> > of at least three +1 Nutch PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Nutch 1.19.
> > [ ] -1 Do not release this package because…
> >
> > Cheers,
> > Sebastian
> > (On behalf of the Nutch PMC)
> >
> > P.S.
> > Here is my +1.
> > - tested most of Nutch tools and run a test crawl on a single-node cluster
> >   running Hadoop 3.3.4, see
> >   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)
> >

Re: [VOTE] Release Apache Nutch 1.19 RC#1

Posted by Sebastian Nagel <wa...@googlemail.com.INVALID>.
Hi Markus,

thanks!  What's your (final) decision?


>    [javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;

During build the class should be provided in
  build/plugins/indexer-elastic/httpasyncclient-4.1.4.jar
Could you verify whether this jar is there and whether it contains the class
file? See also:
  https://repo1.maven.org/maven2/org/apache/httpcomponents/httpasyncclient/4.1.4/

> I am worried about the indexer-elastic plugin, maybe others have that
> problem too? Otherwise everything seems fine.

In order to fix it, we need to make the error reproducible resp. figure out
what the reason is.


Regarding the logging: we switched to log4j 2.x (NUTCH-2915) while Hadoop now
uses reload4j (HADOOP-18088 [1]). The logging configuration should be improved
to avoid the warnings in local mode. In distributed mode, the logging
configuration of the provided Hadoop takes over.


Best,
Sebastian

[1] https://issues.apache.org/jira/browse/HADOOP-18088


On 8/24/22 13:28, Markus Jelsma wrote:
> Hi,
> 
> Everything seems fine, the crawler seems fine when trying the binary
> distribution. The source won't work because this computer still cannot
> compile it. Clearing the local Ivy cache did not do much. This is the known
> compiler error with the elastic-indexer plugin:
> compile:
>     [echo] Compiling plugin: indexer-elastic
>    [javac] Compiling 3 source files to
> /home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
>    [javac]
> /home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
> error: package org.apache.http.impl.nio.client does not exist
>    [javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
>    [javac]                                       ^
>    [javac] 1 error
> 
> 
> The binary distribution works fine though. I do see a lot of new messages
> when fetching:
> 2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner
> Map Task Executor #0] Found 0 extensions at
> point:'org.apache.nutch.net.URLExemptionFilter'
> 
> This is also new at start of each task:
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 
> SLF4J: Found binding in
> [jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> 
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> 
> And this one at the end of fetcher:
> log4j:WARN No appenders could be found for logger
> (org.apache.commons.httpclient.params.DefaultHttpParams).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> 
> I am worried about the indexer-elastic plugin, maybe others have that
> problem too? Otherwise everything seems fine.
> 
> Markus
> 
> Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <sn...@apache.org>:
> 
>> Hi Folks,
>>
>> A first candidate for the Nutch 1.19 release is available at:
>>
>>    https://dist.apache.org/repos/dist/dev/nutch/1.19/
>>
>> The release candidate is a zip and tar.gz archive of the binary and
>> sources in:
>>    https://github.com/apache/nutch/tree/release-1.19
>>
>> In addition, a staged maven repository is available here:
>>    https://repository.apache.org/content/repositories/orgapachenutch-1020
>>
>> We addressed 87 issues:
>>    https://s.apache.org/lf6li
>>
>>
>> Please vote on releasing this package as Apache Nutch 1.19.
>> The vote is open for the next 72 hours and passes if a majority
>> of at least three +1 Nutch PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Nutch 1.19.
>> [ ] -1 Do not release this package because…
>>
>> Cheers,
>> Sebastian
>> (On behalf of the Nutch PMC)
>>
>> P.S.
>> Here is my +1.
>> - tested most of Nutch tools and run a test crawl on a single-node cluster
>>   running Hadoop 3.3.4, see
>>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)
>>
> 

Re: [VOTE] Release Apache Nutch 1.19 RC#1

Posted by Markus Jelsma <ma...@openindex.io>.
Hi,

Everything seems fine, the crawler seems fine when trying the binary
distribution. The source won't work because this computer still cannot
compile it. Clearing the local Ivy cache did not do much. This is the known
compiler error with the elastic-indexer plugin:
compile:
    [echo] Compiling plugin: indexer-elastic
   [javac] Compiling 3 source files to
/home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
   [javac]
/home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
error: package org.apache.http.impl.nio.client does not exist
   [javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
   [javac]                                       ^
   [javac] 1 error


The binary distribution works fine though. I do see a lot of new messages
when fetching:
2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner
Map Task Executor #0] Found 0 extensions at
point:'org.apache.nutch.net.URLExemptionFilter'

This is also new at start of each task:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in
[jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type
[org.apache.logging.slf4j.Log4jLoggerFactory]

And this one at the end of fetcher:
log4j:WARN No appenders could be found for logger
(org.apache.commons.httpclient.params.DefaultHttpParams).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.

I am worried about the indexer-elastic plugin, maybe others have that
problem too? Otherwise everything seems fine.

Markus

Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <sn...@apache.org>:

> Hi Folks,
>
> A first candidate for the Nutch 1.19 release is available at:
>
>    https://dist.apache.org/repos/dist/dev/nutch/1.19/
>
> The release candidate is a zip and tar.gz archive of the binary and
> sources in:
>    https://github.com/apache/nutch/tree/release-1.19
>
> In addition, a staged maven repository is available here:
>    https://repository.apache.org/content/repositories/orgapachenutch-1020
>
> We addressed 87 issues:
>    https://s.apache.org/lf6li
>
>
> Please vote on releasing this package as Apache Nutch 1.19.
> The vote is open for the next 72 hours and passes if a majority
> of at least three +1 Nutch PMC votes are cast.
>
> [ ] +1 Release this package as Apache Nutch 1.19.
> [ ] -1 Do not release this package because…
>
> Cheers,
> Sebastian
> (On behalf of the Nutch PMC)
>
> P.S.
> Here is my +1.
> - tested most of Nutch tools and run a test crawl on a single-node cluster
>   running Hadoop 3.3.4, see
>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)
>

[RESULT] was [VOTE] Release Apache Nutch 1.19 RC#1

Posted by Sebastian Nagel <sn...@apache.org>.
Hi Folks,

thanks to everyone who was able to review the release candidate!

72 hours have definitely passed, please see below for vote results.

[4] +1 Release this package as Apache Nutch 1.19
   Markus Jelsma *
   BlackIce *
   Jorge Betancourt *
   Sebastian Nagel *

[0] -1 Do not release this package because ...

* Nutch PMC

The VOTE passes with 4 binding votes from Nutch PMC members.

I'll continue to publish the release packages and announce the release.

Thanks to everyone who contributed to Nutch and the 1.19 release.

Sebastian


On 8/22/22 17:30, Sebastian Nagel wrote:
> Hi Folks,
> 
> A first candidate for the Nutch 1.19 release is available at:
> 
>    https://dist.apache.org/repos/dist/dev/nutch/1.19/
> 
> The release candidate is a zip and tar.gz archive of the binary and sources in:
>    https://github.com/apache/nutch/tree/release-1.19
> 
> In addition, a staged maven repository is available here:
>    https://repository.apache.org/content/repositories/orgapachenutch-1020
> 
> We addressed 87 issues:
>    https://s.apache.org/lf6li
> 
> 
> Please vote on releasing this package as Apache Nutch 1.19.
> The vote is open for the next 72 hours and passes if a majority
> of at least three +1 Nutch PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Nutch 1.19.
> [ ] -1 Do not release this package because…
> 
> Cheers,
> Sebastian
> (On behalf of the Nutch PMC)
> 
> P.S.
> Here is my +1.
> - tested most of Nutch tools and run a test crawl on a single-node cluster
>   running Hadoop 3.3.4, see
>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)

[RESULT] was [VOTE] Release Apache Nutch 1.19 RC#1

Posted by Sebastian Nagel <sn...@apache.org>.
Hi Folks,

thanks to everyone who was able to review the release candidate!

72 hours have definitely passed, please see below for vote results.

[4] +1 Release this package as Apache Nutch 1.19
   Markus Jelsma *
   BlackIce *
   Jorge Betancourt *
   Sebastian Nagel *

[0] -1 Do not release this package because ...

* Nutch PMC

The VOTE passes with 4 binding votes from Nutch PMC members.

I'll continue to publish the release packages and announce the release.

Thanks to everyone who contributed to Nutch and the 1.19 release.

Sebastian


On 8/22/22 17:30, Sebastian Nagel wrote:
> Hi Folks,
> 
> A first candidate for the Nutch 1.19 release is available at:
> 
>    https://dist.apache.org/repos/dist/dev/nutch/1.19/
> 
> The release candidate is a zip and tar.gz archive of the binary and sources in:
>    https://github.com/apache/nutch/tree/release-1.19
> 
> In addition, a staged maven repository is available here:
>    https://repository.apache.org/content/repositories/orgapachenutch-1020
> 
> We addressed 87 issues:
>    https://s.apache.org/lf6li
> 
> 
> Please vote on releasing this package as Apache Nutch 1.19.
> The vote is open for the next 72 hours and passes if a majority
> of at least three +1 Nutch PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Nutch 1.19.
> [ ] -1 Do not release this package because…
> 
> Cheers,
> Sebastian
> (On behalf of the Nutch PMC)
> 
> P.S.
> Here is my +1.
> - tested most of Nutch tools and run a test crawl on a single-node cluster
>   running Hadoop 3.3.4, see
>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)

Re: [VOTE] Release Apache Nutch 1.19 RC#1

Posted by Markus Jelsma <ma...@openindex.io>.
Hi,

Everything seems fine, the crawler seems fine when trying the binary
distribution. The source won't work because this computer still cannot
compile it. Clearing the local Ivy cache did not do much. This is the known
compiler error with the elastic-indexer plugin:
compile:
    [echo] Compiling plugin: indexer-elastic
   [javac] Compiling 3 source files to
/home/markus/temp/apache-nutch-1.19/build/indexer-elastic/classes
   [javac]
/home/markus/temp/apache-nutch-1.19/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java:39:
error: package org.apache.http.impl.nio.client does not exist
   [javac] import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
   [javac]                                       ^
   [javac] 1 error


The binary distribution works fine though. I do see a lot of new messages
when fetching:
2022-08-24 13:21:15,867 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner
Map Task Executor #0] Found 0 extensions at
point:'org.apache.nutch.net.URLExemptionFilter'

This is also new at start of each task:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/markus/temp/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in
[jar:file:/home/markus/temp/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type
[org.apache.logging.slf4j.Log4jLoggerFactory]

And this one at the end of fetcher:
log4j:WARN No appenders could be found for logger
(org.apache.commons.httpclient.params.DefaultHttpParams).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.

I am worried about the indexer-elastic plugin, maybe others have that
problem too? Otherwise everything seems fine.

Markus

Op ma 22 aug. 2022 om 17:30 schreef Sebastian Nagel <sn...@apache.org>:

> Hi Folks,
>
> A first candidate for the Nutch 1.19 release is available at:
>
>    https://dist.apache.org/repos/dist/dev/nutch/1.19/
>
> The release candidate is a zip and tar.gz archive of the binary and
> sources in:
>    https://github.com/apache/nutch/tree/release-1.19
>
> In addition, a staged maven repository is available here:
>    https://repository.apache.org/content/repositories/orgapachenutch-1020
>
> We addressed 87 issues:
>    https://s.apache.org/lf6li
>
>
> Please vote on releasing this package as Apache Nutch 1.19.
> The vote is open for the next 72 hours and passes if a majority
> of at least three +1 Nutch PMC votes are cast.
>
> [ ] +1 Release this package as Apache Nutch 1.19.
> [ ] -1 Do not release this package because…
>
> Cheers,
> Sebastian
> (On behalf of the Nutch PMC)
>
> P.S.
> Here is my +1.
> - tested most of Nutch tools and run a test crawl on a single-node cluster
>   running Hadoop 3.3.4, see
>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)
>

Re: [VOTE] Release Apache Nutch 1.19 RC#1

Posted by Jorge Betancourt <be...@gmail.com>.
Hi all,

Compiled from the sources (JDK11) and ran a small crawl and indexing (to
Solr) both passed with flying colors.

That's a +1 from me. Great work Sebastian!

On Mon, Aug 22, 2022 at 5:30 PM Sebastian Nagel <sn...@apache.org> wrote:

> Hi Folks,
>
> A first candidate for the Nutch 1.19 release is available at:
>
>    https://dist.apache.org/repos/dist/dev/nutch/1.19/
>
> The release candidate is a zip and tar.gz archive of the binary and
> sources in:
>    https://github.com/apache/nutch/tree/release-1.19
>
> In addition, a staged maven repository is available here:
>    https://repository.apache.org/content/repositories/orgapachenutch-1020
>
> We addressed 87 issues:
>    https://s.apache.org/lf6li
>
>
> Please vote on releasing this package as Apache Nutch 1.19.
> The vote is open for the next 72 hours and passes if a majority
> of at least three +1 Nutch PMC votes are cast.
>
> [ ] +1 Release this package as Apache Nutch 1.19.
> [ ] -1 Do not release this package because…
>
> Cheers,
> Sebastian
> (On behalf of the Nutch PMC)
>
> P.S.
> Here is my +1.
> - tested most of Nutch tools and run a test crawl on a single-node cluster
>   running Hadoop 3.3.4, see
>   https://github.com/sebastian-nagel/nutch-test-single-node-cluster/)
>