You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Ramanathapuram, Rajesh" <Ra...@turner.com> on 2011/11/02 14:26:58 UTC

RE: Nutch not crawling URLs with spanish accented characters ( ñ)

Hi Lewis, 

I just wanted to let you know that I greatly value the open source user community. Also, I really appreciate the amount of time and effort of the community developers spend towards contributing to help resolve someone else's issues. 

My apologies for this email, which does not contribute towards the essence of this user group, But,  I wanted to clear things up. 

I believe I did not say things, in any form or shape, that would hurt the awesome effort of the community developers. 
In your response if you are referring to the text below in the email chain.... 

		" ... you are right. Spanish characters decode is not yet included.

		You must understand that nutch team has extensive patch review process and this path is beeing reviewed for about 2 months already. If i add support for decoding spanish characters now, patch review process will be restarted  and %20 encoding and other stuff will be commited in 2012, which is too late because end of world is scheduled in 2012 and i would really wanted to crawl site with spaces in URL before world is over. "

I think you might have gotten the message mixed up with the email formatting, and wanted to point out the above comments were not mine. 

Please feel free to reach out to me and let me know your thoughts.

Thanks & regards
Rajesh Ramana

-----Original Message-----
From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com] 
Sent: Monday, October 31, 2011 8:57 AM
To: user@nutch.apache.org
Subject: Re: Nutch not crawling URLs with spanish accented characters ( ñ)

Hi Rajesh,

It couldn't be further from the truth that there is any time line allocated to a particular patch review. Patches are reviewed based upon community driven priorities, new releases, patch quality, committer opinion and overall value to the Nutch project codebase. Without taking this too far, I would also say that the ongoing conversation is not in agreement with the overall friendly ethos which is shared across both the Nutch user and developer community that such false opinions should be openly expressed in the hope that this actually benefits the overall environment which we enjoy within the Nutch community... I would say that to then compare the patch review process with taking close to a year, then further expanding this to mention the end of the world is slightly delusional.

As a community we greatly appreciate good quality contributions from all walks of life, this enables us to leverage a wide diverse developer expertise, however unfortunately it also means that from time to time there are cases that form an exception. This being one of them. I find it extremely difficult to comprehend how it is expected that patches should be getting integrated into our main development trunk when they have been openly under development, numerous revisions and subject to several amendments with little or no comprehensive supporting documentation.

As I said it the past, I think it would be great to get this particular functionality integrated into the Nutch codebase, but only when the patch documentation has been substantiated upon and the active committers on the project have been convinced by the patch quality. This is by no-means unique to the Apache Nutch project... please take a look at numerous other projects under the ASF and you will see that this open review and quality assurance process is pretty much consistent (obviously there are exceptions).

Finally, I hope this fully addresses the back-hand comments regarding the review process and that subject to the above conditions the patch can make its way into the trunk code-base in the near future.

Thanks for now

Lewis

On Sat, Oct 29, 2011 at 12:47 PM, Ramanathapuram, Rajesh < Rajesh.Ramanathapuram@turner.com> wrote:

> Hi Radim,
>
> I am fairly new to nutch, Thank you for the explaination about the 
> patch review and commit process.
>
> Good luck on your patch being committed to the core.
>
> Thanks,
> Rajesh Ramana
>
>
>
>
> On Oct 29, 2011, at 2:46 AM, "Radim Kolar" <hs...@sendmail.cz> wrote:
>
> > Dne 29.10.2011 5:20, Ramanathapuram, Rajesh napsal(a):
> >> Hi Radim,
> >>
> >> I looked at the patch details and the code itself.  On first look, 
> >> this
> patch looks like it is for handling spaces (%20) and not for spanish 
> accented chars (ñ).
> >>
> >> Please let me know if I am overlooking something.
> >>
> >> I will take a closer look as soon as I get time. Thanks for your help.
> > you are right. Spanish characters decode is not yet included.
> >
> > You must understand that nutch team has extensive patch review 
> > process
> and this path is beeing reviewed for about 2 months already. If i add 
> support for decoding spanish characters now, patch review process will 
> be restarted  and %20 encoding and other stuff will be commited in 
> 2012, which is too late because end of world is scheduled in 2012 and 
> i would really wanted to crawl site with spaces in URL before world is over.
>



--
*Lewis*

Re: Nutch not crawling URLs with spanish accented characters ( ñ)

Posted by Radim Kolar <hs...@sendmail.cz>.
Here is coresponding jar file, replace it in your nutch installation.

https://rapidshare.com/files/1383583017/urlnormalizer-basic.jar

RE: Nutch not crawling URLs with spanish accented characters ( ñ)

Posted by "Ramanathapuram, Rajesh" <Ra...@turner.com>.
Hi Radim, 

Thanks, I really appreciate your time and effort for adding unicode handler code in your patch.

Thanks & regards,
Rajesh Ramana


-----Original Message-----
From: Radim Kolar [mailto:hsn@sendmail.cz] 
Sent: Wednesday, November 02, 2011 9:16 PM
To: user@nutch.apache.org
Subject: Re: Nutch not crawling URLs with spanish accented characters ( ñ)

Dear Rajesh,
   Today is your lucky day. Your long awaited patch was submitted to
https://issues.apache.org/jira/browse/NUTCH-1098 . If you have church near you then go there, light a candle and with your entire family pray for success. I will also pray at Saturday. Dont forget also ask priest to bless Lewis which was really very helpful person in explaining to entire community of nutch users complicated process of reviewing contributions. Wish him good health because without his guidance entire patch review process would be stuck in dead end.

Radim Kolar

Re: Nutch not crawling URLs with spanish accented characters ( ñ)

Posted by Radim Kolar <hs...@sendmail.cz>.
Dear Rajesh,
   Today is your lucky day. Your long awaited patch was submitted to 
https://issues.apache.org/jira/browse/NUTCH-1098 . If you have church 
near you then go there, light a candle and with your entire family pray 
for success. I will also pray at Saturday. Dont forget also ask priest 
to bless Lewis which was really very helpful person in explaining to 
entire community of nutch users complicated process of reviewing 
contributions. Wish him good health because without his guidance entire 
patch review process would be stuck in dead end.

Radim Kolar