You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by remi tassing <ta...@gmail.com> on 2012/01/16 14:58:46 UTC

invalid uri with "three dots"

Hello all,

I'm getting "invalid uri" error with some link that have three dots, i.e.
"...". They work perfectly well in browsers (IE and Chrome) but,
apparently, not with Nutch.

Is this a known issue? Any idea on how to handle it?

Remi

Re: invalid uri with "three dots"

Posted by remi tassing <ta...@gmail.com>.
It comes under the error "java.lang.IllegalArgumentException"

On Mon, Jan 16, 2012 at 3:58 PM, remi tassing <ta...@gmail.com> wrote:

> Hello all,
>
> I'm getting "invalid uri" error with some link that have three dots, i.e.
> "...". They work perfectly well in browsers (IE and Chrome) but,
> apparently, not with Nutch.
>
> Is this a known issue? Any idea on how to handle it?
>
> Remi
>

Re: invalid uri with "three dots"

Posted by remi tassing <ta...@gmail.com>.
Problem solved!

I replaced all whitespaces with "%20" in the url before getting the
"content" in httpreaponse.java(Httpclient plugin).

Dirty solution? Yes, but it works for me now.

Remi

On Thursday, January 26, 2012, remi tassing <ta...@gmail.com> wrote:
> Hey guys,
> any ideas on how to "properly escape non-URI characters?". I'm getting
invalid URI for urls that contain "three dots", "space"...
> //Remi
> [1] https://issues.apache.org/jira/browse/HTTPCLIENT-858
>
> Ortwin Glück added a comment - 30/Jun/09 14:46
> Properly escape non-URI characters. HttpClient is not a browser and thus
does not, can not and will never try to fix invalid input.
> On Wed, Jan 18, 2012 at 4:51 PM, remi tassing <ta...@gmail.com>
wrote:
>
> I posted a question on this JIRA:
https://issues.apache.org/jira/browse/HTTPCLIENT-858?focusedCommentId=13188481#comment-13188481

> I looks like the same problem
>
> On Tue, Jan 17, 2012 at 6:41 PM, Markus Jelsma <ma...@openindex.io>
wrote:
>
> this may also be an issue of protocolhttp-client.
>
>> Hi Remi,
>>
>> This also looks like we need to document and address it.
>>
>> Can you log a Jira issue and we will try to get on to it. Can you also
have
>> a look through some of the existing issues in case there is something
>> similar, possibly relate them.
>>
>> Thank you in advance
>>
>> Lewis
>>
>> On Tue, Jan 17, 2012 at 9:38 AM, remi tassing <ta...@gmail.com>
wrote:
>> > Hi,
>> >
>> > The problem is really similar to this:
>> >
>> >
http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td2
>> > 1856688.html
>> >
>> > Unfortunately, I have no clue on what to update in Nutch ...
>> >
>> > On Mon, Jan 16, 2012 at 4:41 PM, remi tassing <ta...@gmail.com>
>> >
>> > wrote:
>> > > Hello Markus,
>> > >
>> > > thanks for the help!
>> > >
>> > > Just to clarify a little bit. In my previous message, "uri1"
>> > > represented
>> >
>> > a
>> >
>> > > normal, ordinary URL, I just didn't want to copy the exact URL.
>> > >
>> > > The weird part is that it all works in the browser...
>> > >
>> > >
>> > > On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma <
>> >
>> > markus.jelsma@openindex.io
>> >
>> > > > wrote:
>> > >> This? https://uri1...&From=stats
>> > >>
>> > >> That's not a correct or valid URL if you ask me.
>> > >>
>> > >> On Monday 16 January 2012 15:12:51 remi tassing wrote:
>> > >> > Hello ,
>> > >> >
>> > >> > this is a snapshot of the log:
>> > >> >
>> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > >> > java.lang.IllegalArgumentException: Invalid uri
>> > >> > 'https://uri1...&From=stats': Invalid query
>> > >> > at
>> >
>> >
org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:2
>> > 22
>> >
>> > >> > ) at
>> >
>> >
org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
>> >
>> > >> > at
>> >
>> >
>
org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
>> > >> > 79) at
>> > >>
>> > >> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
>> > >>
>> > >> > at
>> >
>> >
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.ja
>> > va
>> >
>> > >> > :224) at
>> > >> >
>> > >> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run

Re: invalid uri with "three dots"

Posted by remi tassing <ta...@gmail.com>.
Hey guys,

any ideas on how to "properly escape non-URI characters?". I'm getting
invalid URI for urls that contain "three dots", "space"...

//Remi

[1] https://issues.apache.org/jira/browse/HTTPCLIENT-858


Ortwin Glück<https://issues.apache.org/jira/secure/ViewProfile.jspa?name=oglueck>
added
a comment - 30/Jun/09 14:46
Properly escape non-URI characters. HttpClient is not a browser and thus
does not, can not and will never try to fix invalid input.

On Wed, Jan 18, 2012 at 4:51 PM, remi tassing <ta...@gmail.com> wrote:

> I posted a question on this JIRA:
> https://issues.apache.org/jira/browse/HTTPCLIENT-858?focusedCommentId=13188481#comment-13188481
>
>
> I looks like the same problem
>
>
> On Tue, Jan 17, 2012 at 6:41 PM, Markus Jelsma <markus.jelsma@openindex.io
> > wrote:
>
>> this may also be an issue of protocolhttp-client.
>>
>> > Hi Remi,
>> >
>> > This also looks like we need to document and address it.
>> >
>> > Can you log a Jira issue and we will try to get on to it. Can you also
>> have
>> > a look through some of the existing issues in case there is something
>> > similar, possibly relate them.
>> >
>> > Thank you in advance
>> >
>> > Lewis
>> >
>> > On Tue, Jan 17, 2012 at 9:38 AM, remi tassing <ta...@gmail.com>
>> wrote:
>> > > Hi,
>> > >
>> > > The problem is really similar to this:
>> > >
>> > >
>> http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td2
>> > > 1856688.html
>> > >
>> > > Unfortunately, I have no clue on what to update in Nutch ...
>> > >
>> > > On Mon, Jan 16, 2012 at 4:41 PM, remi tassing <ta...@gmail.com>
>> > >
>> > > wrote:
>> > > > Hello Markus,
>> > > >
>> > > > thanks for the help!
>> > > >
>> > > > Just to clarify a little bit. In my previous message, "uri1"
>> > > > represented
>> > >
>> > > a
>> > >
>> > > > normal, ordinary URL, I just didn't want to copy the exact URL.
>> > > >
>> > > > The weird part is that it all works in the browser...
>> > > >
>> > > >
>> > > > On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma <
>> > >
>> > > markus.jelsma@openindex.io
>> > >
>> > > > > wrote:
>> > > >> This? https://uri1...&From=stats
>> > > >>
>> > > >> That's not a correct or valid URL if you ask me.
>> > > >>
>> > > >> On Monday 16 January 2012 15:12:51 remi tassing wrote:
>> > > >> > Hello ,
>> > > >> >
>> > > >> > this is a snapshot of the log:
>> > > >> >
>> > > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > > >> > java.lang.IllegalArgumentException: Invalid uri
>> > > >> > 'https://uri1...&From=stats': Invalid query
>> > > >> > at
>> > >
>> > >
>> org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:2
>> > > 22
>> > >
>> > > >> > ) at
>> > >
>> > >
>> org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
>> > >
>> > > >> > at
>> > >
>> > >
>>
>> org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
>> > > >> > 79) at
>> > > >>
>> > > >>
>> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
>> > > >>
>> > > >> > at
>> > >
>> > >
>> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.ja
>> > > va
>> > >
>> > > >> > :224) at
>> > > >> >
>> > > >> >
>> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
>> > > >>
>> > > >> fetch
>> > > >>
>> > > >> > of https://uri1...&From=stats failed with:
>> > > >> > java.lang.IllegalArgumentException: Invalid uri
>> > > >> > 'https://uri1...&From=stats': Invalid query
>> > > >> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
>> > > >> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
>> > > >> >
>> > > >> > On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
>> > > >> >
>> > > >> > <ma...@openindex.io>wrote:
>> > > >> > > copy the stack trace please
>> > > >> > >
>> > > >> > > On Monday 16 January 2012 14:58:46 remi tassing wrote:
>> > > >> > > > Hello all,
>> > > >> > > >
>> > > >> > > > I'm getting "invalid uri" error with some link that have
>> three
>> > >
>> > > dots,
>> > >
>> > > >> > > > i.e. "...". They work perfectly well in browsers (IE and
>> Chrome)
>> > > >>
>> > > >> but,
>> > > >>
>> > > >> > > > apparently, not with Nutch.
>> > > >> > > >
>> > > >> > > > Is this a known issue? Any idea on how to handle it?
>> > > >> > > >
>> > > >> > > > Remi
>> > > >> > >
>> > > >> > > --
>> > > >> > > Markus Jelsma - CTO - Openindex
>> > > >>
>> > > >> --
>> > > >> Markus Jelsma - CTO - Openindex
>>
>
>

Re: invalid uri with "three dots"

Posted by remi tassing <ta...@gmail.com>.
I posted a question on this JIRA:
https://issues.apache.org/jira/browse/HTTPCLIENT-858?focusedCommentId=13188481#comment-13188481


I looks like the same problem

On Tue, Jan 17, 2012 at 6:41 PM, Markus Jelsma
<ma...@openindex.io>wrote:

> this may also be an issue of protocolhttp-client.
>
> > Hi Remi,
> >
> > This also looks like we need to document and address it.
> >
> > Can you log a Jira issue and we will try to get on to it. Can you also
> have
> > a look through some of the existing issues in case there is something
> > similar, possibly relate them.
> >
> > Thank you in advance
> >
> > Lewis
> >
> > On Tue, Jan 17, 2012 at 9:38 AM, remi tassing <ta...@gmail.com>
> wrote:
> > > Hi,
> > >
> > > The problem is really similar to this:
> > >
> > >
> http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td2
> > > 1856688.html
> > >
> > > Unfortunately, I have no clue on what to update in Nutch ...
> > >
> > > On Mon, Jan 16, 2012 at 4:41 PM, remi tassing <ta...@gmail.com>
> > >
> > > wrote:
> > > > Hello Markus,
> > > >
> > > > thanks for the help!
> > > >
> > > > Just to clarify a little bit. In my previous message, "uri1"
> > > > represented
> > >
> > > a
> > >
> > > > normal, ordinary URL, I just didn't want to copy the exact URL.
> > > >
> > > > The weird part is that it all works in the browser...
> > > >
> > > >
> > > > On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma <
> > >
> > > markus.jelsma@openindex.io
> > >
> > > > > wrote:
> > > >> This? https://uri1...&From=stats
> > > >>
> > > >> That's not a correct or valid URL if you ask me.
> > > >>
> > > >> On Monday 16 January 2012 15:12:51 remi tassing wrote:
> > > >> > Hello ,
> > > >> >
> > > >> > this is a snapshot of the log:
> > > >> >
> > > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > > >> > java.lang.IllegalArgumentException: Invalid uri
> > > >> > 'https://uri1...&From=stats': Invalid query
> > > >> > at
> > >
> > >
> org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:2
> > > 22
> > >
> > > >> > ) at
> > >
> > >
> org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
> > >
> > > >> > at
> > >
> > >
> org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
> > > >> > 79) at
> > > >>
> > > >> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
> > > >>
> > > >> > at
> > >
> > >
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.ja
> > > va
> > >
> > > >> > :224) at
> > > >> >
> > > >> >
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
> > > >>
> > > >> fetch
> > > >>
> > > >> > of https://uri1...&From=stats failed with:
> > > >> > java.lang.IllegalArgumentException: Invalid uri
> > > >> > 'https://uri1...&From=stats': Invalid query
> > > >> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> > > >> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> > > >> >
> > > >> > On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
> > > >> >
> > > >> > <ma...@openindex.io>wrote:
> > > >> > > copy the stack trace please
> > > >> > >
> > > >> > > On Monday 16 January 2012 14:58:46 remi tassing wrote:
> > > >> > > > Hello all,
> > > >> > > >
> > > >> > > > I'm getting "invalid uri" error with some link that have three
> > >
> > > dots,
> > >
> > > >> > > > i.e. "...". They work perfectly well in browsers (IE and
> Chrome)
> > > >>
> > > >> but,
> > > >>
> > > >> > > > apparently, not with Nutch.
> > > >> > > >
> > > >> > > > Is this a known issue? Any idea on how to handle it?
> > > >> > > >
> > > >> > > > Remi
> > > >> > >
> > > >> > > --
> > > >> > > Markus Jelsma - CTO - Openindex
> > > >>
> > > >> --
> > > >> Markus Jelsma - CTO - Openindex
>

Re: invalid uri with "three dots"

Posted by Markus Jelsma <ma...@openindex.io>.
this may also be an issue of protocolhttp-client. 

> Hi Remi,
> 
> This also looks like we need to document and address it.
> 
> Can you log a Jira issue and we will try to get on to it. Can you also have
> a look through some of the existing issues in case there is something
> similar, possibly relate them.
> 
> Thank you in advance
> 
> Lewis
> 
> On Tue, Jan 17, 2012 at 9:38 AM, remi tassing <ta...@gmail.com> wrote:
> > Hi,
> > 
> > The problem is really similar to this:
> > 
> > http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td2
> > 1856688.html
> > 
> > Unfortunately, I have no clue on what to update in Nutch ...
> > 
> > On Mon, Jan 16, 2012 at 4:41 PM, remi tassing <ta...@gmail.com>
> > 
> > wrote:
> > > Hello Markus,
> > > 
> > > thanks for the help!
> > > 
> > > Just to clarify a little bit. In my previous message, "uri1"
> > > represented
> > 
> > a
> > 
> > > normal, ordinary URL, I just didn't want to copy the exact URL.
> > > 
> > > The weird part is that it all works in the browser...
> > > 
> > > 
> > > On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma <
> > 
> > markus.jelsma@openindex.io
> > 
> > > > wrote:
> > >> This? https://uri1...&From=stats
> > >> 
> > >> That's not a correct or valid URL if you ask me.
> > >> 
> > >> On Monday 16 January 2012 15:12:51 remi tassing wrote:
> > >> > Hello ,
> > >> > 
> > >> > this is a snapshot of the log:
> > >> > 
> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > >> > java.lang.IllegalArgumentException: Invalid uri
> > >> > 'https://uri1...&From=stats': Invalid query
> > >> > at
> > 
> > org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:2
> > 22
> > 
> > >> > ) at
> > 
> > org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
> > 
> > >> > at
> > 
> > 
org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
> > >> > 79) at
> > >> 
> > >> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
> > >> 
> > >> > at
> > 
> > org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.ja
> > va
> > 
> > >> > :224) at
> > >> > 
> > >> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
> > >> 
> > >> fetch
> > >> 
> > >> > of https://uri1...&From=stats failed with:
> > >> > java.lang.IllegalArgumentException: Invalid uri
> > >> > 'https://uri1...&From=stats': Invalid query
> > >> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> > >> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> > >> > 
> > >> > On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
> > >> > 
> > >> > <ma...@openindex.io>wrote:
> > >> > > copy the stack trace please
> > >> > > 
> > >> > > On Monday 16 January 2012 14:58:46 remi tassing wrote:
> > >> > > > Hello all,
> > >> > > > 
> > >> > > > I'm getting "invalid uri" error with some link that have three
> > 
> > dots,
> > 
> > >> > > > i.e. "...". They work perfectly well in browsers (IE and Chrome)
> > >> 
> > >> but,
> > >> 
> > >> > > > apparently, not with Nutch.
> > >> > > > 
> > >> > > > Is this a known issue? Any idea on how to handle it?
> > >> > > > 
> > >> > > > Remi
> > >> > > 
> > >> > > --
> > >> > > Markus Jelsma - CTO - Openindex
> > >> 
> > >> --
> > >> Markus Jelsma - CTO - Openindex

Re: invalid uri with "three dots"

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Remi,

This also looks like we need to document and address it.

Can you log a Jira issue and we will try to get on to it. Can you also have
a look through some of the existing issues in case there is something
similar, possibly relate them.

Thank you in advance

Lewis

On Tue, Jan 17, 2012 at 9:38 AM, remi tassing <ta...@gmail.com> wrote:

> Hi,
>
> The problem is really similar to this:
>
> http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td21856688.html
>
> Unfortunately, I have no clue on what to update in Nutch ...
>
> On Mon, Jan 16, 2012 at 4:41 PM, remi tassing <ta...@gmail.com>
> wrote:
>
> > Hello Markus,
> >
> > thanks for the help!
> >
> > Just to clarify a little bit. In my previous message, "uri1" represented
> a
> > normal, ordinary URL, I just didn't want to copy the exact URL.
> >
> > The weird part is that it all works in the browser...
> >
> >
> > On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma <
> markus.jelsma@openindex.io
> > > wrote:
> >
> >> This? https://uri1...&From=stats
> >>
> >> That's not a correct or valid URL if you ask me.
> >>
> >> On Monday 16 January 2012 15:12:51 remi tassing wrote:
> >> > Hello ,
> >> >
> >> > this is a snapshot of the log:
> >> >
> >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> >> > java.lang.IllegalArgumentException: Invalid uri
> >> > 'https://uri1...&From=stats': Invalid query
> >> > at
> >> >
> >>
> org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222
> >> > ) at
> >> >
> >>
> org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
> >> > at
> >> >
> >>
> org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
> >> > 79) at
> >> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
> >> > at
> >> >
> >>
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java
> >> > :224) at
> >> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
> >> fetch
> >> > of https://uri1...&From=stats failed with:
> >> > java.lang.IllegalArgumentException: Invalid uri
> >> > 'https://uri1...&From=stats': Invalid query
> >> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> >> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> >> >
> >> > On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
> >> >
> >> > <ma...@openindex.io>wrote:
> >> > > copy the stack trace please
> >> > >
> >> > > On Monday 16 January 2012 14:58:46 remi tassing wrote:
> >> > > > Hello all,
> >> > > >
> >> > > > I'm getting "invalid uri" error with some link that have three
> dots,
> >> > > > i.e. "...". They work perfectly well in browsers (IE and Chrome)
> >> but,
> >> > > > apparently, not with Nutch.
> >> > > >
> >> > > > Is this a known issue? Any idea on how to handle it?
> >> > > >
> >> > > > Remi
> >> > >
> >> > > --
> >> > > Markus Jelsma - CTO - Openindex
> >>
> >> --
> >> Markus Jelsma - CTO - Openindex
> >>
> >
> >
>



-- 
*Lewis*

Re: invalid uri with "three dots"

Posted by remi tassing <ta...@gmail.com>.
Hi,

The problem is really similar to this:
http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td21856688.html

Unfortunately, I have no clue on what to update in Nutch ...

On Mon, Jan 16, 2012 at 4:41 PM, remi tassing <ta...@gmail.com> wrote:

> Hello Markus,
>
> thanks for the help!
>
> Just to clarify a little bit. In my previous message, "uri1" represented a
> normal, ordinary URL, I just didn't want to copy the exact URL.
>
> The weird part is that it all works in the browser...
>
>
> On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma <markus.jelsma@openindex.io
> > wrote:
>
>> This? https://uri1...&From=stats
>>
>> That's not a correct or valid URL if you ask me.
>>
>> On Monday 16 January 2012 15:12:51 remi tassing wrote:
>> > Hello ,
>> >
>> > this is a snapshot of the log:
>> >
>> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > java.lang.IllegalArgumentException: Invalid uri
>> > 'https://uri1...&From=stats': Invalid query
>> > at
>> >
>> org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222
>> > ) at
>> >
>> org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
>> > at
>> >
>> org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
>> > 79) at
>> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
>> > at
>> >
>> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java
>> > :224) at
>> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
>> fetch
>> > of https://uri1...&From=stats failed with:
>> > java.lang.IllegalArgumentException: Invalid uri
>> > 'https://uri1...&From=stats': Invalid query
>> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
>> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
>> >
>> > On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
>> >
>> > <ma...@openindex.io>wrote:
>> > > copy the stack trace please
>> > >
>> > > On Monday 16 January 2012 14:58:46 remi tassing wrote:
>> > > > Hello all,
>> > > >
>> > > > I'm getting "invalid uri" error with some link that have three dots,
>> > > > i.e. "...". They work perfectly well in browsers (IE and Chrome)
>> but,
>> > > > apparently, not with Nutch.
>> > > >
>> > > > Is this a known issue? Any idea on how to handle it?
>> > > >
>> > > > Remi
>> > >
>> > > --
>> > > Markus Jelsma - CTO - Openindex
>>
>> --
>> Markus Jelsma - CTO - Openindex
>>
>
>

Re: invalid uri with "three dots"

Posted by remi tassing <ta...@gmail.com>.
Hello Markus,

thanks for the help!

Just to clarify a little bit. In my previous message, "uri1" represented a
normal, ordinary URL, I just didn't want to copy the exact URL.

The weird part is that it all works in the browser...

On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma
<ma...@openindex.io>wrote:

> This? https://uri1...&From=stats
>
> That's not a correct or valid URL if you ask me.
>
> On Monday 16 January 2012 15:12:51 remi tassing wrote:
> > Hello ,
> >
> > this is a snapshot of the log:
> >
> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > java.lang.IllegalArgumentException: Invalid uri
> > 'https://uri1...&From=stats': Invalid query
> > at
> >
> org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222
> > ) at
> > org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
> > at
> >
> org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
> > 79) at
> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
> > at
> >
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java
> > :224) at
> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
> fetch
> > of https://uri1...&From=stats failed with:
> > java.lang.IllegalArgumentException: Invalid uri
> > 'https://uri1...&From=stats': Invalid query
> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> >
> > On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
> >
> > <ma...@openindex.io>wrote:
> > > copy the stack trace please
> > >
> > > On Monday 16 January 2012 14:58:46 remi tassing wrote:
> > > > Hello all,
> > > >
> > > > I'm getting "invalid uri" error with some link that have three dots,
> > > > i.e. "...". They work perfectly well in browsers (IE and Chrome) but,
> > > > apparently, not with Nutch.
> > > >
> > > > Is this a known issue? Any idea on how to handle it?
> > > >
> > > > Remi
> > >
> > > --
> > > Markus Jelsma - CTO - Openindex
>
> --
> Markus Jelsma - CTO - Openindex
>

Re: invalid uri with "three dots"

Posted by Markus Jelsma <ma...@openindex.io>.
This? https://uri1...&From=stats

That's not a correct or valid URL if you ask me.

On Monday 16 January 2012 15:12:51 remi tassing wrote:
> Hello ,
> 
> this is a snapshot of the log:
> 
> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> java.lang.IllegalArgumentException: Invalid uri
> 'https://uri1...&From=stats': Invalid query
> at
> org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222
> ) at
> org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
> at
> org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
> 79) at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
> at
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java
> :224) at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628) fetch
> of https://uri1...&From=stats failed with:
> java.lang.IllegalArgumentException: Invalid uri
> 'https://uri1...&From=stats': Invalid query
> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> 
> On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
> 
> <ma...@openindex.io>wrote:
> > copy the stack trace please
> > 
> > On Monday 16 January 2012 14:58:46 remi tassing wrote:
> > > Hello all,
> > > 
> > > I'm getting "invalid uri" error with some link that have three dots,
> > > i.e. "...". They work perfectly well in browsers (IE and Chrome) but,
> > > apparently, not with Nutch.
> > > 
> > > Is this a known issue? Any idea on how to handle it?
> > > 
> > > Remi
> > 
> > --
> > Markus Jelsma - CTO - Openindex

-- 
Markus Jelsma - CTO - Openindex

Re: invalid uri with "three dots"

Posted by remi tassing <ta...@gmail.com>.
Hello ,

this is a snapshot of the log:

-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
java.lang.IllegalArgumentException: Invalid uri 'https://uri1...&From=stats':
Invalid query
at
org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222)
at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
at
org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:79)
at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
at
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:224)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
fetch of https://uri1...&From=stats failed with:
java.lang.IllegalArgumentException: Invalid uri 'https://uri1...&From=stats':
Invalid query
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96

On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
<ma...@openindex.io>wrote:

> copy the stack trace please
>
> On Monday 16 January 2012 14:58:46 remi tassing wrote:
> > Hello all,
> >
> > I'm getting "invalid uri" error with some link that have three dots, i.e.
> > "...". They work perfectly well in browsers (IE and Chrome) but,
> > apparently, not with Nutch.
> >
> > Is this a known issue? Any idea on how to handle it?
> >
> > Remi
>
> --
> Markus Jelsma - CTO - Openindex
>

Re: invalid uri with "three dots"

Posted by Markus Jelsma <ma...@openindex.io>.
copy the stack trace please

On Monday 16 January 2012 14:58:46 remi tassing wrote:
> Hello all,
> 
> I'm getting "invalid uri" error with some link that have three dots, i.e.
> "...". They work perfectly well in browsers (IE and Chrome) but,
> apparently, not with Nutch.
> 
> Is this a known issue? Any idea on how to handle it?
> 
> Remi

-- 
Markus Jelsma - CTO - Openindex