You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by remi tassing <ta...@gmail.com> on 2012/01/16 14:58:46 UTC
invalid uri with "three dots"
Hello all,
I'm getting "invalid uri" error with some link that have three dots, i.e.
"...". They work perfectly well in browsers (IE and Chrome) but,
apparently, not with Nutch.
Is this a known issue? Any idea on how to handle it?
Remi
Re: invalid uri with "three dots"
Posted by remi tassing <ta...@gmail.com>.
It comes under the error "java.lang.IllegalArgumentException"
On Mon, Jan 16, 2012 at 3:58 PM, remi tassing <ta...@gmail.com> wrote:
> Hello all,
>
> I'm getting "invalid uri" error with some link that have three dots, i.e.
> "...". They work perfectly well in browsers (IE and Chrome) but,
> apparently, not with Nutch.
>
> Is this a known issue? Any idea on how to handle it?
>
> Remi
>
Re: invalid uri with "three dots"
Posted by remi tassing <ta...@gmail.com>.
Problem solved!
I replaced all whitespaces with "%20" in the url before getting the
"content" in httpreaponse.java(Httpclient plugin).
Dirty solution? Yes, but it works for me now.
Remi
On Thursday, January 26, 2012, remi tassing <ta...@gmail.com> wrote:
> Hey guys,
> any ideas on how to "properly escape non-URI characters?". I'm getting
invalid URI for urls that contain "three dots", "space"...
> //Remi
> [1] https://issues.apache.org/jira/browse/HTTPCLIENT-858
>
> Ortwin Glück added a comment - 30/Jun/09 14:46
> Properly escape non-URI characters. HttpClient is not a browser and thus
does not, can not and will never try to fix invalid input.
> On Wed, Jan 18, 2012 at 4:51 PM, remi tassing <ta...@gmail.com>
wrote:
>
> I posted a question on this JIRA:
https://issues.apache.org/jira/browse/HTTPCLIENT-858?focusedCommentId=13188481#comment-13188481
> I looks like the same problem
>
> On Tue, Jan 17, 2012 at 6:41 PM, Markus Jelsma <ma...@openindex.io>
wrote:
>
> this may also be an issue of protocolhttp-client.
>
>> Hi Remi,
>>
>> This also looks like we need to document and address it.
>>
>> Can you log a Jira issue and we will try to get on to it. Can you also
have
>> a look through some of the existing issues in case there is something
>> similar, possibly relate them.
>>
>> Thank you in advance
>>
>> Lewis
>>
>> On Tue, Jan 17, 2012 at 9:38 AM, remi tassing <ta...@gmail.com>
wrote:
>> > Hi,
>> >
>> > The problem is really similar to this:
>> >
>> >
http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td2
>> > 1856688.html
>> >
>> > Unfortunately, I have no clue on what to update in Nutch ...
>> >
>> > On Mon, Jan 16, 2012 at 4:41 PM, remi tassing <ta...@gmail.com>
>> >
>> > wrote:
>> > > Hello Markus,
>> > >
>> > > thanks for the help!
>> > >
>> > > Just to clarify a little bit. In my previous message, "uri1"
>> > > represented
>> >
>> > a
>> >
>> > > normal, ordinary URL, I just didn't want to copy the exact URL.
>> > >
>> > > The weird part is that it all works in the browser...
>> > >
>> > >
>> > > On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma <
>> >
>> > markus.jelsma@openindex.io
>> >
>> > > > wrote:
>> > >> This? https://uri1...&From=stats
>> > >>
>> > >> That's not a correct or valid URL if you ask me.
>> > >>
>> > >> On Monday 16 January 2012 15:12:51 remi tassing wrote:
>> > >> > Hello ,
>> > >> >
>> > >> > this is a snapshot of the log:
>> > >> >
>> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > >> > java.lang.IllegalArgumentException: Invalid uri
>> > >> > 'https://uri1...&From=stats': Invalid query
>> > >> > at
>> >
>> >
org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:2
>> > 22
>> >
>> > >> > ) at
>> >
>> >
org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
>> >
>> > >> > at
>> >
>> >
>
org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
>> > >> > 79) at
>> > >>
>> > >> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
>> > >>
>> > >> > at
>> >
>> >
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.ja
>> > va
>> >
>> > >> > :224) at
>> > >> >
>> > >> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run
Re: invalid uri with "three dots"
Posted by remi tassing <ta...@gmail.com>.
Hey guys,
any ideas on how to "properly escape non-URI characters?". I'm getting
invalid URI for urls that contain "three dots", "space"...
//Remi
[1] https://issues.apache.org/jira/browse/HTTPCLIENT-858
Ortwin Glück<https://issues.apache.org/jira/secure/ViewProfile.jspa?name=oglueck>
added
a comment - 30/Jun/09 14:46
Properly escape non-URI characters. HttpClient is not a browser and thus
does not, can not and will never try to fix invalid input.
On Wed, Jan 18, 2012 at 4:51 PM, remi tassing <ta...@gmail.com> wrote:
> I posted a question on this JIRA:
> https://issues.apache.org/jira/browse/HTTPCLIENT-858?focusedCommentId=13188481#comment-13188481
>
>
> I looks like the same problem
>
>
> On Tue, Jan 17, 2012 at 6:41 PM, Markus Jelsma <markus.jelsma@openindex.io
> > wrote:
>
>> this may also be an issue of protocolhttp-client.
>>
>> > Hi Remi,
>> >
>> > This also looks like we need to document and address it.
>> >
>> > Can you log a Jira issue and we will try to get on to it. Can you also
>> have
>> > a look through some of the existing issues in case there is something
>> > similar, possibly relate them.
>> >
>> > Thank you in advance
>> >
>> > Lewis
>> >
>> > On Tue, Jan 17, 2012 at 9:38 AM, remi tassing <ta...@gmail.com>
>> wrote:
>> > > Hi,
>> > >
>> > > The problem is really similar to this:
>> > >
>> > >
>> http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td2
>> > > 1856688.html
>> > >
>> > > Unfortunately, I have no clue on what to update in Nutch ...
>> > >
>> > > On Mon, Jan 16, 2012 at 4:41 PM, remi tassing <ta...@gmail.com>
>> > >
>> > > wrote:
>> > > > Hello Markus,
>> > > >
>> > > > thanks for the help!
>> > > >
>> > > > Just to clarify a little bit. In my previous message, "uri1"
>> > > > represented
>> > >
>> > > a
>> > >
>> > > > normal, ordinary URL, I just didn't want to copy the exact URL.
>> > > >
>> > > > The weird part is that it all works in the browser...
>> > > >
>> > > >
>> > > > On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma <
>> > >
>> > > markus.jelsma@openindex.io
>> > >
>> > > > > wrote:
>> > > >> This? https://uri1...&From=stats
>> > > >>
>> > > >> That's not a correct or valid URL if you ask me.
>> > > >>
>> > > >> On Monday 16 January 2012 15:12:51 remi tassing wrote:
>> > > >> > Hello ,
>> > > >> >
>> > > >> > this is a snapshot of the log:
>> > > >> >
>> > > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > > >> > java.lang.IllegalArgumentException: Invalid uri
>> > > >> > 'https://uri1...&From=stats': Invalid query
>> > > >> > at
>> > >
>> > >
>> org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:2
>> > > 22
>> > >
>> > > >> > ) at
>> > >
>> > >
>> org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
>> > >
>> > > >> > at
>> > >
>> > >
>>
>> org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
>> > > >> > 79) at
>> > > >>
>> > > >>
>> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
>> > > >>
>> > > >> > at
>> > >
>> > >
>> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.ja
>> > > va
>> > >
>> > > >> > :224) at
>> > > >> >
>> > > >> >
>> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
>> > > >>
>> > > >> fetch
>> > > >>
>> > > >> > of https://uri1...&From=stats failed with:
>> > > >> > java.lang.IllegalArgumentException: Invalid uri
>> > > >> > 'https://uri1...&From=stats': Invalid query
>> > > >> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
>> > > >> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
>> > > >> >
>> > > >> > On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
>> > > >> >
>> > > >> > <ma...@openindex.io>wrote:
>> > > >> > > copy the stack trace please
>> > > >> > >
>> > > >> > > On Monday 16 January 2012 14:58:46 remi tassing wrote:
>> > > >> > > > Hello all,
>> > > >> > > >
>> > > >> > > > I'm getting "invalid uri" error with some link that have
>> three
>> > >
>> > > dots,
>> > >
>> > > >> > > > i.e. "...". They work perfectly well in browsers (IE and
>> Chrome)
>> > > >>
>> > > >> but,
>> > > >>
>> > > >> > > > apparently, not with Nutch.
>> > > >> > > >
>> > > >> > > > Is this a known issue? Any idea on how to handle it?
>> > > >> > > >
>> > > >> > > > Remi
>> > > >> > >
>> > > >> > > --
>> > > >> > > Markus Jelsma - CTO - Openindex
>> > > >>
>> > > >> --
>> > > >> Markus Jelsma - CTO - Openindex
>>
>
>
Re: invalid uri with "three dots"
Posted by remi tassing <ta...@gmail.com>.
I posted a question on this JIRA:
https://issues.apache.org/jira/browse/HTTPCLIENT-858?focusedCommentId=13188481#comment-13188481
I looks like the same problem
On Tue, Jan 17, 2012 at 6:41 PM, Markus Jelsma
<ma...@openindex.io>wrote:
> this may also be an issue of protocolhttp-client.
>
> > Hi Remi,
> >
> > This also looks like we need to document and address it.
> >
> > Can you log a Jira issue and we will try to get on to it. Can you also
> have
> > a look through some of the existing issues in case there is something
> > similar, possibly relate them.
> >
> > Thank you in advance
> >
> > Lewis
> >
> > On Tue, Jan 17, 2012 at 9:38 AM, remi tassing <ta...@gmail.com>
> wrote:
> > > Hi,
> > >
> > > The problem is really similar to this:
> > >
> > >
> http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td2
> > > 1856688.html
> > >
> > > Unfortunately, I have no clue on what to update in Nutch ...
> > >
> > > On Mon, Jan 16, 2012 at 4:41 PM, remi tassing <ta...@gmail.com>
> > >
> > > wrote:
> > > > Hello Markus,
> > > >
> > > > thanks for the help!
> > > >
> > > > Just to clarify a little bit. In my previous message, "uri1"
> > > > represented
> > >
> > > a
> > >
> > > > normal, ordinary URL, I just didn't want to copy the exact URL.
> > > >
> > > > The weird part is that it all works in the browser...
> > > >
> > > >
> > > > On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma <
> > >
> > > markus.jelsma@openindex.io
> > >
> > > > > wrote:
> > > >> This? https://uri1...&From=stats
> > > >>
> > > >> That's not a correct or valid URL if you ask me.
> > > >>
> > > >> On Monday 16 January 2012 15:12:51 remi tassing wrote:
> > > >> > Hello ,
> > > >> >
> > > >> > this is a snapshot of the log:
> > > >> >
> > > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > > >> > java.lang.IllegalArgumentException: Invalid uri
> > > >> > 'https://uri1...&From=stats': Invalid query
> > > >> > at
> > >
> > >
> org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:2
> > > 22
> > >
> > > >> > ) at
> > >
> > >
> org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
> > >
> > > >> > at
> > >
> > >
> org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
> > > >> > 79) at
> > > >>
> > > >> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
> > > >>
> > > >> > at
> > >
> > >
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.ja
> > > va
> > >
> > > >> > :224) at
> > > >> >
> > > >> >
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
> > > >>
> > > >> fetch
> > > >>
> > > >> > of https://uri1...&From=stats failed with:
> > > >> > java.lang.IllegalArgumentException: Invalid uri
> > > >> > 'https://uri1...&From=stats': Invalid query
> > > >> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> > > >> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> > > >> >
> > > >> > On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
> > > >> >
> > > >> > <ma...@openindex.io>wrote:
> > > >> > > copy the stack trace please
> > > >> > >
> > > >> > > On Monday 16 January 2012 14:58:46 remi tassing wrote:
> > > >> > > > Hello all,
> > > >> > > >
> > > >> > > > I'm getting "invalid uri" error with some link that have three
> > >
> > > dots,
> > >
> > > >> > > > i.e. "...". They work perfectly well in browsers (IE and
> Chrome)
> > > >>
> > > >> but,
> > > >>
> > > >> > > > apparently, not with Nutch.
> > > >> > > >
> > > >> > > > Is this a known issue? Any idea on how to handle it?
> > > >> > > >
> > > >> > > > Remi
> > > >> > >
> > > >> > > --
> > > >> > > Markus Jelsma - CTO - Openindex
> > > >>
> > > >> --
> > > >> Markus Jelsma - CTO - Openindex
>
Re: invalid uri with "three dots"
Posted by Markus Jelsma <ma...@openindex.io>.
this may also be an issue of protocolhttp-client.
> Hi Remi,
>
> This also looks like we need to document and address it.
>
> Can you log a Jira issue and we will try to get on to it. Can you also have
> a look through some of the existing issues in case there is something
> similar, possibly relate them.
>
> Thank you in advance
>
> Lewis
>
> On Tue, Jan 17, 2012 at 9:38 AM, remi tassing <ta...@gmail.com> wrote:
> > Hi,
> >
> > The problem is really similar to this:
> >
> > http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td2
> > 1856688.html
> >
> > Unfortunately, I have no clue on what to update in Nutch ...
> >
> > On Mon, Jan 16, 2012 at 4:41 PM, remi tassing <ta...@gmail.com>
> >
> > wrote:
> > > Hello Markus,
> > >
> > > thanks for the help!
> > >
> > > Just to clarify a little bit. In my previous message, "uri1"
> > > represented
> >
> > a
> >
> > > normal, ordinary URL, I just didn't want to copy the exact URL.
> > >
> > > The weird part is that it all works in the browser...
> > >
> > >
> > > On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma <
> >
> > markus.jelsma@openindex.io
> >
> > > > wrote:
> > >> This? https://uri1...&From=stats
> > >>
> > >> That's not a correct or valid URL if you ask me.
> > >>
> > >> On Monday 16 January 2012 15:12:51 remi tassing wrote:
> > >> > Hello ,
> > >> >
> > >> > this is a snapshot of the log:
> > >> >
> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > >> > java.lang.IllegalArgumentException: Invalid uri
> > >> > 'https://uri1...&From=stats': Invalid query
> > >> > at
> >
> > org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:2
> > 22
> >
> > >> > ) at
> >
> > org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
> >
> > >> > at
> >
> >
org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
> > >> > 79) at
> > >>
> > >> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
> > >>
> > >> > at
> >
> > org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.ja
> > va
> >
> > >> > :224) at
> > >> >
> > >> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
> > >>
> > >> fetch
> > >>
> > >> > of https://uri1...&From=stats failed with:
> > >> > java.lang.IllegalArgumentException: Invalid uri
> > >> > 'https://uri1...&From=stats': Invalid query
> > >> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> > >> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> > >> >
> > >> > On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
> > >> >
> > >> > <ma...@openindex.io>wrote:
> > >> > > copy the stack trace please
> > >> > >
> > >> > > On Monday 16 January 2012 14:58:46 remi tassing wrote:
> > >> > > > Hello all,
> > >> > > >
> > >> > > > I'm getting "invalid uri" error with some link that have three
> >
> > dots,
> >
> > >> > > > i.e. "...". They work perfectly well in browsers (IE and Chrome)
> > >>
> > >> but,
> > >>
> > >> > > > apparently, not with Nutch.
> > >> > > >
> > >> > > > Is this a known issue? Any idea on how to handle it?
> > >> > > >
> > >> > > > Remi
> > >> > >
> > >> > > --
> > >> > > Markus Jelsma - CTO - Openindex
> > >>
> > >> --
> > >> Markus Jelsma - CTO - Openindex
Re: invalid uri with "three dots"
Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Remi,
This also looks like we need to document and address it.
Can you log a Jira issue and we will try to get on to it. Can you also have
a look through some of the existing issues in case there is something
similar, possibly relate them.
Thank you in advance
Lewis
On Tue, Jan 17, 2012 at 9:38 AM, remi tassing <ta...@gmail.com> wrote:
> Hi,
>
> The problem is really similar to this:
>
> http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td21856688.html
>
> Unfortunately, I have no clue on what to update in Nutch ...
>
> On Mon, Jan 16, 2012 at 4:41 PM, remi tassing <ta...@gmail.com>
> wrote:
>
> > Hello Markus,
> >
> > thanks for the help!
> >
> > Just to clarify a little bit. In my previous message, "uri1" represented
> a
> > normal, ordinary URL, I just didn't want to copy the exact URL.
> >
> > The weird part is that it all works in the browser...
> >
> >
> > On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma <
> markus.jelsma@openindex.io
> > > wrote:
> >
> >> This? https://uri1...&From=stats
> >>
> >> That's not a correct or valid URL if you ask me.
> >>
> >> On Monday 16 January 2012 15:12:51 remi tassing wrote:
> >> > Hello ,
> >> >
> >> > this is a snapshot of the log:
> >> >
> >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> >> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> >> > java.lang.IllegalArgumentException: Invalid uri
> >> > 'https://uri1...&From=stats': Invalid query
> >> > at
> >> >
> >>
> org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222
> >> > ) at
> >> >
> >>
> org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
> >> > at
> >> >
> >>
> org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
> >> > 79) at
> >> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
> >> > at
> >> >
> >>
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java
> >> > :224) at
> >> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
> >> fetch
> >> > of https://uri1...&From=stats failed with:
> >> > java.lang.IllegalArgumentException: Invalid uri
> >> > 'https://uri1...&From=stats': Invalid query
> >> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> >> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> >> >
> >> > On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
> >> >
> >> > <ma...@openindex.io>wrote:
> >> > > copy the stack trace please
> >> > >
> >> > > On Monday 16 January 2012 14:58:46 remi tassing wrote:
> >> > > > Hello all,
> >> > > >
> >> > > > I'm getting "invalid uri" error with some link that have three
> dots,
> >> > > > i.e. "...". They work perfectly well in browsers (IE and Chrome)
> >> but,
> >> > > > apparently, not with Nutch.
> >> > > >
> >> > > > Is this a known issue? Any idea on how to handle it?
> >> > > >
> >> > > > Remi
> >> > >
> >> > > --
> >> > > Markus Jelsma - CTO - Openindex
> >>
> >> --
> >> Markus Jelsma - CTO - Openindex
> >>
> >
> >
>
--
*Lewis*
Re: invalid uri with "three dots"
Posted by remi tassing <ta...@gmail.com>.
Hi,
The problem is really similar to this:
http://old.nabble.com/java.lang.IllegalArgumentException:-Invalid-uri-td21856688.html
Unfortunately, I have no clue on what to update in Nutch ...
On Mon, Jan 16, 2012 at 4:41 PM, remi tassing <ta...@gmail.com> wrote:
> Hello Markus,
>
> thanks for the help!
>
> Just to clarify a little bit. In my previous message, "uri1" represented a
> normal, ordinary URL, I just didn't want to copy the exact URL.
>
> The weird part is that it all works in the browser...
>
>
> On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma <markus.jelsma@openindex.io
> > wrote:
>
>> This? https://uri1...&From=stats
>>
>> That's not a correct or valid URL if you ask me.
>>
>> On Monday 16 January 2012 15:12:51 remi tassing wrote:
>> > Hello ,
>> >
>> > this is a snapshot of the log:
>> >
>> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
>> > java.lang.IllegalArgumentException: Invalid uri
>> > 'https://uri1...&From=stats': Invalid query
>> > at
>> >
>> org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222
>> > ) at
>> >
>> org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
>> > at
>> >
>> org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
>> > 79) at
>> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
>> > at
>> >
>> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java
>> > :224) at
>> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
>> fetch
>> > of https://uri1...&From=stats failed with:
>> > java.lang.IllegalArgumentException: Invalid uri
>> > 'https://uri1...&From=stats': Invalid query
>> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
>> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
>> >
>> > On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
>> >
>> > <ma...@openindex.io>wrote:
>> > > copy the stack trace please
>> > >
>> > > On Monday 16 January 2012 14:58:46 remi tassing wrote:
>> > > > Hello all,
>> > > >
>> > > > I'm getting "invalid uri" error with some link that have three dots,
>> > > > i.e. "...". They work perfectly well in browsers (IE and Chrome)
>> but,
>> > > > apparently, not with Nutch.
>> > > >
>> > > > Is this a known issue? Any idea on how to handle it?
>> > > >
>> > > > Remi
>> > >
>> > > --
>> > > Markus Jelsma - CTO - Openindex
>>
>> --
>> Markus Jelsma - CTO - Openindex
>>
>
>
Re: invalid uri with "three dots"
Posted by remi tassing <ta...@gmail.com>.
Hello Markus,
thanks for the help!
Just to clarify a little bit. In my previous message, "uri1" represented a
normal, ordinary URL, I just didn't want to copy the exact URL.
The weird part is that it all works in the browser...
On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma
<ma...@openindex.io>wrote:
> This? https://uri1...&From=stats
>
> That's not a correct or valid URL if you ask me.
>
> On Monday 16 January 2012 15:12:51 remi tassing wrote:
> > Hello ,
> >
> > this is a snapshot of the log:
> >
> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> > java.lang.IllegalArgumentException: Invalid uri
> > 'https://uri1...&From=stats': Invalid query
> > at
> >
> org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222
> > ) at
> > org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
> > at
> >
> org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
> > 79) at
> org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
> > at
> >
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java
> > :224) at
> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
> fetch
> > of https://uri1...&From=stats failed with:
> > java.lang.IllegalArgumentException: Invalid uri
> > 'https://uri1...&From=stats': Invalid query
> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> > -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> >
> > On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
> >
> > <ma...@openindex.io>wrote:
> > > copy the stack trace please
> > >
> > > On Monday 16 January 2012 14:58:46 remi tassing wrote:
> > > > Hello all,
> > > >
> > > > I'm getting "invalid uri" error with some link that have three dots,
> > > > i.e. "...". They work perfectly well in browsers (IE and Chrome) but,
> > > > apparently, not with Nutch.
> > > >
> > > > Is this a known issue? Any idea on how to handle it?
> > > >
> > > > Remi
> > >
> > > --
> > > Markus Jelsma - CTO - Openindex
>
> --
> Markus Jelsma - CTO - Openindex
>
Re: invalid uri with "three dots"
Posted by Markus Jelsma <ma...@openindex.io>.
This? https://uri1...&From=stats
That's not a correct or valid URL if you ask me.
On Monday 16 January 2012 15:12:51 remi tassing wrote:
> Hello ,
>
> this is a snapshot of the log:
>
> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
> java.lang.IllegalArgumentException: Invalid uri
> 'https://uri1...&From=stats': Invalid query
> at
> org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222
> ) at
> org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
> at
> org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:
> 79) at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
> at
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java
> :224) at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628) fetch
> of https://uri1...&From=stats failed with:
> java.lang.IllegalArgumentException: Invalid uri
> 'https://uri1...&From=stats': Invalid query
> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
>
> On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
>
> <ma...@openindex.io>wrote:
> > copy the stack trace please
> >
> > On Monday 16 January 2012 14:58:46 remi tassing wrote:
> > > Hello all,
> > >
> > > I'm getting "invalid uri" error with some link that have three dots,
> > > i.e. "...". They work perfectly well in browsers (IE and Chrome) but,
> > > apparently, not with Nutch.
> > >
> > > Is this a known issue? Any idea on how to handle it?
> > >
> > > Remi
> >
> > --
> > Markus Jelsma - CTO - Openindex
--
Markus Jelsma - CTO - Openindex
Re: invalid uri with "three dots"
Posted by remi tassing <ta...@gmail.com>.
Hello ,
this is a snapshot of the log:
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
java.lang.IllegalArgumentException: Invalid uri 'https://uri1...&From=stats':
Invalid query
at
org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222)
at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
at
org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:79)
at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
at
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:224)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
fetch of https://uri1...&From=stats failed with:
java.lang.IllegalArgumentException: Invalid uri 'https://uri1...&From=stats':
Invalid query
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=96
On Mon, Jan 16, 2012 at 4:05 PM, Markus Jelsma
<ma...@openindex.io>wrote:
> copy the stack trace please
>
> On Monday 16 January 2012 14:58:46 remi tassing wrote:
> > Hello all,
> >
> > I'm getting "invalid uri" error with some link that have three dots, i.e.
> > "...". They work perfectly well in browsers (IE and Chrome) but,
> > apparently, not with Nutch.
> >
> > Is this a known issue? Any idea on how to handle it?
> >
> > Remi
>
> --
> Markus Jelsma - CTO - Openindex
>
Re: invalid uri with "three dots"
Posted by Markus Jelsma <ma...@openindex.io>.
copy the stack trace please
On Monday 16 January 2012 14:58:46 remi tassing wrote:
> Hello all,
>
> I'm getting "invalid uri" error with some link that have three dots, i.e.
> "...". They work perfectly well in browsers (IE and Chrome) but,
> apparently, not with Nutch.
>
> Is this a known issue? Any idea on how to handle it?
>
> Remi
--
Markus Jelsma - CTO - Openindex