You are viewing a plain text version of this content. The canonical link for it is here.
Posted to droids-dev@incubator.apache.org by ray lukas <ra...@verizon.net> on 2009/12/28 14:11:44 UTC

can droid crawl this site

Well this link crashes Nutch (redirection problem I would guess but have not
proved it).. I really just need to get my hands on the HTML and I will feed
it into my parsing and indexing systems. For this I just need a crawling
mechanism that will give me the HTML for these types of links. Nutch is,
wonderful but for this overkill and is unable t crawl these links, so I am
looking at Droid as a solution. 

I am not archiving anything, I am directly using the html in my java
application. Can Droid crawl this site and return me the correct html. Could
someone try it for me on their droid installation and let me know?

 

Thanks guys.. 

 

http://electricservices.smrated.com/servlet/splocal?m=verizonem&xmid=5060691
&xmcid=-12026&entry_point_id=3079198> 

 

 

On 07/12/2009, at 16:38, Lukas, Ray wrote:

> I was having a problem with Nutch and would like to see if Droids can

> help. Could someone just try crawling this web page and tell me if this

> works on Droids. I need to be able to crawl these web pages and can not

> seem to do so with Nutch. Would you plug this into your installation and

> see if Droids can successfully crawl this. 

 

What is your problem with Nutch with this site? What are you trying to
archive?

salu2

 

> http://electricservices.smrated.com/servlet/splocal?m=verizonem&xmid=506

> 0691&xmcid=-12026&entry_point_id=3079198

> if so I will start switching over the project to Droid, if not then I

> have to keep looking for something that will work.. any advice would be

> really helpful. 

> I don't know if this is the correct list.. sorry

> thanks so much Ray

> 

 


Re: can droid crawl this site

Posted by Richard Frovarp <rf...@apache.org>.
ray lukas wrote:
> Well this link crashes Nutch (redirection problem I would guess but have not
> proved it).. I really just need to get my hands on the HTML and I will feed
> it into my parsing and indexing systems. For this I just need a crawling
> mechanism that will give me the HTML for these types of links. Nutch is,
> wonderful but for this overkill and is unable t crawl these links, so I am
> looking at Droid as a solution. 
>
> I am not archiving anything, I am directly using the html in my java
> application. Can Droid crawl this site and return me the correct html. Could
> someone try it for me on their droid installation and let me know?
>
>  
>
> Thanks guys.. 
>
>  
>
> http://electricservices.smrated.com/servlet/splocal?m=verizonem&xmid=5060691
> &xmcid=-12026&entry_point_id=3079198> 
>
>  
>   
I'm just in the evaluation phase of using Droids myself, but from what 
I've found it is quite flexible.

I would try running the SimpleRuntime code with the URL you listed and 
see if it gives you back your expected results. If that doesn't work, it 
may require some further work.

Richard

Re: can droid crawl this site

Posted by Ken Krugler <kk...@transpac.com>.
Hi Ray,

I don't see why Nutch would crash on this link - just does two  
redirects, returns what looks like standard HTML, etc.

Given the above, Droids, Nutch, etc. should all work fine.

-- Ken

On Dec 28, 2009, at 6:11am, ray lukas wrote:

> Well this link crashes Nutch (redirection problem I would guess but  
> have not
> proved it).. I really just need to get my hands on the HTML and I  
> will feed
> it into my parsing and indexing systems. For this I just need a  
> crawling
> mechanism that will give me the HTML for these types of links. Nutch  
> is,
> wonderful but for this overkill and is unable t crawl these links,  
> so I am
> looking at Droid as a solution.
>
> I am not archiving anything, I am directly using the html in my java
> application. Can Droid crawl this site and return me the correct  
> html. Could
> someone try it for me on their droid installation and let me know?
>
>
>
> Thanks guys..
>
>
>
> http://electricservices.smrated.com/servlet/splocal?m=verizonem&xmid=5060691
> &xmcid=-12026&entry_point_id=3079198>
>
>
>
>
>
> On 07/12/2009, at 16:38, Lukas, Ray wrote:
>
>> I was having a problem with Nutch and would like to see if Droids can
>
>> help. Could someone just try crawling this web page and tell me if  
>> this
>
>> works on Droids. I need to be able to crawl these web pages and can  
>> not
>
>> seem to do so with Nutch. Would you plug this into your  
>> installation and
>
>> see if Droids can successfully crawl this.
>
>
>
> What is your problem with Nutch with this site? What are you trying to
> archive?
>
> salu2
>
>
>
>> http://electricservices.smrated.com/servlet/splocal?m=verizonem&xmid=506
>
>> 0691&xmcid=-12026&entry_point_id=3079198
>
>> if so I will start switching over the project to Droid, if not then I
>
>> have to keep looking for something that will work.. any advice  
>> would be
>
>> really helpful.
>
>> I don't know if this is the correct list.. sorry
>
>> thanks so much Ray
>
>>
>
>
>

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g