You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@struts.apache.org by Dinh Nguyen <cm...@yahoo.com> on 2003/10/28 05:30:27 UTC

How to find out all of pages (links) in a directory on website.

Hi all,

I know that this is probably not a right place to ask the question, 
but I am sure that there are some experts on this forum knows the 
answer.  Here is the question:

I am about the write a program to list (and count) all of the pages 
(URL) in a directory from a website.
Let say, given a URL below:
http://www.xyz.com/myfolder
or http://www.xyz.com 
there are lots of directories, each directory has number of pages. 
So in this case, let say there are five pages located in myfolder 
directory.  I'd like to find out those five URLs or links, for 
example:
http://www.xyz.com/myfolder/page1.html
http://www.xyz.com/myfolder/design.doc
http://www.xyz.com/myfolder/faqs.html
http://www.xyz.com/myfolder/links.htm
http://www.xyz.com/myfolder/mynews.html

How would I do this? Can you please guide me step by step (or some 
ideas) to design this program?

Thanks for your help.
Dinh Nguyen


---------------------------------------------------------------------
To unsubscribe, e-mail: struts-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: struts-user-help@jakarta.apache.org


Re: How to find out all of pages (links) in a directory on website.

Posted by Dinh Nguyen <cm...@yahoo.com>.
Hi Nathan,

In this case, I am interested in java files (java language) , if it's not possible, then I'll look into other languanges.
 
Thanks,
Dinh Nguyen

Dinh Nguyen <cm...@yahoo.com> wrote:
Hi Nathan,

Can you please explain in details for me? I don't get this at all.

Thanks,
Dinh nguyen

Nathan Maves wrote:linklint can do what you are asking for. Remotely it can only see the 
pages that are linked together. To see the orphan'ed files you can run 
linklint locally.

Nathan
On Oct 27, 2003, at 9:30 PM, Dinh Nguyen wrote:

> Hi all,
>
> I know that this is probably not a right place to ask the question,
> but I am sure that there are some experts on this forum knows the
> answer. Here is the question:
>
> I am about the write a program to list (and count) all of the pages
> (URL) in a directory from a website.
> Let say, given a URL below:
> http://www.xyz.com/myfolder
> or http://www.xyz.com
> there are lots of directories, each directory has number of pages.
> So in this case, let say there are five pages located in myfolder
> directory. I'd like to find out those five URLs or links, for
> example:
> http://www.xyz.com/myfolder/page1.html
> http://www.xyz.com/myfolder/design.doc
> http://www.xyz.com/myfolder/faqs.html
> http://www.xyz.com/myfolder/links.htm
> http://www.xyz.com/myfolder/mynews.html
>
> How would I do this? Can you please guide me step by step (or some
> ideas) to design this program?
>
> Thanks for your help.
> Dinh Nguyen
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: struts-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: struts-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: struts-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: struts-user-help@jakarta.apache.org



---------------------------------
Do you Yahoo!?
Exclusive Video Premiere - Britney Spears

---------------------------------
Do you Yahoo!?
Exclusive Video Premiere - Britney Spears

Re: How to find out all of pages (links) in a directory on website.

Posted by Dinh Nguyen <cm...@yahoo.com>.
Hi Nathan,

Can you please explain in details for me?  I don't get this at all.

Thanks,
Dinh nguyen

Nathan Maves <Na...@Sun.COM> wrote:linklint can do what you are asking for. Remotely it can only see the 
pages that are linked together. To see the orphan'ed files you can run 
linklint locally.

Nathan
On Oct 27, 2003, at 9:30 PM, Dinh Nguyen wrote:

> Hi all,
>
> I know that this is probably not a right place to ask the question,
> but I am sure that there are some experts on this forum knows the
> answer. Here is the question:
>
> I am about the write a program to list (and count) all of the pages
> (URL) in a directory from a website.
> Let say, given a URL below:
> http://www.xyz.com/myfolder
> or http://www.xyz.com
> there are lots of directories, each directory has number of pages.
> So in this case, let say there are five pages located in myfolder
> directory. I'd like to find out those five URLs or links, for
> example:
> http://www.xyz.com/myfolder/page1.html
> http://www.xyz.com/myfolder/design.doc
> http://www.xyz.com/myfolder/faqs.html
> http://www.xyz.com/myfolder/links.htm
> http://www.xyz.com/myfolder/mynews.html
>
> How would I do this? Can you please guide me step by step (or some
> ideas) to design this program?
>
> Thanks for your help.
> Dinh Nguyen
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: struts-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: struts-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: struts-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: struts-user-help@jakarta.apache.org



---------------------------------
Do you Yahoo!?
Exclusive Video Premiere - Britney Spears

Re: How to find out all of pages (links) in a directory on website.

Posted by Nathan Maves <Na...@Sun.COM>.
linklint can do what you are asking for.  Remotely it can only see the 
pages that are linked together.  To see the orphan'ed files you can run 
linklint locally.

Nathan
On Oct 27, 2003, at 9:30 PM, Dinh Nguyen wrote:

> Hi all,
>
> I know that this is probably not a right place to ask the question,
> but I am sure that there are some experts on this forum knows the
> answer.  Here is the question:
>
> I am about the write a program to list (and count) all of the pages
> (URL) in a directory from a website.
> Let say, given a URL below:
> http://www.xyz.com/myfolder
> or http://www.xyz.com
> there are lots of directories, each directory has number of pages.
> So in this case, let say there are five pages located in myfolder
> directory.  I'd like to find out those five URLs or links, for
> example:
> http://www.xyz.com/myfolder/page1.html
> http://www.xyz.com/myfolder/design.doc
> http://www.xyz.com/myfolder/faqs.html
> http://www.xyz.com/myfolder/links.htm
> http://www.xyz.com/myfolder/mynews.html
>
> How would I do this? Can you please guide me step by step (or some
> ideas) to design this program?
>
> Thanks for your help.
> Dinh Nguyen
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: struts-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: struts-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: struts-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: struts-user-help@jakarta.apache.org


Re: How to find out all of pages (links) in a directory on website.

Posted by Ted Husted <hu...@apache.org>.
At this point, we're definitely moving into Lucene territory.

http://jakarta.apache.org/lucene/docs/index.html

-T.

Dinh Nguyen wrote:

> Hi Rajat/all,
> 
> The thing is, I need to detect which files and how many files, that contain word "Privacy", on a directory on a website.  
> Assume that, http://www.xyz.com is a live site, and myfolder is one of the directories on the site (http://www.xyz.com/myfolder.html).  This folder contains some files that are privacy.  I don't know how many files are there, and I don't know which file that contains word "Privacy".  
> I think that there are steps: 
> 1) Implement a small Java program/class that dectects if a certain word appears on a file or not.
> 2) Implement a smalll Java program/class that lists (and counts) all of the URLs/links (or/and sub-folders) that from a URL (given URL) has.  
> I got step 1 done already and I am stuck at step #2.
> 
> I know that Sun has one of those examples, I downloaded it, compiled it, but it gave me errors at the line ==> List    listMatches; 
> http://developer.java.sun.com/developer/technicalArticles/ThirdParty/WebCrawler/#demo
> 
> I wasn't able to this compile.  I am not sure that if anyone ran into this problem before.
> 
> Thanks,
> Dinh Nguyen
> 
> Rajat Pandit <ra...@centergroupinc.com> wrote:
> The solution also depends on the fact that you want to "explore" a
> remote file or a local file. For local files, I belive using the DOM is
> the best option to solve the problem. (see javascript for more info). 
> -----Original Message-----
> From: Dinh Nguyen [mailto:cmpe275@yahoo.com] 
> Sent: Monday, October 27, 2003 8:30 PM
> To: struts-user@jakarta.apache.org
> Subject: How to find out all of pages (links) in a directory on website.
> 
> 
> Hi all,
> 
> I know that this is probably not a right place to ask the question, 
> but I am sure that there are some experts on this forum knows the 
> answer. Here is the question:
> 
> I am about the write a program to list (and count) all of the pages 
> (URL) in a directory from a website.
> Let say, given a URL below:
> http://www.xyz.com/myfolder
> or http://www.xyz.com 
> there are lots of directories, each directory has number of pages. 
> So in this case, let say there are five pages located in myfolder 
> directory. I'd like to find out those five URLs or links, for 
> example:
> http://www.xyz.com/myfolder/page1.html
> http://www.xyz.com/myfolder/design.doc
> http://www.xyz.com/myfolder/faqs.html
> http://www.xyz.com/myfolder/links.htm
> http://www.xyz.com/myfolder/mynews.html
> 
> How would I do this? Can you please guide me step by step (or some 
> ideas) to design this program?
> 
> Thanks for your help.
> Dinh Nguyen

-- 
Ted Husted,
   Junit in Action  - <http://www.manning.com/massol/>,
   Struts in Action - <http://husted.com/struts/book.html>,
   JSP Site Design  - <http://www.amazon.com/exec/obidos/ISBN=1861005512>.



---------------------------------------------------------------------
To unsubscribe, e-mail: struts-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: struts-user-help@jakarta.apache.org


RE: How to find out all of pages (links) in a directory on website.

Posted by Dinh Nguyen <cm...@yahoo.com>.
Hi Rajat/all,

The thing is, I need to detect which files and how many files, that contain word "Privacy", on a directory on a website.  
Assume that, http://www.xyz.com is a live site, and myfolder is one of the directories on the site (http://www.xyz.com/myfolder.html).  This folder contains some files that are privacy.  I don't know how many files are there, and I don't know which file that contains word "Privacy".  
I think that there are steps: 
1) Implement a small Java program/class that dectects if a certain word appears on a file or not.
2) Implement a smalll Java program/class that lists (and counts) all of the URLs/links (or/and sub-folders) that from a URL (given URL) has.  
I got step 1 done already and I am stuck at step #2.

I know that Sun has one of those examples, I downloaded it, compiled it, but it gave me errors at the line ==> List    listMatches; 
http://developer.java.sun.com/developer/technicalArticles/ThirdParty/WebCrawler/#demo

I wasn't able to this compile.  I am not sure that if anyone ran into this problem before.

Thanks,
Dinh Nguyen

Rajat Pandit <ra...@centergroupinc.com> wrote:
The solution also depends on the fact that you want to "explore" a
remote file or a local file. For local files, I belive using the DOM is
the best option to solve the problem. (see javascript for more info). 
-----Original Message-----
From: Dinh Nguyen [mailto:cmpe275@yahoo.com] 
Sent: Monday, October 27, 2003 8:30 PM
To: struts-user@jakarta.apache.org
Subject: How to find out all of pages (links) in a directory on website.


Hi all,

I know that this is probably not a right place to ask the question, 
but I am sure that there are some experts on this forum knows the 
answer. Here is the question:

I am about the write a program to list (and count) all of the pages 
(URL) in a directory from a website.
Let say, given a URL below:
http://www.xyz.com/myfolder
or http://www.xyz.com 
there are lots of directories, each directory has number of pages. 
So in this case, let say there are five pages located in myfolder 
directory. I'd like to find out those five URLs or links, for 
example:
http://www.xyz.com/myfolder/page1.html
http://www.xyz.com/myfolder/design.doc
http://www.xyz.com/myfolder/faqs.html
http://www.xyz.com/myfolder/links.htm
http://www.xyz.com/myfolder/mynews.html

How would I do this? Can you please guide me step by step (or some 
ideas) to design this program?

Thanks for your help.
Dinh Nguyen


---------------------------------------------------------------------
To unsubscribe, e-mail: struts-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: struts-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: struts-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: struts-user-help@jakarta.apache.org


---------------------------------
Do you Yahoo!?
Exclusive Video Premiere - Britney Spears

RE: How to find out all of pages (links) in a directory on website.

Posted by Rajat Pandit <ra...@centergroupinc.com>.
The solution also depends on the fact that you want to "explore" a
remote file or a local file. For local files, I belive using the DOM is
the best option to solve the problem. (see javascript for more info). 
-----Original Message-----
From: Dinh Nguyen [mailto:cmpe275@yahoo.com] 
Sent: Monday, October 27, 2003 8:30 PM
To: struts-user@jakarta.apache.org
Subject: How to find out all of pages (links) in a directory on website.


Hi all,

I know that this is probably not a right place to ask the question, 
but I am sure that there are some experts on this forum knows the 
answer.  Here is the question:

I am about the write a program to list (and count) all of the pages 
(URL) in a directory from a website.
Let say, given a URL below:
http://www.xyz.com/myfolder
or http://www.xyz.com 
there are lots of directories, each directory has number of pages. 
So in this case, let say there are five pages located in myfolder 
directory.  I'd like to find out those five URLs or links, for 
example:
http://www.xyz.com/myfolder/page1.html
http://www.xyz.com/myfolder/design.doc
http://www.xyz.com/myfolder/faqs.html
http://www.xyz.com/myfolder/links.htm
http://www.xyz.com/myfolder/mynews.html

How would I do this? Can you please guide me step by step (or some 
ideas) to design this program?

Thanks for your help.
Dinh Nguyen


---------------------------------------------------------------------
To unsubscribe, e-mail: struts-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: struts-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: struts-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: struts-user-help@jakarta.apache.org