You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Boysenberry Payne <bo...@humaniteque.com> on 2005/08/01 23:12:56 UTC

Trying to get File and Directory info off of external server quickly

Hello All,

	I've got a two server platform one a static server for files and runs 
the mysql server
and the other runs mod_perl.  I'm trying to figure out the fastest way 
to get info on directories
and files from the static server to the mod_perl server.  Right now I'm 
using Net::FTP which
is really slow, especially when they're a lot of files.  Unfortunately, 
I need to check the file info
quite frequently.  I was wondering if anyone knew what was the fast way 
to get this info, LDAP,
SSH, etc?

Thanks,
Boysenberry

This message contains information that is confidential
and proprietary to Humaniteque and / or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.

http://www.habitatlife.com
The World's Best Site Builder


Re: Trying to get File and Directory info off of external server quickly

Posted by Boysenberry Payne <bo...@humaniteque.com>.
I've already got it working using Net::FTP.  The problem is it runs
slow using FTP.  Here is an example of what I'm trying to do:

my $h = $ftp->{handle};
foreach my $directory ( @directories ) {
	$h->cwd( $directory ) or die "can't change to directory: $directory 
$!";
	my $dir_ls = $h->ls;
	foreach my $file_name ( @$dir_ls ) {
		unless ( substr( $file_name, 0, 1 ) eq "." ) {
			my $dir_nfo = $h->dir( $directory . $file_name );
			$_ = $dir_nfo->[ 0 ];
			s/(\s)+/ /g;
			my @file_nfo = split / /, $_;
			my $file_size = $file_nfo[ 4 ];
			if( $file_size != 0 ) {
				add to database
			}
		}
	}
}
$h->quit;

I tried using ftp->size( $directory . $file_name );
But it seems to only return a size for small files,
at least on my OSX box.

Thanks,
Boysenberry

This message contains information that is confidential
and proprietary to Humaniteque and / or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.

http://www.habitatlife.com
The World's Best Site Builder
On Aug 1, 2005, at 5:54 PM, Philip M. Gollucci wrote:

> Boysenberry Payne wrote:
>> I'm not sure if HEAD would work.
>> Basically, I'm trying to read a directory's files.
>> After I confirm a file exists and doesn't have zero
>> size I check that it has the appropriate extension
>> for the directory then I add the directory address,
>> file name and extension to a table in our database.
> We actually do something very similar to this involving pictures being 
> uploaded from a digital camera to eventually be published on a 
> website.
>
> Cronjob1:
>   Poll destination directory and move the files to a temp location
>   The destination directory is where the camera puts them.
>
> Cronjob2:
>   Poll temp directory and move image into permenent
> 	location andinsert a row into our "images" table.
>
> Its split only so that if some part breaks the uploading from camera's
> does not and people can continue to upload from camera's.  Digital 
> camera's [at least the ones the government uses :)] upload with the 
> same non-unique file names for each upload, so we have to process each 
> batch rather quickly.
>
> I didn't write this, but I can say in 3 years its only crashed once 
> and makes us millions.
>
> [snipped for breviety of course]
> Cronjob1:
>
> use Net::FTP;
> my $ftp = Net::FTP->new($PEERADDR, Debug => 0, Timeout => 30)
> 	|| die "Connect to server failed\n";
> $ftp->login($USERNAME, $PASSWORD)
> 	|| die "Cannot login to FTP server\n";
> $ftp->binary();
>
> my @files   = $ftp->ls('-R');
> foreach my $file (@files) {
> 	unless some critera
> 	$ftp->get("$dir/$file", $localFilename);
> }
> $ftp->quit();
>
>


Re: Trying to get File and Directory info off of external server quickly

Posted by Boysenberry Payne <bo...@humaniteque.com>.
$ftp->{handle} = Net::FTP->new( $ftp->{host}, Passive => 1  ) or die  
"Can't create new ftp with host: $ftp->{host}";
It's part of my FTP module

Thanks,
Boysenberry

This message contains information that is confidential
and proprietary to Humaniteque and / or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.

http://www.habitatlife.com
The World's Best Site Builder
On Aug 1, 2005, at 6:31 PM, Philip M. Gollucci wrote:

> Boysenberry Payne wrote:
>> my $h = $ftp->{handle};
>> foreach my $directory ( @directories ) {
>>     $h->cwd( $directory ) or die "can't change to directory:  
>> $directory $!";
>>     my $dir_ls = $h->ls;
>>     foreach my $file_name ( @$dir_ls ) {
>>         unless ( substr( $file_name, 0, 1 ) eq "." ) {
>>             my $dir_nfo = $h->dir( $directory . $file_name );
>>             $_ = $dir_nfo->[ 0 ];
>>             s/(\s)+/ /g;
>>             my @file_nfo = split / /, $_;
>>             my $file_size = $file_nfo[ 4 ];
>>             if( $file_size != 0 ) {
>>                 add to database
>>             }
>>         }
>>     }
>> }
>> $h->quit;
> Whats this ftp->{handle} stuff ?
> Shouldn't it just be $ftp->xxx
> Thats not in perldoc Net::FTP.
>
> Are you using a relatively new version of it ?
> We've got Net::FTP 2.75,
> Linux wickedwitch 2.6.12.3 #2 SMP Mon Jul 18 17:14:55 EDT 2005 i686  
> i686 i386 GNU/Linux
>
> What is the file size check for .. why do you have files of size 0 ?
>
> I think it might be faster to do
>
> next unless $file_name =~ /^\./o;
>
> instead of
> >         unless ( substr( $file_name, 0, 1 ) eq "." ) {
>
> It might be your database connection... Do you prepare the handle  
> outside of the loop?  Is the database connect/disconnect outside of  
> the loop? What are you inserting into the database ? If you're  
> inserting a BLOB of the image/file data it could be the bandwidth  
> transfer now thats its not a local socket anymore.
>
> Luck
>
> --  
> END
> ----------------------------------------------------------------------- 
> ------
> Philip M. Gollucci
> Senior Developer - Liquidity Services Inc.
> Phone:  202.558.6268 (Direct)
> Cell:   301.254.5198
> E-Mail: pgollucci@liquidation.com
> Web:    http://www.liquidityservicesinc.com
>         http://www.liquidation.com
>         http://www.uksurplus.com
>         http://www.govliquidation.com
>         http://www.gowholesale.com
>
>
>


Re: Trying to get File and Directory info off of external server quickly

Posted by "Philip M. Gollucci" <pg...@liquidation.com>.
Boysenberry Payne wrote:
> my $h = $ftp->{handle};
> foreach my $directory ( @directories ) {
>     $h->cwd( $directory ) or die "can't change to directory: $directory 
> $!";
>     my $dir_ls = $h->ls;
>     foreach my $file_name ( @$dir_ls ) {
>         unless ( substr( $file_name, 0, 1 ) eq "." ) {
>             my $dir_nfo = $h->dir( $directory . $file_name );
>             $_ = $dir_nfo->[ 0 ];
>             s/(\s)+/ /g;
>             my @file_nfo = split / /, $_;
>             my $file_size = $file_nfo[ 4 ];
>             if( $file_size != 0 ) {
>                 add to database
>             }
>         }
>     }
> }
> $h->quit;
Whats this ftp->{handle} stuff ?
Shouldn't it just be $ftp->xxx
Thats not in perldoc Net::FTP.

Are you using a relatively new version of it ?
We've got Net::FTP 2.75,
Linux wickedwitch 2.6.12.3 #2 SMP Mon Jul 18 17:14:55 EDT 2005 i686 i686 
i386 GNU/Linux

What is the file size check for .. why do you have files of size 0 ?

I think it might be faster to do

next unless $file_name =~ /^\./o;

instead of
 >         unless ( substr( $file_name, 0, 1 ) eq "." ) {

It might be your database connection... Do you prepare the handle 
outside of the loop?  Is the database connect/disconnect outside of the 
loop? What are you inserting into the database ? If you're inserting a 
BLOB of the image/file data it could be the bandwidth transfer now thats 
its not a local socket anymore.

Luck

-- 
END
-----------------------------------------------------------------------------
Philip M. Gollucci
Senior Developer - Liquidity Services Inc.
Phone:  202.558.6268 (Direct)
Cell:   301.254.5198
E-Mail: pgollucci@liquidation.com
Web:    http://www.liquidityservicesinc.com
         http://www.liquidation.com
         http://www.uksurplus.com
         http://www.govliquidation.com
         http://www.gowholesale.com


Re: Trying to get File and Directory info off of external server quickly

Posted by "Philip M. Gollucci" <pg...@p6m7g8.com>.
Boysenberry Payne wrote:
> I'm not sure if HEAD would work.
> Basically, I'm trying to read a directory's files.
> After I confirm a file exists and doesn't have zero
> size I check that it has the appropriate extension
> for the directory then I add the directory address,
> file name and extension to a table in our database.
We actually do something very similar to this involving pictures being 
uploaded from a digital camera to eventually be published on a website.

Cronjob1:
   Poll destination directory and move the files to a temp location
   The destination directory is where the camera puts them.

Cronjob2:
   Poll temp directory and move image into permenent
	location andinsert a row into our "images" table.

Its split only so that if some part breaks the uploading from camera's
does not and people can continue to upload from camera's.  Digital 
camera's [at least the ones the government uses :)] upload with the same 
non-unique file names for each upload, so we have to process each batch 
rather quickly.

I didn't write this, but I can say in 3 years its only crashed once and 
makes us millions.

[snipped for breviety of course]
Cronjob1:

use Net::FTP;
my $ftp = Net::FTP->new($PEERADDR, Debug => 0, Timeout => 30)
	|| die "Connect to server failed\n";
$ftp->login($USERNAME, $PASSWORD)
	|| die "Cannot login to FTP server\n";
$ftp->binary();

my @files   = $ftp->ls('-R');
foreach my $file (@files) {
	unless some critera
	$ftp->get("$dir/$file", $localFilename);
}
$ftp->quit();

Re: Trying to get File and Directory info off of external server quickly

Posted by Boysenberry Payne <bo...@humaniteque.com>.
Thank You Everyone,

I think now that I know I can use $ftp->ls( "-lR" ), which I couldn't 
find
anywhere in the Net::FTP docs or other O'Reilly books I have, I can
stick to Net::FTP without is being slow.  What was causing my script
to take so long was the multiple $ftp->cwd( $directory ), $ftp->ls() and
$ftp->dir(  $directory . $file ) calls for each directory in my 
directory loop.

Now I use one cwd and ls("-lR") from my public html area then process
the return array, which is a lot faster.  It would be nice to be able 
to specify
the directory as well as the "-lR" without using cwd( $directory ); does
anyone know how to do it?

Thanks for the tips on making my code more efficient too.

Boysenberry

This message contains information that is confidential
and proprietary to Humaniteque and / or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.

http://www.habitatlife.com
The World's Best Site Builder
On Aug 1, 2005, at 6:28 PM, Randy Kobes wrote:

> On Mon, 1 Aug 2005, Boysenberry Payne wrote:
>
>> I'm not sure if HEAD would work.
>> Basically, I'm trying to read a directory's files.
>> After I confirm a file exists and doesn't have zero
>> size I check that it has the appropriate extension
>> for the directory then I add the directory address,
>> file name and extension to a table in our database.
>
> Can you get someone on the remote server to do a
>    cd top_level_directory
>    ls -lR > ls-lR  # or find -fls find-ls
>    gzip ls-lR      # or gzip find-ls
> periodically, and then you can grab and parse ls-lR.gz or find-ls.gz?
>
> -- 
> best regards,
> randy kobes
>
>


Re: Trying to get File and Directory info off of external server quickly

Posted by Randy Kobes <ra...@theoryx5.uwinnipeg.ca>.
On Mon, 1 Aug 2005, Boysenberry Payne wrote:

> I'm not sure if HEAD would work.
> Basically, I'm trying to read a directory's files.
> After I confirm a file exists and doesn't have zero
> size I check that it has the appropriate extension
> for the directory then I add the directory address,
> file name and extension to a table in our database.

Can you get someone on the remote server to do a
    cd top_level_directory
    ls -lR > ls-lR  # or find -fls find-ls
    gzip ls-lR      # or gzip find-ls
periodically, and then you can grab and parse ls-lR.gz or 
find-ls.gz?

-- 
best regards,
randy kobes

Re: Trying to get File and Directory info off of external server quickly

Posted by Boysenberry Payne <bo...@humaniteque.com>.
I'm not sure if HEAD would work.
Basically, I'm trying to read a directory's files.
After I confirm a file exists and doesn't have zero
size I check that it has the appropriate extension
for the directory then I add the directory address,
file name and extension to a table in our database.

It used to be easy when I was using php on a single
server system (which is what I'm migrating from.) Now
that I am using a two server system it's a little tricker.

Thanks,
Boysenberry

This message contains information that is confidential
and proprietary to Humaniteque and / or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.

http://www.habitatlife.com
The World's Best Site Builder
On Aug 1, 2005, at 5:12 PM, Philippe M. Chiasson wrote:

> Boysenberry Payne wrote:
>> Hello All,
>>
>>     I've got a two server platform one a static server for files and
>> runs the mysql server
>> and the other runs mod_perl.  I'm trying to figure out the fastest way
>> to get info on directories
>> and files from the static server to the mod_perl server.  Right now 
>> I'm
>> using Net::FTP which
>> is really slow, especially when they're a lot of files.  
>> Unfortunately,
>> I need to check the file info
>> quite frequently.  I was wondering if anyone knew what was the fast 
>> way
>> to get this info, LDAP,
>> SSH, etc?
>
> Wouldn't HTTP HEAD request achieve this fairly nicely ?
>
> I am not sure you have described the actual problem you are trying to 
> solve.
> Why is that information needed and how is it being used?
>
> -- 
> Philippe M. Chiasson m/gozer\@(apache|cpan|ectoplasm)\.org/ GPG KeyID 
> : 88C3A5A5
> http://gozer.ectoplasm.org/     F9BF E0C2 480E 7680 1AE5 3631 CB32 
> A107 88C3A5A5


Re: Trying to get File and Directory info off of external server quickly

Posted by "Philippe M. Chiasson" <go...@ectoplasm.org>.
Boysenberry Payne wrote:
> Hello All,
> 
>     I've got a two server platform one a static server for files and
> runs the mysql server
> and the other runs mod_perl.  I'm trying to figure out the fastest way
> to get info on directories
> and files from the static server to the mod_perl server.  Right now I'm
> using Net::FTP which
> is really slow, especially when they're a lot of files.  Unfortunately,
> I need to check the file info
> quite frequently.  I was wondering if anyone knew what was the fast way
> to get this info, LDAP,
> SSH, etc?

Wouldn't HTTP HEAD request achieve this fairly nicely ?

I am not sure you have described the actual problem you are trying to solve.
Why is that information needed and how is it being used?

-- 
Philippe M. Chiasson m/gozer\@(apache|cpan|ectoplasm)\.org/ GPG KeyID : 88C3A5A5
http://gozer.ectoplasm.org/     F9BF E0C2 480E 7680 1AE5 3631 CB32 A107 88C3A5A5

Re: Trying to get File and Directory info off of external server quickly

Posted by Torsten Foertsch <to...@gmx.net>.
On Monday 01 August 2005 23:12, Boysenberry Payne wrote:
> Hello All,
>
> 	I've got a two server platform one a static server for files and runs
> the mysql server
> and the other runs mod_perl.  I'm trying to figure out the fastest way
> to get info on directories
> and files from the static server to the mod_perl server.  Right now I'm
> using Net::FTP which
> is really slow, especially when they're a lot of files.  Unfortunately,
> I need to check the file info
> quite frequently.  I was wondering if anyone knew what was the fast way
> to get this info, LDAP,
> SSH, etc?

mod_dav may be an option.

Torsten

Re: Trying to get File and Directory info off of external server quickly

Posted by "Philip M. Gollucci" <pg...@p6m7g8.com>.
Boysenberry Payne wrote:
> The server system is hosted with a third party, so I was hoping I could use
> mod_perl as a solution without resorting to involving them in the 
> solution if
> possible.  If NFS ends up being the best solution I will ask them
> if they could set it up.
Specifically what information do you need from this file server ?

directories, path names, file names, mod_times, access_time times, 
size..... etc ?

Re: Trying to get File and Directory info off of external server quickly

Posted by Boysenberry Payne <bo...@humaniteque.com>.
The server system is hosted with a third party, so I was hoping I could 
use
mod_perl as a solution without resorting to involving them in the 
solution if
possible.  If NFS ends up being the best solution I will ask them
if they could set it up.

Thanks,
Boysenberry

This message contains information that is confidential
and proprietary to Humaniteque and / or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.

http://www.habitatlife.com
The World's Best Site Builder
On Aug 1, 2005, at 4:28 PM, Gedanken wrote:

> On Mon, 1 Aug 2005, Philip M. Gollucci wrote:
>
> There is a running joke in my office that, no matter what the problem 
> is,
> I simply blame NFS before hearing any details.  I am correct a 
> surprising
> amount of the time =)
>
> One quick caveat - properly unmount volumes when rebooting and such.  
> Due
> to one of those things that somehow grew beyond its original intent, we
> had a network of about 15 machines all mounting each other.  NFS
> chokes when a mount it expects to be there, isnt.  it takes it several
> minutes to give up.  One machine rebooting in such a big spiderweb 
> cluster
> can cause massive problems without proper attention to cleanly 
> unmounting
> and remounting shares.  And 'cascading' is usually such a lovely 
> word...
>
> gedanken
>
>  You might try an NFS mount between the too.
>
>  mount_nfs -L server:/path /local/path
>  (FreeBSD)
>
>
> -- 
> gedanken
>
>


Re: Trying to get File and Directory info off of external server quickly

Posted by "Philip M. Gollucci" <pg...@p6m7g8.com>.
Gedanken wrote:
> One quick caveat - properly unmount volumes when rebooting and such.  Due 
> to one of those things that somehow grew beyond its original intent, we 
> had a network of about 15 machines all mounting each other.  NFS 
> chokes when a mount it expects to be there, isnt.  it takes it several 
> minutes to give up.  One machine rebooting in such a big spiderweb cluster 
> can cause massive problems without proper attention to cleanly unmounting 
> and remounting shares.  And 'cascading' is usually such a lovely word...
I agree with that whole hardly. (the p6m7g8.net cluster is doing that)
As long as you don't reboot its great though :)


Re: Trying to get File and Directory info off of external server quickly

Posted by Gedanken <ge...@io.com>.
On Mon, 1 Aug 2005, Philip M. Gollucci wrote:

There is a running joke in my office that, no matter what the problem is, 
I simply blame NFS before hearing any details.  I am correct a surprising 
amount of the time =)

One quick caveat - properly unmount volumes when rebooting and such.  Due 
to one of those things that somehow grew beyond its original intent, we 
had a network of about 15 machines all mounting each other.  NFS 
chokes when a mount it expects to be there, isnt.  it takes it several 
minutes to give up.  One machine rebooting in such a big spiderweb cluster 
can cause massive problems without proper attention to cleanly unmounting 
and remounting shares.  And 'cascading' is usually such a lovely word...

gedanken

 You might try an NFS mount between the too.
 
 mount_nfs -L server:/path /local/path
 (FreeBSD)
 

-- 
gedanken

Re: Trying to get File and Directory info off of external server quickly

Posted by "Philip M. Gollucci" <pg...@p6m7g8.com>.
Boysenberry Payne wrote:
> Hello All,
> 
>     I've got a two server platform one a static server for files and 
> runs the mysql server
> and the other runs mod_perl.  I'm trying to figure out the fastest way 
> to get info on directories
> and files from the static server to the mod_perl server.  Right now I'm 
> using Net::FTP which
> is really slow, especially when they're a lot of files.  Unfortunately, 
> I need to check the file info
> quite frequently.  I was wondering if anyone knew what was the fast way 
> to get this info, LDAP,
> SSH, etc?
You might try an NFS mount between the too.

mount_nfs -L server:/path /local/path
(FreeBSD)