You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by David Kerber <dc...@verizon.net> on 2007/10/13 04:56:38 UTC

Copying large files around

What is the most efficient (=fastest) way of copying large (> 1GB [yes, 
that's a Giga]) files around the network in java when running under 
tomcat 5.5.x?  Do I use a FileInputStream and FileOutputStream with a 
large byte[] array?  Or what?

D



---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by Bill Barker <wb...@wilshire.com>.
Seconding Peter's comments:  It is extremely unlikely that you can beat the 
Windows tools for this sort of thing (been there, done that).  Also, if your 
webapp isn't physically on one or the other boxes (as Peter said, preferably 
the reciever), then you are pretty much guaranteed to lose.

That being said, if you insist on using Java, then use a MappedByteBuffer on 
the source, and either a transferTo or TransferFrom to send it to the to. 
On Windows, this makes an order of magnitude difference for 1GB+ files.


"Peter Crowther" <Pe...@melandra.com> wrote in message 
news:DDBBD1E00935D144AB9563D57EF98D624EF5C9@raccoon.melandra.net...
> From: David Kerber [mailto:dckerber@verizon.net]
> What is the most efficient (=fastest) way of copying large (>
> 1GB [yes,
> that's a Giga]) files around the network in java when running under
> tomcat 5.5.x?  Do I use a FileInputStream and FileOutputStream with a
> large byte[] array?  Or what?

If you *definitely* want efficient, and you're on Windows, then:

- Do not use Java, use OS-level tools;
- Run the tools on the receiving machine, not the sending machine.

You don't have to worry about differences in buffer sizes between the
network and whatever chunking you're using on the Java file copy, and
I'm pretty sure I've seen "copy" pre-allocate the entire file space when
the target file is local (because it knows it's copying a source file
and it knows it has exclusive access, so the size doesn't change).  I've
never seen that behaviour when "copy" is running to a network share.

The easiest way of doing this on Windows is via the AT command: write
yourself an at that runs a couple of minutes in the future (use Java to
look up the time :-) ), similar to:

at \\targetserver 21:02 "c:\jobs\copyFromLive.bat"

Ugly as sin, but it works and it's fast.

- Peter

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org





---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: [OT] Replication (was RE: Copying large files around)

Posted by Johnny Kewl <jo...@kewlstuff.co.za>.
---------------------------------------------------------------------------
HARBOR: http://coolharbor.100free.com/index.htm
Now Tomcat is also a cool application server
---------------------------------------------------------------------------
----- Original Message ----- 
From: "Peter Crowther" <Pe...@melandra.com>
To: "Tomcat Users List" <us...@tomcat.apache.org>
Sent: Monday, October 15, 2007 11:38 AM
Subject: [OT] Replication (was RE: Copying large files around)


[Marked off-topic as this has, indeed, come a long long way from the
original question]

> From: Johnny Kewl [mailto:john@kewlstuff.co.za]
> but yes if the user could consider replication
> and the required
> dB design, much better than moving GB files around.

Not always.  For example, replication has the disadvantage that an
application logic error can get replicated and break both your live and
your backup database (although I accept that point-in-time log-based
recovery can mitigate this).  Point-in-time backups that are copied
between machines have a long and honourable history; they're also often
much easier for the ops team to get their heads round, where a
replicated system can often set heads spinning in the ops team.  You
also have to be very careful with your replication setup - and the
guarantees you give your client - in case of partial failures.  For
example, if commuinications to the second system fails, do you stop
accepting transactions on the primary (potential loss of new business on
an Internet site) or do you continue, but in the knowledge that if the
primary fails you're now losing transactions and hence violating ACID
principles from the client's point of view (potential loss of old
business on an Internet site)?

=================================================
Yes, exactly - all true, replication requires very careful design 
consideration, someone deletes one dB and it ripples through the other 10, 
yes, everything you say is true, and its the reason I made a Master - Master 
postgresql replication system. It tries to address all the weaknesses, while 
enjoying its benefits.
Its a long long time ago so I cant remember all the design details but thats 
what I tried to avoid... things like multiple masters to begin with, so if 
say the 10th floor goes down, those direct users, ie the ones talking 
directly to that dB are only affected, however when it comes back up, all 
the other masters will bring it back into alignment.
Also things like allowing one to make one dB none deletable... its a pure 
backup and even if one kills the main dB, this dB replicates updates, and 
inserts, but never deletes... it becomes a complete history.

However even with all this... to get it right, one needs to consider 
replication when designing the dB... still has to be a total system design, 
and sadly its not a plug and play technology. I use TC as a web enabled 
controller and originally I wanted to extend it to other dB's as well, but 
was exhausted after just designing it for postgres. I also tried to design 
it for multiple dispersed web clusters... ie update newyork and london will 
eventually sync up and visa versa, but never got around to testing it over 
the web.

But yes all you say is true, and although fascinating technology, the brunt 
of the thing is that if it wasnt considered up front, you probably resigned 
to good old fashioned backups.

Everything I build runs on good old Tomcat... so I think I'm still legal if 
I talk about this ;)

... the general knowledge in this group is fantastic... thanks


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


[OT] Replication (was RE: Copying large files around)

Posted by Peter Crowther <Pe...@melandra.com>.
[Marked off-topic as this has, indeed, come a long long way from the
original question]

> From: Johnny Kewl [mailto:john@kewlstuff.co.za] 
> but yes if the user could consider replication 
> and the required
> dB design, much better than moving GB files around.

Not always.  For example, replication has the disadvantage that an
application logic error can get replicated and break both your live and
your backup database (although I accept that point-in-time log-based
recovery can mitigate this).  Point-in-time backups that are copied
between machines have a long and honourable history; they're also often
much easier for the ops team to get their heads round, where a
replicated system can often set heads spinning in the ops team.  You
also have to be very careful with your replication setup - and the
guarantees you give your client - in case of partial failures.  For
example, if commuinications to the second system fails, do you stop
accepting transactions on the primary (potential loss of new business on
an Internet site) or do you continue, but in the knowledge that if the
primary fails you're now losing transactions and hence violating ACID
principles from the client's point of view (potential loss of old
business on an Internet site)?

(As you can probably guess, I've designed and built my share of
replicated databases - and supported them!)

		- Peter

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by Johnny Kewl <jo...@kewlstuff.co.za>.
---------------------------------------------------------------------------
HARBOR: http://coolharbor.100free.com/index.htm
Now Tomcat is also a cool application server
---------------------------------------------------------------------------
----- Original Message ----- 
From: "David Kerber" <dc...@verizon.net>
To: "Tomcat Users List" <us...@tomcat.apache.org>
Sent: Sunday, October 14, 2007 9:49 PM
Subject: Re: Copying large files around


> Pid wrote:
>> Johnny Kewl wrote:
>>
>>> ---------------------------------------------------------------------------
>>> HARBOR: http://coolharbor.100free.com/index.htm
>>> Now Tomcat is also a cool application server
>>> ---------------------------------------------------------------------------
>>> ----- Original Message ----- From: "David Kerber" <dc...@verizon.net>
>>> To: "Tomcat Users List" <us...@tomcat.apache.org>
>>> Sent: Sunday, October 14, 2007 2:56 AM
>>> Subject: Re: Copying large files around
>>>
>>> If it Postgresql you using, a while a go I wrote a replication system
>>> for
>>> Postgres, and its a TC servlet. So what you could to is make a real time
>>> back up, ie as one transaction happens on the main dB its replicated on
>>> the
>>> other and visa versa.... dB has to be designed for it, but they always
>>> aligned.
>>> If you interested, just yell.
>>>
>>
>> If it was Postgres*, slaving it is possible and that is definitely more
>> efficient/safe.
>>
>> p
>>
>> * Apparently not, as 'cost' has been referred to and the OS is Windows,
>> which makes a commercial DB more likely.
>>
> Correct observation.

Yes... all true, question has come along way from what initialy looked like
a TC query, but yes if the user could consider replication and the required
dB design, much better than moving GB files around. For a newbie it may be a
little overwhelming and I just want to point out that there are many levels
of replication, Microsofts replication is not like Postgres's master - slave
replication, and thats a little different to my master - master replication
system as well.

If it is web based work that is generating these huge dB's... ie its TC that
is talking to the dB, and replication is not an option on this dB, I would
even look at something like a customized dB pool that duplicates
transactions to a backup dB... anything to avoid huge dB backups.
Then also, if it was postgres, and that was not possible.... even
considering incremental transaction logs would be better, ie backing up a
days work and not the whole dB... something like that... good luck.

>
> D
>
>
>
> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by David Kerber <dc...@verizon.net>.
Pid wrote:
> Johnny Kewl wrote:
>   
>> ---------------------------------------------------------------------------
>> HARBOR: http://coolharbor.100free.com/index.htm
>> Now Tomcat is also a cool application server
>> ---------------------------------------------------------------------------
>> ----- Original Message ----- From: "David Kerber" <dc...@verizon.net>
>> To: "Tomcat Users List" <us...@tomcat.apache.org>
>> Sent: Sunday, October 14, 2007 2:56 AM
>> Subject: Re: Copying large files around
>>
>> If it Postgresql you using, a while a go I wrote a replication system for
>> Postgres, and its a TC servlet. So what you could to is make a real time
>> back up, ie as one transaction happens on the main dB its replicated on the
>> other and visa versa.... dB has to be designed for it, but they always
>> aligned.
>> If you interested, just yell.
>>     
>
> If it was Postgres*, slaving it is possible and that is definitely more
> efficient/safe.
>
> p
>
> * Apparently not, as 'cost' has been referred to and the OS is Windows,
> which makes a commercial DB more likely.
>   
Correct observation.

D



---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by Pid <p...@pidster.com>.
Johnny Kewl wrote:
> 
> ---------------------------------------------------------------------------
> HARBOR: http://coolharbor.100free.com/index.htm
> Now Tomcat is also a cool application server
> ---------------------------------------------------------------------------
> ----- Original Message ----- From: "David Kerber" <dc...@verizon.net>
> To: "Tomcat Users List" <us...@tomcat.apache.org>
> Sent: Sunday, October 14, 2007 2:56 AM
> Subject: Re: Copying large files around
> 
> If it Postgresql you using, a while a go I wrote a replication system for
> Postgres, and its a TC servlet. So what you could to is make a real time
> back up, ie as one transaction happens on the main dB its replicated on the
> other and visa versa.... dB has to be designed for it, but they always
> aligned.
> If you interested, just yell.

If it was Postgres*, slaving it is possible and that is definitely more
efficient/safe.

p

* Apparently not, as 'cost' has been referred to and the OS is Windows,
which makes a commercial DB more likely.


> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 
> 


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by Johnny Kewl <jo...@kewlstuff.co.za>.
---------------------------------------------------------------------------
HARBOR: http://coolharbor.100free.com/index.htm
Now Tomcat is also a cool application server
---------------------------------------------------------------------------
----- Original Message ----- 
From: "David Kerber" <dc...@verizon.net>
To: "Tomcat Users List" <us...@tomcat.apache.org>
Sent: Sunday, October 14, 2007 2:56 AM
Subject: Re: Copying large files around

If it Postgresql you using, a while a go I wrote a replication system for
Postgres, and its a TC servlet. So what you could to is make a real time
back up, ie as one transaction happens on the main dB its replicated on the
other and visa versa.... dB has to be designed for it, but they always
aligned.
If you interested, just yell.


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by David Kerber <dc...@verizon.net>.
Peter Crowther wrote:
>> From: David Kerber [mailto:dckerber@verizon.net] 
>> What is the most efficient (=fastest) way of copying large (> 
>> 1GB [yes, 
>> that's a Giga]) files around the network in java when running under 
>> tomcat 5.5.x?  Do I use a FileInputStream and FileOutputStream with a 
>> large byte[] array?  Or what?
>>     
>
> If you *definitely* want efficient, and you're on Windows, then:
>
> - Do not use Java, use OS-level tools;
> - Run the tools on the receiving machine, not the sending machine.
>
> You don't have to worry about differences in buffer sizes between the
> network and whatever chunking you're using on the Java file copy, and
> I'm pretty sure I've seen "copy" pre-allocate the entire file space when
> the target file is local (because it knows it's copying a source file
> and it knows it has exclusive access, so the size doesn't change).  I've
> never seen that behaviour when "copy" is running to a network share.
>
> The easiest way of doing this on Windows is via the AT command: write
> yourself an at that runs a couple of minutes in the future (use Java to
> look up the time :-) ), similar to:
>
> at \\targetserver 21:02 "c:\jobs\copyFromLive.bat"
>
> Ugly as sin, but it works and it's fast.
>
> 		- Peter
>   
Thanks for the suggestion!  I would never have thought of that.

D



---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


RE: Copying large files around

Posted by Peter Crowther <Pe...@melandra.com>.
> From: David Kerber [mailto:dckerber@verizon.net] 
> What is the most efficient (=fastest) way of copying large (> 
> 1GB [yes, 
> that's a Giga]) files around the network in java when running under 
> tomcat 5.5.x?  Do I use a FileInputStream and FileOutputStream with a 
> large byte[] array?  Or what?

If you *definitely* want efficient, and you're on Windows, then:

- Do not use Java, use OS-level tools;
- Run the tools on the receiving machine, not the sending machine.

You don't have to worry about differences in buffer sizes between the
network and whatever chunking you're using on the Java file copy, and
I'm pretty sure I've seen "copy" pre-allocate the entire file space when
the target file is local (because it knows it's copying a source file
and it knows it has exclusive access, so the size doesn't change).  I've
never seen that behaviour when "copy" is running to a network share.

The easiest way of doing this on Windows is via the AT command: write
yourself an at that runs a couple of minutes in the future (use Java to
look up the time :-) ), similar to:

at \\targetserver 21:02 "c:\jobs\copyFromLive.bat"

Ugly as sin, but it works and it's fast.

		- Peter

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by Johnny Kewl <jo...@kewlstuff.co.za>.
---------------------------------------------------------------------------
HARBOR: http://coolharbor.100free.com/index.htm
Now Tomcat is also a cool application server
---------------------------------------------------------------------------
----- Original Message ----- 
From: "Johnny Kewl" <jo...@kewlstuff.co.za>
To: "Tomcat Users List" <us...@tomcat.apache.org>
Sent: Saturday, October 13, 2007 1:11 PM
Subject: Re: Copying large files around


>
> ---------------------------------------------------------------------------
> HARBOR: http://coolharbor.100free.com/index.htm
> Now Tomcat is also a cool application server
> ---------------------------------------------------------------------------
>
>
> ----- Original Message ----- 
> From: "Johnny Kewl" <jo...@kewlstuff.co.za>
> To: "Tomcat Users List" <us...@tomcat.apache.org>; <dk...@ieee.org>
> Sent: Saturday, October 13, 2007 12:41 PM
> Subject: Re: Copying large files around
>
>
>>
>> ---------------------------------------------------------------------------
>> HARBOR: http://coolharbor.100free.com/index.htm
>> Now Tomcat is also a cool application server
>> ---------------------------------------------------------------------------
>> ----- Original Message ----- 
>> From: "David Kerber" <dc...@verizon.net>
>> To: "Tomcat Users List" <us...@tomcat.apache.org>
>> Sent: Saturday, October 13, 2007 4:56 AM
>> Subject: Copying large files around
>>
>>
>>> What is the most efficient (=fastest) way of copying large (> 1GB [yes, 
>>> that's a Giga]) files around the network in java when running under 
>>> tomcat 5.5.x?  Do I use a FileInputStream and FileOutputStream with a 
>>> large byte[] array?  Or what?
>>
>> What a cool question.... I've never had the need to move such big files 
>> but this is what I would do...
>>
>> First look at this link.... 
>> http://forum.java.sun.com/thread.jspa?threadID=226413&messageID=818728
>>
>> It will give you the basic streaming model.... ie a POST on the browser 
>> side starts sucking down the file from the servlet...
>> So in effect you do not have to worry about the whole file been 
>> buffered... it will move across the wire in say 4k blocks.
>>
>> OK... now if you on a LAN... thats cool, but for the internet this is not 
>> good enough...
>>
>> If you have a look at something like the Opera browser's file transfer it 
>> does some cool things, like if you shut down the machine, next time you 
>> start up again, it will pick up where you stopped it, it doesnt start 
>> with the whole 1 gb file again....
>> In fact if I built a servlet to do this... I would run the Opera browser 
>> download against it and stop and start and see it my servlet is to 
>> spec...
>>
>> I think the way to do it is to to modify the code in that link for RANDOM 
>> file access... ie the client knows its got 800 MB already and only asks 
>> for 800MB onwards.... so how do they do that.
>>
>> Look at this link..... 
>> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
>> and look at the byte range header spec..... I would build the servlet to 
>> that.
>>
>> With such large files.... I think the biggest problem to solve is 
>> interruptions.... ie give the user the ability to close the client and go 
>> home... tomorrow it starts where it left off..... I think file change 
>> headers (almost RSS) type stuff comes into this as well, in case the file 
>> is modified before it all downloaded, in which case it MUST start 
>> again....
>>
>> Nice question.... I'm surprized I cant find code like this on the web 
>> already.... have a good look around.... it must be there.
>>
>> You could also add compression to it... and I think GZIP is an option in 
>> TC already, or you can just compress the blocks as they go...
>> If its a JPG or GIF.... dont bother with compression it already is.... 
>> nice little project
>>
>> Good Luck....
>
> AFTER-THOUGHT
> You know, these TC guys are frightening clever chaps ;)
> I would not be surprized at all if the DEFAULT servlet already does all 
> this already.
>
> Never tried it but I would test Opera against the DEFAULT servlet (ie you 
> do nothing but put the file in TC.... no code)
>
> If it works but you need to extend that to folders outside of Tomcat.... 
> steal the DEAFULT servlet and just modify it ;)

Save you some trouble.... Yes the DEFAULT servlet already supports Etags and 
File Offsets... so there you go.... Thanks Tomcat!

At most you need to make the client... or just use Opera ;) 


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by Johnny Kewl <jo...@kewlstuff.co.za>.
---------------------------------------------------------------------------
HARBOR: http://coolharbor.100free.com/index.htm
Now Tomcat is also a cool application server
---------------------------------------------------------------------------


----- Original Message ----- 
From: "Johnny Kewl" <jo...@kewlstuff.co.za>
To: "Tomcat Users List" <us...@tomcat.apache.org>; <dk...@ieee.org>
Sent: Saturday, October 13, 2007 12:41 PM
Subject: Re: Copying large files around


>
> ---------------------------------------------------------------------------
> HARBOR: http://coolharbor.100free.com/index.htm
> Now Tomcat is also a cool application server
> ---------------------------------------------------------------------------
> ----- Original Message ----- 
> From: "David Kerber" <dc...@verizon.net>
> To: "Tomcat Users List" <us...@tomcat.apache.org>
> Sent: Saturday, October 13, 2007 4:56 AM
> Subject: Copying large files around
>
>
>> What is the most efficient (=fastest) way of copying large (> 1GB [yes, 
>> that's a Giga]) files around the network in java when running under 
>> tomcat 5.5.x?  Do I use a FileInputStream and FileOutputStream with a 
>> large byte[] array?  Or what?
>
> What a cool question.... I've never had the need to move such big files 
> but this is what I would do...
>
> First look at this link.... 
> http://forum.java.sun.com/thread.jspa?threadID=226413&messageID=818728
>
> It will give you the basic streaming model.... ie a POST on the browser 
> side starts sucking down the file from the servlet...
> So in effect you do not have to worry about the whole file been 
> buffered... it will move across the wire in say 4k blocks.
>
> OK... now if you on a LAN... thats cool, but for the internet this is not 
> good enough...
>
> If you have a look at something like the Opera browser's file transfer it 
> does some cool things, like if you shut down the machine, next time you 
> start up again, it will pick up where you stopped it, it doesnt start with 
> the whole 1 gb file again....
> In fact if I built a servlet to do this... I would run the Opera browser 
> download against it and stop and start and see it my servlet is to spec...
>
> I think the way to do it is to to modify the code in that link for RANDOM 
> file access... ie the client knows its got 800 MB already and only asks 
> for 800MB onwards.... so how do they do that.
>
> Look at this link..... 
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
> and look at the byte range header spec..... I would build the servlet to 
> that.
>
> With such large files.... I think the biggest problem to solve is 
> interruptions.... ie give the user the ability to close the client and go 
> home... tomorrow it starts where it left off..... I think file change 
> headers (almost RSS) type stuff comes into this as well, in case the file 
> is modified before it all downloaded, in which case it MUST start 
> again....
>
> Nice question.... I'm surprized I cant find code like this on the web 
> already.... have a good look around.... it must be there.
>
> You could also add compression to it... and I think GZIP is an option in 
> TC already, or you can just compress the blocks as they go...
> If its a JPG or GIF.... dont bother with compression it already is.... 
> nice little project
>
> Good Luck....

AFTER-THOUGHT
You know, these TC guys are frightening clever chaps ;)
I would not be surprized at all if the DEFAULT servlet already does all this 
already.

Never tried it but I would test Opera against the DEFAULT servlet (ie you do 
nothing but put the file in TC.... no code)

If it works but you need to extend that to folders outside of Tomcat.... 
steal the DEAFULT servlet and just modify it ;)


> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
> 


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by "Mark H. Wood" <mw...@IUPUI.Edu>.
On Sat, Oct 13, 2007 at 12:41:45PM +0200, Johnny Kewl wrote:
> OK... now if you on a LAN... thats cool, but for the internet this is not 
> good enough...
>
> If you have a look at something like the Opera browser's file transfer it 
> does some cool things, like if you shut down the machine, next time you 
> start up again, it will pick up where you stopped it, it doesnt start with 
> the whole 1 gb file again....
> In fact if I built a servlet to do this... I would run the Opera browser 
> download against it and stop and start and see it my servlet is to spec...
>
> I think the way to do it is to to modify the code in that link for RANDOM 
> file access... ie the client knows its got 800 MB already and only asks for 
> 800MB onwards.... so how do they do that.
>
> Look at this link..... 
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
> and look at the byte range header spec..... I would build the servlet to 
> that.
>
> With such large files.... I think the biggest problem to solve is 
> interruptions.... ie give the user the ability to close the client and go 
> home... tomorrow it starts where it left off..... I think file change 
> headers (almost RSS) type stuff comes into this as well, in case the file 
> is modified before it all downloaded, in which case it MUST start again....
>
> Nice question.... I'm surprized I cant find code like this on the web 
> already.... have a good look around.... it must be there.

Perhaps not the *fastest*, but there *is* code to deal with
transferring files across flaky paths.  You just didn't look far
enough back in history.  UUCP over TCP works well, remembers how far
it got, and will keep trying on a schedule you specify until the job
is done.  I still use it to throw huge files across WAN paths, even
dialup links.

-- 
Mark H. Wood, Lead System Programmer   mwood@IUPUI.Edu
"Don't throw the past away. / You might need it some rainy day."

Re: Copying large files around

Posted by Johnny Kewl <jo...@kewlstuff.co.za>.
---------------------------------------------------------------------------
HARBOR: http://coolharbor.100free.com/index.htm
Now Tomcat is also a cool application server
---------------------------------------------------------------------------
----- Original Message ----- 
From: "David Kerber" <dc...@verizon.net>
To: "Tomcat Users List" <us...@tomcat.apache.org>
Sent: Saturday, October 13, 2007 4:56 AM
Subject: Copying large files around


> What is the most efficient (=fastest) way of copying large (> 1GB [yes, 
> that's a Giga]) files around the network in java when running under tomcat 
> 5.5.x?  Do I use a FileInputStream and FileOutputStream with a large 
> byte[] array?  Or what?

What a cool question.... I've never had the need to move such big files but 
this is what I would do...

First look at this link.... 
http://forum.java.sun.com/thread.jspa?threadID=226413&messageID=818728

It will give you the basic streaming model.... ie a POST on the browser side 
starts sucking down the file from the servlet...
So in effect you do not have to worry about the whole file been buffered... 
it will move across the wire in say 4k blocks.

OK... now if you on a LAN... thats cool, but for the internet this is not 
good enough...

If you have a look at something like the Opera browser's file transfer it 
does some cool things, like if you shut down the machine, next time you 
start up again, it will pick up where you stopped it, it doesnt start with 
the whole 1 gb file again....
In fact if I built a servlet to do this... I would run the Opera browser 
download against it and stop and start and see it my servlet is to spec...

I think the way to do it is to to modify the code in that link for RANDOM 
file access... ie the client knows its got 800 MB already and only asks for 
800MB onwards.... so how do they do that.

Look at this link..... 
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
and look at the byte range header spec..... I would build the servlet to 
that.

With such large files.... I think the biggest problem to solve is 
interruptions.... ie give the user the ability to close the client and go 
home... tomorrow it starts where it left off..... I think file change 
headers (almost RSS) type stuff comes into this as well, in case the file is 
modified before it all downloaded, in which case it MUST start again....

Nice question.... I'm surprized I cant find code like this on the web 
already.... have a good look around.... it must be there.

You could also add compression to it... and I think GZIP is an option in TC 
already, or you can just compress the blocks as they go...
If its a JPG or GIF.... dont bother with compression it already is.... nice 
little project

Good Luck....


















---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by "Mark H. Wood" <mw...@IUPUI.Edu>.
On Sat, Oct 13, 2007 at 11:02:59AM -0400, Jim Cox wrote:
> On 10/13/07, Christopher Schultz <ch...@christopherschultz.net> wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > David,
> >
> > David Kerber wrote:
> > > Let me give a bit more detail:  I am working on a utility function in my
> > > webapp that will periodically copy the database file from the db server
> > > to a backup server on the LAN.
> >
> > Uh.... cron and cp, anyone? You can even use cron to initiate the
> > backup, too, instead of scripting it from your webapp.
> 
> Or in this case scp (or rcp, or sftp, or ftp) ?

Definitely.  While you're at it, just pipe the backup stream through
instead of collecting a huge file on host A and then moving it to host
B.  I have some DBMS backups that run that way.

-- 
Mark H. Wood, Lead System Programmer   mwood@IUPUI.Edu
Typically when a software vendor says that a product is "intuitive" he
means the exact opposite.


Re: Copying large files around

Posted by Jim Cox <sh...@gmail.com>.
On 10/13/07, Christopher Schultz <ch...@christopherschultz.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> David,
>
> David Kerber wrote:
> > Let me give a bit more detail:  I am working on a utility function in my
> > webapp that will periodically copy the database file from the db server
> > to a backup server on the LAN.
>
> Uh.... cron and cp, anyone? You can even use cron to initiate the
> backup, too, instead of scripting it from your webapp.

Or in this case scp (or rcp, or sftp, or ftp) ?

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David,

David Kerber wrote:
> Does Windows 2003 server have a version of cron?

Yes, it's called the Microsoft Task Scheduler.

> I've never used it. 

:(

> And building it into my webapp lets all the configuration be done from
> the same interface..

Fair enough. :(

- -chris

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHET+A9CaO5/Lv0PARAmstAJ90SYrU6rhvCQtmglr8SGrlPtk2XQCfTjY8
UlGp14bytS4qjwhBw3TqcYc=
=GafA
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by Johnny Kewl <jo...@kewlstuff.co.za>.
---------------------------------------------------------------------------
HARBOR: http://coolharbor.100free.com/index.htm
Now Tomcat is also a cool application server
---------------------------------------------------------------------------
----- Original Message ----- 
From: "David Kerber" <dc...@verizon.net>
To: "Tomcat Users List" <us...@tomcat.apache.org>
Sent: Saturday, October 13, 2007 5:29 PM
Subject: Re: Copying large files around


> Christopher Schultz wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> David,
>>
>> David Kerber wrote:
>>
>>> Let me give a bit more detail:  I am working on a utility function in my
>>> webapp that will periodically copy the database file from the db server
>>> to a backup server on the LAN.
>>>
>>
>> Uh.... cron and cp, anyone? You can even use cron to initiate the
>> backup, too, instead of scripting it from your webapp.
>>
>> Why doesn't anyone use cron anymore? Sheesh...
>>
> Does Windows 2003 server have a version of cron?  I've never used it.  And 
> building it into my webapp lets all the configuration be done from the 
> same interface..

No.... not that I know of... but there are freeware ports to windows.

If its windows based, or perhaps windows to Samba.... and its just an admin 
function, like a scheduled copy... then have a look at Microsofts attempt at 
linux type scripting... its called
WSH... its a vb or Javascript type language that lets you do things like 
copy files, and I imagine it will interface with scheduled tasks, services 
and all the rest of it.

Looks like this
Set FSO = CreateObject("Scripting.FileSystemObject")

If Not FSO.FolderExists(FolderName) Then

   FSO.CreateFolder(FolderName)

End If

Have Fun...


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by PTS <pa...@gmail.com>.
Yes. It has AT (command line) and a scheduler (GUI). You can do a batch file 
and set it to run at a certain time, day and frequency. I had a client that 
I setup the dhcp service to stop, copy the database over to a backup 
directory and then restart the service in a batch file. Then I called it 
every night before the tape backup ran.
AT is a lot lighter than Scheduler so using it is preferable.

Doug

----- Original Message ----- 
From: "David Kerber" <dc...@verizon.net>
To: "Tomcat Users List" <us...@tomcat.apache.org>
Sent: Saturday, October 13, 2007 11:29 AM
Subject: Re: Copying large files around


> Christopher Schultz wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> David,
>>
>> David Kerber wrote:
>>
>>> Let me give a bit more detail:  I am working on a utility function in my
>>> webapp that will periodically copy the database file from the db server
>>> to a backup server on the LAN.
>>>
>>
>> Uh.... cron and cp, anyone? You can even use cron to initiate the
>> backup, too, instead of scripting it from your webapp.
>>
>> Why doesn't anyone use cron anymore? Sheesh...
>>
> Does Windows 2003 server have a version of cron?  I've never used it.  And 
> building it into my webapp lets all the configuration be done from the 
> same interface..
>
> D
>
>
>
> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by David Kerber <dc...@verizon.net>.
Martin Gainty wrote:
> Sadly no cron (and sadly no .htaccess file available either)
> wrap the java %java_opts% -cp %classpath% classname
> in either a cmd or .bat
> On a 7 year old wintel box the application is called Task Scheduler where
> you want to select that cmd/bat
> Be sure to set sufficient security permissions to execute as well as
> read/write whatever resources you need
> (no catalina.policy to reference permissions when running standalone)
>
> One more thing-
> Click on Icon/Settings/Power Management/Dont start if computer is running on
> batteries!
>
> HTH/
> Martin--
>   

Seems like my idea of just writing a TimerTask to do this would be more 
straightforward, since I'm doing other stuff with java.util.Timer's 
anyway...

D



---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by David Kerber <dc...@verizon.net>.
Christopher Schultz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> David,
>
> David Kerber wrote:
>   
>> Let me give a bit more detail:  I am working on a utility function in my
>> webapp that will periodically copy the database file from the db server
>> to a backup server on the LAN.
>>     
>
> Uh.... cron and cp, anyone? You can even use cron to initiate the
> backup, too, instead of scripting it from your webapp.
>
> Why doesn't anyone use cron anymore? Sheesh...
>   
Does Windows 2003 server have a version of cron?  I've never used it.  
And building it into my webapp lets all the configuration be done from 
the same interface..

D



---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David,

David Kerber wrote:
> Let me give a bit more detail:  I am working on a utility function in my
> webapp that will periodically copy the database file from the db server
> to a backup server on the LAN.

Uh.... cron and cp, anyone? You can even use cron to initiate the
backup, too, instead of scripting it from your webapp.

Why doesn't anyone use cron anymore? Sheesh...

- -chris

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHENq59CaO5/Lv0PARAuicAJ4+JoPzVDN/RS9iAxTRqXYdw1LM0QCeNOf3
70G+J4HF0wYuPZAyJRvRrf0=
=c706
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by David Kerber <dc...@verizon.net>.
Pid wrote:
> David Kerber wrote:
>   
>> Pid wrote:
>>     
>>> David Kerber wrote:
>>>  
>>>       
>>>> What is the most efficient (=fastest) way of copying large (> 1GB [yes,
>>>> that's a Giga]) files around the network in java when running under
>>>> tomcat 5.5.x?  Do I use a FileInputStream and FileOutputStream with a
>>>> large byte[] array?  Or what?
>>>>     
>>>>         
>>> I think that the NIO apis were intended to handle speedier transfers of
>>> data across network connections.
>>>
>>> It depends how you wish to move the data.  Google provided a wealth of
>>> results of mixed tutorials and API explanations for "java nio".
>>>
>>> p
>>>   
>>>       
>> Let me give a bit more detail:  I am working on a utility function in my
>> webapp that will periodically copy the database file from the db server
>> to a backup server on the LAN.  This particular function does not use
>> the internet, it's strictly copying on the LAN.  I don't care what
>> method it uses as long as it's fairly simple to code, and moves fast. 
>> I've already got the creation of the backup file handled; that's a
>> database function.  It's just the copying to another machine that I'm
>> working on now.
>>     
>
> Can you not make the second DB a slave of the main one, it'll be
> permanently up to date that way and it doesn't require any copying.
>
> For integrity to be maintained during the copy you'd have to stop the
> database otherwise, or you could have unpleasant conditions occuring
> where some database writing occurs while you're copying it.
>   
No, database mirroring isn't an option for this particular case; it's 
rather cost-sensitive, and the customer decided that the extra 
reliability wasn't worth the extra cost.  This database has a backup 
procedure that is specifically designed to handle transactions that 
occur during the backup without causing trouble.  Plus, the full backup 
is done during the middle of the night when there is normally nothing 
going on (there is only activity about 18 hours per day in this app).  
The other backups during the day are incrementals.


D



---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by Pid <p...@pidster.com>.
David Kerber wrote:
> Pid wrote:
>> David Kerber wrote:
>>  
>>> What is the most efficient (=fastest) way of copying large (> 1GB [yes,
>>> that's a Giga]) files around the network in java when running under
>>> tomcat 5.5.x?  Do I use a FileInputStream and FileOutputStream with a
>>> large byte[] array?  Or what?
>>>     
>>
>> I think that the NIO apis were intended to handle speedier transfers of
>> data across network connections.
>>
>> It depends how you wish to move the data.  Google provided a wealth of
>> results of mixed tutorials and API explanations for "java nio".
>>
>> p
>>   
> Let me give a bit more detail:  I am working on a utility function in my
> webapp that will periodically copy the database file from the db server
> to a backup server on the LAN.  This particular function does not use
> the internet, it's strictly copying on the LAN.  I don't care what
> method it uses as long as it's fairly simple to code, and moves fast. 
> I've already got the creation of the backup file handled; that's a
> database function.  It's just the copying to another machine that I'm
> working on now.

Can you not make the second DB a slave of the main one, it'll be
permanently up to date that way and it doesn't require any copying.

For integrity to be maintained during the copy you'd have to stop the
database otherwise, or you could have unpleasant conditions occuring
where some database writing occurs while you're copying it.

p



> D
> 
> 
> 
> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 
> 


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by David Kerber <dc...@verizon.net>.
Pid wrote:
> David Kerber wrote:
>   
>> What is the most efficient (=fastest) way of copying large (> 1GB [yes,
>> that's a Giga]) files around the network in java when running under
>> tomcat 5.5.x?  Do I use a FileInputStream and FileOutputStream with a
>> large byte[] array?  Or what?
>>     
>
> I think that the NIO apis were intended to handle speedier transfers of
> data across network connections.
>
> It depends how you wish to move the data.  Google provided a wealth of
> results of mixed tutorials and API explanations for "java nio".
>
> p
>   
Let me give a bit more detail:  I am working on a utility function in my 
webapp that will periodically copy the database file from the db server 
to a backup server on the LAN.  This particular function does not use 
the internet, it's strictly copying on the LAN.  I don't care what 
method it uses as long as it's fairly simple to code, and moves fast.  
I've already got the creation of the backup file handled; that's a 
database function.  It's just the copying to another machine that I'm 
working on now.

D



---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Copying large files around

Posted by Pid <p...@pidster.com>.
David Kerber wrote:
> What is the most efficient (=fastest) way of copying large (> 1GB [yes,
> that's a Giga]) files around the network in java when running under
> tomcat 5.5.x?  Do I use a FileInputStream and FileOutputStream with a
> large byte[] array?  Or what?

I think that the NIO apis were intended to handle speedier transfers of
data across network connections.

It depends how you wish to move the data.  Google provided a wealth of
results of mixed tutorials and API explanations for "java nio".

p



> D
> 
> 
> 
> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 
> 


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org