You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Yogesh <yo...@gmail.com> on 2010/11/05 22:56:59 UTC

Save URLs to PDFs?

Hi,

I have PDFs which I can access through URLs. I want to download and save it
to files. How can I go about it?

Thanks

-Yogesh

Re: Save URLs to PDFs?

Posted by Yogesh <yo...@gmail.com>.

Following is the code and the URL.


URL url = new URL ("
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947364/pdf/msb201055.pdf?tool=pmcentrez
");


            con = url.openConnection();


            InputStream in = con.getInputStream();



            FileWriter out = new FileWriter("C:/My.pdf");

            int next = 0;

            while ( ( next = in.read() ) != -1  ) {

             out.write(next);
            }


Thanks,


- Yogesh



On 5 November 2010 18:31, Grant Overby <gr...@floorsoft.com> wrote:

> Hrm, That's odd.
>
> Can you post the code you tried? An perhaps the url?
>
>
> --
> Grant Overby
> Senior Developer
> FloorSoft, Inc.
>
> Often people, especially computer engineers, focus on the machines. They
> think, "By doing this, the machine will run faster. By doing this, the
> machine will run more effectively. By doing this, the machine will something
> something something." They are focusing on machines. But in fact we need to
> focus on humans, on how humans care about doing programming or operating the
> application of the machines. We are the masters. They are the slaves. --
> Yukihiro Matsumoto
>
>
>
>
> On Fri, Nov 5, 2010 at 6:27 PM, Yogesh <yo...@gmail.com> wrote:
>
>> Yes. I can download the file through the browser. It works perfectly fine.
>>
>> - Yogesh
>>
>>
>>
>> On 5 November 2010 18:25, Grant Overby <gr...@floorsoft.com> wrote:
>>
>>> If you download the file through a browser? Does it work then?
>>>
>>>
>>> --
>>> Grant Overby
>>> Senior Developer
>>> FloorSoft, Inc.
>>>
>>> Often people, especially computer engineers, focus on the machines. They
>>> think, "By doing this, the machine will run faster. By doing this, the
>>> machine will run more effectively. By doing this, the machine will something
>>> something something." They are focusing on machines. But in fact we need to
>>> focus on humans, on how humans care about doing programming or operating the
>>> application of the machines. We are the masters. They are the slaves. --
>>> Yukihiro Matsumoto
>>>
>>>
>>>
>>>
>>> On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <yo...@gmail.com> wrote:
>>>
>>>> I tried with that, it writes a blank PDF. Though, the file size and the
>>>> number of pages is correct (for the new written file)
>>>>
>>>> - Yogesh
>>>>
>>>>
>>>>
>>>>
>>>> On 5 November 2010 18:09, Grant Overby <gr...@floorsoft.com> wrote:
>>>>
>>>>> You don't need pdfBox to do this. Below is some rough code that allows
>>>>> you
>>>>> to download a file and save it.
>>>>>
>>>>> URLConnection urlConnection = new URL("http://...");
>>>>> InputStream   in      = urlConnection.getInputStream();
>>>>> FileWriter out = new FileWriter("my.pdf");
>>>>> int next = 0;
>>>>> while ( ( next = in.read() ) != -1  ) out.write(next);
>>>>> //close everything
>>>>>
>>>>> --
>>>>> Grant Overby
>>>>> Senior Developer
>>>>> FloorSoft, Inc.
>>>>>
>>>>> Often people, especially computer engineers, focus on the machines.
>>>>> They
>>>>> think, "By doing this, the machine will run faster. By doing this, the
>>>>> machine will run more effectively. By doing this, the machine will
>>>>> something
>>>>> something something." They are focusing on machines. But in fact we
>>>>> need to
>>>>> focus on humans, on how humans care about doing programming or
>>>>> operating the
>>>>> application of the machines. We are the masters. They are the slaves.
>>>>> --
>>>>> Yukihiro Matsumoto
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <yo...@gmail.com> wrote:
>>>>>
>>>>> > Hi,
>>>>> >
>>>>> > I have PDFs which I can access through URLs. I want to download and
>>>>> save it
>>>>> > to files. How can I go about it?
>>>>> >
>>>>> > Thanks
>>>>> >
>>>>> > -Yogesh
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Save URLs to PDFs?

Posted by Grant Overby <gr...@floorsoft.com>.

Hrm, That's odd.

Can you post the code you tried? An perhaps the url?

--
Grant Overby
Senior Developer
FloorSoft, Inc.

Often people, especially computer engineers, focus on the machines. They
think, "By doing this, the machine will run faster. By doing this, the
machine will run more effectively. By doing this, the machine will something
something something." They are focusing on machines. But in fact we need to
focus on humans, on how humans care about doing programming or operating the
application of the machines. We are the masters. They are the slaves. --
Yukihiro Matsumoto




On Fri, Nov 5, 2010 at 6:27 PM, Yogesh <yo...@gmail.com> wrote:

> Yes. I can download the file through the browser. It works perfectly fine.
>
> - Yogesh
>
>
>
> On 5 November 2010 18:25, Grant Overby <gr...@floorsoft.com> wrote:
>
>> If you download the file through a browser? Does it work then?
>>
>>
>> --
>> Grant Overby
>> Senior Developer
>> FloorSoft, Inc.
>>
>> Often people, especially computer engineers, focus on the machines. They
>> think, "By doing this, the machine will run faster. By doing this, the
>> machine will run more effectively. By doing this, the machine will something
>> something something." They are focusing on machines. But in fact we need to
>> focus on humans, on how humans care about doing programming or operating the
>> application of the machines. We are the masters. They are the slaves. --
>> Yukihiro Matsumoto
>>
>>
>>
>>
>> On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <yo...@gmail.com> wrote:
>>
>>> I tried with that, it writes a blank PDF. Though, the file size and the
>>> number of pages is correct (for the new written file)
>>>
>>> - Yogesh
>>>
>>>
>>>
>>>
>>> On 5 November 2010 18:09, Grant Overby <gr...@floorsoft.com> wrote:
>>>
>>>> You don't need pdfBox to do this. Below is some rough code that allows
>>>> you
>>>> to download a file and save it.
>>>>
>>>> URLConnection urlConnection = new URL("http://...");
>>>> InputStream   in      = urlConnection.getInputStream();
>>>> FileWriter out = new FileWriter("my.pdf");
>>>> int next = 0;
>>>> while ( ( next = in.read() ) != -1  ) out.write(next);
>>>> //close everything
>>>>
>>>> --
>>>> Grant Overby
>>>> Senior Developer
>>>> FloorSoft, Inc.
>>>>
>>>> Often people, especially computer engineers, focus on the machines. They
>>>> think, "By doing this, the machine will run faster. By doing this, the
>>>> machine will run more effectively. By doing this, the machine will
>>>> something
>>>> something something." They are focusing on machines. But in fact we need
>>>> to
>>>> focus on humans, on how humans care about doing programming or operating
>>>> the
>>>> application of the machines. We are the masters. They are the slaves. --
>>>> Yukihiro Matsumoto
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <yo...@gmail.com> wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > I have PDFs which I can access through URLs. I want to download and
>>>> save it
>>>> > to files. How can I go about it?
>>>> >
>>>> > Thanks
>>>> >
>>>> > -Yogesh
>>>> >
>>>>
>>>
>>>
>>
>

Re: Save URLs to PDFs?

Posted by Ad...@swmc.com.

Yogesh,

Compare the file size and hash (SHA1, MD5, etc.) of the file you download 
from your browser with the file that Java downloads.  The end of the file 
may be missing when you download it via Java.  I know you said the file 
size is correct, but is it the *exact* same number of bytes?  If so, then 
the content must be different, and it should just be a matter of running 
`diff` on the files to see what's going wrong.

---- 
Thanks,
Adam





From:
Yogesh <yo...@gmail.com>
To:
grant@floorsoft.com
Cc:
users@pdfbox.apache.org
Date:
11/05/2010 15:29
Subject:
Re: Save URLs to PDFs?



Yes. I can download the file through the browser. It works perfectly fine.

- Yogesh



On 5 November 2010 18:25, Grant Overby <gr...@floorsoft.com> wrote:

> If you download the file through a browser? Does it work then?
>
>
> --
> Grant Overby
> Senior Developer
> FloorSoft, Inc.
>
> Often people, especially computer engineers, focus on the machines. They
> think, "By doing this, the machine will run faster. By doing this, the
> machine will run more effectively. By doing this, the machine will 
something
> something something." They are focusing on machines. But in fact we need 
to
> focus on humans, on how humans care about doing programming or operating 
the
> application of the machines. We are the masters. They are the slaves. --
> Yukihiro Matsumoto
>
>
>
>
> On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <yo...@gmail.com> wrote:
>
>> I tried with that, it writes a blank PDF. Though, the file size and the
>> number of pages is correct (for the new written file)
>>
>> - Yogesh
>>
>>
>>
>>
>> On 5 November 2010 18:09, Grant Overby <gr...@floorsoft.com> wrote:
>>
>>> You don't need pdfBox to do this. Below is some rough code that allows
>>> you
>>> to download a file and save it.
>>>
>>> URLConnection urlConnection = new URL("http://...");
>>> InputStream   in      = urlConnection.getInputStream();
>>> FileWriter out = new FileWriter("my.pdf");
>>> int next = 0;
>>> while ( ( next = in.read() ) != -1  ) out.write(next);
>>> //close everything
>>>
>>> --
>>> Grant Overby
>>> Senior Developer
>>> FloorSoft, Inc.
>>>
>>> Often people, especially computer engineers, focus on the machines. 
They
>>> think, "By doing this, the machine will run faster. By doing this, the
>>> machine will run more effectively. By doing this, the machine will
>>> something
>>> something something." They are focusing on machines. But in fact we 
need
>>> to
>>> focus on humans, on how humans care about doing programming or 
operating
>>> the
>>> application of the machines. We are the masters. They are the slaves. 
--
>>> Yukihiro Matsumoto
>>>
>>>
>>>
>>>
>>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <yo...@gmail.com> wrote:
>>>
>>> > Hi,
>>> >
>>> > I have PDFs which I can access through URLs. I want to download and
>>> save it
>>> > to files. How can I go about it?
>>> >
>>> > Thanks
>>> >
>>> > -Yogesh
>>> >
>>>
>>
>>
>



- FHA 203b; 203k; HECM; VA; USDA; Conventional 
- Warehouse Lines; FHA-Authorized Originators 
- Lending and Servicing in over 45 States 
www.swmc.com   -  www.simplehecmcalculator.com   Visit  www.swmc.com/resources   for helpful links on Training, Webinars, Lender Alerts and Submitting Conditions  
This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.

Re: Save URLs to PDFs?

Posted by Grant Overby <gr...@floorsoft.com>.

Also, My.pdf has EOF token "%%EOF"

--
Grant Overby
Senior Developer
FloorSoft, Inc.

Often people, especially computer engineers, focus on the machines. They
think, "By doing this, the machine will run faster. By doing this, the
machine will run more effectively. By doing this, the machine will something
something something." They are focusing on machines. But in fact we need to
focus on humans, on how humans care about doing programming or operating the
application of the machines. We are the masters. They are the slaves. --
Yukihiro Matsumoto




On Fri, Nov 5, 2010 at 6:53 PM, Grant Overby <gr...@floorsoft.com> wrote:

> I ran the code [2]. The pdf is corrupted by the code as MD5s are different.
> File sizes are identical [1];
>
> 1:
> 11/05/2010  06:47 PM         2,371,050 msb201055.pdf
> 11/05/2010  06:46 PM         2,371,050 My.pdf
>
>
>
> 2:
> package s;
>
> import java.io.FileWriter;
> import java.io.InputStream;
> import java.io.IOException;
> import java.net.URL;
> import java.net.URLConnection;
> import java.net.MalformedURLException;
>
> public class Main
> {
>   public static void main(String[] args) throws IOException
>    {
>     URL url = new URL("
> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947364/pdf/msb201055.pdf?tool=pmcentrez
> ");
>
>     URLConnection con = url.openConnection();
>
>     InputStream in = con.getInputStream();
>
>     FileWriter out = new FileWriter("C:/My.pdf");
>
>     int next = 0;
>     while ( ( next = in.read() ) != -1  ) {
>       out.write(next);
>     }
>     out.flush();
>     out.close();
>     in.close();
>   }
> }
>
>
>
>
> --
> Grant Overby
> Senior Developer
> FloorSoft, Inc.
>
> Often people, especially computer engineers, focus on the machines. They
> think, "By doing this, the machine will run faster. By doing this, the
> machine will run more effectively. By doing this, the machine will something
> something something." They are focusing on machines. But in fact we need to
> focus on humans, on how humans care about doing programming or operating the
> application of the machines. We are the masters. They are the slaves. --
> Yukihiro Matsumoto
>
>
>
>
> On Fri, Nov 5, 2010 at 6:45 PM, <Ad...@swmc.com> wrote:
>
>> Yogesh,
>>
>> Compare the file size and hash (SHA1, MD5, etc.) of the file you download
>> from your browser with the file that Java downloads.  The end of the file
>> may be missing when you download it via Java.  I know you said the file
>> size is correct, but is it the *exact* same number of bytes?  If so, then
>> the content must be different, and it should just be a matter of running
>> `diff` on the files to see what's going wrong.
>>
>> ----
>> Thanks,
>> Adam
>>
>>
>>
>>
>>
>> From:
>> Yogesh <yo...@gmail.com>
>> To:
>> grant@floorsoft.com
>> Cc:
>> users@pdfbox.apache.org
>> Date:
>> 11/05/2010 15:29
>> Subject:
>> Re: Save URLs to PDFs?
>>
>>
>>
>> Yes. I can download the file through the browser. It works perfectly fine.
>>
>> - Yogesh
>>
>>
>>
>> On 5 November 2010 18:25, Grant Overby <gr...@floorsoft.com> wrote:
>>
>> > If you download the file through a browser? Does it work then?
>> >
>> >
>> > --
>> > Grant Overby
>> > Senior Developer
>> > FloorSoft, Inc.
>> >
>> > Often people, especially computer engineers, focus on the machines. They
>> > think, "By doing this, the machine will run faster. By doing this, the
>> > machine will run more effectively. By doing this, the machine will
>> something
>> > something something." They are focusing on machines. But in fact we need
>> to
>> > focus on humans, on how humans care about doing programming or operating
>> the
>> > application of the machines. We are the masters. They are the slaves. --
>> > Yukihiro Matsumoto
>> >
>> >
>> >
>> >
>> > On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <yo...@gmail.com> wrote:
>> >
>> >> I tried with that, it writes a blank PDF. Though, the file size and the
>> >> number of pages is correct (for the new written file)
>> >>
>> >> - Yogesh
>> >>
>> >>
>> >>
>> >>
>> >> On 5 November 2010 18:09, Grant Overby <gr...@floorsoft.com> wrote:
>> >>
>> >>> You don't need pdfBox to do this. Below is some rough code that allows
>> >>> you
>> >>> to download a file and save it.
>> >>>
>> >>> URLConnection urlConnection = new URL("http://...");
>> >>> InputStream   in      = urlConnection.getInputStream();
>> >>> FileWriter out = new FileWriter("my.pdf");
>> >>> int next = 0;
>> >>> while ( ( next = in.read() ) != -1  ) out.write(next);
>> >>> //close everything
>> >>>
>> >>> --
>> >>> Grant Overby
>> >>> Senior Developer
>> >>> FloorSoft, Inc.
>> >>>
>> >>> Often people, especially computer engineers, focus on the machines.
>> They
>> >>> think, "By doing this, the machine will run faster. By doing this, the
>> >>> machine will run more effectively. By doing this, the machine will
>> >>> something
>> >>> something something." They are focusing on machines. But in fact we
>> need
>> >>> to
>> >>> focus on humans, on how humans care about doing programming or
>> operating
>> >>> the
>> >>> application of the machines. We are the masters. They are the slaves.
>> --
>> >>> Yukihiro Matsumoto
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <yo...@gmail.com> wrote:
>> >>>
>> >>> > Hi,
>> >>> >
>> >>> > I have PDFs which I can access through URLs. I want to download and
>> >>> save it
>> >>> > to files. How can I go about it?
>> >>> >
>> >>> > Thanks
>> >>> >
>> >>> > -Yogesh
>> >>> >
>> >>>
>> >>
>> >>
>> >
>>
>>
>>
>> - FHA 203b; 203k; HECM; VA; USDA; Conventional
>> - Warehouse Lines; FHA-Authorized Originators
>> - Lending and Servicing in over 45 States
>> www.swmc.com   -  www.simplehecmcalculator.com   Visit
>> www.swmc.com/resources   for helpful links on Training, Webinars, Lender
>> Alerts and Submitting Conditions
>> This email and any content within or attached hereto from Sun West
>> Mortgage Company, Inc. is confidential and/or legally privileged. The
>> information is intended only for the use of the individual or entity named
>> on this email.. If you are not the intended recipient, you are hereby
>> notified that any disclosure, copying, distribution or taking any action in
>> reliance on the contents of this email information is strictly prohibited,
>> and that the documents should be returned to this office immediately by
>> email. Receipt by anyone other than the intended recipient is not a waiver
>> of any privilege. Please do not include your social security number, account
>> number, or any other personal or financial information in the content of the
>> email. Should you have any questions, please call (800) 453 7884.  =
>>
>
>

Re: Save URLs to PDFs?

Posted by Grant Overby <gr...@floorsoft.com>.

Michael:
You're right. Copy / paste for the loose; I read right over it. :(

corrected code:


package s;

import java.io.FileWriter;
import java.io.InputStream;
import java.io.IOException;
import java.io.FileOutputStream;
import java.net.URL;
import java.net.URLConnection;
import java.net.MalformedURLException;

public class Main
{
  public static void main(String[] args) throws IOException
  {
    URL url = new URL("
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947364/pdf/msb201055.pdf?tool=pmcentrez
");

    URLConnection con = url.openConnection();

    InputStream in = con.getInputStream();

    FileOutputStream out = new FileOutputStream("C:/My.pdf");

    int next = 0;
    while ( ( next = in.read() ) != -1  ) {
      out.write(next);
    }
    out.flush();
    out.close();
    in.close();
  }
}


--
Grant Overby
Senior Developer
FloorSoft, Inc.

Often people, especially computer engineers, focus on the machines. They
think, "By doing this, the machine will run faster. By doing this, the
machine will run more effectively. By doing this, the machine will something
something something." They are focusing on machines. But in fact we need to
focus on humans, on how humans care about doing programming or operating the
application of the machines. We are the masters. They are the slaves. --
Yukihiro Matsumoto




On Fri, Nov 5, 2010 at 7:18 PM, Michael Baehr <co...@googlemail.com>wrote:

> From the JDK docs:
>
> FileWriter is meant for writing streams of characters. For writing streams
> of raw bytes, consider using a FileOutputStream.
>
> You get characters replaced depending on your platforms character encoding.
> You must ensure you're writing bytes and not characters!
>
> Michael
>
> On 5. Nov 2010, at 18:14, Grant Overby wrote:
>
> > First difference (on second line, first line is for reference point):
> >
> > bad:
> > <</Length 1372/E 1779/Filter/FlateDecode/I 1811/L 1795/O 1741/S 1423/T
> > 1676/V 1757>>stream
> > xÚ?U LSW >?Û O)]Wä!Ô>?"CATl?4PkADy ? ?RjgÊ??< õ A
> >
> > Start of second line in hex:   78 DA 3F 55 0B 4C 53 57
> >
> > good:
> > <</Length 1372/E 1779/Filter/FlateDecode/I 1811/L 1795/O 1741/S 1423/T
> > 1676/V 1757>>stream
> > xÚ”U LSW >—Û O)]Wä!Ô>˜"CATl”4PkADy ‹ –RjgÊˆˆ< õ A
> >
> > Start of second line in hex:   78 DA 94 55 0B 4C 53 57
> >
> >
> >
> >
> > Isolated incorrect single characters are throughout the document.
> > Downloading it multiple times shows consistant errors.
> >
> >
> > I'll keep thinking on it, but nothing is apparent to me. This shouldn't
> > happen afaik.
> >
> >
> > Anyone?
> >
> > --
> > Grant Overby
> > Senior Developer
> > FloorSoft, Inc.
> >
> > Often people, especially computer engineers, focus on the machines. They
> > think, "By doing this, the machine will run faster. By doing this, the
> > machine will run more effectively. By doing this, the machine will
> something
> > something something." They are focusing on machines. But in fact we need
> to
> > focus on humans, on how humans care about doing programming or operating
> the
> > application of the machines. We are the masters. They are the slaves. --
> > Yukihiro Matsumoto
> >
> >
> >
> >
> > On Fri, Nov 5, 2010 at 6:58 PM, Yogesh <yo...@gmail.com> wrote:
> >
> >> Thanks Grant.
> >> But I have thousands of PDF URLs like this. I have tried around 12 so
> far.
> >> Can all of them be corrupt?
> >>
> >> What can I do about this?
> >>
> >>
> >> - Yogesh
> >>
> >>
> >>
> >>
> >> On 5 November 2010 18:53, Grant Overby <gr...@floorsoft.com> wrote:
> >>
> >>> I ran the code [2]. The pdf is corrupted by the code as MD5s are
> >>> different.
> >>> File sizes are identical [1];
> >>>
> >>> 1:
> >>> 11/05/2010  06:47 PM         2,371,050 msb201055.pdf
> >>> 11/05/2010  06:46 PM         2,371,050 My.pdf
> >>>
> >>>
> >>>
> >>> 2:
> >>> package s;
> >>>
> >>> import java.io.FileWriter;
> >>> import java.io.InputStream;
> >>> import java.io.IOException;
> >>> import java.net.URL;
> >>> import java.net.URLConnection;
> >>> import java.net.MalformedURLException;
> >>>
> >>> public class Main
> >>> {
> >>> public static void main(String[] args) throws IOException
> >>>  {
> >>>   URL url = new URL("
> >>>
> >>>
> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947364/pdf/msb201055.pdf?tool=pmcentrez
> >>> ");
> >>>
> >>>    URLConnection con = url.openConnection();
> >>>
> >>>   InputStream in = con.getInputStream();
> >>>
> >>>   FileWriter out = new FileWriter("C:/My.pdf");
> >>>
> >>>   int next = 0;
> >>>   while ( ( next = in.read() ) != -1  ) {
> >>>     out.write(next);
> >>>   }
> >>>    out.flush();
> >>>   out.close();
> >>>   in.close();
> >>>  }
> >>> }
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Grant Overby
> >>> Senior Developer
> >>> FloorSoft, Inc.
> >>>
> >>> Often people, especially computer engineers, focus on the machines.
> They
> >>> think, "By doing this, the machine will run faster. By doing this, the
> >>> machine will run more effectively. By doing this, the machine will
> >>> something
> >>> something something." They are focusing on machines. But in fact we
> need
> >>> to
> >>> focus on humans, on how humans care about doing programming or
> operating
> >>> the
> >>> application of the machines. We are the masters. They are the slaves.
> --
> >>> Yukihiro Matsumoto
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Nov 5, 2010 at 6:45 PM, <Ad...@swmc.com> wrote:
> >>>
> >>>> Yogesh,
> >>>>
> >>>> Compare the file size and hash (SHA1, MD5, etc.) of the file you
> >>> download
> >>>> from your browser with the file that Java downloads.  The end of the
> >>> file
> >>>> may be missing when you download it via Java.  I know you said the
> file
> >>>> size is correct, but is it the *exact* same number of bytes?  If so,
> >>> then
> >>>> the content must be different, and it should just be a matter of
> running
> >>>> `diff` on the files to see what's going wrong.
> >>>>
> >>>> ----
> >>>> Thanks,
> >>>> Adam
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> From:
> >>>> Yogesh <yo...@gmail.com>
> >>>> To:
> >>>> grant@floorsoft.com
> >>>> Cc:
> >>>> users@pdfbox.apache.org
> >>>> Date:
> >>>> 11/05/2010 15:29
> >>>> Subject:
> >>>> Re: Save URLs to PDFs?
> >>>>
> >>>>
> >>>>
> >>>> Yes. I can download the file through the browser. It works perfectly
> >>> fine.
> >>>>
> >>>> - Yogesh
> >>>>
> >>>>
> >>>>
> >>>> On 5 November 2010 18:25, Grant Overby <gr...@floorsoft.com> wrote:
> >>>>
> >>>>> If you download the file through a browser? Does it work then?
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Grant Overby
> >>>>> Senior Developer
> >>>>> FloorSoft, Inc.
> >>>>>
> >>>>> Often people, especially computer engineers, focus on the machines.
> >>> They
> >>>>> think, "By doing this, the machine will run faster. By doing this,
> the
> >>>>> machine will run more effectively. By doing this, the machine will
> >>>> something
> >>>>> something something." They are focusing on machines. But in fact we
> >>> need
> >>>> to
> >>>>> focus on humans, on how humans care about doing programming or
> >>> operating
> >>>> the
> >>>>> application of the machines. We are the masters. They are the slaves.
> >>> --
> >>>>> Yukihiro Matsumoto
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <yo...@gmail.com> wrote:
> >>>>>
> >>>>>> I tried with that, it writes a blank PDF. Though, the file size and
> >>> the
> >>>>>> number of pages is correct (for the new written file)
> >>>>>>
> >>>>>> - Yogesh
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 5 November 2010 18:09, Grant Overby <gr...@floorsoft.com> wrote:
> >>>>>>
> >>>>>>> You don't need pdfBox to do this. Below is some rough code that
> >>> allows
> >>>>>>> you
> >>>>>>> to download a file and save it.
> >>>>>>>
> >>>>>>> URLConnection urlConnection = new URL("http://...");
> >>>>>>> InputStream   in      = urlConnection.getInputStream();
> >>>>>>> FileWriter out = new FileWriter("my.pdf");
> >>>>>>> int next = 0;
> >>>>>>> while ( ( next = in.read() ) != -1  ) out.write(next);
> >>>>>>> //close everything
> >>>>>>>
> >>>>>>> --
> >>>>>>> Grant Overby
> >>>>>>> Senior Developer
> >>>>>>> FloorSoft, Inc.
> >>>>>>>
> >>>>>>> Often people, especially computer engineers, focus on the machines.
> >>>> They
> >>>>>>> think, "By doing this, the machine will run faster. By doing this,
> >>> the
> >>>>>>> machine will run more effectively. By doing this, the machine will
> >>>>>>> something
> >>>>>>> something something." They are focusing on machines. But in fact we
> >>>> need
> >>>>>>> to
> >>>>>>> focus on humans, on how humans care about doing programming or
> >>>> operating
> >>>>>>> the
> >>>>>>> application of the machines. We are the masters. They are the
> >>> slaves.
> >>>> --
> >>>>>>> Yukihiro Matsumoto
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <yo...@gmail.com>
> wrote:
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I have PDFs which I can access through URLs. I want to download
> >>> and
> >>>>>>> save it
> >>>>>>>> to files. How can I go about it?
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>> -Yogesh
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> - FHA 203b; 203k; HECM; VA; USDA; Conventional
> >>>> - Warehouse Lines; FHA-Authorized Originators
> >>>> - Lending and Servicing in over 45 States
> >>>> www.swmc.com   -  www.simplehecmcalculator.com   Visit
> >>>> www.swmc.com/resources   for helpful links on Training, Webinars,
> >>> Lender
> >>>> Alerts and Submitting Conditions
> >>>> This email and any content within or attached hereto from Sun West
> >>> Mortgage
> >>>> Company, Inc. is confidential and/or legally privileged. The
> information
> >>> is
> >>>> intended only for the use of the individual or entity named on this
> >>> email..
> >>>> If you are not the intended recipient, you are hereby notified that
> any
> >>>> disclosure, copying, distribution or taking any action in reliance on
> >>> the
> >>>> contents of this email information is strictly prohibited, and that
> the
> >>>> documents should be returned to this office immediately by email.
> >>> Receipt by
> >>>> anyone other than the intended recipient is not a waiver of any
> >>> privilege.
> >>>> Please do not include your social security number, account number, or
> >>> any
> >>>> other personal or financial information in the content of the email.
> >>> Should
> >>>> you have any questions, please call (800) 453 7884.  =
> >>>>
> >>>
> >>
> >>
>
>

Re: Save URLs to PDFs?

Posted by Michael Baehr <co...@googlemail.com>.

From the JDK docs:

FileWriter is meant for writing streams of characters. For writing streams of raw bytes, consider using a FileOutputStream.

You get characters replaced depending on your platforms character encoding. You must ensure you're writing bytes and not characters!

Michael

On 5. Nov 2010, at 18:14, Grant Overby wrote:

> First difference (on second line, first line is for reference point):
> 
> bad:
> <</Length 1372/E 1779/Filter/FlateDecode/I 1811/L 1795/O 1741/S 1423/T
> 1676/V 1757>>stream
> xÚ?U LSW >?Û O)]Wä!Ô>?"CATl?4PkADy ? ?RjgÊ??< õ A
> 
> Start of second line in hex:   78 DA 3F 55 0B 4C 53 57
> 
> good:
> <</Length 1372/E 1779/Filter/FlateDecode/I 1811/L 1795/O 1741/S 1423/T
> 1676/V 1757>>stream
> xÚ”U LSW >—Û O)]Wä!Ô>˜"CATl”4PkADy ‹ –RjgÊˆˆ< õ A
> 
> Start of second line in hex:   78 DA 94 55 0B 4C 53 57
> 
> 
> 
> 
> Isolated incorrect single characters are throughout the document.
> Downloading it multiple times shows consistant errors.
> 
> 
> I'll keep thinking on it, but nothing is apparent to me. This shouldn't
> happen afaik.
> 
> 
> Anyone?
> 
> --
> Grant Overby
> Senior Developer
> FloorSoft, Inc.
> 
> Often people, especially computer engineers, focus on the machines. They
> think, "By doing this, the machine will run faster. By doing this, the
> machine will run more effectively. By doing this, the machine will something
> something something." They are focusing on machines. But in fact we need to
> focus on humans, on how humans care about doing programming or operating the
> application of the machines. We are the masters. They are the slaves. --
> Yukihiro Matsumoto
> 
> 
> 
> 
> On Fri, Nov 5, 2010 at 6:58 PM, Yogesh <yo...@gmail.com> wrote:
> 
>> Thanks Grant.
>> But I have thousands of PDF URLs like this. I have tried around 12 so far.
>> Can all of them be corrupt?
>> 
>> What can I do about this?
>> 
>> 
>> - Yogesh
>> 
>> 
>> 
>> 
>> On 5 November 2010 18:53, Grant Overby <gr...@floorsoft.com> wrote:
>> 
>>> I ran the code [2]. The pdf is corrupted by the code as MD5s are
>>> different.
>>> File sizes are identical [1];
>>> 
>>> 1:
>>> 11/05/2010  06:47 PM         2,371,050 msb201055.pdf
>>> 11/05/2010  06:46 PM         2,371,050 My.pdf
>>> 
>>> 
>>> 
>>> 2:
>>> package s;
>>> 
>>> import java.io.FileWriter;
>>> import java.io.InputStream;
>>> import java.io.IOException;
>>> import java.net.URL;
>>> import java.net.URLConnection;
>>> import java.net.MalformedURLException;
>>> 
>>> public class Main
>>> {
>>> public static void main(String[] args) throws IOException
>>>  {
>>>   URL url = new URL("
>>> 
>>> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947364/pdf/msb201055.pdf?tool=pmcentrez
>>> ");
>>> 
>>>    URLConnection con = url.openConnection();
>>> 
>>>   InputStream in = con.getInputStream();
>>> 
>>>   FileWriter out = new FileWriter("C:/My.pdf");
>>> 
>>>   int next = 0;
>>>   while ( ( next = in.read() ) != -1  ) {
>>>     out.write(next);
>>>   }
>>>    out.flush();
>>>   out.close();
>>>   in.close();
>>>  }
>>> }
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Grant Overby
>>> Senior Developer
>>> FloorSoft, Inc.
>>> 
>>> Often people, especially computer engineers, focus on the machines. They
>>> think, "By doing this, the machine will run faster. By doing this, the
>>> machine will run more effectively. By doing this, the machine will
>>> something
>>> something something." They are focusing on machines. But in fact we need
>>> to
>>> focus on humans, on how humans care about doing programming or operating
>>> the
>>> application of the machines. We are the masters. They are the slaves. --
>>> Yukihiro Matsumoto
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Nov 5, 2010 at 6:45 PM, <Ad...@swmc.com> wrote:
>>> 
>>>> Yogesh,
>>>> 
>>>> Compare the file size and hash (SHA1, MD5, etc.) of the file you
>>> download
>>>> from your browser with the file that Java downloads.  The end of the
>>> file
>>>> may be missing when you download it via Java.  I know you said the file
>>>> size is correct, but is it the *exact* same number of bytes?  If so,
>>> then
>>>> the content must be different, and it should just be a matter of running
>>>> `diff` on the files to see what's going wrong.
>>>> 
>>>> ----
>>>> Thanks,
>>>> Adam
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> From:
>>>> Yogesh <yo...@gmail.com>
>>>> To:
>>>> grant@floorsoft.com
>>>> Cc:
>>>> users@pdfbox.apache.org
>>>> Date:
>>>> 11/05/2010 15:29
>>>> Subject:
>>>> Re: Save URLs to PDFs?
>>>> 
>>>> 
>>>> 
>>>> Yes. I can download the file through the browser. It works perfectly
>>> fine.
>>>> 
>>>> - Yogesh
>>>> 
>>>> 
>>>> 
>>>> On 5 November 2010 18:25, Grant Overby <gr...@floorsoft.com> wrote:
>>>> 
>>>>> If you download the file through a browser? Does it work then?
>>>>> 
>>>>> 
>>>>> --
>>>>> Grant Overby
>>>>> Senior Developer
>>>>> FloorSoft, Inc.
>>>>> 
>>>>> Often people, especially computer engineers, focus on the machines.
>>> They
>>>>> think, "By doing this, the machine will run faster. By doing this, the
>>>>> machine will run more effectively. By doing this, the machine will
>>>> something
>>>>> something something." They are focusing on machines. But in fact we
>>> need
>>>> to
>>>>> focus on humans, on how humans care about doing programming or
>>> operating
>>>> the
>>>>> application of the machines. We are the masters. They are the slaves.
>>> --
>>>>> Yukihiro Matsumoto
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <yo...@gmail.com> wrote:
>>>>> 
>>>>>> I tried with that, it writes a blank PDF. Though, the file size and
>>> the
>>>>>> number of pages is correct (for the new written file)
>>>>>> 
>>>>>> - Yogesh
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 5 November 2010 18:09, Grant Overby <gr...@floorsoft.com> wrote:
>>>>>> 
>>>>>>> You don't need pdfBox to do this. Below is some rough code that
>>> allows
>>>>>>> you
>>>>>>> to download a file and save it.
>>>>>>> 
>>>>>>> URLConnection urlConnection = new URL("http://...");
>>>>>>> InputStream   in      = urlConnection.getInputStream();
>>>>>>> FileWriter out = new FileWriter("my.pdf");
>>>>>>> int next = 0;
>>>>>>> while ( ( next = in.read() ) != -1  ) out.write(next);
>>>>>>> //close everything
>>>>>>> 
>>>>>>> --
>>>>>>> Grant Overby
>>>>>>> Senior Developer
>>>>>>> FloorSoft, Inc.
>>>>>>> 
>>>>>>> Often people, especially computer engineers, focus on the machines.
>>>> They
>>>>>>> think, "By doing this, the machine will run faster. By doing this,
>>> the
>>>>>>> machine will run more effectively. By doing this, the machine will
>>>>>>> something
>>>>>>> something something." They are focusing on machines. But in fact we
>>>> need
>>>>>>> to
>>>>>>> focus on humans, on how humans care about doing programming or
>>>> operating
>>>>>>> the
>>>>>>> application of the machines. We are the masters. They are the
>>> slaves.
>>>> --
>>>>>>> Yukihiro Matsumoto
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <yo...@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I have PDFs which I can access through URLs. I want to download
>>> and
>>>>>>> save it
>>>>>>>> to files. How can I go about it?
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> -Yogesh
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> - FHA 203b; 203k; HECM; VA; USDA; Conventional
>>>> - Warehouse Lines; FHA-Authorized Originators
>>>> - Lending and Servicing in over 45 States
>>>> www.swmc.com   -  www.simplehecmcalculator.com   Visit
>>>> www.swmc.com/resources   for helpful links on Training, Webinars,
>>> Lender
>>>> Alerts and Submitting Conditions
>>>> This email and any content within or attached hereto from Sun West
>>> Mortgage
>>>> Company, Inc. is confidential and/or legally privileged. The information
>>> is
>>>> intended only for the use of the individual or entity named on this
>>> email..
>>>> If you are not the intended recipient, you are hereby notified that any
>>>> disclosure, copying, distribution or taking any action in reliance on
>>> the
>>>> contents of this email information is strictly prohibited, and that the
>>>> documents should be returned to this office immediately by email.
>>> Receipt by
>>>> anyone other than the intended recipient is not a waiver of any
>>> privilege.
>>>> Please do not include your social security number, account number, or
>>> any
>>>> other personal or financial information in the content of the email.
>>> Should
>>>> you have any questions, please call (800) 453 7884.  =
>>>> 
>>> 
>> 
>>

Re: Save URLs to PDFs?

Posted by Grant Overby <gr...@floorsoft.com>.

First difference (on second line, first line is for reference point):

bad:
<</Length 1372/E 1779/Filter/FlateDecode/I 1811/L 1795/O 1741/S 1423/T
1676/V 1757>>stream
xÚ?U LSW >?Û O)]Wä!Ô>?"CATl?4PkADy ? ?RjgÊ??< õ A

Start of second line in hex:   78 DA 3F 55 0B 4C 53 57

good:
<</Length 1372/E 1779/Filter/FlateDecode/I 1811/L 1795/O 1741/S 1423/T
1676/V 1757>>stream
xÚ”U LSW >—Û O)]Wä!Ô>˜"CATl”4PkADy ‹ –RjgÊˆˆ< õ A

Start of second line in hex:   78 DA 94 55 0B 4C 53 57




Isolated incorrect single characters are throughout the document.
Downloading it multiple times shows consistant errors.


I'll keep thinking on it, but nothing is apparent to me. This shouldn't
happen afaik.


Anyone?

--
Grant Overby
Senior Developer
FloorSoft, Inc.

Often people, especially computer engineers, focus on the machines. They
think, "By doing this, the machine will run faster. By doing this, the
machine will run more effectively. By doing this, the machine will something
something something." They are focusing on machines. But in fact we need to
focus on humans, on how humans care about doing programming or operating the
application of the machines. We are the masters. They are the slaves. --
Yukihiro Matsumoto




On Fri, Nov 5, 2010 at 6:58 PM, Yogesh <yo...@gmail.com> wrote:

> Thanks Grant.
> But I have thousands of PDF URLs like this. I have tried around 12 so far.
> Can all of them be corrupt?
>
> What can I do about this?
>
>
> - Yogesh
>
>
>
>
> On 5 November 2010 18:53, Grant Overby <gr...@floorsoft.com> wrote:
>
>> I ran the code [2]. The pdf is corrupted by the code as MD5s are
>> different.
>> File sizes are identical [1];
>>
>> 1:
>> 11/05/2010  06:47 PM         2,371,050 msb201055.pdf
>> 11/05/2010  06:46 PM         2,371,050 My.pdf
>>
>>
>>
>> 2:
>> package s;
>>
>> import java.io.FileWriter;
>> import java.io.InputStream;
>> import java.io.IOException;
>> import java.net.URL;
>> import java.net.URLConnection;
>> import java.net.MalformedURLException;
>>
>> public class Main
>> {
>>  public static void main(String[] args) throws IOException
>>   {
>>    URL url = new URL("
>>
>> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947364/pdf/msb201055.pdf?tool=pmcentrez
>> ");
>>
>>     URLConnection con = url.openConnection();
>>
>>    InputStream in = con.getInputStream();
>>
>>    FileWriter out = new FileWriter("C:/My.pdf");
>>
>>    int next = 0;
>>    while ( ( next = in.read() ) != -1  ) {
>>      out.write(next);
>>    }
>>     out.flush();
>>    out.close();
>>    in.close();
>>   }
>> }
>>
>>
>>
>>
>> --
>> Grant Overby
>> Senior Developer
>> FloorSoft, Inc.
>>
>> Often people, especially computer engineers, focus on the machines. They
>> think, "By doing this, the machine will run faster. By doing this, the
>> machine will run more effectively. By doing this, the machine will
>> something
>> something something." They are focusing on machines. But in fact we need
>> to
>> focus on humans, on how humans care about doing programming or operating
>> the
>> application of the machines. We are the masters. They are the slaves. --
>> Yukihiro Matsumoto
>>
>>
>>
>>
>> On Fri, Nov 5, 2010 at 6:45 PM, <Ad...@swmc.com> wrote:
>>
>> > Yogesh,
>> >
>> > Compare the file size and hash (SHA1, MD5, etc.) of the file you
>> download
>> > from your browser with the file that Java downloads.  The end of the
>> file
>> > may be missing when you download it via Java.  I know you said the file
>> > size is correct, but is it the *exact* same number of bytes?  If so,
>> then
>> > the content must be different, and it should just be a matter of running
>> > `diff` on the files to see what's going wrong.
>> >
>> > ----
>> > Thanks,
>> > Adam
>> >
>> >
>> >
>> >
>> >
>> > From:
>> > Yogesh <yo...@gmail.com>
>> > To:
>> > grant@floorsoft.com
>> > Cc:
>> > users@pdfbox.apache.org
>> > Date:
>> > 11/05/2010 15:29
>> > Subject:
>> > Re: Save URLs to PDFs?
>> >
>> >
>> >
>> > Yes. I can download the file through the browser. It works perfectly
>> fine.
>> >
>> > - Yogesh
>> >
>> >
>> >
>> > On 5 November 2010 18:25, Grant Overby <gr...@floorsoft.com> wrote:
>> >
>> > > If you download the file through a browser? Does it work then?
>> > >
>> > >
>> > > --
>> > > Grant Overby
>> > > Senior Developer
>> > > FloorSoft, Inc.
>> > >
>> > > Often people, especially computer engineers, focus on the machines.
>> They
>> > > think, "By doing this, the machine will run faster. By doing this, the
>> > > machine will run more effectively. By doing this, the machine will
>> > something
>> > > something something." They are focusing on machines. But in fact we
>> need
>> > to
>> > > focus on humans, on how humans care about doing programming or
>> operating
>> > the
>> > > application of the machines. We are the masters. They are the slaves.
>> --
>> > > Yukihiro Matsumoto
>> > >
>> > >
>> > >
>> > >
>> > > On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <yo...@gmail.com> wrote:
>> > >
>> > >> I tried with that, it writes a blank PDF. Though, the file size and
>> the
>> > >> number of pages is correct (for the new written file)
>> > >>
>> > >> - Yogesh
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On 5 November 2010 18:09, Grant Overby <gr...@floorsoft.com> wrote:
>> > >>
>> > >>> You don't need pdfBox to do this. Below is some rough code that
>> allows
>> > >>> you
>> > >>> to download a file and save it.
>> > >>>
>> > >>> URLConnection urlConnection = new URL("http://...");
>> > >>> InputStream   in      = urlConnection.getInputStream();
>> > >>> FileWriter out = new FileWriter("my.pdf");
>> > >>> int next = 0;
>> > >>> while ( ( next = in.read() ) != -1  ) out.write(next);
>> > >>> //close everything
>> > >>>
>> > >>> --
>> > >>> Grant Overby
>> > >>> Senior Developer
>> > >>> FloorSoft, Inc.
>> > >>>
>> > >>> Often people, especially computer engineers, focus on the machines.
>> > They
>> > >>> think, "By doing this, the machine will run faster. By doing this,
>> the
>> > >>> machine will run more effectively. By doing this, the machine will
>> > >>> something
>> > >>> something something." They are focusing on machines. But in fact we
>> > need
>> > >>> to
>> > >>> focus on humans, on how humans care about doing programming or
>> > operating
>> > >>> the
>> > >>> application of the machines. We are the masters. They are the
>> slaves.
>> > --
>> > >>> Yukihiro Matsumoto
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <yo...@gmail.com> wrote:
>> > >>>
>> > >>> > Hi,
>> > >>> >
>> > >>> > I have PDFs which I can access through URLs. I want to download
>> and
>> > >>> save it
>> > >>> > to files. How can I go about it?
>> > >>> >
>> > >>> > Thanks
>> > >>> >
>> > >>> > -Yogesh
>> > >>> >
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>> >
>> >
>> > - FHA 203b; 203k; HECM; VA; USDA; Conventional
>> > - Warehouse Lines; FHA-Authorized Originators
>> > - Lending and Servicing in over 45 States
>> > www.swmc.com   -  www.simplehecmcalculator.com   Visit
>> > www.swmc.com/resources   for helpful links on Training, Webinars,
>> Lender
>> > Alerts and Submitting Conditions
>> > This email and any content within or attached hereto from Sun West
>> Mortgage
>> > Company, Inc. is confidential and/or legally privileged. The information
>> is
>> > intended only for the use of the individual or entity named on this
>> email..
>> > If you are not the intended recipient, you are hereby notified that any
>> > disclosure, copying, distribution or taking any action in reliance on
>> the
>> > contents of this email information is strictly prohibited, and that the
>> > documents should be returned to this office immediately by email.
>> Receipt by
>> > anyone other than the intended recipient is not a waiver of any
>> privilege.
>> > Please do not include your social security number, account number, or
>> any
>> > other personal or financial information in the content of the email.
>> Should
>> > you have any questions, please call (800) 453 7884.  =
>> >
>>
>
>

Re: Save URLs to PDFs?

Posted by Yogesh <yo...@gmail.com>.

Thanks Grant.
But I have thousands of PDF URLs like this. I have tried around 12 so far.
Can all of them be corrupt?

What can I do about this?


- Yogesh




On 5 November 2010 18:53, Grant Overby <gr...@floorsoft.com> wrote:

> I ran the code [2]. The pdf is corrupted by the code as MD5s are different.
> File sizes are identical [1];
>
> 1:
> 11/05/2010  06:47 PM         2,371,050 msb201055.pdf
> 11/05/2010  06:46 PM         2,371,050 My.pdf
>
>
>
> 2:
> package s;
>
> import java.io.FileWriter;
> import java.io.InputStream;
> import java.io.IOException;
> import java.net.URL;
> import java.net.URLConnection;
> import java.net.MalformedURLException;
>
> public class Main
> {
>  public static void main(String[] args) throws IOException
>   {
>    URL url = new URL("
>
> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947364/pdf/msb201055.pdf?tool=pmcentrez
> ");
>
>     URLConnection con = url.openConnection();
>
>    InputStream in = con.getInputStream();
>
>    FileWriter out = new FileWriter("C:/My.pdf");
>
>    int next = 0;
>    while ( ( next = in.read() ) != -1  ) {
>      out.write(next);
>    }
>     out.flush();
>    out.close();
>    in.close();
>   }
> }
>
>
>
>
> --
> Grant Overby
> Senior Developer
> FloorSoft, Inc.
>
> Often people, especially computer engineers, focus on the machines. They
> think, "By doing this, the machine will run faster. By doing this, the
> machine will run more effectively. By doing this, the machine will
> something
> something something." They are focusing on machines. But in fact we need to
> focus on humans, on how humans care about doing programming or operating
> the
> application of the machines. We are the masters. They are the slaves. --
> Yukihiro Matsumoto
>
>
>
>
> On Fri, Nov 5, 2010 at 6:45 PM, <Ad...@swmc.com> wrote:
>
> > Yogesh,
> >
> > Compare the file size and hash (SHA1, MD5, etc.) of the file you download
> > from your browser with the file that Java downloads.  The end of the file
> > may be missing when you download it via Java.  I know you said the file
> > size is correct, but is it the *exact* same number of bytes?  If so, then
> > the content must be different, and it should just be a matter of running
> > `diff` on the files to see what's going wrong.
> >
> > ----
> > Thanks,
> > Adam
> >
> >
> >
> >
> >
> > From:
> > Yogesh <yo...@gmail.com>
> > To:
> > grant@floorsoft.com
> > Cc:
> > users@pdfbox.apache.org
> > Date:
> > 11/05/2010 15:29
> > Subject:
> > Re: Save URLs to PDFs?
> >
> >
> >
> > Yes. I can download the file through the browser. It works perfectly
> fine.
> >
> > - Yogesh
> >
> >
> >
> > On 5 November 2010 18:25, Grant Overby <gr...@floorsoft.com> wrote:
> >
> > > If you download the file through a browser? Does it work then?
> > >
> > >
> > > --
> > > Grant Overby
> > > Senior Developer
> > > FloorSoft, Inc.
> > >
> > > Often people, especially computer engineers, focus on the machines.
> They
> > > think, "By doing this, the machine will run faster. By doing this, the
> > > machine will run more effectively. By doing this, the machine will
> > something
> > > something something." They are focusing on machines. But in fact we
> need
> > to
> > > focus on humans, on how humans care about doing programming or
> operating
> > the
> > > application of the machines. We are the masters. They are the slaves.
> --
> > > Yukihiro Matsumoto
> > >
> > >
> > >
> > >
> > > On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <yo...@gmail.com> wrote:
> > >
> > >> I tried with that, it writes a blank PDF. Though, the file size and
> the
> > >> number of pages is correct (for the new written file)
> > >>
> > >> - Yogesh
> > >>
> > >>
> > >>
> > >>
> > >> On 5 November 2010 18:09, Grant Overby <gr...@floorsoft.com> wrote:
> > >>
> > >>> You don't need pdfBox to do this. Below is some rough code that
> allows
> > >>> you
> > >>> to download a file and save it.
> > >>>
> > >>> URLConnection urlConnection = new URL("http://...");
> > >>> InputStream   in      = urlConnection.getInputStream();
> > >>> FileWriter out = new FileWriter("my.pdf");
> > >>> int next = 0;
> > >>> while ( ( next = in.read() ) != -1  ) out.write(next);
> > >>> //close everything
> > >>>
> > >>> --
> > >>> Grant Overby
> > >>> Senior Developer
> > >>> FloorSoft, Inc.
> > >>>
> > >>> Often people, especially computer engineers, focus on the machines.
> > They
> > >>> think, "By doing this, the machine will run faster. By doing this,
> the
> > >>> machine will run more effectively. By doing this, the machine will
> > >>> something
> > >>> something something." They are focusing on machines. But in fact we
> > need
> > >>> to
> > >>> focus on humans, on how humans care about doing programming or
> > operating
> > >>> the
> > >>> application of the machines. We are the masters. They are the slaves.
> > --
> > >>> Yukihiro Matsumoto
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <yo...@gmail.com> wrote:
> > >>>
> > >>> > Hi,
> > >>> >
> > >>> > I have PDFs which I can access through URLs. I want to download and
> > >>> save it
> > >>> > to files. How can I go about it?
> > >>> >
> > >>> > Thanks
> > >>> >
> > >>> > -Yogesh
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
> >
> >
> > - FHA 203b; 203k; HECM; VA; USDA; Conventional
> > - Warehouse Lines; FHA-Authorized Originators
> > - Lending and Servicing in over 45 States
> > www.swmc.com   -  www.simplehecmcalculator.com   Visit
> > www.swmc.com/resources   for helpful links on Training, Webinars, Lender
> > Alerts and Submitting Conditions
> > This email and any content within or attached hereto from Sun West
> Mortgage
> > Company, Inc. is confidential and/or legally privileged. The information
> is
> > intended only for the use of the individual or entity named on this
> email..
> > If you are not the intended recipient, you are hereby notified that any
> > disclosure, copying, distribution or taking any action in reliance on the
> > contents of this email information is strictly prohibited, and that the
> > documents should be returned to this office immediately by email. Receipt
> by
> > anyone other than the intended recipient is not a waiver of any
> privilege.
> > Please do not include your social security number, account number, or any
> > other personal or financial information in the content of the email.
> Should
> > you have any questions, please call (800) 453 7884.  =
> >
>

Re: Save URLs to PDFs?

Posted by Grant Overby <gr...@floorsoft.com>.

I ran the code [2]. The pdf is corrupted by the code as MD5s are different.
File sizes are identical [1];

1:
11/05/2010  06:47 PM         2,371,050 msb201055.pdf
11/05/2010  06:46 PM         2,371,050 My.pdf



2:
package s;

import java.io.FileWriter;
import java.io.InputStream;
import java.io.IOException;
import java.net.URL;
import java.net.URLConnection;
import java.net.MalformedURLException;

public class Main
{
  public static void main(String[] args) throws IOException
  {
    URL url = new URL("
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947364/pdf/msb201055.pdf?tool=pmcentrez
");

    URLConnection con = url.openConnection();

    InputStream in = con.getInputStream();

    FileWriter out = new FileWriter("C:/My.pdf");

    int next = 0;
    while ( ( next = in.read() ) != -1  ) {
      out.write(next);
    }
    out.flush();
    out.close();
    in.close();
  }
}




--
Grant Overby
Senior Developer
FloorSoft, Inc.

Often people, especially computer engineers, focus on the machines. They
think, "By doing this, the machine will run faster. By doing this, the
machine will run more effectively. By doing this, the machine will something
something something." They are focusing on machines. But in fact we need to
focus on humans, on how humans care about doing programming or operating the
application of the machines. We are the masters. They are the slaves. --
Yukihiro Matsumoto




On Fri, Nov 5, 2010 at 6:45 PM, <Ad...@swmc.com> wrote:

> Yogesh,
>
> Compare the file size and hash (SHA1, MD5, etc.) of the file you download
> from your browser with the file that Java downloads.  The end of the file
> may be missing when you download it via Java.  I know you said the file
> size is correct, but is it the *exact* same number of bytes?  If so, then
> the content must be different, and it should just be a matter of running
> `diff` on the files to see what's going wrong.
>
> ----
> Thanks,
> Adam
>
>
>
>
>
> From:
> Yogesh <yo...@gmail.com>
> To:
> grant@floorsoft.com
> Cc:
> users@pdfbox.apache.org
> Date:
> 11/05/2010 15:29
> Subject:
> Re: Save URLs to PDFs?
>
>
>
> Yes. I can download the file through the browser. It works perfectly fine.
>
> - Yogesh
>
>
>
> On 5 November 2010 18:25, Grant Overby <gr...@floorsoft.com> wrote:
>
> > If you download the file through a browser? Does it work then?
> >
> >
> > --
> > Grant Overby
> > Senior Developer
> > FloorSoft, Inc.
> >
> > Often people, especially computer engineers, focus on the machines. They
> > think, "By doing this, the machine will run faster. By doing this, the
> > machine will run more effectively. By doing this, the machine will
> something
> > something something." They are focusing on machines. But in fact we need
> to
> > focus on humans, on how humans care about doing programming or operating
> the
> > application of the machines. We are the masters. They are the slaves. --
> > Yukihiro Matsumoto
> >
> >
> >
> >
> > On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <yo...@gmail.com> wrote:
> >
> >> I tried with that, it writes a blank PDF. Though, the file size and the
> >> number of pages is correct (for the new written file)
> >>
> >> - Yogesh
> >>
> >>
> >>
> >>
> >> On 5 November 2010 18:09, Grant Overby <gr...@floorsoft.com> wrote:
> >>
> >>> You don't need pdfBox to do this. Below is some rough code that allows
> >>> you
> >>> to download a file and save it.
> >>>
> >>> URLConnection urlConnection = new URL("http://...");
> >>> InputStream   in      = urlConnection.getInputStream();
> >>> FileWriter out = new FileWriter("my.pdf");
> >>> int next = 0;
> >>> while ( ( next = in.read() ) != -1  ) out.write(next);
> >>> //close everything
> >>>
> >>> --
> >>> Grant Overby
> >>> Senior Developer
> >>> FloorSoft, Inc.
> >>>
> >>> Often people, especially computer engineers, focus on the machines.
> They
> >>> think, "By doing this, the machine will run faster. By doing this, the
> >>> machine will run more effectively. By doing this, the machine will
> >>> something
> >>> something something." They are focusing on machines. But in fact we
> need
> >>> to
> >>> focus on humans, on how humans care about doing programming or
> operating
> >>> the
> >>> application of the machines. We are the masters. They are the slaves.
> --
> >>> Yukihiro Matsumoto
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <yo...@gmail.com> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> > I have PDFs which I can access through URLs. I want to download and
> >>> save it
> >>> > to files. How can I go about it?
> >>> >
> >>> > Thanks
> >>> >
> >>> > -Yogesh
> >>> >
> >>>
> >>
> >>
> >
>
>
>
> - FHA 203b; 203k; HECM; VA; USDA; Conventional
> - Warehouse Lines; FHA-Authorized Originators
> - Lending and Servicing in over 45 States
> www.swmc.com   -  www.simplehecmcalculator.com   Visit
> www.swmc.com/resources   for helpful links on Training, Webinars, Lender
> Alerts and Submitting Conditions
> This email and any content within or attached hereto from Sun West Mortgage
> Company, Inc. is confidential and/or legally privileged. The information is
> intended only for the use of the individual or entity named on this email..
> If you are not the intended recipient, you are hereby notified that any
> disclosure, copying, distribution or taking any action in reliance on the
> contents of this email information is strictly prohibited, and that the
> documents should be returned to this office immediately by email. Receipt by
> anyone other than the intended recipient is not a waiver of any privilege.
> Please do not include your social security number, account number, or any
> other personal or financial information in the content of the email. Should
> you have any questions, please call (800) 453 7884.  =
>

Re: Save URLs to PDFs?

Posted by Yogesh <yo...@gmail.com>.

Yes. I can download the file through the browser. It works perfectly fine.

- Yogesh



On 5 November 2010 18:25, Grant Overby <gr...@floorsoft.com> wrote:

> If you download the file through a browser? Does it work then?
>
>
> --
> Grant Overby
> Senior Developer
> FloorSoft, Inc.
>
> Often people, especially computer engineers, focus on the machines. They
> think, "By doing this, the machine will run faster. By doing this, the
> machine will run more effectively. By doing this, the machine will something
> something something." They are focusing on machines. But in fact we need to
> focus on humans, on how humans care about doing programming or operating the
> application of the machines. We are the masters. They are the slaves. --
> Yukihiro Matsumoto
>
>
>
>
> On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <yo...@gmail.com> wrote:
>
>> I tried with that, it writes a blank PDF. Though, the file size and the
>> number of pages is correct (for the new written file)
>>
>> - Yogesh
>>
>>
>>
>>
>> On 5 November 2010 18:09, Grant Overby <gr...@floorsoft.com> wrote:
>>
>>> You don't need pdfBox to do this. Below is some rough code that allows
>>> you
>>> to download a file and save it.
>>>
>>> URLConnection urlConnection = new URL("http://...");
>>> InputStream   in      = urlConnection.getInputStream();
>>> FileWriter out = new FileWriter("my.pdf");
>>> int next = 0;
>>> while ( ( next = in.read() ) != -1  ) out.write(next);
>>> //close everything
>>>
>>> --
>>> Grant Overby
>>> Senior Developer
>>> FloorSoft, Inc.
>>>
>>> Often people, especially computer engineers, focus on the machines. They
>>> think, "By doing this, the machine will run faster. By doing this, the
>>> machine will run more effectively. By doing this, the machine will
>>> something
>>> something something." They are focusing on machines. But in fact we need
>>> to
>>> focus on humans, on how humans care about doing programming or operating
>>> the
>>> application of the machines. We are the masters. They are the slaves. --
>>> Yukihiro Matsumoto
>>>
>>>
>>>
>>>
>>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <yo...@gmail.com> wrote:
>>>
>>> > Hi,
>>> >
>>> > I have PDFs which I can access through URLs. I want to download and
>>> save it
>>> > to files. How can I go about it?
>>> >
>>> > Thanks
>>> >
>>> > -Yogesh
>>> >
>>>
>>
>>
>

Re: Save URLs to PDFs?

Posted by Grant Overby <gr...@floorsoft.com>.

If you download the file through a browser? Does it work then?

--
Grant Overby
Senior Developer
FloorSoft, Inc.

Often people, especially computer engineers, focus on the machines. They
think, "By doing this, the machine will run faster. By doing this, the
machine will run more effectively. By doing this, the machine will something
something something." They are focusing on machines. But in fact we need to
focus on humans, on how humans care about doing programming or operating the
application of the machines. We are the masters. They are the slaves. --
Yukihiro Matsumoto




On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <yo...@gmail.com> wrote:

> I tried with that, it writes a blank PDF. Though, the file size and the
> number of pages is correct (for the new written file)
>
> - Yogesh
>
>
>
>
> On 5 November 2010 18:09, Grant Overby <gr...@floorsoft.com> wrote:
>
>> You don't need pdfBox to do this. Below is some rough code that allows you
>> to download a file and save it.
>>
>> URLConnection urlConnection = new URL("http://...");
>> InputStream   in      = urlConnection.getInputStream();
>> FileWriter out = new FileWriter("my.pdf");
>> int next = 0;
>> while ( ( next = in.read() ) != -1  ) out.write(next);
>> //close everything
>>
>> --
>> Grant Overby
>> Senior Developer
>> FloorSoft, Inc.
>>
>> Often people, especially computer engineers, focus on the machines. They
>> think, "By doing this, the machine will run faster. By doing this, the
>> machine will run more effectively. By doing this, the machine will
>> something
>> something something." They are focusing on machines. But in fact we need
>> to
>> focus on humans, on how humans care about doing programming or operating
>> the
>> application of the machines. We are the masters. They are the slaves. --
>> Yukihiro Matsumoto
>>
>>
>>
>>
>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <yo...@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > I have PDFs which I can access through URLs. I want to download and save
>> it
>> > to files. How can I go about it?
>> >
>> > Thanks
>> >
>> > -Yogesh
>> >
>>
>
>

Re: Save URLs to PDFs?

Posted by Yogesh <yo...@gmail.com>.

I tried with that, it writes a blank PDF. Though, the file size and the
number of pages is correct (for the new written file)

- Yogesh




On 5 November 2010 18:09, Grant Overby <gr...@floorsoft.com> wrote:

> You don't need pdfBox to do this. Below is some rough code that allows you
> to download a file and save it.
>
> URLConnection urlConnection = new URL("http://...");
> InputStream   in      = urlConnection.getInputStream();
> FileWriter out = new FileWriter("my.pdf");
> int next = 0;
> while ( ( next = in.read() ) != -1  ) out.write(next);
> //close everything
>
> --
> Grant Overby
> Senior Developer
> FloorSoft, Inc.
>
> Often people, especially computer engineers, focus on the machines. They
> think, "By doing this, the machine will run faster. By doing this, the
> machine will run more effectively. By doing this, the machine will
> something
> something something." They are focusing on machines. But in fact we need to
> focus on humans, on how humans care about doing programming or operating
> the
> application of the machines. We are the masters. They are the slaves. --
> Yukihiro Matsumoto
>
>
>
>
> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <yo...@gmail.com> wrote:
>
> > Hi,
> >
> > I have PDFs which I can access through URLs. I want to download and save
> it
> > to files. How can I go about it?
> >
> > Thanks
> >
> > -Yogesh
> >
>

Re: Save URLs to PDFs?

Posted by Grant Overby <gr...@floorsoft.com>.

You don't need pdfBox to do this. Below is some rough code that allows you
to download a file and save it.

URLConnection urlConnection = new URL("http://...");
InputStream   in      = urlConnection.getInputStream();
FileWriter out = new FileWriter("my.pdf");
int next = 0;
while ( ( next = in.read() ) != -1  ) out.write(next);
//close everything

--
Grant Overby
Senior Developer
FloorSoft, Inc.

Often people, especially computer engineers, focus on the machines. They
think, "By doing this, the machine will run faster. By doing this, the
machine will run more effectively. By doing this, the machine will something
something something." They are focusing on machines. But in fact we need to
focus on humans, on how humans care about doing programming or operating the
application of the machines. We are the masters. They are the slaves. --
Yukihiro Matsumoto

On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <yo...@gmail.com> wrote:

> Hi,
>
> I have PDFs which I can access through URLs. I want to download and save it
> to files. How can I go about it?
>
> Thanks
>
> -Yogesh
>