You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Basharat Ali <ba...@groundswellgroup.ca> on 2014/07/29 17:15:56 UTC

TXT2PDF

Hi,
I am using the PDFBOX utility to convert TXT to PDF files. I have developed script as under:

echo " Remove Old TXT File List " >> $LogFileDir/ConvertTxtToPdf.log
rm $ConversionScriptDir/TxtFileList.out
echo " Remove Old PDF File List " >> $LogFileDir/ConvertTxtToPdf.log
rm $ConversionScriptDir/PDFFileslist.out
echo " Make List of TXT Files we are going to convert to PDF " >> $LogFileDir/ConvertTxtToPdf.log
ls -a $TxtFilesDir|grep .TXT > $ConversionScriptDir/TxtFileList.out
echo " TXT File Listing is Complete " >> $LogFileDir/ConvertTxtToPdf.log
echo " Reading TXT File Listing " >> $LogFileDir/ConvertTxtToPdf.log
touch $ConversionScriptDir/PDFFileslist.out
while read line;
do
     PDFOutFile=`echo $line|cut -d '.' -f 1`
     java -jar $PdfConvertorDir/pdfbox-app-1.8.6.jar TextToPDF $PdfFilesDir/$PDFOutFile.PDF $TxtFilesDir/$line
     echo " TXT File Converted to PDF = $line " >> $ConversionScriptDir/PDFFileslist.out
done < $ConversionScriptDir/TxtFileList.out
echo " All TXT to PDF Conversion is completed successfully. Please verify the PDF Files at:: $PdfFilesDir "


This is taking about 1 hour to convert 2000 files. I have about 1 million such files so it means it will take 500 hours. Can we have some quicker solution to convert the TXT files to PDF in less time.
Thanks
Bash


Re: TXT2PDF

Posted by Brzrk One <br...@gmail.com>.
You might consider putting the loop inside a java main.
Not only are you suffering from the use of 'read line' (which often spawns
a shell),
you are parsing 'line' 2x, and
you are suffering from the java startup/teardown for each line.
You could have your main() do all of this for you.
Then, of course, you could run this on multiple machines, or in multiple
processes, etc, etc.


On Tue, Jul 29, 2014 at 12:13 PM, Daniel Gibby <dgibby@edirectpublishing.com
> wrote:

> It sounds to me that converting 2000 files in an hour is pretty good...
> 1.8 seconds per file.
>
> My suggestion is put the files on more than one computer and run them
> simultaneously. If you have a million files, you know it is going to take a
> long time to create PDFs out of them.
> You'll save much more time by splitting up the load into multiple
> computers than you will with fiddling with anything below.
>
> Thanks,
> Daniel Gibby
>
>
>  <ma...@edirectpublishing.com>On 7/29/2014 9:15 AM, Basharat Ali
> wrote:
>
>  Hi,
>> I am using the PDFBOX utility to convert TXT to PDF files. I have
>> developed script as under:
>>
>> echo " Remove Old TXT File List " >> $LogFileDir/ConvertTxtToPdf.log
>> rm $ConversionScriptDir/TxtFileList.out
>> echo " Remove Old PDF File List " >> $LogFileDir/ConvertTxtToPdf.log
>> rm $ConversionScriptDir/PDFFileslist.out
>> echo " Make List of TXT Files we are going to convert to PDF " >>
>> $LogFileDir/ConvertTxtToPdf.log
>> ls -a $TxtFilesDir|grep .TXT > $ConversionScriptDir/TxtFileList.out
>> echo " TXT File Listing is Complete " >> $LogFileDir/ConvertTxtToPdf.log
>> echo " Reading TXT File Listing " >> $LogFileDir/ConvertTxtToPdf.log
>> touch $ConversionScriptDir/PDFFileslist.out
>> while read line;
>> do
>>       PDFOutFile=`echo $line|cut -d '.' -f 1`
>>       java -jar $PdfConvertorDir/pdfbox-app-1.8.6.jar TextToPDF
>> $PdfFilesDir/$PDFOutFile.PDF $TxtFilesDir/$line
>>       echo " TXT File Converted to PDF = $line " >> $ConversionScriptDir/
>> PDFFileslist.out
>> done < $ConversionScriptDir/TxtFileList.out
>> echo " All TXT to PDF Conversion is completed successfully. Please verify
>> the PDF Files at:: $PdfFilesDir "
>>
>>
>> This is taking about 1 hour to convert 2000 files. I have about 1 million
>> such files so it means it will take 500 hours. Can we have some quicker
>> solution to convert the TXT files to PDF in less time.
>> Thanks
>> Bash
>>
>>
>>
>

Re: TXT2PDF

Posted by Daniel Gibby <dg...@edirectpublishing.com>.
It sounds to me that converting 2000 files in an hour is pretty good... 
1.8 seconds per file.

My suggestion is put the files on more than one computer and run them 
simultaneously. If you have a million files, you know it is going to 
take a long time to create PDFs out of them.
You'll save much more time by splitting up the load into multiple 
computers than you will with fiddling with anything below.

Thanks,
Daniel Gibby

  <ma...@edirectpublishing.com>On 7/29/2014 9:15 AM, Basharat Ali wrote:

> Hi,
> I am using the PDFBOX utility to convert TXT to PDF files. I have developed script as under:
>
> echo " Remove Old TXT File List " >> $LogFileDir/ConvertTxtToPdf.log
> rm $ConversionScriptDir/TxtFileList.out
> echo " Remove Old PDF File List " >> $LogFileDir/ConvertTxtToPdf.log
> rm $ConversionScriptDir/PDFFileslist.out
> echo " Make List of TXT Files we are going to convert to PDF " >> $LogFileDir/ConvertTxtToPdf.log
> ls -a $TxtFilesDir|grep .TXT > $ConversionScriptDir/TxtFileList.out
> echo " TXT File Listing is Complete " >> $LogFileDir/ConvertTxtToPdf.log
> echo " Reading TXT File Listing " >> $LogFileDir/ConvertTxtToPdf.log
> touch $ConversionScriptDir/PDFFileslist.out
> while read line;
> do
>       PDFOutFile=`echo $line|cut -d '.' -f 1`
>       java -jar $PdfConvertorDir/pdfbox-app-1.8.6.jar TextToPDF $PdfFilesDir/$PDFOutFile.PDF $TxtFilesDir/$line
>       echo " TXT File Converted to PDF = $line " >> $ConversionScriptDir/PDFFileslist.out
> done < $ConversionScriptDir/TxtFileList.out
> echo " All TXT to PDF Conversion is completed successfully. Please verify the PDF Files at:: $PdfFilesDir "
>
>
> This is taking about 1 hour to convert 2000 files. I have about 1 million such files so it means it will take 500 hours. Can we have some quicker solution to convert the TXT files to PDF in less time.
> Thanks
> Bash
>
>