You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@poi.apache.org by simanchal maharana <si...@gmail.com> on 2014/01/29 07:21:52 UTC

Retrieving content of hyperlinked slides in powerpoint files(.PPTX) through apache POI

I am trying to get the text content of powerpoint files and replace with some
other text. I have a powerpoint file of 20 slides. where 13,14,15,16 slides
have hyperlink to 17,18,19 and 20th slide. I am using XMLSlideshow to
traverse through the slides, But it gives only 16 slides. It does not give
last 4 hyperlinked slides.

Any idea really appreciable in advance how can I get content of all
hyper-linked slides and Replace by some other text.

here is my code.

import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.xslf.usermodel.XMLSlideShow;
import org.apache.poi.xslf.usermodel.XSLFShape;
import org.apache.poi.xslf.usermodel.XSLFSlide;
import org.apache.poi.xslf.usermodel.XSLFTextParagraph;
import org.apache.poi.xslf.usermodel.XSLFTextShape;
import org.openxmlformats.schemas.drawingml.x2006.main.CTRegularTextRun;
public class Testing {
	static String inputFile =
"C:\\Users\\SM78882\\Desktop\\Testing\\IE_Basics_English.pptx";
	static String outputFile =
"C:\\Users\\SM78882\\Desktop\\Testing\\result.pptx";

	public static String replaceUnwantedChar(String originalString) {
		if (null != originalString)
			return "" + originalString.replaceAll("(\n+)|(\t+)|(\\s{2,})", " ")
.trim();
		else
			return "";
	}
	public static void main(String[] args) {
		FileInputStream fis = null;
		FileOutputStream fos = null;
		XMLSlideShow ppt = null;
		try {
			fis = new FileInputStream(inputFile);
			fos = new FileOutputStream(outputFile);
			ppt = new XMLSlideShow(fis);
			System.out.println("No of slides:" + ppt.getSlides().length); // gives 16 
slides.
			for (XSLFSlide slide : ppt.getSlides()) {
				for (XSLFShape shape : slide) {
					if (shape instanceof XSLFTextShape) {
						XSLFTextShape txShape = (XSLFTextShape) shape;
						for (XSLFTextParagraph xslfParagraph : txShape .getTextParagraphs()) {
							String originalText = replaceUnwantedChar(xslfParagraph .getText());
							if (!originalText.isEmpty()) {
								String translation = "";
								if (translation != null) {
									CTRegularTextRun[] ctRegularTextRun = xslfParagraph
.getXmlObject().getRArray();
									for (int index = ctRegularTextRun.length - 1; index > 0; index--) {
										xslfParagraph.getXmlObject().removeR( index);
									}
									if (ctRegularTextRun.length > 0)
										ctRegularTextRun[0].setT(translation);
								}
							}
						}
					}
				}
			}
			ppt.write(fos);
			fos.close();
			fis.close();
		} catch (Exception ex) {
			ex.printStackTrace();
		}
	}
}






--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: Retrieving content of hyperlinked slides in powerpoint files(.PPTX) through apache POI

Posted by Andreas Beeker <an...@gmx.de>.

Hi,

there's an error in POIXMLDocumentPart.read(), i.e. the depth-first recursion to
determine the relation-ids will corrupt the relation-map and therefore only 16 instead of 20 slides
are reported.

I'll apply the patch in 3.11, i.e. when Yegor has finalized POI 3.10.

Apart of that, Simanchals code tries to normalize the paragraphs - my test implementation
looks very minimal different:
- get the content of all text-runs in the paragraph as plain text
- remove all breaks, fields and all text-runs apart of the first one
   (one would need to synchronize the xmlbeans and the poi run list though)
- and set the normalized content into the first text run

I'll leave the answer to Q1/Q2 as an exercise to you ;)

Andi

On 30.01.2014 23:46, David Law wrote:
> Simanchal,
>
> may I ask a couple of stupid questions?
>
> I've removed some dead code & what's left
> in the heart of all those nested if's & for's is this:
>
> CTRegularTextRun[] ctRegularTextRun = xslfParagraph.getXmlObject().getRArray();
>
> for (int index = ctRegularTextRun.length - 1; index > 0; index--) {
>     xslfParagraph.getXmlObject().removeR(index);
> }
> if (ctRegularTextRun.length > 0) {
>     ctRegularTextRun[0].setT("");
> }
>
> First you get an Array of all CTRegularTextRuns contained in the XmlObject.
> Then you remove them all from the XmlObject.
> (they now only exist in the Array you just got)
> Finally you set the T Element of (only!) the 1st CTRegularTextRun (if present) to "".
>
> Q1) Now I wonder why you need to iterate backwards through the Array?
> Q2) Setting the T Element will have no effect (because you have just deleted all R's from the XmlObject?!
>
> All the best,
> DaveLaw
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: Retrieving content of hyperlinked slides in powerpoint files(.PPTX) through apache POI

Posted by David Law <da...@apconsult.de>.

Hi Simanchal,

sorry: I missed that the for-loop was leaving the 1st entry intact.

In a way, your new forwards loop is a bit better
as it documents that you are always removing
entry[1], but you still have the uncertainty:
"did I remove the last one!!" :-)

(This is NOT a fault of your code, rather a shortcoming
  of the Java language & they haven't addressed it with
  the for-each construct. What is missing is an index in
  the syntax of for-each & the possibility to iterate
  backwards through collections)

Slightly more paranoid is the following:
while (true) {
     try {
         xslfParagraph.getXmlObject().removeR(1);
     }
     catch (IndexOutOfBoundsException e) {
         break;
     }
}
...after this loop you are certain all except entry[0] were removed!

By the way, I assume you're using Java 7,
so you might like to use the try-with-resources syntax:
try (   final FileInputStream  fis = new FileInputStream (inputFile);
         final FileOutputStream fos = new FileOutputStream(outputFile);
         ) {
     final XMLSlideShow ppt = new XMLSlideShow(fis);
     :     :            :   : :   :
}
catch(Exception ex){
     ex.printStackTrace();
}

This guarantees that fis & fos will always be closed automatically.

All the best,
DaveLaw

On 31.01.2014 08:43, simanchal maharana wrote:
> Hi David,
>
> Thanks lot for your suggestion.
> Actually its for translation of PPTX files. So I have to replace whole
> paragraph with its translation.
> But paragraph is combination of some <a:r> ie; CTRegularTextRun, and
> again <a:r> is parent of <a:t>.
>
> 1. I saw XML of each paragraph, but never faced more than one <a:t> in
> one <a:r> (CTRegularTextRun), but in code it gives array of T. so
> write code in this way.
>
> 2. I have to replace whole paragraph content by its translation. Its
> not possible to divide translated text as per <a:r> or <a:t>. So I
> removed all siblings of <a:r> ie; CTRegularTextRun except 1st one and
> I am replacing <a:t> content of 1st CTRegularTextRun of paragraph by
> its translation.
>
> 3. For blank paragraphs number of ctRegularTextRun is zero I guess. so
> while setting its content by
>      ctRegularTextRun[0].setT("Some Translated Text"); gives
> ArrayIndexOutOfBound exception. So I check for length. I
>      have modified it.
> String originalParaText = replaceUnwantedChar(xslfParagraph.getText());
> if ( ! originalText.isEmpty()) {
>
> I am doing all operation. So now I don't need to check for length of
> CTRegularTextRun[] for that paragraph.
> Thanks lot for this suggestion.
>
> 4. Now I figured out the difference among
>
> for(int index = 1; index <= ctRegularTextRun.length-1; index++) and
> for(int index = ctRegularTextRun.length-1; index > 0 ; index--).
>
> While traversing in forward direction (1st one) it gives
> IndexOutOfBoundsException while traversing in backward (2nd one) it
> works fine. So I was deleting ctRegularTextRun in backward direction
> whereas I was leaving 1st ctRegularTextRun ie; present at 0th index.
>
> but now I can put both.
>
> CTRegularTextRun[] ctRegularTextRun = xslfParagraph.getXmlObject().getRArray();
>   for(int index = ctRegularTextRun.length-1; index > 0 ; index--){
>              xslfParagraph.getXmlObject().removeR(index);
>   }
>
> or
>
> for(int index = 1; index <= ctRegularTextRun.length-1; index++){
>               xslfParagraph.getXmlObject().removeR(1);
> }
>
> Thanks lot for your suggestion.
> Simanchal
>
>
>
> On Fri, Jan 31, 2014 at 4:17 AM, David Law-2 [via Apache POI]
> <ml...@n5.nabble.com> wrote:
>> Simanchal,
>>
>> may I ask a couple of stupid questions?
>>
>> I've removed some dead code & what's left
>> in the heart of all those nested if's & for's is this:
>>
>> CTRegularTextRun[] ctRegularTextRun =
>> xslfParagraph.getXmlObject().getRArray();
>>
>> for (int index = ctRegularTextRun.length - 1; index > 0; index--) {
>>       xslfParagraph.getXmlObject().removeR(index);
>> }
>> if (ctRegularTextRun.length > 0) {
>>       ctRegularTextRun[0].setT("");
>> }
>>
>> First you get an Array of all CTRegularTextRuns contained in the XmlObject.
>> Then you remove them all from the XmlObject.
>> (they now only exist in the Array you just got)
>> Finally you set the T Element of (only!) the 1st CTRegularTextRun (if
>> present) to "".
>>
>> Q1) Now I wonder why you need to iterate backwards through the Array?
>> Q2) Setting the T Element will have no effect (because you have just
>> deleted all R's from the XmlObject?!
>>
>> All the best,
>> DaveLaw
>>
>>
>> On 30.01.2014 04:44, simanchal maharana wrote:
>>
>>> Hi Andreas,
>>>
>>> PFA PPTX file for your review.
>>>
>>> Thanks,
>>> Simanchal
>>>
>>> On Thu, Jan 30, 2014 at 2:44 AM, Andreas Beeker [via Apache POI]
>>> <[hidden email]> wrote:
>>>> Hi,
>>>>
>>>> is there a chance to get your .pptx-files?
>>>>
>>>> - link it to your stackoverflow post [1]
>>>> - or open a bugzilla entry [2]
>>>> - or send it to my email address (least preferred ...)
>>>>
>>>> Andi.
>>>>
>>>>
>>>> [1]
>>>>
>>>> http://stackoverflow.com/questions/21386211/retrieving-content-of-hyperlinked-slides-in-powerpoint-files-pptx-through-apac
>>>> [2] http://issues.apache.org/bugzilla/buglist.cgi?product=POI
>>>>
>>>> On 29.01.2014 07:21, simanchal maharana wrote:
>>>>
>>>>> I am trying to get the text content of powerpoint files and replace with
>>>>> some
>>>>> other text. I have a powerpoint file of 20 slides. where 13,14,15,16
>>>>> slides
>>>>> have hyperlink to 17,18,19 and 20th slide. I am using XMLSlideshow to
>>>>> traverse through the slides, But it gives only 16 slides. It does not
>>>>> give
>>>>> last 4 hyperlinked slides.
>>>>>
>>>>> Any idea really appreciable in advance how can I get content of all
>>>>> hyper-linked slides and Replace by some other text.
>>>>>
>>>>> here is my code.
>>>>>
>>>>> import java.io.FileInputStream;
>>>>> import java.io.FileOutputStream;
>>>>> import org.apache.poi.xslf.usermodel.XMLSlideShow;
>>>>> import org.apache.poi.xslf.usermodel.XSLFShape;
>>>>> import org.apache.poi.xslf.usermodel.XSLFSlide;
>>>>> import org.apache.poi.xslf.usermodel.XSLFTextParagraph;
>>>>> import org.apache.poi.xslf.usermodel.XSLFTextShape;
>>>>> import org.openxmlformats.schemas.drawingml.x2006.main.CTRegularTextRun;
>>>>> public class Testing {
>>>>> static String inputFile =
>>>>> "C:\\Users\\SM78882\\Desktop\\Testing\\IE_Basics_English.pptx";
>>>>> static String outputFile =
>>>>> "C:\\Users\\SM78882\\Desktop\\Testing\\result.pptx";
>>>>>
>>>>> public static String replaceUnwantedChar(String originalString) {
>>>>> if (null != originalString)
>>>>> return "" + originalString.replaceAll("(\n+)|(\t+)|(\\s{2,})", " ")
>>>>> .trim();
>>>>> else
>>>>> return "";
>>>>> }
>>>>> public static void main(String[] args) {
>>>>> FileInputStream fis = null;
>>>>> FileOutputStream fos = null;
>>>>> XMLSlideShow ppt = null;
>>>>> try {
>>>>> fis = new FileInputStream(inputFile);
>>>>> fos = new FileOutputStream(outputFile);
>>>>> ppt = new XMLSlideShow(fis);
>>>>> System.out.println("No of slides:" + ppt.getSlides().length); // gives
>>>>> 16
>>>>> slides.
>>>>> for (XSLFSlide slide : ppt.getSlides()) {
>>>>> for (XSLFShape shape : slide) {
>>>>> if (shape instanceof XSLFTextShape) {
>>>>> XSLFTextShape txShape = (XSLFTextShape) shape;
>>>>> for (XSLFTextParagraph xslfParagraph : txShape .getTextParagraphs()) {
>>>>> String originalText = replaceUnwantedChar(xslfParagraph .getText());
>>>>> if (!originalText.isEmpty()) {
>>>>> String translation = "";
>>>>> if (translation != null) {
>>>>> CTRegularTextRun[] ctRegularTextRun = xslfParagraph
>>>>> .getXmlObject().getRArray();
>>>>> for (int index = ctRegularTextRun.length - 1; index > 0; index--) {
>>>>> xslfParagraph.getXmlObject().removeR( index);
>>>>> }
>>>>> if (ctRegularTextRun.length > 0)
>>>>> ctRegularTextRun[0].setT(translation);
>>>>> }
>>>>> }
>>>>> }
>>>>> }
>>>>> }
>>>>> }
>>>>> ppt.write(fos);
>>>>> fos.close();
>>>>> fis.close();
>>>>> } catch (Exception ex) {
>>>>> ex.printStackTrace();
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>>
>>>>> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766.html
>>>>> Sent from the POI - User mailing list archive at Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> If you reply to this email, your message will be added to the discussion
>>>> below:
>>>>
>>>> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714769.html
>>>> To unsubscribe from Retrieving content of hyperlinked slides in
>>>> powerpoint
>>>> files(.PPTX) through apache POI, click here.
>>>> NAML
>>> Final_2.7z (16M)
>>> <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/0/Final_2.7z>
>>> IE_Basics_English.pptx (4M)
>>> <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/1/IE_Basics_English.pptx>
>>> PPTXParser_Code.java (4K)
>>> <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/2/PPTXParser_Code.java>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714773.html
>>> Sent from the POI - User mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714785.html
>> To unsubscribe from Retrieving content of hyperlinked slides in powerpoint
>> files(.PPTX) through apache POI, click here.
>> NAML
>
>
>
> --
> View this message in context: http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714788.html
> Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: Retrieving content of hyperlinked slides in powerpoint files(.PPTX) through apache POI

Posted by simanchal maharana <si...@gmail.com>.

Hi David,

Thanks lot for your suggestion.
Actually its for translation of PPTX files. So I have to replace whole
paragraph with its translation.
But paragraph is combination of some <a:r> ie; CTRegularTextRun, and
again <a:r> is parent of <a:t>.

1. I saw XML of each paragraph, but never faced more than one <a:t> in
one <a:r> (CTRegularTextRun), but in code it gives array of T. so
write code in this way.

2. I have to replace whole paragraph content by its translation. Its
not possible to divide translated text as per <a:r> or <a:t>. So I
removed all siblings of <a:r> ie; CTRegularTextRun except 1st one and
I am replacing <a:t> content of 1st CTRegularTextRun of paragraph by
its translation.

3. For blank paragraphs number of ctRegularTextRun is zero I guess. so
while setting its content by
    ctRegularTextRun[0].setT("Some Translated Text"); gives
ArrayIndexOutOfBound exception. So I check for length. I
    have modified it.
String originalParaText = replaceUnwantedChar(xslfParagraph.getText());
if ( ! originalText.isEmpty()) {

I am doing all operation. So now I don't need to check for length of
CTRegularTextRun[] for that paragraph.
Thanks lot for this suggestion.

4. Now I figured out the difference among

for(int index = 1; index <= ctRegularTextRun.length-1; index++) and
for(int index = ctRegularTextRun.length-1; index > 0 ; index--).

While traversing in forward direction (1st one) it gives
IndexOutOfBoundsException while traversing in backward (2nd one) it
works fine. So I was deleting ctRegularTextRun in backward direction
whereas I was leaving 1st ctRegularTextRun ie; present at 0th index.

but now I can put both.

CTRegularTextRun[] ctRegularTextRun = xslfParagraph.getXmlObject().getRArray();
 for(int index = ctRegularTextRun.length-1; index > 0 ; index--){
            xslfParagraph.getXmlObject().removeR(index);
 }

or

for(int index = 1; index <= ctRegularTextRun.length-1; index++){
             xslfParagraph.getXmlObject().removeR(1);
}

Thanks lot for your suggestion.
Simanchal



On Fri, Jan 31, 2014 at 4:17 AM, David Law-2 [via Apache POI]
<ml...@n5.nabble.com> wrote:
> Simanchal,
>
> may I ask a couple of stupid questions?
>
> I've removed some dead code & what's left
> in the heart of all those nested if's & for's is this:
>
> CTRegularTextRun[] ctRegularTextRun =
> xslfParagraph.getXmlObject().getRArray();
>
> for (int index = ctRegularTextRun.length - 1; index > 0; index--) {
>      xslfParagraph.getXmlObject().removeR(index);
> }
> if (ctRegularTextRun.length > 0) {
>      ctRegularTextRun[0].setT("");
> }
>
> First you get an Array of all CTRegularTextRuns contained in the XmlObject.
> Then you remove them all from the XmlObject.
> (they now only exist in the Array you just got)
> Finally you set the T Element of (only!) the 1st CTRegularTextRun (if
> present) to "".
>
> Q1) Now I wonder why you need to iterate backwards through the Array?
> Q2) Setting the T Element will have no effect (because you have just
> deleted all R's from the XmlObject?!
>
> All the best,
> DaveLaw
>
>
> On 30.01.2014 04:44, simanchal maharana wrote:
>
>> Hi Andreas,
>>
>> PFA PPTX file for your review.
>>
>> Thanks,
>> Simanchal
>>
>> On Thu, Jan 30, 2014 at 2:44 AM, Andreas Beeker [via Apache POI]
>> <[hidden email]> wrote:
>>> Hi,
>>>
>>> is there a chance to get your .pptx-files?
>>>
>>> - link it to your stackoverflow post [1]
>>> - or open a bugzilla entry [2]
>>> - or send it to my email address (least preferred ...)
>>>
>>> Andi.
>>>
>>>
>>> [1]
>>>
>>> http://stackoverflow.com/questions/21386211/retrieving-content-of-hyperlinked-slides-in-powerpoint-files-pptx-through-apac
>>> [2] http://issues.apache.org/bugzilla/buglist.cgi?product=POI
>>>
>>> On 29.01.2014 07:21, simanchal maharana wrote:
>>>
>>>> I am trying to get the text content of powerpoint files and replace with
>>>> some
>>>> other text. I have a powerpoint file of 20 slides. where 13,14,15,16
>>>> slides
>>>> have hyperlink to 17,18,19 and 20th slide. I am using XMLSlideshow to
>>>> traverse through the slides, But it gives only 16 slides. It does not
>>>> give
>>>> last 4 hyperlinked slides.
>>>>
>>>> Any idea really appreciable in advance how can I get content of all
>>>> hyper-linked slides and Replace by some other text.
>>>>
>>>> here is my code.
>>>>
>>>> import java.io.FileInputStream;
>>>> import java.io.FileOutputStream;
>>>> import org.apache.poi.xslf.usermodel.XMLSlideShow;
>>>> import org.apache.poi.xslf.usermodel.XSLFShape;
>>>> import org.apache.poi.xslf.usermodel.XSLFSlide;
>>>> import org.apache.poi.xslf.usermodel.XSLFTextParagraph;
>>>> import org.apache.poi.xslf.usermodel.XSLFTextShape;
>>>> import org.openxmlformats.schemas.drawingml.x2006.main.CTRegularTextRun;
>>>> public class Testing {
>>>> static String inputFile =
>>>> "C:\\Users\\SM78882\\Desktop\\Testing\\IE_Basics_English.pptx";
>>>> static String outputFile =
>>>> "C:\\Users\\SM78882\\Desktop\\Testing\\result.pptx";
>>>>
>>>> public static String replaceUnwantedChar(String originalString) {
>>>> if (null != originalString)
>>>> return "" + originalString.replaceAll("(\n+)|(\t+)|(\\s{2,})", " ")
>>>> .trim();
>>>> else
>>>> return "";
>>>> }
>>>> public static void main(String[] args) {
>>>> FileInputStream fis = null;
>>>> FileOutputStream fos = null;
>>>> XMLSlideShow ppt = null;
>>>> try {
>>>> fis = new FileInputStream(inputFile);
>>>> fos = new FileOutputStream(outputFile);
>>>> ppt = new XMLSlideShow(fis);
>>>> System.out.println("No of slides:" + ppt.getSlides().length); // gives
>>>> 16
>>>> slides.
>>>> for (XSLFSlide slide : ppt.getSlides()) {
>>>> for (XSLFShape shape : slide) {
>>>> if (shape instanceof XSLFTextShape) {
>>>> XSLFTextShape txShape = (XSLFTextShape) shape;
>>>> for (XSLFTextParagraph xslfParagraph : txShape .getTextParagraphs()) {
>>>> String originalText = replaceUnwantedChar(xslfParagraph .getText());
>>>> if (!originalText.isEmpty()) {
>>>> String translation = "";
>>>> if (translation != null) {
>>>> CTRegularTextRun[] ctRegularTextRun = xslfParagraph
>>>> .getXmlObject().getRArray();
>>>> for (int index = ctRegularTextRun.length - 1; index > 0; index--) {
>>>> xslfParagraph.getXmlObject().removeR( index);
>>>> }
>>>> if (ctRegularTextRun.length > 0)
>>>> ctRegularTextRun[0].setT(translation);
>>>> }
>>>> }
>>>> }
>>>> }
>>>> }
>>>> }
>>>> ppt.write(fos);
>>>> fos.close();
>>>> fis.close();
>>>> } catch (Exception ex) {
>>>> ex.printStackTrace();
>>>> }
>>>> }
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>>
>>>> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766.html
>>>> Sent from the POI - User mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>>
>>> ________________________________
>>> If you reply to this email, your message will be added to the discussion
>>> below:
>>>
>>> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714769.html
>>> To unsubscribe from Retrieving content of hyperlinked slides in
>>> powerpoint
>>> files(.PPTX) through apache POI, click here.
>>> NAML
>>
>> Final_2.7z (16M)
>> <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/0/Final_2.7z>
>> IE_Basics_English.pptx (4M)
>> <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/1/IE_Basics_English.pptx>
>> PPTXParser_Code.java (4K)
>> <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/2/PPTXParser_Code.java>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714773.html
>
>> Sent from the POI - User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714785.html
> To unsubscribe from Retrieving content of hyperlinked slides in powerpoint
> files(.PPTX) through apache POI, click here.
> NAML




--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714788.html
Sent from the POI - User mailing list archive at Nabble.com.

Re: Retrieving content of hyperlinked slides in powerpoint files(.PPTX) through apache POI

Posted by David Law <da...@apconsult.de>.

Simanchal,

may I ask a couple of stupid questions?

I've removed some dead code & what's left
in the heart of all those nested if's & for's is this:

CTRegularTextRun[] ctRegularTextRun = 
xslfParagraph.getXmlObject().getRArray();

for (int index = ctRegularTextRun.length - 1; index > 0; index--) {
     xslfParagraph.getXmlObject().removeR(index);
}
if (ctRegularTextRun.length > 0) {
     ctRegularTextRun[0].setT("");
}

First you get an Array of all CTRegularTextRuns contained in the XmlObject.
Then you remove them all from the XmlObject.
(they now only exist in the Array you just got)
Finally you set the T Element of (only!) the 1st CTRegularTextRun (if 
present) to "".

Q1) Now I wonder why you need to iterate backwards through the Array?
Q2) Setting the T Element will have no effect (because you have just 
deleted all R's from the XmlObject?!

All the best,
DaveLaw


On 30.01.2014 04:44, simanchal maharana wrote:
> Hi Andreas,
>
> PFA PPTX file for your review.
>
> Thanks,
> Simanchal
>
> On Thu, Jan 30, 2014 at 2:44 AM, Andreas Beeker [via Apache POI]
> <ml...@n5.nabble.com> wrote:
>> Hi,
>>
>> is there a chance to get your .pptx-files?
>>
>> - link it to your stackoverflow post [1]
>> - or open a bugzilla entry [2]
>> - or send it to my email address (least preferred ...)
>>
>> Andi.
>>
>>
>> [1]
>> http://stackoverflow.com/questions/21386211/retrieving-content-of-hyperlinked-slides-in-powerpoint-files-pptx-through-apac
>> [2] http://issues.apache.org/bugzilla/buglist.cgi?product=POI
>>
>> On 29.01.2014 07:21, simanchal maharana wrote:
>>
>>> I am trying to get the text content of powerpoint files and replace with
>>> some
>>> other text. I have a powerpoint file of 20 slides. where 13,14,15,16
>>> slides
>>> have hyperlink to 17,18,19 and 20th slide. I am using XMLSlideshow to
>>> traverse through the slides, But it gives only 16 slides. It does not give
>>> last 4 hyperlinked slides.
>>>
>>> Any idea really appreciable in advance how can I get content of all
>>> hyper-linked slides and Replace by some other text.
>>>
>>> here is my code.
>>>
>>> import java.io.FileInputStream;
>>> import java.io.FileOutputStream;
>>> import org.apache.poi.xslf.usermodel.XMLSlideShow;
>>> import org.apache.poi.xslf.usermodel.XSLFShape;
>>> import org.apache.poi.xslf.usermodel.XSLFSlide;
>>> import org.apache.poi.xslf.usermodel.XSLFTextParagraph;
>>> import org.apache.poi.xslf.usermodel.XSLFTextShape;
>>> import org.openxmlformats.schemas.drawingml.x2006.main.CTRegularTextRun;
>>> public class Testing {
>>> static String inputFile =
>>> "C:\\Users\\SM78882\\Desktop\\Testing\\IE_Basics_English.pptx";
>>> static String outputFile =
>>> "C:\\Users\\SM78882\\Desktop\\Testing\\result.pptx";
>>>
>>> public static String replaceUnwantedChar(String originalString) {
>>> if (null != originalString)
>>> return "" + originalString.replaceAll("(\n+)|(\t+)|(\\s{2,})", " ")
>>> .trim();
>>> else
>>> return "";
>>> }
>>> public static void main(String[] args) {
>>> FileInputStream fis = null;
>>> FileOutputStream fos = null;
>>> XMLSlideShow ppt = null;
>>> try {
>>> fis = new FileInputStream(inputFile);
>>> fos = new FileOutputStream(outputFile);
>>> ppt = new XMLSlideShow(fis);
>>> System.out.println("No of slides:" + ppt.getSlides().length); // gives 16
>>> slides.
>>> for (XSLFSlide slide : ppt.getSlides()) {
>>> for (XSLFShape shape : slide) {
>>> if (shape instanceof XSLFTextShape) {
>>> XSLFTextShape txShape = (XSLFTextShape) shape;
>>> for (XSLFTextParagraph xslfParagraph : txShape .getTextParagraphs()) {
>>> String originalText = replaceUnwantedChar(xslfParagraph .getText());
>>> if (!originalText.isEmpty()) {
>>> String translation = "";
>>> if (translation != null) {
>>> CTRegularTextRun[] ctRegularTextRun = xslfParagraph
>>> .getXmlObject().getRArray();
>>> for (int index = ctRegularTextRun.length - 1; index > 0; index--) {
>>> xslfParagraph.getXmlObject().removeR( index);
>>> }
>>> if (ctRegularTextRun.length > 0)
>>> ctRegularTextRun[0].setT(translation);
>>> }
>>> }
>>> }
>>> }
>>> }
>>> }
>>> ppt.write(fos);
>>> fos.close();
>>> fis.close();
>>> } catch (Exception ex) {
>>> ex.printStackTrace();
>>> }
>>> }
>>> }
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766.html
>>> Sent from the POI - User mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714769.html
>> To unsubscribe from Retrieving content of hyperlinked slides in powerpoint
>> files(.PPTX) through apache POI, click here.
>> NAML
>
> Final_2.7z (16M) <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/0/Final_2.7z>
> IE_Basics_English.pptx (4M) <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/1/IE_Basics_English.pptx>
> PPTXParser_Code.java (4K) <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/2/PPTXParser_Code.java>
>
>
>
>
> --
> View this message in context: http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714773.html
> Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: Retrieving content of hyperlinked slides in powerpoint files(.PPTX) through apache POI

Posted by simanchal maharana <si...@gmail.com>.

Hi Andreas,

PFA PPTX file for your review.

Thanks,
Simanchal

On Thu, Jan 30, 2014 at 2:44 AM, Andreas Beeker [via Apache POI]
<ml...@n5.nabble.com> wrote:
> Hi,
>
> is there a chance to get your .pptx-files?
>
> - link it to your stackoverflow post [1]
> - or open a bugzilla entry [2]
> - or send it to my email address (least preferred ...)
>
> Andi.
>
>
> [1]
> http://stackoverflow.com/questions/21386211/retrieving-content-of-hyperlinked-slides-in-powerpoint-files-pptx-through-apac
> [2] http://issues.apache.org/bugzilla/buglist.cgi?product=POI
>
> On 29.01.2014 07:21, simanchal maharana wrote:
>
>> I am trying to get the text content of powerpoint files and replace with
>> some
>> other text. I have a powerpoint file of 20 slides. where 13,14,15,16
>> slides
>> have hyperlink to 17,18,19 and 20th slide. I am using XMLSlideshow to
>> traverse through the slides, But it gives only 16 slides. It does not give
>> last 4 hyperlinked slides.
>>
>> Any idea really appreciable in advance how can I get content of all
>> hyper-linked slides and Replace by some other text.
>>
>> here is my code.
>>
>> import java.io.FileInputStream;
>> import java.io.FileOutputStream;
>> import org.apache.poi.xslf.usermodel.XMLSlideShow;
>> import org.apache.poi.xslf.usermodel.XSLFShape;
>> import org.apache.poi.xslf.usermodel.XSLFSlide;
>> import org.apache.poi.xslf.usermodel.XSLFTextParagraph;
>> import org.apache.poi.xslf.usermodel.XSLFTextShape;
>> import org.openxmlformats.schemas.drawingml.x2006.main.CTRegularTextRun;
>> public class Testing {
>> static String inputFile =
>> "C:\\Users\\SM78882\\Desktop\\Testing\\IE_Basics_English.pptx";
>> static String outputFile =
>> "C:\\Users\\SM78882\\Desktop\\Testing\\result.pptx";
>>
>> public static String replaceUnwantedChar(String originalString) {
>> if (null != originalString)
>> return "" + originalString.replaceAll("(\n+)|(\t+)|(\\s{2,})", " ")
>> .trim();
>> else
>> return "";
>> }
>> public static void main(String[] args) {
>> FileInputStream fis = null;
>> FileOutputStream fos = null;
>> XMLSlideShow ppt = null;
>> try {
>> fis = new FileInputStream(inputFile);
>> fos = new FileOutputStream(outputFile);
>> ppt = new XMLSlideShow(fis);
>> System.out.println("No of slides:" + ppt.getSlides().length); // gives 16
>> slides.
>> for (XSLFSlide slide : ppt.getSlides()) {
>> for (XSLFShape shape : slide) {
>> if (shape instanceof XSLFTextShape) {
>> XSLFTextShape txShape = (XSLFTextShape) shape;
>> for (XSLFTextParagraph xslfParagraph : txShape .getTextParagraphs()) {
>> String originalText = replaceUnwantedChar(xslfParagraph .getText());
>> if (!originalText.isEmpty()) {
>> String translation = "";
>> if (translation != null) {
>> CTRegularTextRun[] ctRegularTextRun = xslfParagraph
>> .getXmlObject().getRArray();
>> for (int index = ctRegularTextRun.length - 1; index > 0; index--) {
>> xslfParagraph.getXmlObject().removeR( index);
>> }
>> if (ctRegularTextRun.length > 0)
>> ctRegularTextRun[0].setT(translation);
>> }
>> }
>> }
>> }
>> }
>> }
>> ppt.write(fos);
>> fos.close();
>> fis.close();
>> } catch (Exception ex) {
>> ex.printStackTrace();
>> }
>> }
>> }
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766.html
>> Sent from the POI - User mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714769.html
> To unsubscribe from Retrieving content of hyperlinked slides in powerpoint
> files(.PPTX) through apache POI, click here.
> NAML


Final_2.7z (16M) <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/0/Final_2.7z>
IE_Basics_English.pptx (4M) <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/1/IE_Basics_English.pptx>
PPTXParser_Code.java (4K) <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/2/PPTXParser_Code.java>




--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714773.html
Sent from the POI - User mailing list archive at Nabble.com.

Re: Retrieving content of hyperlinked slides in powerpoint files(.PPTX) through apache POI

Posted by Andreas Beeker <an...@gmx.de>.

Hi,

is there a chance to get your .pptx-files?

- link it to your stackoverflow post [1]
- or open a bugzilla entry [2]
- or send it to my email address (least preferred ...)

Andi.


[1] http://stackoverflow.com/questions/21386211/retrieving-content-of-hyperlinked-slides-in-powerpoint-files-pptx-through-apac
[2] http://issues.apache.org/bugzilla/buglist.cgi?product=POI

On 29.01.2014 07:21, simanchal maharana wrote:
> I am trying to get the text content of powerpoint files and replace with some
> other text. I have a powerpoint file of 20 slides. where 13,14,15,16 slides
> have hyperlink to 17,18,19 and 20th slide. I am using XMLSlideshow to
> traverse through the slides, But it gives only 16 slides. It does not give
> last 4 hyperlinked slides.
>
> Any idea really appreciable in advance how can I get content of all
> hyper-linked slides and Replace by some other text.
>
> here is my code.
>
> import java.io.FileInputStream;
> import java.io.FileOutputStream;
> import org.apache.poi.xslf.usermodel.XMLSlideShow;
> import org.apache.poi.xslf.usermodel.XSLFShape;
> import org.apache.poi.xslf.usermodel.XSLFSlide;
> import org.apache.poi.xslf.usermodel.XSLFTextParagraph;
> import org.apache.poi.xslf.usermodel.XSLFTextShape;
> import org.openxmlformats.schemas.drawingml.x2006.main.CTRegularTextRun;
> public class Testing {
> 	static String inputFile =
> "C:\\Users\\SM78882\\Desktop\\Testing\\IE_Basics_English.pptx";
> 	static String outputFile =
> "C:\\Users\\SM78882\\Desktop\\Testing\\result.pptx";
>
> 	public static String replaceUnwantedChar(String originalString) {
> 		if (null != originalString)
> 			return "" + originalString.replaceAll("(\n+)|(\t+)|(\\s{2,})", " ")
> .trim();
> 		else
> 			return "";
> 	}
> 	public static void main(String[] args) {
> 		FileInputStream fis = null;
> 		FileOutputStream fos = null;
> 		XMLSlideShow ppt = null;
> 		try {
> 			fis = new FileInputStream(inputFile);
> 			fos = new FileOutputStream(outputFile);
> 			ppt = new XMLSlideShow(fis);
> 			System.out.println("No of slides:" + ppt.getSlides().length); // gives 16
> slides.
> 			for (XSLFSlide slide : ppt.getSlides()) {
> 				for (XSLFShape shape : slide) {
> 					if (shape instanceof XSLFTextShape) {
> 						XSLFTextShape txShape = (XSLFTextShape) shape;
> 						for (XSLFTextParagraph xslfParagraph : txShape .getTextParagraphs()) {
> 							String originalText = replaceUnwantedChar(xslfParagraph .getText());
> 							if (!originalText.isEmpty()) {
> 								String translation = "";
> 								if (translation != null) {
> 									CTRegularTextRun[] ctRegularTextRun = xslfParagraph
> .getXmlObject().getRArray();
> 									for (int index = ctRegularTextRun.length - 1; index > 0; index--) {
> 										xslfParagraph.getXmlObject().removeR( index);
> 									}
> 									if (ctRegularTextRun.length > 0)
> 										ctRegularTextRun[0].setT(translation);
> 								}
> 							}
> 						}
> 					}
> 				}
> 			}
> 			ppt.write(fos);
> 			fos.close();
> 			fis.close();
> 		} catch (Exception ex) {
> 			ex.printStackTrace();
> 		}
> 	}
> }
>
>
>
>
>
>
> --
> View this message in context: http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766.html
> Sent from the POI - User mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org