You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Daniel Prill <da...@web.de> on 2011/01/28 23:00:45 UTC

get Path of cells referenced to cells of another file

Hello,
i extract data from .xls and .xlsx files. Now my excelfile has cells which reference to cells of another file. With cell.getFormulaCell(), i get for the first referenced file [1]tablename!cell, for the second referenced file [2]tablename!cell and so on. How can i get the absolute path of the referenced files or how can i get a list(?) of all referenced files?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: get Path of cells referenced to cells of another file

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
Turns out - at least with regard to the xlsx format OOXML based files that
the whole process is even easier that I first through. Think back to the
formula where you would see something llie this - [1]Sheet1!A1 - well, it
seems as if the integer value identifes the files externalLink1.xml and
externalLink1.xml.rels that together contain the full path information (well
the externalLink1.xml.rels contains teh full path information). There will
be a matching pair of these files for each external workbook that is linked
to. So, if you link to two workbooks there will be four such files
externalLink1.xml/externalLink1.xml.rels, externalLink2.xml and
externalLink2.xml.rels. If you lionk to onw rokbook more than a single time
then the information that specifies the cells linked from with be in the
externalLinkN.xml file and the path information in the
externalLinkN.xml.rels file.

Am still playing about with code to get at the information these files
contain. I suspect it should be possible to iterate through the various
relationships and obtain the raw XML which can then simply be hacked to
strip out the path information. Time is a little pressing at the moment, I
am away from my base and working on a laptop that ought not to have the api
on it, let alone Office, so I cannot promise when I will make any further
progress.

Yours

Mark B
-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/get-Path-of-cells-referenced-to-cells-of-another-file-tp3362200p3366544.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: get Path of cells referenced to cells of another file

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
The only file format that I can really talk about here is the newer
SpreadsheetML format in the .xlsx files. These files are simply zipped xml
and so it is very easy to look into the internal workings and see just where
the information is stored. To discover how to drill down to the actual path
to an external file, I simply created two workbooks and linked from a cell
in one workbook to a cell in another workbook using the familar
=[Book1.xlsx]'Sheet1'!A1 syntax. After saving the second workbook away to
disc, I unzipped it and had a good dig around inside it's contents.

Lookinf into the xml for the workbbok, the contents of cell A1 were just as
you described them, the formula I entered had been modified by Excel to look
like this - [1]'Sheet1'!A1. The integer value 1 had replaced the name of and
path to the external workbook and the question then was, how to resolve this
back to the actual path. The answer is really quite straightforward becuase,
when the file was saved originally, Excel created a folder called
externalLinks and in this folder there were two entries, another folder
called _rels and a file called externalLink1.xml. The externalLink1.xml file
contained an entry that allows a link to be formed between an identifier -
in this case rID1 and the specific cell in question. That identifier then
makes it possible to drill down into the single file that the _rels folder
hods which goes by the name of externalLink1.xml.rels. This file contains
xml markup that links the identifier to an actual file through the Target
attribute of the Relationship element.

To summarise then, the process would be;

1. Get the formula from the cell and if it contains a link to an external
file; I guess searching for embedded '[' and ']' characters would be a good
place to start.
2. If the formula does contain a link to an external file, then look into
the externalLink1.xml file, search for the cells reference and locate the
associated relationship id for the external workbook.
3. Look into the externalLink1.xml.rels file and locate the relationship
element with the matching identifier. Read the associated Target attribute
and that should be the address of the external file.

It should be possible to do all of this from within POI as it is possible to
recover the various relationships that exist between elements but I have
never tried to do this myself. If you fancy having a go, then post to the
list and see if anyone can help you accomplish this task and, if you do
manage to find out how to do it, please post to the list so that others can
benefit from your work. It may also be worth posting onto the dev list to
ask for specific help with this problem. The people who frequent that list
have a far better knowledge of the internal workings of POI than do I.

Finally, I created a workbook with just a single cell linked to an external
file and so have not tested out this theory; I would bet that more than one
externalLink1.xml and externalLin1.xml.rels file will be created if there
are multiple liks to different external workbooks. I will through levae this
for you to test out as I guess you have a few files on hand that you could
unzip. With this information, it may be possible to refine the algorithm
slightly and make the search for the names of and paths to external
workbooks more efficient.

Now, with regard to the older binary file format, life is a little harder as
it is much more difficult to look into the file and see just where specific
items of data are stored. You may want to begin by taking a look at the
ForkedEvaluator class that does offer some support for linked wokrbooks but
I do not know how much use this will be to you. Aside from that, I do not
know how to suggest you proceed with the older file format other than using
the BiffViewer and reviewing Microsoft's file format specification to locate
whereabouts in the file the external workbooks ID is stored.

Yours

Mark B
-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/get-Path-of-cells-referenced-to-cells-of-another-file-tp3362200p3362651.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org