You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by "Dirk V. Schesmer" <di...@mac.com> on 2006/04/21 10:58:02 UTC

Accessing Word and PDF Content

>>>
Hi All,
I'd like to ask for help telling me how I can extract the content of  
file testwordfile.doc and save it in the local file system. I am  
already able to save it successfully into my jackrabbit repository  
using the addDocFile() method below. Also, I can find it using the  
saveDocMethod() also shown below. But how to extract the content, to  
determine e.g. the mime type and to set the encoding needed to save  
it successfully?

Any help appreciated!

Dirk V. Schesmer
Stuttgart/Germany
-------
     public void addDocFile(Node root, Session session) throws  
Exception {

         Node folderNode = root.addNode("foldernode", "nt:folder");
         File docFile = new
                        File("/Users/dschesmer/jackrabbitJCR/ 
testdocuments/testwordfile.doc");
         Node docFileNode = folderNode.addNode(docFile.getName(),
                                               "nt:file");
         String docMimeType = "application/msword";
         Node docResourceNode = docFileNode.addNode("jcr:content",
                 "nt:resource");

         docResourceNode.setProperty("jcr:mimeType", docMimeType);
         //resourceNode.setProperty("jcr:encoding", ""); //Needed?
         docResourceNode.setProperty("jcr:data", new
                                     FileInputStream(docFile));
         Calendar docLastModified = Calendar.getInstance();
         docLastModified.setTimeInMillis(docFile.lastModified());
         docResourceNode.setProperty("jcr:lastModified",  
docLastModified);
         session.save();
     }

     public void saveDocFile(Node root, Session session) throws  
Exception {

         //Now do my test search
         Workspace workspace = session.getWorkspace();
         QueryManager queryManager = workspace.getQueryManager();
         Query query =
                 queryManager.createQuery(
                         "/jcr:root/foldernode//*", Query.XPATH);
         QueryResult result = query.execute();

         NodeIterator niter = result.getNodes();
         while (niter.hasNext()) {
             Node n = niter.nextNode();
// toDo: extract word doc and save it into the file system...
             System.out.println("node: "+n);
         }
     }

Re: Accessing Word and PDF Content

Posted by Marcel Reutegger <ma...@gmx.net>.

Dirk V. Schesmer wrote:
> I can now get baxck my MS-Word file!
>> an optional encoding when supported by the mime type
> 
> How can I find out which mime type supports and/or requires which 
> encoding?

as a general rule of thumb binary files do not have an encoding whereas 
  files that consist of plain text have an encoding.

see: http://www.iana.org/assignments/media-types/

> Does the JCR-API support me here?

no.

> BTW: Is there a comprehensive demo application I can have a look at to 
> further endeepen my JCR knowledge ?

there are a couple of other projects / applications that work on top of 
jackrabbit. you might want to try them out. see:
http://wiki.apache.org/jackrabbit/JcrLinks

regards
  marcel

Re: Accessing Word and PDF Content

Posted by "Dirk V. Schesmer" <di...@mac.com>.

>>

Marcel, thanks!
I can now get baxck my MS-Word file!
> an optional encoding when supported by the mime type

How can I find out which mime type supports and/or requires which  
encoding? Does the JCR-API support me here?

Thanks for help,
Dirk

BTW: Is there a comprehensive demo application I can have a look at  
to further endeepen my JCR knowledge ?

>> docResourceNode.setProperty("jcr:mimeType", docMimeType);
>>         //resourceNode.setProperty("jcr:encoding", ""); //Needed?

Am 24.04.2006 um 08:46 schrieb Marcel Reutegger:

> Dirk V. Schesmer wrote:
>>>>>
>> Hi All,
>> I'd like to ask for help telling me how I can extract the content  
>> of file testwordfile.doc and save it in the local file system. I  
>> am already able to save it successfully into my jackrabbit  
>> repository using the addDocFile() method below. Also, I can find  
>> it using the saveDocMethod() also shown below. But how to extract  
>> the content, to determine e.g. the mime type and to set the  
>> encoding needed to save it successfully?
>
> an nt:resource is just a binary stream and may have an optional  
> encoding when supported by the mime type. e.g. a word document will  
> not have an encoding, but a plain text file will have one.
>
> to read the document from the repository you simply navigate to the  
> binary data property and get the value as an input stream:
>
> Node resource = ...
> InputStream in = resource.getProperty("jcr:data").getStream();
> // now spool the stream to a local file...
>
>
> regards
>  marcel
>
>> Any help appreciated!
>> Dirk V. Schesmer
>> Stuttgart/Germany
>> -------
>>     public void addDocFile(Node root, Session session) throws  
>> Exception {
>>         Node folderNode = root.addNode("foldernode", "nt:folder");
>>         File docFile = new
>>                        File("/Users/dschesmer/jackrabbitJCR/ 
>> testdocuments/testwordfile.doc");
>>         Node docFileNode = folderNode.addNode(docFile.getName(),
>>                                               "nt:file");
>>         String docMimeType = "application/msword";
>>         Node docResourceNode = docFileNode.addNode("jcr:content",
>>                 "nt:resource");
>>         docResourceNode.setProperty("jcr:mimeType", docMimeType);
>>         //resourceNode.setProperty("jcr:encoding", ""); //Needed?
>>         docResourceNode.setProperty("jcr:data", new
>>                                     FileInputStream(docFile));
>>         Calendar docLastModified = Calendar.getInstance();
>>         docLastModified.setTimeInMillis(docFile.lastModified());
>>         docResourceNode.setProperty("jcr:lastModified",  
>> docLastModified);
>>         session.save();
>>     }
>>     public void saveDocFile(Node root, Session session) throws  
>> Exception {
>>         //Now do my test search
>>         Workspace workspace = session.getWorkspace();
>>         QueryManager queryManager = workspace.getQueryManager();
>>         Query query =
>>                 queryManager.createQuery(
>>                         "/jcr:root/foldernode//*", Query.XPATH);
>>         QueryResult result = query.execute();
>>         NodeIterator niter = result.getNodes();
>>         while (niter.hasNext()) {
>>             Node n = niter.nextNode();
>> // toDo: extract word doc and save it into the file system...
>>             System.out.println("node: "+n);
>>         }
>>     }
>

Re: Accessing Word and PDF Content

Posted by Marcel Reutegger <ma...@gmx.net>.

Dirk V. Schesmer wrote:
>>>>
> Hi All,
> I'd like to ask for help telling me how I can extract the content of 
> file testwordfile.doc and save it in the local file system. I am already 
> able to save it successfully into my jackrabbit repository using the 
> addDocFile() method below. Also, I can find it using the saveDocMethod() 
> also shown below. But how to extract the content, to determine e.g. the 
> mime type and to set the encoding needed to save it successfully?

an nt:resource is just a binary stream and may have an optional encoding 
when supported by the mime type. e.g. a word document will not have an 
encoding, but a plain text file will have one.

to read the document from the repository you simply navigate to the 
binary data property and get the value as an input stream:

Node resource = ...
InputStream in = resource.getProperty("jcr:data").getStream();
// now spool the stream to a local file...


regards
  marcel

> Any help appreciated!
> 
> Dirk V. Schesmer
> Stuttgart/Germany
> -------
>     public void addDocFile(Node root, Session session) throws Exception {
> 
>         Node folderNode = root.addNode("foldernode", "nt:folder");
>         File docFile = new
>                        
> File("/Users/dschesmer/jackrabbitJCR/testdocuments/testwordfile.doc");
>         Node docFileNode = folderNode.addNode(docFile.getName(),
>                                               "nt:file");
>         String docMimeType = "application/msword";
>         Node docResourceNode = docFileNode.addNode("jcr:content",
>                 "nt:resource");
> 
>         docResourceNode.setProperty("jcr:mimeType", docMimeType);
>         //resourceNode.setProperty("jcr:encoding", ""); //Needed?
>         docResourceNode.setProperty("jcr:data", new
>                                     FileInputStream(docFile));
>         Calendar docLastModified = Calendar.getInstance();
>         docLastModified.setTimeInMillis(docFile.lastModified());
>         docResourceNode.setProperty("jcr:lastModified", docLastModified);
>         session.save();
>     }
> 
>     public void saveDocFile(Node root, Session session) throws Exception {
> 
>         //Now do my test search
>         Workspace workspace = session.getWorkspace();
>         QueryManager queryManager = workspace.getQueryManager();
>         Query query =
>                 queryManager.createQuery(
>                         "/jcr:root/foldernode//*", Query.XPATH);
>         QueryResult result = query.execute();
> 
>         NodeIterator niter = result.getNodes();
>         while (niter.hasNext()) {
>             Node n = niter.nextNode();
> // toDo: extract word doc and save it into the file system...
>             System.out.println("node: "+n);
>         }
>     }
> 
>