You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Jyo <je...@gmail.com> on 2010/05/13 13:08:42 UTC

Re: Searching problems

Hi Alex,
        I have tried to search for a word in text file from jackrabbit
repository. But my query is not returning any result. I read from the forum
that you have done successfull word search from several files like
doc,excel, pdf. Please guide me how to do the word search from the files.
Below are the code changes I have done in my application.
1) Added the Indexing-configuration.xml at the same place where
workspace.xml is there.
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM
"http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
               xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
    <aggregate primaryType="nt:file">
        <include>jcr:content</include>
        <include>jcr:content/*</include>
        <include-property>jcr:content/jcr:lastModified</include-property>
    </aggregate>
	
</configuration>
 
2) Added the indexing-configuration.xml as a param to the SearchIndex in the
repository.xml and in workspcae.xml

3)Added the extractor clasess as parameters to the SearchIndex in
repository.xml
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            
            
	    
	    
        </SearchIndex>

4)Added the below code 
			FileInputStream file = new FileInputStream("C:\\New\\New.txt");
			Binary binary =
JackrabbitPlugin.getSession().getValueFactory().createBinary(file);
			binary.dispose(); 
			Calendar cal = Calendar.getInstance(); 
			cal.set(2008, Calendar.JUNE, 10);
			//in below line getBlogEntryNode() method returns the blogEntryNode.
			Node blogEntryNode = getBlogEntryNode(blogTitle, session);
			Node newNode=blogEntryNode.addNode("newNode", "nt:folder");
			Node NewblogEntry = newNode.addNode("NewblogEntry", "nt:file");
			Node resNode = NewblogEntry.addNode("jcr:content", "nt:resource"); 
			resNode.setProperty("jcr:mimeType", "text/html");
			resNode.setProperty("jcr:data", binary);
			resNode.setProperty("jcr:lastModified",cal);
			session.save();
5)New.txt file contains the text like "This is my first searching
application". Now I want to search the word "first" from my file.So my query
is like

Query query = queryManager.createQuery("//element(*,
nt:file)[jcr:contains(., 'first')]",Query.XPATH);
QueryResult queryResult = query.execute();	 
NodeIterator queryResultNodeIterator = queryResult.getNodes();
while (queryResultNodeIterator.hasNext()) {
.......
......
}


I am not getting where I went wrong. query is not returning any result. Just
for testing I am adding only .txt file to repository and searching the text
from the file. But in my requirement I need to add any format files like
pdf,excel, word ...etc. and do the text search from those files. It is very
urgent requirement. please help me in this. If possible please give me some
sample code. Thanks in advance.


Thanks,
Jyo

-- 
View this message in context: http://jackrabbit.510166.n4.nabble.com/Searching-problems-tp513070p2197400.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

RE: Searching problems

Posted by Jenni Pothu <Je...@virtusa.com>.
Hi Alex,
	 Now I am able to add different format files to the repository and able to do word search from the added files.
I have changed the mimetype and query also changed to 
Query query = queryManager.createQuery
				("//*[jcr:contains(., '"+searchText+"')]/rep:excerpt(.)",Query.XPATH);
Thank you very much for your guidance. It's really helpful.

Thanks,
Jyo

-----Original Message-----
From: Alexander Klimetschek [mailto:aklimets@day.com] 
Sent: Friday, May 14, 2010 6:42 PM
To: users@jackrabbit.apache.org
Subject: Re: Searching problems

On Thu, May 13, 2010 at 13:08, Jyo <je...@gmail.com> wrote:
>                        FileInputStream file = new FileInputStream("C:\\New\\New.txt");
>                        Binary binary =
> JackrabbitPlugin.getSession().getValueFactory().createBinary(file);
>                        binary.dispose();

You shouldn't dispose the binary before you pass it to the property
and before you save the session!

>                        Calendar cal = Calendar.getInstance();
>                        cal.set(2008, Calendar.JUNE, 10);
>                        //in below line getBlogEntryNode() method returns the blogEntryNode.
>                        Node blogEntryNode = getBlogEntryNode(blogTitle, session);
>                        Node newNode=blogEntryNode.addNode("newNode", "nt:folder");
>                        Node NewblogEntry = newNode.addNode("NewblogEntry", "nt:file");
>                        Node resNode = NewblogEntry.addNode("jcr:content", "nt:resource");
>                        resNode.setProperty("jcr:mimeType", "text/html");

Make sure you set the proper mime type here, depending on the content
you put in (above you set text/html). I am not sure if the Tika-based
text extractor depends on this, but I think so.

If it still fails, it could be because there are problems with the
specific document you have. Look for text extraction exceptions in the
log. And try with a plain text file.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

--------------------------------------------------------------------------------------------

This message, including any attachments, contains confidential information 
intended for a specific individual and purpose, and is intended for the addressee only. Any unauthorized disclosure, use, dissemination, copying, or distribution of 
this message or any of its attachments or the information contained in this e-mail, or the taking of any action based on it, is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return e-mail and delete this message.

--------------------------------------------------------------------------------------------

Re: Searching problems

Posted by Alexander Klimetschek <ak...@day.com>.
On Thu, May 13, 2010 at 13:08, Jyo <je...@gmail.com> wrote:
>                        FileInputStream file = new FileInputStream("C:\\New\\New.txt");
>                        Binary binary =
> JackrabbitPlugin.getSession().getValueFactory().createBinary(file);
>                        binary.dispose();

You shouldn't dispose the binary before you pass it to the property
and before you save the session!

>                        Calendar cal = Calendar.getInstance();
>                        cal.set(2008, Calendar.JUNE, 10);
>                        //in below line getBlogEntryNode() method returns the blogEntryNode.
>                        Node blogEntryNode = getBlogEntryNode(blogTitle, session);
>                        Node newNode=blogEntryNode.addNode("newNode", "nt:folder");
>                        Node NewblogEntry = newNode.addNode("NewblogEntry", "nt:file");
>                        Node resNode = NewblogEntry.addNode("jcr:content", "nt:resource");
>                        resNode.setProperty("jcr:mimeType", "text/html");

Make sure you set the proper mime type here, depending on the content
you put in (above you set text/html). I am not sure if the Tika-based
text extractor depends on this, but I think so.

If it still fails, it could be because there are problems with the
specific document you have. Look for text extraction exceptions in the
log. And try with a plain text file.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com