You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Paco Avila <pa...@git.es> on 2006/03/23 00:14:34 UTC

some troubles with search

I put an text file in the repository and the search return the found
document (I put a text file called "prueba.txt" with MIME "plain/text").
But if I put a PDF file (with MIME "application/pdf") no data is
returned. It seems that Lucence can't find the desired word because it
don't use the correct file filter. Sample code below.

The JAR jackrabbit-textfilters is in the CLASSPATH, so I don't know what
to do.

Thanks!

--------------- CODE ---------------
Hashtable env = new Hashtable();
env.put(Context.INITIAL_CONTEXT_FACTORY, 
	"org.apache.jackrabbit.core.jndi.provider.DummyInitialContextFactory");
env.put(Context.PROVIDER_URL, "localhost");
InitialContext ctx = new InitialContext(env);
		
// Repository config
String configFile = "repotest/repository.xml";
String repHomeDir = "repotest";
RegistryHelper.registerRepository(ctx, "repo", configFile, repHomeDir,
true);

// Obtain the repository through a JNDI lookup
Repository r = (Repository) ctx.lookup("repo");
		
// Create a new repository session, after authenticating
Session session = r.login(new SimpleCredentials("paco",
"".toCharArray()), null);

// Namespace registration
Workspace ws = session.getWorkspace();
ws.getNamespaceRegistry().registerNamespace("okm",
"http://www.openkm.org/1.0");
		
// Node creation
Node root = session.getRootNode();
Node okmRoot = root.addNode("okm:root", "nt:folder");
okmRoot.addMixin("mix:referenceable");
session.save();
		
// Get node
Node node = session.getRootNode().getNode("okm:root");
System.out.println("Node Name: "+node.getName());
System.out.println("Node UUID: "+node.getUUID());
		
// Add document
String fileName = "prueba.pdf";
Node fileNode = okmRoot.addNode(fileName, "nt:file");
fileNode.addMixin("mix:referenceable");
fileNode.addMixin("mix:lockable");
fileNode.addMixin("mix:versionable");
Node resNode = fileNode.addNode("jcr:content", "nt:resource");
resNode.setProperty("jcr:mimeType", "application/pdf");
resNode.setProperty("jcr:data", new FileInputStream(fileName));
resNode.setProperty("jcr:lastModified", Calendar.getInstance());
session.save();
		
// Search
String statement =
"/jcr:root/okm:root//element(*,nt:resource)[jcr:contains(.,'hola')]";
Workspace workspace = session.getWorkspace();
QueryManager queryManager = workspace.getQueryManager();
Query query = queryManager.createQuery(statement,
javax.jcr.query.Query.XPATH);
QueryResult result = query.execute();
		
System.out.println("Search results:");
for (NodeIterator it = result.getNodes(); it.hasNext();) {
	Node sNode = (Node) it.next();
	System.out.println(sNode.getParent().getUUID());
}
--------------- CODE ---------------

-- 
Paco Avila
GIT Consultors

Re: some troubles with search

Posted by Marcel Reutegger <ma...@gmx.net>.

Paco Avila wrote:
> Now works: I've to modify SearchIndex.java and add the other text
> filters to DEFAULT_TEXT_FILTERS:
> 
> /**
>  * Default text filters.
>  */
> public static final String DEFAULT_TEXT_FILTERS =
> TextPlainTextFilter.class.getName()+
> ",org.apache.jackrabbit.core.query.PdfTextFilter,org.apache.jackrabbit.core.query.MsWordTextFilter"+
> ",org.apache.jackrabbit.core.query.OpenOfficeTextFilter";
> 
> Is this the right way? I don't think this is a good practice :(

no, not really ;)

there is a configuration parameter that allows you to set the various 
text filters.

See also a previous thread:
http://article.gmane.org/gmane.comp.apache.jackrabbit.devel/6140

regards
  marcel

Re: some troubles with search

Posted by Paco Avila <pa...@git.es>.

El jue, 23-03-2006 a las 00:14 +0100, Paco Avila escribió:
> I put an text file in the repository and the search return the found
> document (I put a text file called "prueba.txt" with MIME "plain/text").
> But if I put a PDF file (with MIME "application/pdf") no data is
> returned. It seems that Lucence can't find the desired word because it
> don't use the correct file filter. Sample code below.
> 
> The JAR jackrabbit-textfilters is in the CLASSPATH, so I don't know what
> to do.

Now works: I've to modify SearchIndex.java and add the other text
filters to DEFAULT_TEXT_FILTERS:

/**
 * Default text filters.
 */
public static final String DEFAULT_TEXT_FILTERS =
TextPlainTextFilter.class.getName()+
",org.apache.jackrabbit.core.query.PdfTextFilter,org.apache.jackrabbit.core.query.MsWordTextFilter"+
",org.apache.jackrabbit.core.query.OpenOfficeTextFilter";

Is this the right way? I don't think this is a good practice :(

-- 
Paco Avila
GIT Consultors