You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Paco Avila <pa...@git.es> on 2006/03/23 00:14:34 UTC
some troubles with search
I put an text file in the repository and the search return the found
document (I put a text file called "prueba.txt" with MIME "plain/text").
But if I put a PDF file (with MIME "application/pdf") no data is
returned. It seems that Lucence can't find the desired word because it
don't use the correct file filter. Sample code below.
The JAR jackrabbit-textfilters is in the CLASSPATH, so I don't know what
to do.
Thanks!
--------------- CODE ---------------
Hashtable env = new Hashtable();
env.put(Context.INITIAL_CONTEXT_FACTORY,
"org.apache.jackrabbit.core.jndi.provider.DummyInitialContextFactory");
env.put(Context.PROVIDER_URL, "localhost");
InitialContext ctx = new InitialContext(env);
// Repository config
String configFile = "repotest/repository.xml";
String repHomeDir = "repotest";
RegistryHelper.registerRepository(ctx, "repo", configFile, repHomeDir,
true);
// Obtain the repository through a JNDI lookup
Repository r = (Repository) ctx.lookup("repo");
// Create a new repository session, after authenticating
Session session = r.login(new SimpleCredentials("paco",
"".toCharArray()), null);
// Namespace registration
Workspace ws = session.getWorkspace();
ws.getNamespaceRegistry().registerNamespace("okm",
"http://www.openkm.org/1.0");
// Node creation
Node root = session.getRootNode();
Node okmRoot = root.addNode("okm:root", "nt:folder");
okmRoot.addMixin("mix:referenceable");
session.save();
// Get node
Node node = session.getRootNode().getNode("okm:root");
System.out.println("Node Name: "+node.getName());
System.out.println("Node UUID: "+node.getUUID());
// Add document
String fileName = "prueba.pdf";
Node fileNode = okmRoot.addNode(fileName, "nt:file");
fileNode.addMixin("mix:referenceable");
fileNode.addMixin("mix:lockable");
fileNode.addMixin("mix:versionable");
Node resNode = fileNode.addNode("jcr:content", "nt:resource");
resNode.setProperty("jcr:mimeType", "application/pdf");
resNode.setProperty("jcr:data", new FileInputStream(fileName));
resNode.setProperty("jcr:lastModified", Calendar.getInstance());
session.save();
// Search
String statement =
"/jcr:root/okm:root//element(*,nt:resource)[jcr:contains(.,'hola')]";
Workspace workspace = session.getWorkspace();
QueryManager queryManager = workspace.getQueryManager();
Query query = queryManager.createQuery(statement,
javax.jcr.query.Query.XPATH);
QueryResult result = query.execute();
System.out.println("Search results:");
for (NodeIterator it = result.getNodes(); it.hasNext();) {
Node sNode = (Node) it.next();
System.out.println(sNode.getParent().getUUID());
}
--------------- CODE ---------------
--
Paco Avila
GIT Consultors
Re: some troubles with search
Posted by Marcel Reutegger <ma...@gmx.net>.
Paco Avila wrote:
> Now works: I've to modify SearchIndex.java and add the other text
> filters to DEFAULT_TEXT_FILTERS:
>
> /**
> * Default text filters.
> */
> public static final String DEFAULT_TEXT_FILTERS =
> TextPlainTextFilter.class.getName()+
> ",org.apache.jackrabbit.core.query.PdfTextFilter,org.apache.jackrabbit.core.query.MsWordTextFilter"+
> ",org.apache.jackrabbit.core.query.OpenOfficeTextFilter";
>
> Is this the right way? I don't think this is a good practice :(
no, not really ;)
there is a configuration parameter that allows you to set the various
text filters.
See also a previous thread:
http://article.gmane.org/gmane.comp.apache.jackrabbit.devel/6140
regards
marcel
Re: some troubles with search
Posted by Paco Avila <pa...@git.es>.
El jue, 23-03-2006 a las 00:14 +0100, Paco Avila escribió:
> I put an text file in the repository and the search return the found
> document (I put a text file called "prueba.txt" with MIME "plain/text").
> But if I put a PDF file (with MIME "application/pdf") no data is
> returned. It seems that Lucence can't find the desired word because it
> don't use the correct file filter. Sample code below.
>
> The JAR jackrabbit-textfilters is in the CLASSPATH, so I don't know what
> to do.
Now works: I've to modify SearchIndex.java and add the other text
filters to DEFAULT_TEXT_FILTERS:
/**
* Default text filters.
*/
public static final String DEFAULT_TEXT_FILTERS =
TextPlainTextFilter.class.getName()+
",org.apache.jackrabbit.core.query.PdfTextFilter,org.apache.jackrabbit.core.query.MsWordTextFilter"+
",org.apache.jackrabbit.core.query.OpenOfficeTextFilter";
Is this the right way? I don't think this is a good practice :(
--
Paco Avila
GIT Consultors