You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@sling.apache.org by anjan <po...@gmail.com> on 2013/06/05 14:19:49 UTC

Full text indexing is not happening

Hi, I am building the Sling (by checking out the latest stable version from
Jenkins) successfully and deploying it in Tomcat.  Then I am connecting to
Sling using WebDAV and adding few documents (pdf, word, text file...etc). 
But the full text indexing is not happening.  I can confirm this using the
Luke.  Only metadata is getting indexed.  As I see it, the built Sling is
using Jackrabbit 2.4.2 as the embedded repository.  So I tried to reproduce
the problem by downloading the standalone Jackrabbit 2.4.2 jar, running it,
connecting to it via WebDAV and adding few documents.  Here the full text
indexing is happening perfectly fine (confirmed looking at the indexes using
Luke).

I don't see any exceptions in logs related to indexing, but I see the below
exception after every 5 minutes though not sure if it's causing any issue.
05.06.2013 17:44:23.970 *ERROR* [pool-4-thread-1]
org.apache.sling.commons.scheduler.impl.QuartzScheduler Exception during job
execution of org.apache.sling.event.impl.jobs.JobManagerImpl@574587ff : null
java.lang.NullPointerException
	at
org.apache.sling.event.impl.jobs.MaintenanceTask.simpleEmptyFolderCleanup(MaintenanceTask.java:421)
	at
org.apache.sling.event.impl.jobs.MaintenanceTask.run(MaintenanceTask.java:340)
	at
org.apache.sling.event.impl.jobs.JobManagerImpl.maintain(JobManagerImpl.java:267)
	at
org.apache.sling.event.impl.jobs.JobManagerImpl.run(JobManagerImpl.java:363)
	at
org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:56)
	at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

Am I missing some configuration?  Any pointers?

Thanks.



--
View this message in context: http://apache-sling.73963.n3.nabble.com/Full-text-indexing-is-not-happening-tp4024383.html
Sent from the Sling - Users mailing list archive at Nabble.com.

Re: Full text indexing is not happening

Posted by anjan <po...@gmail.com>.
Hi Robert, posted the issue at
https://issues.apache.org/jira/browse/SLING-2924.



--
View this message in context: http://apache-sling.73963.n3.nabble.com/Full-text-indexing-is-not-happening-tp4024383p4024663.html
Sent from the Sling - Users mailing list archive at Nabble.com.

Re: Full text indexing is not happening

Posted by Robert Munteanu <ro...@lmn.ro>.
On Fri, Jun 21, 2013 at 11:39 AM, anjan <po...@gmail.com> wrote:
> I wanted to do some index configuration and had to delete the indexes.  But
> when I restarted the server, the indexes were not getting created for the
> existing documents where as the new documents were getting indexed properly.
>
> I noticed that the Tika bundles were not ready by the time Jackrabbit starts
> to rebuild the indexes during the Sling server start up.  So I changed the
> start level of Tika bundles(both Core and Parser) from 15 to 10 and built
> the Sling again.  This time, the indexes were rebuilt successfully during
> the server start up.

Definitely sounds like something we can fix. Can you summarise the
information and post it in a bug report at [1] ?

https://issues.apache.org/jira/browse/SLING

Thanks,

Robert


>
>
>
> --
> View this message in context: http://apache-sling.73963.n3.nabble.com/Full-text-indexing-is-not-happening-tp4024383p4024660.html
> Sent from the Sling - Users mailing list archive at Nabble.com.



--
Sent from my (old) computer

Re: Full text indexing is not happening

Posted by anjan <po...@gmail.com>.
I wanted to do some index configuration and had to delete the indexes.  But
when I restarted the server, the indexes were not getting created for the
existing documents where as the new documents were getting indexed properly.

I noticed that the Tika bundles were not ready by the time Jackrabbit starts
to rebuild the indexes during the Sling server start up.  So I changed the
start level of Tika bundles(both Core and Parser) from 15 to 10 and built
the Sling again.  This time, the indexes were rebuilt successfully during
the server start up.



--
View this message in context: http://apache-sling.73963.n3.nabble.com/Full-text-indexing-is-not-happening-tp4024383p4024660.html
Sent from the Sling - Users mailing list archive at Nabble.com.

Re: Full text indexing is not happening

Posted by anjan <po...@gmail.com>.
I changed the Tika dependency from version 1.0 to 1.2 and rebuilt Sling. 
After deploying the same in Tomcat, full text indexing is working fine.  I
tested with pdf, doc, docx, xlsx and all of them are getting indexed.

I hope this version change will not have any impact on other areas.



--
View this message in context: http://apache-sling.73963.n3.nabble.com/Full-text-indexing-is-not-happening-tp4024383p4024622.html
Sent from the Sling - Users mailing list archive at Nabble.com.

Re: Full text indexing is not happening

Posted by anjan <po...@gmail.com>.
Hi Alex, I raised the issue in Jackrabbit mailing list also and waiting for
the response.  I also want to update this ticket.
********
Upon further debugging (using Eclipse Debug mode), parse methods of the
below classes (from Tika Parsers) are called when I add documents (txt, doc
files respectively) to Jackrabbit (war file deployed to Tomcat) via WebDAV.  
org.apache.tika.parser.txt.TXTParser
org.apache.tika.parser.microsoft.OfficeParser

But when I add the same documents to Sling via WebDAV, the above methods are
not called.  What could be the issue? 



--
View this message in context: http://apache-sling.73963.n3.nabble.com/Full-text-indexing-is-not-happening-tp4024383p4024590.html
Sent from the Sling - Users mailing list archive at Nabble.com.

Re: Full text indexing is not happening

Posted by Alexander Klimetschek <ak...@adobe.com>.
I think this is a question for the Jackrabbit mailing list - unless the Sling launchpad deployment includes a "broken" configuration wrt to Jackrabbit's search config (repository.xml or indexing_configuration.xml).

For more info see
http://wiki.apache.org/jackrabbit/Search
http://wiki.apache.org/jackrabbit/IndexingConfiguration

Cheers,
Alex


Re: Full text indexing is not happening

Posted by anjan <po...@gmail.com>.
Hi Adam, thanks for confirming that you are also not able to see the document
content (pdf in your case) getting indexed.  Can someone from Sling
development team confirm if this is a bug.  Please let me know, if I need to
raise a JIRA issue for the same.



--
View this message in context: http://apache-sling.73963.n3.nabble.com/Full-text-indexing-is-not-happening-tp4024383p4024564.html
Sent from the Sling - Users mailing list archive at Nabble.com.

Re: Full text indexing is not happening

Posted by Adam Yocum <ad...@gmail.com>.
Hi Anjan,

I've recently been experimenting with storing pdf captures of web pages in
sling and was hoping they would be automatically full-text indexed so I
could do queries using 'CONTAINS'.  I'm also using a fairly recent build of
Sling 7 (mine might be a month old).  I thought that the text would be
extracted into a 'text' property, but I am not seeing this and of course
then the query does not work...  Here's a screen grab of the properties of
my upload pdf's jcr:content node (hope the list allows images)...  I would
also love to hear from someone more experienced if this is expected
behavior or not for Sling 7....



On Thu, Jun 13, 2013 at 12:57 AM, anjan <po...@gmail.com> wrote:

> I did lot of debugging without much success.  When I use  Sling 6 Web
> Application
> <
> http://mirror.metrocast.net/apache//sling/org.apache.sling.launchpad-6.war
> >
> , the full text indexing is working fine.  But in Sling 6, Apache Tika 0.6
> is used (I believe Jackrabbit internally uses Tika for metadata and text
> extraction).  Secondly, the entire Tika is bundled as a single OSGI bundle
> (Core and Parsers) in Sling 6.  But in the latest build of Sling Tika 1.0
> is
> used and Tika Core and Tikar Parsers are deployed as separate OSGI bundles.
> 'Search' is an important feature and it's is not working.  Please reply if
> anyone has noticed this issue.
>
>
>
> --
> View this message in context:
> http://apache-sling.73963.n3.nabble.com/Full-text-indexing-is-not-happening-tp4024383p4024535.html
> Sent from the Sling - Users mailing list archive at Nabble.com.
>

Re: Full text indexing is not happening

Posted by anjan <po...@gmail.com>.
I did lot of debugging without much success.  When I use  Sling 6 Web
Application
<http://mirror.metrocast.net/apache//sling/org.apache.sling.launchpad-6.war> 
, the full text indexing is working fine.  But in Sling 6, Apache Tika 0.6
is used (I believe Jackrabbit internally uses Tika for metadata and text
extraction).  Secondly, the entire Tika is bundled as a single OSGI bundle
(Core and Parsers) in Sling 6.  But in the latest build of Sling Tika 1.0 is
used and Tika Core and Tikar Parsers are deployed as separate OSGI bundles. 
'Search' is an important feature and it's is not working.  Please reply if
anyone has noticed this issue.



--
View this message in context: http://apache-sling.73963.n3.nabble.com/Full-text-indexing-is-not-happening-tp4024383p4024535.html
Sent from the Sling - Users mailing list archive at Nabble.com.