You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by smg11 <sa...@gmail.com> on 2015/11/30 06:58:10 UTC

Alternative to PreExtractedTextProvider

Hi,

We are migrating from Jack rabbit 2.X to OAK. Repository Upgrade is
successful.But now we are facing a issue with lucene index creation after
migrating to OAK.It is taking 4-5 days for creating just a lucene index over
1GB of data.We are using Non OSGI implementation.So we cannot use OSGI pre
text extractor.
Kindly requesting you to advise how to proceed further.Is there any way we
can use pre text extractor in Non OSGI ?





--
View this message in context: http://jackrabbit.510166.n4.nabble.com/Alternative-to-PreExtractedTextProvider-tp4663379.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: Alternative to PreExtractedTextProvider

Posted by smg11 <sa...@gmail.com>.
Thanks Chetan.
As of postpone the plan of using OSGI.  So going ahead with luene indexing
while migration only.



--
View this message in context: http://jackrabbit.510166.n4.nabble.com/Alternative-to-PreExtractedTextProvider-tp4663379p4663469.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: Alternative to PreExtractedTextProvider

Posted by Chetan Mehrotra <ch...@gmail.com>.
On Tue, Dec 1, 2015 at 7:24 PM, smg11 <sa...@gmail.com> wrote:
> As per my understanding, If we go with OSGI approach,  new oak file
> structure is created in local as you mentioned above.

Thats only for the SegmentNodeStore based deployment. For Mongo based
repository home is used to just for some cached index files and json
config file

> So to generate csv file using tika and oak run we need to provide fds-path.
> But we are unable to give fds path with this kind of structure. Any
> suggestion for this?

You can use the FDS path for the FileDataStore associated with
Jackrabbit setup . Also its better to keep FileDataStore and not store
binary files in Mongo. So while performing migration/upgrade retain
FileDataStore

Just for completeness - You can generate the csv programatically. For
e.g. when running in OSGi container and using Script Console you can
use script at [1]

Chetan Mehrotra
[1] https://gist.github.com/chetanmeh/be66363172532e09ee7d

Re: Alternative to PreExtractedTextProvider

Posted by smg11 <sa...@gmail.com>.
Thanks Chetan for the solution provided.

As per my understanding, If we go with OSGI approach,  new oak file
structure is created in local as you mentioned above.

Currently(without OSGI) the file structure created is in MongoDb/data
folder. Ex: dbname_0, dbname_ns 
So to generate csv file using tika and oak run we need to provide fds-path.
But we are unable to give fds path with this kind of structure. Any
suggestion for this?

One more question is if we created lucene index in local file store whether
it creates any issue while mongo clustering?













--
View this message in context: http://jackrabbit.510166.n4.nabble.com/Alternative-to-PreExtractedTextProvider-tp4663379p4663394.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: Alternative to PreExtractedTextProvider

Posted by Chetan Mehrotra <ch...@gmail.com>.
On Mon, Nov 30, 2015 at 11:28 AM, smg11 <sa...@gmail.com> wrote:
> We are using Non OSGI implementation.So we cannot use OSGI pre
> text extractor.

We can look into exposing the PreExtractedTextProvider however using
Oak directly i.e. leaving out all OSGi specific part is tricky and you
need to know lots of internal to get in right.

Instead of that you can make use of OSGI support without requiring
your code to run in an OSGi container. For an example have a look at
oak-standalone example [1] which is currently being developed (and as
of now in working state)

Chetan Mehrotra
[1] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-examples/standalone