You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Madalina Rogoz <mx...@gmail.com> on 2015/03/03 11:14:18 UTC

Tikka content extractor transformation connection

Can a Tikka transformation connection be used to actually move documents
from SharePoint to Solr/FileShare or can that only be achieved with a File
System Output Connection?

What I am trying to figure out is if ManifoldCF can handle a migration from
SharePoint 2010 to Solr. I am thinking to crawl SP with ManifoldCF, but I
also need the actual Office Documents to be available outside of SharePoint
after the crawl.
So I either need to use a Tikka transformation connection that saves the
document as an attachment/binary string to Solr or use an additional File
System Output connection where I only save the Office Documents and then
figure out how to update the Solr metadata so that the document links point
to the file share instead.

Thoughts? Any idea is appreciated.
Thank you!

Re: Tikka content extractor transformation connection

Posted by Karl Wright <da...@gmail.com>.
Hi Madalina,

If you are using MCF 1.7 or greater, you can specify multiple output
connections for a job, and different transformations for each output
connection.  So you should be able to do anything you like, provided the
transformations you are attempting are supported as transformation
connectors.

For extraction to Solr, you can either extract the documents within MCF
from binary to text, and index those through the update handler, OR you can
send the documents intact to Solr via the update/extract handler.  If you
want to make a separate copy of the text somewhere, then you would probably
want to do the extraction once, and output the result both to Solr's update
handler and to the file system.

Please note that the file system output connector does not do anything with
metadata, so that would be lost.

Karl


On Tue, Mar 3, 2015 at 5:14 AM, Madalina Rogoz <mx...@gmail.com> wrote:

> Can a Tikka transformation connection be used to actually move documents
> from SharePoint to Solr/FileShare or can that only be achieved with a File
> System Output Connection?
>
> What I am trying to figure out is if ManifoldCF can handle a migration
> from SharePoint 2010 to Solr. I am thinking to crawl SP with ManifoldCF,
> but I also need the actual Office Documents to be available outside of
> SharePoint after the crawl.
> So I either need to use a Tikka transformation connection that saves the
> document as an attachment/binary string to Solr or use an additional File
> System Output connection where I only save the Office Documents and then
> figure out how to update the Solr metadata so that the document links point
> to the file share instead.
>
> Thoughts? Any idea is appreciated.
> Thank you!
>