You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Timo Selvaraj <ti...@gmail.com> on 2015/04/15 20:49:19 UTC

Metadata Adjuster transformer

Hi,

I need to change the incoming meta data into a specified format.

I want to change 

"Content-Type":"text/html"
to

"contenttype":"HTML"
Has anyone done something similar with the metadata adjuster?

Thanks,
Timo

Re: Metadata Adjuster transformer

Posted by Timo Selvaraj <ti...@gmail.com>.
Hi Rafa,

Let me check this document field.

Thanks,
Timo

> On Apr 16, 2015, at 3:44 AM, Rafa Haro <rh...@apache.org> wrote:
> 
> Hi Timo, 
> 
> If you are using the Tika transformer, probably it is also extracting the document type as general metadata field and you can manipulate that one in the metadata adjuster
> 
> Cheers,
> Rafa
> 
> 
> En 15 de abril de 2015 en 21:24:17, Karl Wright (daddywri@gmail.com <ma...@gmail.com>) escrito:
> 
>> Hi Timo,
>> 
>> Yes, you can do that, but not with the current metadata adjuster.  It does not allow you to access the core fields.
>> 
>> Karl
>> 
>> 
>> On Wed, Apr 15, 2015 at 3:16 PM, Timo Selvaraj <timo.selvaraj@gmail.com <ma...@gmail.com>> wrote:
>> Thanks Karl.
>> 
>> Can I create a new meta field contenttype and add the value HTML based on the mime type value in the core field?
>> 
>> Timo
>> 
>>> On Apr 15, 2015, at 3:13 PM, Karl Wright <daddywri@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Hi Timo,
>>> 
>>> The metadata adjuster currently does not give you access to the core document fields, only to the document's general metadata.  Basically, anything that ManifoldCF uses to make crawling decisions based upon is not accessible or modifiable by the adjuster, because it's not general metadata.
>>> 
>>> That include the document's file name, content/mime type, length, creation date, and modification date.
>>> 
>>> Technically it is possible to build a document transformer which would copy internal fields like those described into general metadata fields that could then be manipulated with the metadata adjuster.  Some connectors already supply such general metadata fields, but it is by no means a consistent practice.
>>> 
>>> Karl
>>> 
>>> 
>>> On Wed, Apr 15, 2015 at 2:49 PM, Timo Selvaraj <timo.selvaraj@gmail.com <ma...@gmail.com>> wrote:
>>> Hi,
>>> 
>>> I need to change the incoming meta data into a specified format.
>>> 
>>> I want to change 
>>> 
>>> "Content-Type":"text/html"
>>> to
>>> 
>>> "contenttype":"HTML"
>>> Has anyone done something similar with the metadata adjuster?
>>> 
>>> Thanks,
>>> Timo


Re: Metadata Adjuster transformer

Posted by Rafa Haro <rh...@apache.org>.
Hi Timo, 

If you are using the Tika transformer, probably it is also extracting the document type as general metadata field and you can manipulate that one in the metadata adjuster

Cheers,
Rafa


En 15 de abril de 2015 en 21:24:17, Karl Wright (daddywri@gmail.com) escrito:

Hi Timo,

Yes, you can do that, but not with the current metadata adjuster.  It does not allow you to access the core fields.

Karl


On Wed, Apr 15, 2015 at 3:16 PM, Timo Selvaraj <ti...@gmail.com> wrote:
Thanks Karl.

Can I create a new meta field contenttype and add the value HTML based on the mime type value in the core field?

Timo

On Apr 15, 2015, at 3:13 PM, Karl Wright <da...@gmail.com> wrote:

Hi Timo,

The metadata adjuster currently does not give you access to the core document fields, only to the document's general metadata.  Basically, anything that ManifoldCF uses to make crawling decisions based upon is not accessible or modifiable by the adjuster, because it's not general metadata.

That include the document's file name, content/mime type, length, creation date, and modification date.

Technically it is possible to build a document transformer which would copy internal fields like those described into general metadata fields that could then be manipulated with the metadata adjuster.  Some connectors already supply such general metadata fields, but it is by no means a consistent practice.

Karl


On Wed, Apr 15, 2015 at 2:49 PM, Timo Selvaraj <ti...@gmail.com> wrote:
Hi,

I need to change the incoming meta data into a specified format.

I want to change 

"Content-Type":"text/html"
to

"contenttype":"HTML"
Has anyone done something similar with the metadata adjuster?

Thanks,
Timo




Re: Metadata Adjuster transformer

Posted by Timo Selvaraj <ti...@gmail.com>.
Thanks Karl.


> On Apr 15, 2015, at 3:21 PM, Karl Wright <da...@gmail.com> wrote:
> 
> Hi Timo,
> 
> Yes, you can do that, but not with the current metadata adjuster.  It does not allow you to access the core fields.
> 
> Karl
> 
> 
> On Wed, Apr 15, 2015 at 3:16 PM, Timo Selvaraj <timo.selvaraj@gmail.com <ma...@gmail.com>> wrote:
> Thanks Karl.
> 
> Can I create a new meta field contenttype and add the value HTML based on the mime type value in the core field?
> 
> Timo
> 
>> On Apr 15, 2015, at 3:13 PM, Karl Wright <daddywri@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi Timo,
>> 
>> The metadata adjuster currently does not give you access to the core document fields, only to the document's general metadata.  Basically, anything that ManifoldCF uses to make crawling decisions based upon is not accessible or modifiable by the adjuster, because it's not general metadata.
>> 
>> That include the document's file name, content/mime type, length, creation date, and modification date.
>> 
>> Technically it is possible to build a document transformer which would copy internal fields like those described into general metadata fields that could then be manipulated with the metadata adjuster.  Some connectors already supply such general metadata fields, but it is by no means a consistent practice.
>> 
>> Karl
>> 
>> 
>> On Wed, Apr 15, 2015 at 2:49 PM, Timo Selvaraj <timo.selvaraj@gmail.com <ma...@gmail.com>> wrote:
>> Hi,
>> 
>> I need to change the incoming meta data into a specified format.
>> 
>> I want to change 
>> 
>> "Content-Type":"text/html"
>> to
>> 
>> "contenttype":"HTML"
>> Has anyone done something similar with the metadata adjuster?
>> 
>> Thanks,
>> Timo
>> 
> 
> 


Re: Metadata Adjuster transformer

Posted by Karl Wright <da...@gmail.com>.
Hi Timo,

Yes, you can do that, but not with the current metadata adjuster.  It does
not allow you to access the core fields.

Karl


On Wed, Apr 15, 2015 at 3:16 PM, Timo Selvaraj <ti...@gmail.com>
wrote:

> Thanks Karl.
>
> Can I create a new meta field contenttype and add the value HTML based on
> the mime type value in the core field?
>
> Timo
>
> On Apr 15, 2015, at 3:13 PM, Karl Wright <da...@gmail.com> wrote:
>
> Hi Timo,
>
> The metadata adjuster currently does not give you access to the core
> document fields, only to the document's general metadata.  Basically,
> anything that ManifoldCF uses to make crawling decisions based upon is not
> accessible or modifiable by the adjuster, because it's not general metadata.
>
> That include the document's file name, content/mime type, length, creation
> date, and modification date.
>
> Technically it is possible to build a document transformer which would
> copy internal fields like those described into general metadata fields that
> could then be manipulated with the metadata adjuster.  Some connectors
> already supply such general metadata fields, but it is by no means a
> consistent practice.
>
> Karl
>
>
> On Wed, Apr 15, 2015 at 2:49 PM, Timo Selvaraj <ti...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I need to change the incoming meta data into a specified format.
>>
>> I want to change
>>
>> "Content-Type":"text/html"
>>
>> to
>>
>> "contenttype":"HTML"
>>
>> Has anyone done something similar with the metadata adjuster?
>>
>> Thanks,
>> Timo
>>
>
>
>

Re: Metadata Adjuster transformer

Posted by Timo Selvaraj <ti...@gmail.com>.
Thanks Karl.

Can I create a new meta field contenttype and add the value HTML based on the mime type value in the core field?

Timo

> On Apr 15, 2015, at 3:13 PM, Karl Wright <da...@gmail.com> wrote:
> 
> Hi Timo,
> 
> The metadata adjuster currently does not give you access to the core document fields, only to the document's general metadata.  Basically, anything that ManifoldCF uses to make crawling decisions based upon is not accessible or modifiable by the adjuster, because it's not general metadata.
> 
> That include the document's file name, content/mime type, length, creation date, and modification date.
> 
> Technically it is possible to build a document transformer which would copy internal fields like those described into general metadata fields that could then be manipulated with the metadata adjuster.  Some connectors already supply such general metadata fields, but it is by no means a consistent practice.
> 
> Karl
> 
> 
> On Wed, Apr 15, 2015 at 2:49 PM, Timo Selvaraj <timo.selvaraj@gmail.com <ma...@gmail.com>> wrote:
> Hi,
> 
> I need to change the incoming meta data into a specified format.
> 
> I want to change 
> 
> "Content-Type":"text/html"
> to
> 
> "contenttype":"HTML"
> Has anyone done something similar with the metadata adjuster?
> 
> Thanks,
> Timo
> 


Re: Metadata Adjuster transformer

Posted by Karl Wright <da...@gmail.com>.
Hi Timo,

The metadata adjuster currently does not give you access to the core
document fields, only to the document's general metadata.  Basically,
anything that ManifoldCF uses to make crawling decisions based upon is not
accessible or modifiable by the adjuster, because it's not general metadata.

That include the document's file name, content/mime type, length, creation
date, and modification date.

Technically it is possible to build a document transformer which would copy
internal fields like those described into general metadata fields that
could then be manipulated with the metadata adjuster.  Some connectors
already supply such general metadata fields, but it is by no means a
consistent practice.

Karl


On Wed, Apr 15, 2015 at 2:49 PM, Timo Selvaraj <ti...@gmail.com>
wrote:

> Hi,
>
> I need to change the incoming meta data into a specified format.
>
> I want to change
>
> "Content-Type":"text/html"
>
> to
>
> "contenttype":"HTML"
>
> Has anyone done something similar with the metadata adjuster?
>
> Thanks,
> Timo
>