You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Francesco Fornasari <f....@tai.it> on 2015/07/31 15:02:37 UTC

Alfresco + ManifoldCF + ElasticSearch index mapping

Dear All,
we're newbie about ManifoldCF and ElasticSearch.

We configured successfully the ManifoldCF + ElasticSearch + Alfresco
5.0.1 architecture. But, we aren't able to get metadata from
ElasticSearch index.

In details:

1) we configured a cmis query in ManifoldCF "SELECT mcf:numeromittente,
mcf:pagine FROM mcf:fax" for the job crowler.
2) the job works right and founds documents.
3) we configured the elastic search index mapping,
http://manifoldcf-es.tainet:9200/index/generictype/_mapping :

{
  "generictype" : {
    "properties" : {
      "mcf:numeromittente" : {
        "type" : "string"
      },
      "mcf:pagine" : {
        "type" : "integer"
      }
    }
  }
}

4) the http://manifoldcf-es.tainet:9200/index/generictype/_mapping API
returns a JSON that includes all subset of properties

{"properties":{"_content":{"type":"string"},"_content_type":{"type":"string"},"_name":{"type":"string"}}},"mcf:numeromittente":{"type":"string"},"mcf:pagine":{"type":"integer"}}}}


5) we call the
http://manifoldcf-es.tainet:9200/index/generictype/_search API :

{
    "query": {
        "query_string": {
            "query": "v2.5.6_ReleaseNotes.txt",
            "fields": []
        }
    }
}

but the API returns a JSON that doesn't contain mcf property values:

"file" : {"_content_type" : "text\/plain","_name" :
"v2.5.6_ReleaseNotes.txt", "_content" : "...." }

So, could you explain us how we can include into the elasitc search
query result also mcf:numeromittente and mcf:pagine properties?

There is something wrong in your ManifoldCF configuration?

Regards,
Francesco.


	




Re: Alfresco + ManifoldCF + ElasticSearch index mapping

Posted by Karl Wright <da...@gmail.com>.
Hi all,

The CMIS connector has a limitation because you can specify a seeding query
but you cannot specify the query that obtains the document contents.
Unfortunately, CMIS creates that query internally.  Adding the metadata to
the seeding query therefore does not help in obtaining metadata.

Thanks,
Karl



On Fri, Jul 31, 2015 at 10:13 AM, Piergiorgio Lucidi <piergiorgio@apache.org
> wrote:

> Hi guys,
>
> I have checked in the code and I can confirm that on the CMIS side the
> connector is correctly considering all the document properties. So probably
> it should be a problem on the ElasticSearch side.
>
> I'm checking on the code now, I hope to have some news for you soon :-P
>
> Regards,
> Piergiorgio
>
> 2015-07-31 15:32 GMT+02:00 Delapasse, Deanna <dd...@oceaneering.com>:
>
>> My memory is a bit fuzzy, but I think this is accurate!
>>
>> Several months back I got a copy of the CMIS connector.  It worked, but
>> the query results were restricted (ie even though you gave "select
>> a,b,c,d..".   it ignored all the fields except "cmis:name &
>> cmis:objectId" so none of the attributes actually made it to ES).   Maybe
>> that has been fixed, but you should check!  Add some debugging into
>> CmisRepositoryConnector to confirm.
>>
>> Install a tool call Elasticsearch Head so you can browse and see exactly
>> what is being pushed into ES.  That was what helped me figure it out.
>>
>> Good luck!
>> Deanna
>>
>> p.s.  I also made a change to the cmis check for "changed" files.
>> Editing metadata doesn't always create a new version, so I modified it to
>> check for LastModifiedDate instead of comparing version #s.
>>
>>
>> On Fri, Jul 31, 2015 at 8:02 AM, Francesco Fornasari <f....@tai.it>
>> wrote:
>>
>>>
>>> Dear All,
>>> we're newbie about ManifoldCF and ElasticSearch.
>>>
>>> We configured successfully the ManifoldCF + ElasticSearch + Alfresco
>>> 5.0.1 architecture. But, we aren't able to get metadata from ElasticSearch
>>> index.
>>>
>>> In details:
>>>
>>> 1) we configured a cmis query in ManifoldCF "SELECT mcf:numeromittente,
>>> mcf:pagine FROM mcf:fax" for the job crowler.
>>> 2) the job works right and founds documents.
>>> 3) we configured the elastic search index mapping,
>>> http://manifoldcf-es.tainet:9200/index/generictype/_mapping :
>>>
>>> {
>>>   "generictype" : {
>>>     "properties" : {
>>>       "mcf:numeromittente" : {
>>>         "type" : "string"
>>>       },
>>>       "mcf:pagine" : {
>>>         "type" : "integer"
>>>       }
>>>     }
>>>   }
>>> }
>>>
>>> 4) the http://manifoldcf-es.tainet:9200/index/generictype/_mapping API
>>> returns a JSON that includes all subset of properties
>>>
>>> {"properties":{"_content":{"type":"string"},"_content_type":{"type":"string"},"_name":{"type":"string"}}},"mcf:numeromittente":{"type":"string"},"mcf:pagine":{"type":"integer"}}}}
>>>
>>>
>>> 5) we call the
>>> http://manifoldcf-es.tainet:9200/index/generictype/_search API :
>>>
>>> {
>>>     "query": {
>>>         "query_string": {
>>>             "query": "v2.5.6_ReleaseNotes.txt",
>>>             "fields": []
>>>         }
>>>     }
>>> }
>>>
>>> but the API returns a JSON that doesn't contain mcf property values:
>>>
>>> "file" : {"_content_type" : "text\/plain","_name" :
>>> "v2.5.6_ReleaseNotes.txt", "_content" : "...." }
>>>
>>> So, could you explain us how we can include into the elasitc search
>>> query result also mcf:numeromittente and mcf:pagine properties?
>>>
>>> There is something wrong in your ManifoldCF configuration?
>>>
>>> Regards,
>>> Francesco.
>>>
>>> --
>>> Piergiorgio Lucidi
>>> Open Source ECM Specialist
>>> http://www.open4dev.com
>>>
>>

Re: Alfresco + ManifoldCF + ElasticSearch index mapping

Posted by Piergiorgio Lucidi <pi...@apache.org>.
Hi guys,

I have checked in the code and I can confirm that on the CMIS side the
connector is correctly considering all the document properties. So probably
it should be a problem on the ElasticSearch side.

I'm checking on the code now, I hope to have some news for you soon :-P

Regards,
Piergiorgio

2015-07-31 15:32 GMT+02:00 Delapasse, Deanna <dd...@oceaneering.com>:

> My memory is a bit fuzzy, but I think this is accurate!
>
> Several months back I got a copy of the CMIS connector.  It worked, but
> the query results were restricted (ie even though you gave "select
> a,b,c,d..".   it ignored all the fields except "cmis:name &
> cmis:objectId" so none of the attributes actually made it to ES).   Maybe
> that has been fixed, but you should check!  Add some debugging into
> CmisRepositoryConnector to confirm.
>
> Install a tool call Elasticsearch Head so you can browse and see exactly
> what is being pushed into ES.  That was what helped me figure it out.
>
> Good luck!
> Deanna
>
> p.s.  I also made a change to the cmis check for "changed" files.  Editing
> metadata doesn't always create a new version, so I modified it to check for
> LastModifiedDate instead of comparing version #s.
>
>
> On Fri, Jul 31, 2015 at 8:02 AM, Francesco Fornasari <f....@tai.it>
> wrote:
>
>>
>> Dear All,
>> we're newbie about ManifoldCF and ElasticSearch.
>>
>> We configured successfully the ManifoldCF + ElasticSearch + Alfresco
>> 5.0.1 architecture. But, we aren't able to get metadata from ElasticSearch
>> index.
>>
>> In details:
>>
>> 1) we configured a cmis query in ManifoldCF "SELECT mcf:numeromittente,
>> mcf:pagine FROM mcf:fax" for the job crowler.
>> 2) the job works right and founds documents.
>> 3) we configured the elastic search index mapping,
>> http://manifoldcf-es.tainet:9200/index/generictype/_mapping :
>>
>> {
>>   "generictype" : {
>>     "properties" : {
>>       "mcf:numeromittente" : {
>>         "type" : "string"
>>       },
>>       "mcf:pagine" : {
>>         "type" : "integer"
>>       }
>>     }
>>   }
>> }
>>
>> 4) the http://manifoldcf-es.tainet:9200/index/generictype/_mapping API
>> returns a JSON that includes all subset of properties
>>
>> {"properties":{"_content":{"type":"string"},"_content_type":{"type":"string"},"_name":{"type":"string"}}},"mcf:numeromittente":{"type":"string"},"mcf:pagine":{"type":"integer"}}}}
>>
>>
>> 5) we call the http://manifoldcf-es.tainet:9200/index/generictype/_search
>> API :
>>
>> {
>>     "query": {
>>         "query_string": {
>>             "query": "v2.5.6_ReleaseNotes.txt",
>>             "fields": []
>>         }
>>     }
>> }
>>
>> but the API returns a JSON that doesn't contain mcf property values:
>>
>> "file" : {"_content_type" : "text\/plain","_name" :
>> "v2.5.6_ReleaseNotes.txt", "_content" : "...." }
>>
>> So, could you explain us how we can include into the elasitc search query
>> result also mcf:numeromittente and mcf:pagine properties?
>>
>> There is something wrong in your ManifoldCF configuration?
>>
>> Regards,
>> Francesco.
>>
>> --
>> Piergiorgio Lucidi
>> Open Source ECM Specialist
>> http://www.open4dev.com
>>
>

Re: Alfresco + ManifoldCF + ElasticSearch index mapping

Posted by "Delapasse, Deanna" <dd...@oceaneering.com>.
My memory is a bit fuzzy, but I think this is accurate!

Several months back I got a copy of the CMIS connector.  It worked, but the
query results were restricted (ie even though you gave "select a,b,c,d..".
  it ignored all the fields except "cmis:name & cmis:objectId" so none of
the attributes actually made it to ES).   Maybe that has been fixed, but
you should check!  Add some debugging into CmisRepositoryConnector to
confirm.

Install a tool call Elasticsearch Head so you can browse and see exactly
what is being pushed into ES.  That was what helped me figure it out.

Good luck!
Deanna

p.s.  I also made a change to the cmis check for "changed" files.  Editing
metadata doesn't always create a new version, so I modified it to check for
LastModifiedDate instead of comparing version #s.


On Fri, Jul 31, 2015 at 8:02 AM, Francesco Fornasari <f....@tai.it>
wrote:

>
> Dear All,
> we're newbie about ManifoldCF and ElasticSearch.
>
> We configured successfully the ManifoldCF + ElasticSearch + Alfresco 5.0.1
> architecture. But, we aren't able to get metadata from ElasticSearch index.
>
> In details:
>
> 1) we configured a cmis query in ManifoldCF "SELECT mcf:numeromittente,
> mcf:pagine FROM mcf:fax" for the job crowler.
> 2) the job works right and founds documents.
> 3) we configured the elastic search index mapping,
> http://manifoldcf-es.tainet:9200/index/generictype/_mapping :
>
> {
>   "generictype" : {
>     "properties" : {
>       "mcf:numeromittente" : {
>         "type" : "string"
>       },
>       "mcf:pagine" : {
>         "type" : "integer"
>       }
>     }
>   }
> }
>
> 4) the http://manifoldcf-es.tainet:9200/index/generictype/_mapping API
> returns a JSON that includes all subset of properties
>
> {"properties":{"_content":{"type":"string"},"_content_type":{"type":"string"},"_name":{"type":"string"}}},"mcf:numeromittente":{"type":"string"},"mcf:pagine":{"type":"integer"}}}}
>
>
> 5) we call the http://manifoldcf-es.tainet:9200/index/generictype/_search
> API :
>
> {
>     "query": {
>         "query_string": {
>             "query": "v2.5.6_ReleaseNotes.txt",
>             "fields": []
>         }
>     }
> }
>
> but the API returns a JSON that doesn't contain mcf property values:
>
> "file" : {"_content_type" : "text\/plain","_name" :
> "v2.5.6_ReleaseNotes.txt", "_content" : "...." }
>
> So, could you explain us how we can include into the elasitc search query
> result also mcf:numeromittente and mcf:pagine properties?
>
> There is something wrong in your ManifoldCF configuration?
>
> Regards,
> Francesco.
>
>
>
>
>
>