You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Jeff Potts <je...@gmail.com> on 2015/08/27 21:47:08 UTC

Reading and posting plain text, rather than encoded files

I've spent a very short time playing with ManifoldCF. Cool project, thank
you for contributing it.

I can read binary files from a source repo like Alfresco 5.0.d and post
them to Elasticsearch 1.7.2 successfully.

Now I'm wondering if the rest of my use cases can be achieved with
ManifoldCF...

Use case 1: Read JSON from a file system, post to Elasticsearch as-is

When I tried to use the file system repository and the Elasticsearch
output, I noticed that the file is being encoded and stored in ES in the
_content property. What I'd rather do is have the file posted to ES as-is,
such as if the file is already a JSON document in the expected format for
my type mapping in ES. These files are 15k to 30k of nested object JSON.

Use case 2: Read JSON from Alfresco, post it to Elasticsearch along with
object metadata

In a slight twist on the first, I'd like to store JSON documents in a
repository, like Alfresco, and then read the metadata from the Alfresco
object and merge it with the JSON stored in the content and post that to
Elasticsearch as a JSON string, not as an encoded blob.

I didn't see anything covering these in the docs but I may have missed it.

Jeff

Re: Reading and posting plain text, rather than encoded files

Posted by Shinichiro Abe <sh...@gmail.com>.
Sorry. I made a mistake. CONNECTORS-1234 is not for posting JSON files ES expected, just for posting the content as string, not as an encoded blob.
Today ES connector is not detecting ES dedicated json files as well as Solr connector is not detecting Solr dedicated xml/json files.

Shinichiro Abe
> 
> 2015/08/28 5:29、Shinichiro Abe <sh...@gmail.com> のメール:
> 
> Hi,
> 
> I’m work in progress at https://issues.apache.org/jira/browse/CONNECTORS-1234
> 
> Regards,
> Shinichiro Abe
> 
>> 2015/08/28 4:47、Jeff Potts <je...@gmail.com> のメール:
>> 
>> I've spent a very short time playing with ManifoldCF. Cool project, thank you for contributing it.
>> 
>> I can read binary files from a source repo like Alfresco 5.0.d and post them to Elasticsearch 1.7.2 successfully.
>> 
>> Now I'm wondering if the rest of my use cases can be achieved with ManifoldCF...
>> 
>> Use case 1: Read JSON from a file system, post to Elasticsearch as-is
>> 
>> When I tried to use the file system repository and the Elasticsearch output, I noticed that the file is being encoded and stored in ES in the _content property. What I'd rather do is have the file posted to ES as-is, such as if the file is already a JSON document in the expected format for my type mapping in ES. These files are 15k to 30k of nested object JSON.
>> 
>> Use case 2: Read JSON from Alfresco, post it to Elasticsearch along with object metadata
>> 
>> In a slight twist on the first, I'd like to store JSON documents in a repository, like Alfresco, and then read the metadata from the Alfresco object and merge it with the JSON stored in the content and post that to Elasticsearch as a JSON string, not as an encoded blob.
>> 
>> I didn't see anything covering these in the docs but I may have missed it.
>> 
>> Jeff
> 


Re: Reading and posting plain text, rather than encoded files

Posted by Karl Wright <da...@gmail.com>.
"In the first case I want to post the plain text of a file to an out put
as-is with no wrappers added by manifold and no encoding."  You can send
plain-text content, but you should expect the output connector to treat
that as content, not as commands.  So it will "escape" it as needed to be
sure it is not interpreted as commands of some kind that would be target
specific.

Karl

On Thu, Aug 27, 2015 at 6:25 PM, Jeff Potts <je...@gmail.com> wrote:

> I mentioned those as examples only. I am not asking to do anything
> repository specific. In the first case I want to post the plain text of a
> file to an out put as-is with no wrappers added by manifold and no encoding.
>
> In the second case I want to merge some properties on a document with the
> existing JSON from the file content and then post that to the output as-is.
>
> Jeff
>
>
>
> On Aug 27, 2015, at 3:58 PM, Karl Wright <da...@gmail.com> wrote:
>
> Hi,
>
> ManifoldCF's connectors are general purpose; they are intended to work
> with *any* repository or output.  So in general, connectors in MCF cannot
> interpret or generate content that is Alfresco or ElasticSearch specific.
>
> You are welcome to convert these documents to ManifoldCF's means of
> managing documents, RepositoryDocument, in your repository connector, and
> then convert them back in your output connector.  Or, if you want to write
> specific proprietary connectors that communicate in a specific format of
> JSON, you can.  But do not expect ManifoldCF's suite of other connectors
> and transformers to work with this in any meaningful way.
>
> Karl
>
>
> On Thu, Aug 27, 2015 at 4:29 PM, Shinichiro Abe <
> shinichiro.abe.1@gmail.com> wrote:
>
>> Hi,
>>
>> I’m work in progress at
>> https://issues.apache.org/jira/browse/CONNECTORS-1234
>>
>> Regards,
>> Shinichiro Abe
>>
>> > 2015/08/28 4:47、Jeff Potts <je...@gmail.com> のメール:
>> >
>> > I've spent a very short time playing with ManifoldCF. Cool project,
>> thank you for contributing it.
>> >
>> > I can read binary files from a source repo like Alfresco 5.0.d and post
>> them to Elasticsearch 1.7.2 successfully.
>> >
>> > Now I'm wondering if the rest of my use cases can be achieved with
>> ManifoldCF...
>> >
>> > Use case 1: Read JSON from a file system, post to Elasticsearch as-is
>> >
>> > When I tried to use the file system repository and the Elasticsearch
>> output, I noticed that the file is being encoded and stored in ES in the
>> _content property. What I'd rather do is have the file posted to ES as-is,
>> such as if the file is already a JSON document in the expected format for
>> my type mapping in ES. These files are 15k to 30k of nested object JSON.
>> >
>> > Use case 2: Read JSON from Alfresco, post it to Elasticsearch along
>> with object metadata
>> >
>> > In a slight twist on the first, I'd like to store JSON documents in a
>> repository, like Alfresco, and then read the metadata from the Alfresco
>> object and merge it with the JSON stored in the content and post that to
>> Elasticsearch as a JSON string, not as an encoded blob.
>> >
>> > I didn't see anything covering these in the docs but I may have missed
>> it.
>> >
>> > Jeff
>>
>>
>

Re: Reading and posting plain text, rather than encoded files

Posted by Jeff Potts <je...@gmail.com>.
I mentioned those as examples only. I am not asking to do anything repository specific. In the first case I want to post the plain text of a file to an out put as-is with no wrappers added by manifold and no encoding.

In the second case I want to merge some properties on a document with the existing JSON from the file content and then post that to the output as-is.

Jeff



> On Aug 27, 2015, at 3:58 PM, Karl Wright <da...@gmail.com> wrote:
> 
> Hi,
> 
> ManifoldCF's connectors are general purpose; they are intended to work with *any* repository or output.  So in general, connectors in MCF cannot interpret or generate content that is Alfresco or ElasticSearch specific.
> 
> You are welcome to convert these documents to ManifoldCF's means of managing documents, RepositoryDocument, in your repository connector, and then convert them back in your output connector.  Or, if you want to write specific proprietary connectors that communicate in a specific format of JSON, you can.  But do not expect ManifoldCF's suite of other connectors and transformers to work with this in any meaningful way.
> 
> Karl
> 
> 
>> On Thu, Aug 27, 2015 at 4:29 PM, Shinichiro Abe <sh...@gmail.com> wrote:
>> Hi,
>> 
>> I’m work in progress at https://issues.apache.org/jira/browse/CONNECTORS-1234
>> 
>> Regards,
>> Shinichiro Abe
>> 
>> > 2015/08/28 4:47、Jeff Potts <je...@gmail.com> のメール:
>> >
>> > I've spent a very short time playing with ManifoldCF. Cool project, thank you for contributing it.
>> >
>> > I can read binary files from a source repo like Alfresco 5.0.d and post them to Elasticsearch 1.7.2 successfully.
>> >
>> > Now I'm wondering if the rest of my use cases can be achieved with ManifoldCF...
>> >
>> > Use case 1: Read JSON from a file system, post to Elasticsearch as-is
>> >
>> > When I tried to use the file system repository and the Elasticsearch output, I noticed that the file is being encoded and stored in ES in the _content property. What I'd rather do is have the file posted to ES as-is, such as if the file is already a JSON document in the expected format for my type mapping in ES. These files are 15k to 30k of nested object JSON.
>> >
>> > Use case 2: Read JSON from Alfresco, post it to Elasticsearch along with object metadata
>> >
>> > In a slight twist on the first, I'd like to store JSON documents in a repository, like Alfresco, and then read the metadata from the Alfresco object and merge it with the JSON stored in the content and post that to Elasticsearch as a JSON string, not as an encoded blob.
>> >
>> > I didn't see anything covering these in the docs but I may have missed it.
>> >
>> > Jeff
>> 
> 

Re: Reading and posting plain text, rather than encoded files

Posted by Karl Wright <da...@gmail.com>.
Hi,

ManifoldCF's connectors are general purpose; they are intended to work with
*any* repository or output.  So in general, connectors in MCF cannot
interpret or generate content that is Alfresco or ElasticSearch specific.

You are welcome to convert these documents to ManifoldCF's means of
managing documents, RepositoryDocument, in your repository connector, and
then convert them back in your output connector.  Or, if you want to write
specific proprietary connectors that communicate in a specific format of
JSON, you can.  But do not expect ManifoldCF's suite of other connectors
and transformers to work with this in any meaningful way.

Karl


On Thu, Aug 27, 2015 at 4:29 PM, Shinichiro Abe <sh...@gmail.com>
wrote:

> Hi,
>
> I’m work in progress at
> https://issues.apache.org/jira/browse/CONNECTORS-1234
>
> Regards,
> Shinichiro Abe
>
> > 2015/08/28 4:47、Jeff Potts <je...@gmail.com> のメール:
> >
> > I've spent a very short time playing with ManifoldCF. Cool project,
> thank you for contributing it.
> >
> > I can read binary files from a source repo like Alfresco 5.0.d and post
> them to Elasticsearch 1.7.2 successfully.
> >
> > Now I'm wondering if the rest of my use cases can be achieved with
> ManifoldCF...
> >
> > Use case 1: Read JSON from a file system, post to Elasticsearch as-is
> >
> > When I tried to use the file system repository and the Elasticsearch
> output, I noticed that the file is being encoded and stored in ES in the
> _content property. What I'd rather do is have the file posted to ES as-is,
> such as if the file is already a JSON document in the expected format for
> my type mapping in ES. These files are 15k to 30k of nested object JSON.
> >
> > Use case 2: Read JSON from Alfresco, post it to Elasticsearch along with
> object metadata
> >
> > In a slight twist on the first, I'd like to store JSON documents in a
> repository, like Alfresco, and then read the metadata from the Alfresco
> object and merge it with the JSON stored in the content and post that to
> Elasticsearch as a JSON string, not as an encoded blob.
> >
> > I didn't see anything covering these in the docs but I may have missed
> it.
> >
> > Jeff
>
>

Re: Reading and posting plain text, rather than encoded files

Posted by Shinichiro Abe <sh...@gmail.com>.
Hi,

I’m work in progress at https://issues.apache.org/jira/browse/CONNECTORS-1234

Regards,
Shinichiro Abe

> 2015/08/28 4:47、Jeff Potts <je...@gmail.com> のメール:
> 
> I've spent a very short time playing with ManifoldCF. Cool project, thank you for contributing it.
> 
> I can read binary files from a source repo like Alfresco 5.0.d and post them to Elasticsearch 1.7.2 successfully.
> 
> Now I'm wondering if the rest of my use cases can be achieved with ManifoldCF...
> 
> Use case 1: Read JSON from a file system, post to Elasticsearch as-is
> 
> When I tried to use the file system repository and the Elasticsearch output, I noticed that the file is being encoded and stored in ES in the _content property. What I'd rather do is have the file posted to ES as-is, such as if the file is already a JSON document in the expected format for my type mapping in ES. These files are 15k to 30k of nested object JSON.
> 
> Use case 2: Read JSON from Alfresco, post it to Elasticsearch along with object metadata
> 
> In a slight twist on the first, I'd like to store JSON documents in a repository, like Alfresco, and then read the metadata from the Alfresco object and merge it with the JSON stored in the content and post that to Elasticsearch as a JSON string, not as an encoded blob.
> 
> I didn't see anything covering these in the docs but I may have missed it.
> 
> Jeff