You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Otis Gospodnetic <ot...@gmail.com> on 2013/11/11 01:29:54 UTC

Questions about Morphline Solr Sink structure

Hello,

Warning: I've got a Flume NG and Morphlines newbie status

I was looking at Morphline Solr Sink to see how one could write an
equivalent Morphline Elasticsearch Sink, but after looking at the
code, I'm a bit confused.  Here are my Qs:

1)  interface MorphlineHandler mentions Solr in N places, but it
doesn't seem to be Solr-specific.  Couldn't one reuse this interface
for a Morphline ES Sink?

2) In general, couldn't/shouldn't a few classes from
org.apache.flume.sink.solr.morphline package really not outside
anything solr-specific? e.g.  org.apache.flume.sink.morphline for
those that are Morphline-specific?

3) Similarly, BlobDeserializer and BlobHandler don't seem to be even
Morphline-specific.  Shouldn't they be elsewhere?

4) I was expecting to see SolrJ (Solr Java client library) being used
in MorphlineHandlerImpl or MorphlineSolrSink to send events to Solr,
but there is no trace of SolrJ there.  How exactly does this load
Flume events into Solr then?
Ooooh, is that because when using this sink one is supposed to provide
a Morphline config and this config has a hard-coded loadSolr()
command?

5) Would it make sense to refactor any of the current Morphline Solr
Sink code to make it easier to add things Morphline Elasticsearch
Sink?  If so, any guidance you could provide would be very helpful.

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

Re: Questions about Morphline Solr Sink structure

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

But is it really about unpleasant names or about incorrect/misleading names?
If I were a developer and came across a class named MorphlineSolrSink
I'd never think I could use it to pump events into anything other than
Solr.  And if I read the docs that said otherwise I'd be confused and
would likely come to the mailing list for help (like I did, I guess).

Why can't the existing code be @deprecated?  Wouldn't that solve
backwards compatibility issue over the next 2 releases?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Nov 12, 2013 at 12:30 AM, Wolfgang Hoschek
<wh...@cloudera.com> wrote:
> Breaking backwards compat isn't an option for enterprise customers, especially if the only gain is making a bunch of names a little more pleasant.
>
> Wolfgang.
>
> On Nov 11, 2013, at 8:56 PM, Otis Gospodnetic wrote:
>
>> Hi,
>>
>> Hm, I don't get something here.  The class name is misleading/wrong,
>> no?  Why not go through the usual deprecation steps to avoid breaking
>> anything during the next release and then remove the
>> misnamed/misplaced classes completely?
>>
>> Also, I don't know enough about this code to understand fully why any
>> code here would need to ship without (unit) tests...
>>
>> While people could use MorphlineSolrSink even if they are not using it
>> with Solr, wouldn't that be a little.... messy? :)
>>
>> Thanks,
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> On Mon, Nov 11, 2013 at 8:25 PM, Roshan Naik <ro...@hortonworks.com> wrote:
>>> imho...would be nice if the code changes were done... but renaming it in
>>> the user guide (without changing FQCNs) can be done regardless. and perhaps
>>> more impt from a user perspective..
>>>
>>>
>>> On Mon, Nov 11, 2013 at 4:04 PM, Wolfgang Hoschek <wh...@cloudera.com>wrote:
>>>
>>>> Yep, the names are a bit misleading now that so much has been generalized,
>>>> but whatever we do, breaking backwards compat isn't an option. Shipping a
>>>> sink without tests doesn't seem compelling to me either.
>>>>
>>>> Taste in names aside, as far as I can see you could use this sink for ES
>>>> today without any issues.
>>>>
>>>> Wolfgang.
>>>>
>>>> On Nov 11, 2013, at 4:00 PM, Hari Shreedharan wrote:
>>>>
>>>>> Hi Otis,
>>>>>
>>>>> I don’t mind doing any of that - but the problem is that such a change
>>>> could impact backward compatibility - so we’d need to keep the stubs around
>>>> even though the actual functionality might be elsewhere.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Hari
>>>>>
>>>>>
>>>>> On Monday, November 11, 2013 at 3:54 PM, Otis Gospodnetic wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Thanks for the info, everyone.
>>>>>> Yes, I noticed after my email that Blob* classes were in the process
>>>>>> of being moved.
>>>>>> Here is what I feel should really be done:
>>>>>>
>>>>>> * get rid of ....solr.morphline package and move the code to
>>>>>> ...morphpline package
>>>>>> * get rid of any Solr-specific code (I guess just in the tests
>>>>>> Wolfgang mentioned)
>>>>>> * rename the sink to MorphlineSink
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Re loadElasticSearch() - yes, I see Wolfgang saw I opened an issue for
>>>>>> that in CDK.
>>>>>>
>>>>>> Thanks,
>>>>>> Otis
>>>>>> --
>>>>>> Performance Monitoring * Log Analytics * Search Analytics
>>>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 11, 2013 at 4:34 PM, Roshan Naik <roshan@hortonworks.com(mailto:
>>>> roshan@hortonworks.com)> wrote:
>>>>>>> We should consider rename the Morphline Solr Sink to Morphline sink in
>>>> the
>>>>>>> docs to avoid any possibility of misleading end users.
>>>>>>>
>>>>>>> --
>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to
>>>>>>> which it is addressed and may contain information that is confidential,
>>>>>>> privileged and exempt from disclosure under applicable law. If the
>>>> reader
>>>>>>> of this message is not the intended recipient, you are hereby notified
>>>> that
>>>>>>> any printing, copying, dissemination, distribution, disclosure or
>>>>>>> forwarding of this communication is strictly prohibited. If you have
>>>>>>> received this communication in error, please contact the sender
>>>> immediately
>>>>>>> and delete it from your system. Thank You.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity to
>>> which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>

Re: Questions about Morphline Solr Sink structure

Posted by Wolfgang Hoschek <wh...@cloudera.com>.
Breaking backwards compat isn't an option for enterprise customers, especially if the only gain is making a bunch of names a little more pleasant.

Wolfgang.

On Nov 11, 2013, at 8:56 PM, Otis Gospodnetic wrote:

> Hi,
> 
> Hm, I don't get something here.  The class name is misleading/wrong,
> no?  Why not go through the usual deprecation steps to avoid breaking
> anything during the next release and then remove the
> misnamed/misplaced classes completely?
> 
> Also, I don't know enough about this code to understand fully why any
> code here would need to ship without (unit) tests...
> 
> While people could use MorphlineSolrSink even if they are not using it
> with Solr, wouldn't that be a little.... messy? :)
> 
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 
> On Mon, Nov 11, 2013 at 8:25 PM, Roshan Naik <ro...@hortonworks.com> wrote:
>> imho...would be nice if the code changes were done... but renaming it in
>> the user guide (without changing FQCNs) can be done regardless. and perhaps
>> more impt from a user perspective..
>> 
>> 
>> On Mon, Nov 11, 2013 at 4:04 PM, Wolfgang Hoschek <wh...@cloudera.com>wrote:
>> 
>>> Yep, the names are a bit misleading now that so much has been generalized,
>>> but whatever we do, breaking backwards compat isn't an option. Shipping a
>>> sink without tests doesn't seem compelling to me either.
>>> 
>>> Taste in names aside, as far as I can see you could use this sink for ES
>>> today without any issues.
>>> 
>>> Wolfgang.
>>> 
>>> On Nov 11, 2013, at 4:00 PM, Hari Shreedharan wrote:
>>> 
>>>> Hi Otis,
>>>> 
>>>> I don’t mind doing any of that - but the problem is that such a change
>>> could impact backward compatibility - so we’d need to keep the stubs around
>>> even though the actual functionality might be elsewhere.
>>>> 
>>>> 
>>>> Thanks,
>>>> Hari
>>>> 
>>>> 
>>>> On Monday, November 11, 2013 at 3:54 PM, Otis Gospodnetic wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Thanks for the info, everyone.
>>>>> Yes, I noticed after my email that Blob* classes were in the process
>>>>> of being moved.
>>>>> Here is what I feel should really be done:
>>>>> 
>>>>> * get rid of ....solr.morphline package and move the code to
>>>>> ...morphpline package
>>>>> * get rid of any Solr-specific code (I guess just in the tests
>>>>> Wolfgang mentioned)
>>>>> * rename the sink to MorphlineSink
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> Re loadElasticSearch() - yes, I see Wolfgang saw I opened an issue for
>>>>> that in CDK.
>>>>> 
>>>>> Thanks,
>>>>> Otis
>>>>> --
>>>>> Performance Monitoring * Log Analytics * Search Analytics
>>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>>> 
>>>>> 
>>>>> On Mon, Nov 11, 2013 at 4:34 PM, Roshan Naik <roshan@hortonworks.com(mailto:
>>> roshan@hortonworks.com)> wrote:
>>>>>> We should consider rename the Morphline Solr Sink to Morphline sink in
>>> the
>>>>>> docs to avoid any possibility of misleading end users.
>>>>>> 
>>>>>> --
>>>>>> CONFIDENTIALITY NOTICE
>>>>>> NOTICE: This message is intended for the use of the individual or
>>> entity to
>>>>>> which it is addressed and may contain information that is confidential,
>>>>>> privileged and exempt from disclosure under applicable law. If the
>>> reader
>>>>>> of this message is not the intended recipient, you are hereby notified
>>> that
>>>>>> any printing, copying, dissemination, distribution, disclosure or
>>>>>> forwarding of this communication is strictly prohibited. If you have
>>>>>> received this communication in error, please contact the sender
>>> immediately
>>>>>> and delete it from your system. Thank You.
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.


Re: Questions about Morphline Solr Sink structure

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

Hm, I don't get something here.  The class name is misleading/wrong,
no?  Why not go through the usual deprecation steps to avoid breaking
anything during the next release and then remove the
misnamed/misplaced classes completely?

Also, I don't know enough about this code to understand fully why any
code here would need to ship without (unit) tests...

While people could use MorphlineSolrSink even if they are not using it
with Solr, wouldn't that be a little.... messy? :)

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Nov 11, 2013 at 8:25 PM, Roshan Naik <ro...@hortonworks.com> wrote:
> imho...would be nice if the code changes were done... but renaming it in
> the user guide (without changing FQCNs) can be done regardless. and perhaps
> more impt from a user perspective..
>
>
> On Mon, Nov 11, 2013 at 4:04 PM, Wolfgang Hoschek <wh...@cloudera.com>wrote:
>
>> Yep, the names are a bit misleading now that so much has been generalized,
>> but whatever we do, breaking backwards compat isn't an option. Shipping a
>> sink without tests doesn't seem compelling to me either.
>>
>> Taste in names aside, as far as I can see you could use this sink for ES
>> today without any issues.
>>
>> Wolfgang.
>>
>> On Nov 11, 2013, at 4:00 PM, Hari Shreedharan wrote:
>>
>> > Hi Otis,
>> >
>> > I don’t mind doing any of that - but the problem is that such a change
>> could impact backward compatibility - so we’d need to keep the stubs around
>> even though the actual functionality might be elsewhere.
>> >
>> >
>> > Thanks,
>> > Hari
>> >
>> >
>> > On Monday, November 11, 2013 at 3:54 PM, Otis Gospodnetic wrote:
>> >
>> >> Hi,
>> >>
>> >> Thanks for the info, everyone.
>> >> Yes, I noticed after my email that Blob* classes were in the process
>> >> of being moved.
>> >> Here is what I feel should really be done:
>> >>
>> >> * get rid of ....solr.morphline package and move the code to
>> >> ...morphpline package
>> >> * get rid of any Solr-specific code (I guess just in the tests
>> >> Wolfgang mentioned)
>> >> * rename the sink to MorphlineSink
>> >>
>> >> Thoughts?
>> >>
>> >> Re loadElasticSearch() - yes, I see Wolfgang saw I opened an issue for
>> >> that in CDK.
>> >>
>> >> Thanks,
>> >> Otis
>> >> --
>> >> Performance Monitoring * Log Analytics * Search Analytics
>> >> Solr & Elasticsearch Support * http://sematext.com/
>> >>
>> >>
>> >> On Mon, Nov 11, 2013 at 4:34 PM, Roshan Naik <roshan@hortonworks.com(mailto:
>> roshan@hortonworks.com)> wrote:
>> >>> We should consider rename the Morphline Solr Sink to Morphline sink in
>> the
>> >>> docs to avoid any possibility of misleading end users.
>> >>>
>> >>> --
>> >>> CONFIDENTIALITY NOTICE
>> >>> NOTICE: This message is intended for the use of the individual or
>> entity to
>> >>> which it is addressed and may contain information that is confidential,
>> >>> privileged and exempt from disclosure under applicable law. If the
>> reader
>> >>> of this message is not the intended recipient, you are hereby notified
>> that
>> >>> any printing, copying, dissemination, distribution, disclosure or
>> >>> forwarding of this communication is strictly prohibited. If you have
>> >>> received this communication in error, please contact the sender
>> immediately
>> >>> and delete it from your system. Thank You.
>> >>>
>> >>
>> >>
>> >>
>> >
>> >
>>
>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Questions about Morphline Solr Sink structure

Posted by Wolfgang Hoschek <wh...@cloudera.com>.
Yep, renaming it in the user guide (without changing FQCNs) seems like a really good idea.

Wolfgang.

On Nov 11, 2013, at 5:25 PM, Roshan Naik wrote:

> imho...would be nice if the code changes were done... but renaming it in
> the user guide (without changing FQCNs) can be done regardless. and perhaps
> more impt from a user perspective..
> 
> 
> On Mon, Nov 11, 2013 at 4:04 PM, Wolfgang Hoschek <wh...@cloudera.com>wrote:
> 
>> Yep, the names are a bit misleading now that so much has been generalized,
>> but whatever we do, breaking backwards compat isn't an option. Shipping a
>> sink without tests doesn't seem compelling to me either.
>> 
>> Taste in names aside, as far as I can see you could use this sink for ES
>> today without any issues.
>> 
>> Wolfgang.
>> 
>> On Nov 11, 2013, at 4:00 PM, Hari Shreedharan wrote:
>> 
>>> Hi Otis,
>>> 
>>> I don’t mind doing any of that - but the problem is that such a change
>> could impact backward compatibility - so we’d need to keep the stubs around
>> even though the actual functionality might be elsewhere.
>>> 
>>> 
>>> Thanks,
>>> Hari
>>> 
>>> 
>>> On Monday, November 11, 2013 at 3:54 PM, Otis Gospodnetic wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Thanks for the info, everyone.
>>>> Yes, I noticed after my email that Blob* classes were in the process
>>>> of being moved.
>>>> Here is what I feel should really be done:
>>>> 
>>>> * get rid of ....solr.morphline package and move the code to
>>>> ...morphpline package
>>>> * get rid of any Solr-specific code (I guess just in the tests
>>>> Wolfgang mentioned)
>>>> * rename the sink to MorphlineSink
>>>> 
>>>> Thoughts?
>>>> 
>>>> Re loadElasticSearch() - yes, I see Wolfgang saw I opened an issue for
>>>> that in CDK.
>>>> 
>>>> Thanks,
>>>> Otis
>>>> --
>>>> Performance Monitoring * Log Analytics * Search Analytics
>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>> 
>>>> 
>>>> On Mon, Nov 11, 2013 at 4:34 PM, Roshan Naik <roshan@hortonworks.com(mailto:
>> roshan@hortonworks.com)> wrote:
>>>>> We should consider rename the Morphline Solr Sink to Morphline sink in
>> the
>>>>> docs to avoid any possibility of misleading end users.
>>>>> 
>>>>> --
>>>>> CONFIDENTIALITY NOTICE
>>>>> NOTICE: This message is intended for the use of the individual or
>> entity to
>>>>> which it is addressed and may contain information that is confidential,
>>>>> privileged and exempt from disclosure under applicable law. If the
>> reader
>>>>> of this message is not the intended recipient, you are hereby notified
>> that
>>>>> any printing, copying, dissemination, distribution, disclosure or
>>>>> forwarding of this communication is strictly prohibited. If you have
>>>>> received this communication in error, please contact the sender
>> immediately
>>>>> and delete it from your system. Thank You.
>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.


Re: Questions about Morphline Solr Sink structure

Posted by Roshan Naik <ro...@hortonworks.com>.
imho...would be nice if the code changes were done... but renaming it in
the user guide (without changing FQCNs) can be done regardless. and perhaps
more impt from a user perspective..


On Mon, Nov 11, 2013 at 4:04 PM, Wolfgang Hoschek <wh...@cloudera.com>wrote:

> Yep, the names are a bit misleading now that so much has been generalized,
> but whatever we do, breaking backwards compat isn't an option. Shipping a
> sink without tests doesn't seem compelling to me either.
>
> Taste in names aside, as far as I can see you could use this sink for ES
> today without any issues.
>
> Wolfgang.
>
> On Nov 11, 2013, at 4:00 PM, Hari Shreedharan wrote:
>
> > Hi Otis,
> >
> > I don’t mind doing any of that - but the problem is that such a change
> could impact backward compatibility - so we’d need to keep the stubs around
> even though the actual functionality might be elsewhere.
> >
> >
> > Thanks,
> > Hari
> >
> >
> > On Monday, November 11, 2013 at 3:54 PM, Otis Gospodnetic wrote:
> >
> >> Hi,
> >>
> >> Thanks for the info, everyone.
> >> Yes, I noticed after my email that Blob* classes were in the process
> >> of being moved.
> >> Here is what I feel should really be done:
> >>
> >> * get rid of ....solr.morphline package and move the code to
> >> ...morphpline package
> >> * get rid of any Solr-specific code (I guess just in the tests
> >> Wolfgang mentioned)
> >> * rename the sink to MorphlineSink
> >>
> >> Thoughts?
> >>
> >> Re loadElasticSearch() - yes, I see Wolfgang saw I opened an issue for
> >> that in CDK.
> >>
> >> Thanks,
> >> Otis
> >> --
> >> Performance Monitoring * Log Analytics * Search Analytics
> >> Solr & Elasticsearch Support * http://sematext.com/
> >>
> >>
> >> On Mon, Nov 11, 2013 at 4:34 PM, Roshan Naik <roshan@hortonworks.com(mailto:
> roshan@hortonworks.com)> wrote:
> >>> We should consider rename the Morphline Solr Sink to Morphline sink in
> the
> >>> docs to avoid any possibility of misleading end users.
> >>>
> >>> --
> >>> CONFIDENTIALITY NOTICE
> >>> NOTICE: This message is intended for the use of the individual or
> entity to
> >>> which it is addressed and may contain information that is confidential,
> >>> privileged and exempt from disclosure under applicable law. If the
> reader
> >>> of this message is not the intended recipient, you are hereby notified
> that
> >>> any printing, copying, dissemination, distribution, disclosure or
> >>> forwarding of this communication is strictly prohibited. If you have
> >>> received this communication in error, please contact the sender
> immediately
> >>> and delete it from your system. Thank You.
> >>>
> >>
> >>
> >>
> >
> >
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Questions about Morphline Solr Sink structure

Posted by Wolfgang Hoschek <wh...@cloudera.com>.
Yep, the names are a bit misleading now that so much has been generalized, but whatever we do, breaking backwards compat isn't an option. Shipping a sink without tests doesn't seem compelling to me either. 

Taste in names aside, as far as I can see you could use this sink for ES today without any issues.

Wolfgang.

On Nov 11, 2013, at 4:00 PM, Hari Shreedharan wrote:

> Hi Otis,  
> 
> I don’t mind doing any of that - but the problem is that such a change could impact backward compatibility - so we’d need to keep the stubs around even though the actual functionality might be elsewhere.   
> 
> 
> Thanks,
> Hari
> 
> 
> On Monday, November 11, 2013 at 3:54 PM, Otis Gospodnetic wrote:
> 
>> Hi,
>> 
>> Thanks for the info, everyone.
>> Yes, I noticed after my email that Blob* classes were in the process
>> of being moved.
>> Here is what I feel should really be done:
>> 
>> * get rid of ....solr.morphline package and move the code to
>> ...morphpline package
>> * get rid of any Solr-specific code (I guess just in the tests
>> Wolfgang mentioned)
>> * rename the sink to MorphlineSink
>> 
>> Thoughts?
>> 
>> Re loadElasticSearch() - yes, I see Wolfgang saw I opened an issue for
>> that in CDK.
>> 
>> Thanks,
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>> 
>> 
>> On Mon, Nov 11, 2013 at 4:34 PM, Roshan Naik <roshan@hortonworks.com (mailto:roshan@hortonworks.com)> wrote:
>>> We should consider rename the Morphline Solr Sink to Morphline sink in the
>>> docs to avoid any possibility of misleading end users.
>>> 
>>> --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity to
>>> which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>> 
>> 
>> 
>> 
> 
> 


Re: Questions about Morphline Solr Sink structure

Posted by Hari Shreedharan <hs...@cloudera.com>.
Hi Otis,  

I don’t mind doing any of that - but the problem is that such a change could impact backward compatibility - so we’d need to keep the stubs around even though the actual functionality might be elsewhere.   


Thanks,
Hari


On Monday, November 11, 2013 at 3:54 PM, Otis Gospodnetic wrote:

> Hi,
>  
> Thanks for the info, everyone.
> Yes, I noticed after my email that Blob* classes were in the process
> of being moved.
> Here is what I feel should really be done:
>  
> * get rid of ....solr.morphline package and move the code to
> ...morphpline package
> * get rid of any Solr-specific code (I guess just in the tests
> Wolfgang mentioned)
> * rename the sink to MorphlineSink
>  
> Thoughts?
>  
> Re loadElasticSearch() - yes, I see Wolfgang saw I opened an issue for
> that in CDK.
>  
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>  
>  
> On Mon, Nov 11, 2013 at 4:34 PM, Roshan Naik <roshan@hortonworks.com (mailto:roshan@hortonworks.com)> wrote:
> > We should consider rename the Morphline Solr Sink to Morphline sink in the
> > docs to avoid any possibility of misleading end users.
> >  
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender immediately
> > and delete it from your system. Thank You.
> >  
>  
>  
>  



Re: Questions about Morphline Solr Sink structure

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

Thanks for the info, everyone.
Yes, I noticed after my email that Blob* classes were in the process
of being moved.
Here is what I feel should really be done:

* get rid of ....solr.morphline package and move the code to
...morphpline package
* get rid of any Solr-specific code (I guess just in the tests
Wolfgang mentioned)
* rename the sink to MorphlineSink

Thoughts?

Re loadElasticSearch() - yes, I see Wolfgang saw I opened an issue for
that in CDK.

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Nov 11, 2013 at 4:34 PM, Roshan Naik <ro...@hortonworks.com> wrote:
> We should consider  rename the Morphline Solr Sink to Morphline sink in the
> docs to avoid any possibility of misleading end users.
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Questions about Morphline Solr Sink structure

Posted by Roshan Naik <ro...@hortonworks.com>.
We should consider  rename the Morphline Solr Sink to Morphline sink in the
docs to avoid any possibility of misleading end users.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Questions about Morphline Solr Sink structure

Posted by Hari Shreedharan <hs...@cloudera.com>.
Just adding to what Wolfgang said, the BlobDeserializer and BlobHandler are going be moved to a flume-core soon (https://issues.apache.org/jira/browse/FLUME-2226 and https://issues.apache.org/jira/browse/FLUME-2227).


Thanks,
Hari


On Monday, November 11, 2013 at 11:54 AM, Wolfgang Hoschek wrote:

> Hi Otis,
> 
> You bring up a lot of very good points here, indeed. I'll try to answer as best as I can...
> 
> In the early days this Flume Sink started out as being very Solr specific. Over time I have made it more generic and reduced the dependency on Solr more and more, and at this point, there is in fact no dependency on Solr in the code left anymore (except in some tests that straddle the boundary between unit tests and integration tests). So in effect it wouldn't be technically wrong to refer to this as a Morphline Sink. The name is just a reflection of an evolutionary journey through history, and for retaining backwards compat.
> 
> You could easily use this sink to extract, transform and load data into ES (or any other app or database or storage system) without pulling in any Solr related jar. To do so you'd write a loadElasticSearch morphline command in a separate morphline maven module, and use that command instead of the loadSolr command in your morphline config files. The new loadElasticSearch command would convert a morphline record to a data structure appropriate for ES, e.g. ES JSON/Smile, and send that to ES. That's all there is to it, really.
> 
> A morphline record is essentially a hash table where the keys are strings and the values are a list of arbitrary Java objects. Those Java objects are typically Strings and Integers, but they can also be InputStreams or byte[] BLOBs, Avro objects, etc. This data model corresponds exactly to the features of the Lucene data model. It can also be seen as a superset of the Flume event data model - the Flume body is a byte[] value in the morphline _attachment_body field. The data model also maps well to the relational model. It also can be used for hierarchical data considering that the values in a morphline record field can be Avro, JSON, XML, protobufs, or any other custom complex data structure.
> 
> Wolfgang.
> 
> On Nov 10, 2013, at 4:42 PM, Otis Gospodnetic wrote:
> 
> > Hello,
> > 
> > One more "proactive" question.
> > 
> > Isn't all code under the .... solr/morphline package not really about
> > Morphline *Solr* Sink, but really more about *Morphline* Sink?
> > In other words, if where Morphline actually outputs is dictated by the
> > Morphline command in Morphline config (e.g. loadSolr()), then as far
> > as Flume is concerned, isn't that really just *Morphline* Sink?
> > 
> > For example, if I wanted to get Flume to pass events through Morphline
> > and have Morphline output to Elasticsearch, I wouldn't really want to
> > add a while new Elasticsearch Morphline Sink. I should really just be
> > able to use the existing (misnamed?) Morphline Solr Sink and just
> > point it to a Morphline config that has laodElasticsearch() instead of
> > loadSolr().
> > 
> > (please ignore the fact Morphline doesn't actually have
> > loadElasticsearch() yet - I think this is a Morphline issue, not a
> > Flume issue)
> > 
> > Is the above correct?
> > 
> > Thanks,
> > Otis
> > --
> > Performance Monitoring * Log Analytics * Search Analytics
> > Solr & Elasticsearch Support * http://sematext.com/
> > 
> > 
> > On Sun, Nov 10, 2013 at 7:29 PM, Otis Gospodnetic
> > <otis.gospodnetic@gmail.com (mailto:otis.gospodnetic@gmail.com)> wrote:
> > > Hello,
> > > 
> > > Warning: I've got a Flume NG and Morphlines newbie status
> > > 
> > > I was looking at Morphline Solr Sink to see how one could write an
> > > equivalent Morphline Elasticsearch Sink, but after looking at the
> > > code, I'm a bit confused. Here are my Qs:
> > > 
> > > 1) interface MorphlineHandler mentions Solr in N places, but it
> > > doesn't seem to be Solr-specific. Couldn't one reuse this interface
> > > for a Morphline ES Sink?
> > > 
> > > 2) In general, couldn't/shouldn't a few classes from
> > > org.apache.flume.sink.solr.morphline package really not outside
> > > anything solr-specific? e.g. org.apache.flume.sink.morphline for
> > > those that are Morphline-specific?
> > > 
> > > 3) Similarly, BlobDeserializer and BlobHandler don't seem to be even
> > > Morphline-specific. Shouldn't they be elsewhere?
> > > 
> > > 4) I was expecting to see SolrJ (Solr Java client library) being used
> > > in MorphlineHandlerImpl or MorphlineSolrSink to send events to Solr,
> > > but there is no trace of SolrJ there. How exactly does this load
> > > Flume events into Solr then?
> > > Ooooh, is that because when using this sink one is supposed to provide
> > > a Morphline config and this config has a hard-coded loadSolr()
> > > command?
> > > 
> > > 5) Would it make sense to refactor any of the current Morphline Solr
> > > Sink code to make it easier to add things Morphline Elasticsearch
> > > Sink? If so, any guidance you could provide would be very helpful.
> > > 
> > > Thanks,
> > > Otis
> > > --
> > > Performance Monitoring * Log Analytics * Search Analytics
> > > Solr & Elasticsearch Support * http://sematext.com/
> > > 
> > 
> > 
> 
> 
> 



Re: Questions about Morphline Solr Sink structure

Posted by Wolfgang Hoschek <wh...@cloudera.com>.
Hi Otis,

You bring up a lot of very good points here, indeed. I'll try to answer as best as I can...

In the early days this Flume Sink started out as being very Solr specific. Over time I have made it more generic and reduced the dependency on Solr more and more, and at this point, there is in fact no dependency on Solr in the code left anymore (except in some tests that straddle the boundary between unit tests and integration tests). So in effect it wouldn't be technically wrong to refer to this as a Morphline Sink. The name is just a reflection of an evolutionary journey through history, and for retaining backwards compat.

You could easily use this sink to extract, transform and load data into ES (or any other app or database or storage system) without pulling in any Solr related jar. To do so you'd write a loadElasticSearch morphline command in a separate morphline maven module, and use that command instead of the loadSolr command in your morphline config files. The new loadElasticSearch command would convert a morphline record to a data structure appropriate for ES, e.g. ES JSON/Smile, and send that to ES. That's all there is to it, really.

A morphline record is essentially a hash table where the keys are strings and the values are a list of arbitrary Java objects. Those Java objects are typically Strings and Integers, but they can also be InputStreams or byte[] BLOBs, Avro objects, etc. This data model corresponds exactly to the features of the Lucene data model. It can also be seen as a superset of the Flume event data model - the Flume body is a byte[] value in the morphline _attachment_body field. The data model also maps well to the relational model. It also can be used for hierarchical data considering that the values in a morphline record field can be Avro, JSON, XML, protobufs, or any other custom complex data structure.

Wolfgang.

On Nov 10, 2013, at 4:42 PM, Otis Gospodnetic wrote:

> Hello,
> 
> One more "proactive" question.
> 
> Isn't all code under the .... solr/morphline package not really about
> Morphline *Solr* Sink, but really more about *Morphline* Sink?
> In other words, if where Morphline actually outputs is dictated by the
> Morphline command in Morphline config (e.g. loadSolr()), then as far
> as Flume is concerned, isn't that really just *Morphline* Sink?
> 
> For example, if I wanted to get Flume to pass events through Morphline
> and have Morphline output to Elasticsearch, I wouldn't really want to
> add a while new Elasticsearch Morphline Sink.  I should really just be
> able to use the existing (misnamed?) Morphline Solr Sink and just
> point it to a Morphline config that has laodElasticsearch() instead of
> loadSolr().
> 
> (please ignore the fact Morphline doesn't actually have
> loadElasticsearch() yet - I think this is a Morphline issue, not a
> Flume issue)
> 
> Is the above correct?
> 
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 
> On Sun, Nov 10, 2013 at 7:29 PM, Otis Gospodnetic
> <ot...@gmail.com> wrote:
>> Hello,
>> 
>> Warning: I've got a Flume NG and Morphlines newbie status
>> 
>> I was looking at Morphline Solr Sink to see how one could write an
>> equivalent Morphline Elasticsearch Sink, but after looking at the
>> code, I'm a bit confused.  Here are my Qs:
>> 
>> 1)  interface MorphlineHandler mentions Solr in N places, but it
>> doesn't seem to be Solr-specific.  Couldn't one reuse this interface
>> for a Morphline ES Sink?
>> 
>> 2) In general, couldn't/shouldn't a few classes from
>> org.apache.flume.sink.solr.morphline package really not outside
>> anything solr-specific? e.g.  org.apache.flume.sink.morphline for
>> those that are Morphline-specific?
>> 
>> 3) Similarly, BlobDeserializer and BlobHandler don't seem to be even
>> Morphline-specific.  Shouldn't they be elsewhere?
>> 
>> 4) I was expecting to see SolrJ (Solr Java client library) being used
>> in MorphlineHandlerImpl or MorphlineSolrSink to send events to Solr,
>> but there is no trace of SolrJ there.  How exactly does this load
>> Flume events into Solr then?
>> Ooooh, is that because when using this sink one is supposed to provide
>> a Morphline config and this config has a hard-coded loadSolr()
>> command?
>> 
>> 5) Would it make sense to refactor any of the current Morphline Solr
>> Sink code to make it easier to add things Morphline Elasticsearch
>> Sink?  If so, any guidance you could provide would be very helpful.
>> 
>> Thanks,
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/


Re: Questions about Morphline Solr Sink structure

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hello,

One more "proactive" question.

Isn't all code under the .... solr/morphline package not really about
Morphline *Solr* Sink, but really more about *Morphline* Sink?
In other words, if where Morphline actually outputs is dictated by the
Morphline command in Morphline config (e.g. loadSolr()), then as far
as Flume is concerned, isn't that really just *Morphline* Sink?

For example, if I wanted to get Flume to pass events through Morphline
and have Morphline output to Elasticsearch, I wouldn't really want to
add a while new Elasticsearch Morphline Sink.  I should really just be
able to use the existing (misnamed?) Morphline Solr Sink and just
point it to a Morphline config that has laodElasticsearch() instead of
loadSolr().

(please ignore the fact Morphline doesn't actually have
loadElasticsearch() yet - I think this is a Morphline issue, not a
Flume issue)

Is the above correct?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Sun, Nov 10, 2013 at 7:29 PM, Otis Gospodnetic
<ot...@gmail.com> wrote:
> Hello,
>
> Warning: I've got a Flume NG and Morphlines newbie status
>
> I was looking at Morphline Solr Sink to see how one could write an
> equivalent Morphline Elasticsearch Sink, but after looking at the
> code, I'm a bit confused.  Here are my Qs:
>
> 1)  interface MorphlineHandler mentions Solr in N places, but it
> doesn't seem to be Solr-specific.  Couldn't one reuse this interface
> for a Morphline ES Sink?
>
> 2) In general, couldn't/shouldn't a few classes from
> org.apache.flume.sink.solr.morphline package really not outside
> anything solr-specific? e.g.  org.apache.flume.sink.morphline for
> those that are Morphline-specific?
>
> 3) Similarly, BlobDeserializer and BlobHandler don't seem to be even
> Morphline-specific.  Shouldn't they be elsewhere?
>
> 4) I was expecting to see SolrJ (Solr Java client library) being used
> in MorphlineHandlerImpl or MorphlineSolrSink to send events to Solr,
> but there is no trace of SolrJ there.  How exactly does this load
> Flume events into Solr then?
> Ooooh, is that because when using this sink one is supposed to provide
> a Morphline config and this config has a hard-coded loadSolr()
> command?
>
> 5) Would it make sense to refactor any of the current Morphline Solr
> Sink code to make it easier to add things Morphline Elasticsearch
> Sink?  If so, any guidance you could provide would be very helpful.
>
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/