You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by Ryan Merriman <me...@gmail.com> on 2018/05/03 18:35:53 UTC

[DISCUSS] Pcap panel architecture

We are planning on adding the pcap query feature to the Alerts UI.  Before
we start this work, I think it is important to get community buy in on the
architectural approach.  There are a couple different options.

One option is to leverage the existing metron-api module that exposes pcap
queries through a REST service.  The upsides are:

   - some work has already been done
   - it's part of our build so we know unit and integration tests pass

The downsides are:

   - It hasn't been used in a while and will need some end to end testing
   to make sure it still functions properly
   - It is synchronous and will block the UI, using up the limited number
   of concurrent connections available in a browser
   - It will require significant MPack work to properly set it up on install
   - It is a legacy module from OpenSOC and coding style is significantly
   different

Another option would be moving to a micro-services architecture.  We have
experimented with a proof of concept and found it was too hard to add this
feature into our existing REST services because of all the dependencies
that must coexist in the same application.  The upsides are:

   - Would provide a platform for future Batch/MR/YARN type features
   - There would be fewer technical compromises since we are building it
   from the ground up

The downsides are:

   - Will require the most effort and will likely take a long time to plan
   and implement
   - Like the previous option, will require significant MPack work

A third option would be to add an endpoint to our existing REST service
that delegates to the pcap_query.sh script through the Java Process class.
The upsides to this approach are:

   - We know the pcap_query.sh script works and would require minimal
   changes
   - Minimal MPack work is required since our REST service is already
   included

The downsides are:

   - Does not set us up to easily add other batch-oriented features in the
   future
   - OS-level security becomes a concern since we are delegating to a
   script in a separate process

I feel like ultimately we want to transition to a micro-services
architecture because it will provide more flexibility and make it easier to
grow our set of features.  But in the meantime, wrapping the pcap_query.sh
script would allow us to add this feature with less work and fewer lines of
code.  If and when we decide to deploy a separate REST application for
batch features, the UI portion would require minimal changes.

What does everyone think?

Re: [DISCUSS] Pcap panel architecture

Posted by Michael Miklavcic <mi...@gmail.com>.
Yes, completely agreed. We're on the same page.

On Thu, May 3, 2018 at 7:50 PM, Otto Fowler <ot...@gmail.com> wrote:

> I think my point is that maybe we should have a discuss about:
>
> * PCAP UI, goals etc
> * Where it would live and why, what that would mean etc
> * Backend ( this original mail )
>
>
>
> On May 3, 2018 at 18:34:00, Michael Miklavcic (michael.miklavcic@gmail.com)
> wrote:
>
> Otto, what are you and your customers finding useful and/or difficult from
> a split management/alerts UI perspective? It might help us to restate the
> original scope and intent around maintaining separate management and alert
> UI's, to your point about "contrary to previous direction." I personally
> don't have a strong position on this other than 1) management is a
> different feature set from drilling into threat intel, yet many apps still
> have their management UI combined with the end user experience and 2) we
> should probably consider pcap in context of a workflow with alerts.
>
> On Thu, May 3, 2018 at 4:19 PM, Otto Fowler <ot...@gmail.com>
> wrote:
>
> > If that UI becomes the Alerts _and_ the PCAP Query UI, then it isn’t the
> > alerts ui anymore.
> >
> > It is becoming more of a “composite” app, with multiple feature ui’s
> > together. I didn’t think that
> > was what we were going for, thus the config ui and the alert ui.
> >
> > Just adding disparate thing as ‘new tabs’ to a ui may be expedient but
> it
> > seems contrary to
> > our previous direction.
> >
> > There are a few things to consider if we are going to start moving
> > everything into Alerts Ui aren’t there?
> >
> > It may be a better road to bring it in on it’s own like the alerts ui
> > effort, so it can be released with ‘qualifiers’ and tested with
> > the right expectations without effecting the Alerts UI.
> >
> >
> >
> > On May 3, 2018 at 17:25:54, Ryan Merriman (merrimanr@gmail.com) wrote:
> >
> > Otto,
> >
> > I'm assuming just adding it to the Alerts UI is less work but I wouldn't
> be
> > strongly opposed to it being it's own UI. What are the reasons for doing
> > that?
> >
> > Mike,
> >
> > On using metron-api:
> >
> > 1. I'm making an assumption about it not being used much. Maybe it
> > still works without issue. I agree, we'll have to test anything we build
> > so this is a minor issue.
> > 2. Updating metron-api to be asynchronous is a requirement in my opinion
> > 3. The MPack work is the major drawback for me. We're essentially
> > creating a brand new Metron component. There are a lot of examples we
> can
> > draw from but it's going to be a large chunk of new MPack code to
> maintain
> > and MPack development has been painful in the past. I think it will
> > include:
> > 1. Creating a start script
> > 2. Creating master.py and commands.py scripts for managing the
> > application lifecycle, service checks, etc
> > 3. Creating an -env.xml file for exposing properties in Ambari
> > 4. Adding the component to the various MPack files
> > (metron_theme.json, metainfo.xml, service_advisor.py, etc.)
> > 4. Our Storm topologies are completely different use cases and much more
> > complex so I don't understand the comparison. But if you prefer this
> > coding style then I think this is a minor issue as well.
> >
> > On micro-services:
> >
> > 1. Our REST service already includes a lot of dependencies and is
> > difficult to manage in it's current state. I just went through this on
> > https://github.com/apache/metron/pull/1008. It was painful. When we
> > tried to include mapreduce and yarn dependencies it became what seemed
> like
> > an endless NoSuchMethod, NoClassDef and similar errors. Even if we can
> get
> > it to work it's going to make managing our REST service that much harder
> > than it already is. I think the shaded jars are the source of all this
> > trouble and I agree it would be nice to improve our architecture in this
> > area. However I don't think it's a simple fix and now we're getting into
> > the "will likely take a long time to plan and implement" concern. If
> > anyone has ideas on how to solve our shaded jar challenge I would be all
> > for it.
> > 2. All the MPack work listed above would also be required here. A
> > micro-services pattern is a significant shift and can't even give you
> > concrete examples of what exactly we would have to do. We would need to
> go
> > through extensive design and planning to even get to that point.
> > 3. It would be a branch new component. See above plus any new
> > infrastructure we would need (web server/proxy, service discovery, etc)
> >
> > On pcap-query:
> >
> > 1. I don't recall any users or customers directly using metron-api but
> > if you say so I believe you :)
> > 2. As I understand it the pcap topology and pcap query are somewhat
> > decoupled. Maybe location of pcap files would be shared? MPack work here
> > is likely to include adding a couple properties and moving some around
> so
> > they can be shared. Deciding between Ambari and global config would be
> > similar to properties we add to any component.
> >
> > I think you may be underestimating how difficult it's going to be to
> solve
> > our dependency problem. Or maybe it's me that is overestimating it :) It
> > could be something we experiment with before we start on the pcap work.
> > There is major upside and it would benefit the whole project. But until
> > then we can't fit anymore more screwdrivers in the toolbox. For me the
> > only reasonable options are to use the existing metron-api as it's own
> > separate service or call out to the pcap_query.sh script from our
> existing
> > REST app. I could go either way really. I'm just not excited about all
> > the MPack code we have to write for a new component. Maybe it won't be
> > that bad.
> >
> > On Thu, May 3, 2018 at 2:50 PM, Otto Fowler <ot...@gmail.com>
> > wrote:
> >
> > > First thought is why the Alerts-UI and Not a dedicated Query UI?
> > >
> > >
> > > On May 3, 2018 at 14:36:04, Ryan Merriman (merrimanr@gmail.com)
> wrote:
> > >
> > > We are planning on adding the pcap query feature to the Alerts UI.
> Before
> > > we start this work, I think it is important to get community buy in on
> > the
> > > architectural approach. There are a couple different options.
> > >
> > > One option is to leverage the existing metron-api module that exposes
> > pcap
> > > queries through a REST service. The upsides are:
> > >
> > > - some work has already been done
> > > - it's part of our build so we know unit and integration tests pass
> > >
> > > The downsides are:
> > >
> > > - It hasn't been used in a while and will need some end to end testing
> > > to make sure it still functions properly
> > > - It is synchronous and will block the UI, using up the limited number
> > > of concurrent connections available in a browser
> > > - It will require significant MPack work to properly set it up on
> install
> > > - It is a legacy module from OpenSOC and coding style is significantly
> > > different
> > >
> > > Another option would be moving to a micro-services architecture. We
> have
> > > experimented with a proof of concept and found it was too hard to add
> > this
> > > feature into our existing REST services because of all the
> dependencies
> > > that must coexist in the same application. The upsides are:
> > >
> > > - Would provide a platform for future Batch/MR/YARN type features
> > > - There would be fewer technical compromises since we are building it
> > > from the ground up
> > >
> > > The downsides are:
> > >
> > > - Will require the most effort and will likely take a long time to
> plan
> > > and implement
> > > - Like the previous option, will require significant MPack work
> > >
> > > A third option would be to add an endpoint to our existing REST
> service
> > > that delegates to the pcap_query.sh script through the Java Process
> > class.
> > > The upsides to this approach are:
> > >
> > > - We know the pcap_query.sh script works and would require minimal
> > > changes
> > > - Minimal MPack work is required since our REST service is already
> > > included
> > >
> > > The downsides are:
> > >
> > > - Does not set us up to easily add other batch-oriented features in
> the
> > > future
> > > - OS-level security becomes a concern since we are delegating to a
> > > script in a separate process
> > >
> > > I feel like ultimately we want to transition to a micro-services
> > > architecture because it will provide more flexibility and make it
> easier
> > > to
> > > grow our set of features. But in the meantime, wrapping the
> pcap_query.sh
> > > script would allow us to add this feature with less work and fewer
> lines
> > > of
> > > code. If and when we decide to deploy a separate REST application for
> > > batch features, the UI portion would require minimal changes.
> > >
> > > What does everyone think?
> > >
> > >
> >
>
>

Re: [DISCUSS] Pcap panel architecture

Posted by Michael Miklavcic <mi...@gmail.com>.
That sounds fine - I'd imagine we'd be looking to hit the classpath related
problems asap when merging the modules.

For the module, we just have a pom that supplies external dependencies.
Rather than every metron module depending on Guava or Jackson directly, or
via transitive dependencies, we specify the version ourselves and depend on
the new module. We can then shade and relocate as needed, which should help
eliminate problems. We did something similar with metron-elasticsearch to
get around conflicts between elasticsearch and storm.
https://github.com/apache/metron/blob/master/metron-platform/metron-elasticsearch/pom.xml#L261

Mike


On Mon, May 7, 2018 at 7:39 AM, Ryan Merriman <me...@gmail.com> wrote:

> Otto, your use case makes sense to me.  We'll have to think about how to
> manage the user to job relationships.  I'm assuming YARN jobs will be
> submitted as the metron service user so YARN won't keep track of this for
> us.  Is that assumption correct?  Do you have any ideas for doing that?
>
> Mike, I can start a feature branch and experiment with merging metron-api
> into metron-rest.  That should allow us to collaborate on any issues or
> challenges.   Also, can you expand on your idea to manage external
> dependencies as a special module?  That seems like a very attractive option
> to me.
>
> On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <ot...@gmail.com>
> wrote:
>
> > From my response on the other thread, but applicable to the backend
> stuff:
> >
> > "The PCAP Query seems more like PCAP Report to me.  You are generating a
> > report based on parameters.
> > That report is something that takes some time and external process to
> > generate… ie you have to wait for it.
> >
> > I can almost imagine a flow where you:
> >
> > * Are in the AlertUI
> > * Ask to generate a PCAP report based on some selected alerts/meta-alert,
> > possibly picking from on or more report ‘templates’
> > that have query options etc
> > * The report request is ‘queued’, that is dispatched to be be
> > executed/generated
> > * You as a user have a ‘queue’ of your report results, and when the
> report
> > is done it is queued there
> > * We ‘monitor’ the report/queue press through the yarn rest ( report
> > info/meta has the yarn details )
> > * You can select the report from your queue and view it either in a new
> UI
> > or custom component
> > * You can then apply a different ‘view’ to the report or work with the
> > report data
> > * You can print / save etc
> > * You can associate the report with the alerts ( again in the report info
> > ) with…. a ‘case’ or ‘ticket’ or investigation something or other
> >
> >
> > We can introduce extensibility into the report templates, report views (
> > thinks that work with the json data of the report )
> >
> > Something like that.”
> >
> > Maybe we can do :
> >
> > template -> query parameters -> script => yarn info
> > yarn info + query info + alert context + yarn status => report info ->
> > stored in a user’s ‘report queue’
> > report persistence added to report info
> > metron-rest -> api to monitor the queue, read results ( page ), etc etc
> >
> >
> > On May 4, 2018 at 09:23:39, Ryan Merriman (merrimanr@gmail.com) wrote:
> >
> > I started a separate thread on Pcap UI considerations and user
> > requirements
> > at Otto's request. This should help us keep these two related but
> separate
> > discussions focused.
> >
> > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <mi...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > >
> > >
> > > (Youhouuu my first reply on this kind of mail chain^^)
> > >
> > >
> > >
> > > If I may, I would like to share my view on the following 3 points.
> > >
> > > - Backend:
> > >
> > > The current metron-api is totally seperate, it will be logic for me to
> > have
> > > it at the same place as the others rest api. Especially when more
> > security
> > > will be added, it will not be needed to do the job twice.
> > > The current implementation send back a pcap object which still need to
> > be
> > > decoded. In the opensoc, the decoding was done with tshard on the
> > frontend.
> > > It will be good to have this decoding happening directly on the backend
> > to
> > > not create a load on frontend. An option will be to install tshark on
> > the
> > > rest server and to use to convert the pcap to xml and then to a json
> > that
> > > will be send to the frontend.
> > >
> > > I tried to start directly the map/reduce job to search over all the
> pcap
> > > data from the rest server and as Ryan mention it, we had trouble. I
> will
> > > try to find back the error.
> > >
> > > Then in the POC, what we tried is to use the pcap_query script and this
> > > work fine. I just modified it that he sends back directly the job_id of
> > > yarn and not waiting that the job is finished. Then it will allow the
> UI
> > > and the rest server to know what the status of the research by querying
> > the
> > > yarn rest api. This will allow the UI and the rest server to be async
> > > without any blocking phase. What do you think about that?
> > >
> > >
> > >
> > > Having the job submitted directly from the code of the rest server will
> > be
> > > perfect, but it will need a lot of investigation I think (but I'm not
> > the
> > > expert so I might be completely wrong ^^).
> > >
> > > We know that the pcap_query scritp work fine so why not calling it? Is
> > it
> > > that bad? (maybe stupid question, but I really don’t see a lot of
> > drawback)
> > >
> > >
> > >
> > > - Front end:
> > >
> > > Adding the the pcap search to the alert UI is, I think, the easiest way
> > to
> > > move forward. But indeed, it will then be the “Alert UI and pcapquery”.
> > > Maybe the name of the UI should just change to something like
> > “Monitoring &
> > > Investigation UI” ?
> > >
> > >
> > >
> > > Is there any roadmap or plan for the different UI? I mean did you
> > already
> > > had discussion on how you see the ui evolving with the new feature that
> > > will come in the future?
> > >
> > >
> > >
> > > - Microservices:
> > >
> > >
> > >
> > > What do you mean exactly by microservices? Is it to separate all the
> > > features in different projects? Or something like having the different
> > > components in container like kubernet? (again maybe stupid question,
> but
> > I
> > > don’t clearly understand what you mean J )
> > >
> > >
> > >
> > > Michel
> > >
> >
> >
>

Re: [DISCUSS] Pcap panel architecture

Posted by Ryan Merriman <me...@gmail.com>.
Yes there will be an admin role that can read and delete all.

On Fri, May 11, 2018 at 4:11 PM, Otto Fowler <ot...@gmail.com>
wrote:

> Do we at least require a admin/super user?  See’s all the queues and jobs?
>
>
> On May 11, 2018 at 17:03:34, Ryan Merriman (merrimanr@gmail.com) wrote:
>
> Thanks everyone for the input and feedback. I will attempt to summarize so
> we can come to a consensus and get this tasked out.
>
> The following endpoints will be included:
>
> - GET /api/v1/pcap/metadata?basePath - This endpoint will return
> metadata of pcap data stored in HDFS. This would include pcap size, date
> ranges (how far back can I go), etc. It would accept an optional HDFS
> basePath parameter for cases where pcap data is stored in multiple places
> and/or different from the default location.
> - POST /api/v1/pcap/fixed - This endpoint would accept a fixed pcap
> request, submit a pcap job, and return a job id. The request would be an
> object containing the options documented here for the fixed filter:
> https://github.com/apache/metron/tree/master/metron-platform/metron-pcap-
> backend#query-filter-utility
> <https://github.com/apache/metron/tree/master/metron-
> platform/metron-pcap-backend#query-filter-utility>.
> A job will be associated with a user that submits it. An exception will be
> returned for violating constraints like too many queries submitted, query
> parameters out of limits, etc. A record of the user and job id will be
> persisted to a data store so a list of a user's jobs can later be
> retrieved.
> - POST /api/v1/pcap/query - This endpoint would accept a query pcap
> request, submit a pcap job, and return a job id. The request would be
> an object containing the options documented here for the query filter:
> https://github.com/apache/metron/tree/master/metron-platform/metron-pcap-
> backend#query-filter-utility
> <https://github.com/apache/metron/tree/master/metron-
> platform/metron-pcap-backend#query-filter-utility>.
> A job will be associated with a user that submits it. An exception will be
> returned for violating constraints like too many queries submitted, query
> parameters out of limits, etc. A record of the user and job id will be
> persisted to a data store so a list of a user's jobs can later be
> retrieved.
> - GET /api/v1/pcap/status/<jobId> - This endpoint will return the YARN
> status of a running/completed job.
> - GET /api/v1/pcap/stop/<jobId> - This endpoint would kill a running
> pcap job. If the job has already completed this is a noop.
> - GET /api/v1/pcap/list - This endpoint will list a user's submitted
> pcap queries. Items in the list would contain job id, status (is it
> finished?), start/end time, and number of pages.
> - GET /api/v1/pcap/pdml/<jobId>/<pageNumber> - This endpoint will return
> pcap results for the given page in pdml format (
> https://wiki.wireshark.org/PDML <https://wiki.wireshark.org/PDML>). Are
> there other formats we want to support?
> - GET /api/v1/pcap/raw/<jobId>/<pageNumber> - This endpoint will allow a
> user to download raw pcap results for the given page.
> - DELETE /api/v1/pcap/<jobId> - This endpoint will delete pcap query
> results.
>
> With respect to security, users will only be able to see their list of
> jobs
> and query results. We will also include an admin role that will be able to
> read and delete all. Jobs will be submitted as the metron service user and
> user to job relationships will be managed and persisted by the REST
> application.
>
> This is a substantial feature and compromises should be made to get an
> initial version out (baby steps). Here are some areas we will compromise
> on and enhance in the future:
>
> - Security - Initially we will rely on Spring Security for authorization
> and authentication. Eventually this feature will fit into a broader
> security strategy. This could mean an authentication strategy that is
> consistent with the rest of Metron, integration with Ranger for
> authorization, and submitting jobs as individual users rather than a
> service user. We can also explore sharing access between users and more
> fine-grained ACL-based security.
> - Priority and scheduling - Eventually we should form a strategy for
> prioritizing jobs, imposing limits, etc with YARN scheduling. Submitting
> jobs with individual users will give us even more flexibility in this
> area.
> - Job cleanup - Initially cleanup will be a manual process either
> through exposed endpoints. Later we can explore automated cleanup
> strategies and introduce data retention policies.
> - Filter options - Initially we will expose the 2 filter options that
> currently exist in Metron: fixed and pcap. Eventually we can add more
> filters like bpf.
> - Data directory - This could be a TOC of different pcap data sets as
> Simon alluded to. This could also include a way to upload pcap data and
> register it as Otto suggested. We could also expand this to other storage
> locations besides HDFS.
>
> We haven't discussed the underlying implementation details for the various
> parts of this feature. I will leave that to whoever tasks on a subtask to
> either lead a specific discussion or iterate in a PR. If anyone feels I
> haven't captured their input here, please let me know. Once we agree on
> this initial scope I will task it out in Jira and we can get started.
>
>
> On Fri, May 11, 2018 at 11:52 AM, Simon Elliston Ball <
> simon@simonellistonball.com> wrote:
>
> > On the sharing and securing points, it seems like having the result of a
> > query run hang around in HDFS on a given URL would work, we can then use
> > Ranger policies (RBAC, or even ABAC later) to control access, this also
> > solves the page storage problem, and gives us a kind of two step
> approach,
> > both of which could (maybe, possibly, but probably not) be large enough
> to
> > need distribution, i.e. the initial search everything and the subsequent
> > sort / page through the results. Does anyone imaging sorting? Maybe
> > sub-filtering, but PCAP is heavily time based, so timestamp sort ok?
> >
> > I also suspect it's worth considering the lifecycle of our stored result
> > sets as being meta-data driven. If I'm doing a speculative search I
> don't
> > really care if an admin cleans that up after at the end of day / week /
> > disk space nervousness limit. However, if I find something good, I might
> > want to mark the result set as immune from automatic deletion.
> >
> > The other issue I would raise, which has implications for our PCAP
> capture,
> > and also impacts Otto's suggestion of 'self-uploaded' PCAPs is how we
> > namespace PCAP collection and retrieval. The problem here is that I
> might
> > have PCAPs from multiple locations which have conflicting private IP
> > ranges, so I can't logically dump them all in the same repository.
> Solving
> > the collection end of that is probably a separate unit of effort, but
> this
> > retrieval architecture should support multiple file system locations.
> >
> > If we wanted to get fancy about it, we should look at using the stored
> > result sets as a kind of cache, for other queries, as people refine and
> > narrow down queries, it may make sense to be more sophisticated about
> where
> > our query jobs pull from (i.e. filter the subset from a previous
> resultset,
> > rather than scanning petabytes of source data). This may imply some kind
> of
> > TOC for the cache. The underlying immutability of the PCAP store should
> > make this fairly tractable.
> >
> > FYI, I've been doing a lot of thinking around data security, API and
> > configuration security and auditing recently, but I suspect that is a
> > different discuss thread. I'll kick something off shortly with a few
> > thoughts.
> >
> > I see a lot of this as long term goals to be honest, so as Jon says, we
> can
> > definitely take a few baby steps to start.
> >
> > Simon
> >
> > On 11 May 2018 at 15:40, Otto Fowler <ot...@gmail.com> wrote:
> >
> > > Don’t lose the use case for manually uploading PCAPS for analysis Jon.
> > >
> > >
> > > On May 11, 2018 at 10:14:02, Zeolla@GMail.com (zeolla@gmail.com)
> wrote:
> > >
> > > I think baby steps are fine - admin gets access to all, otherwise you
> > only
> > > see your own pcaps, but we file a jira for a future add of API
> security,
> > > which more mature SOCs that align with the Metron personas will need.
> > >
> > > Jon
> > >
> > > On Fri, May 11, 2018, 09:27 Ryan Merriman <me...@gmail.com>
> wrote:
> > >
> > > > That's a good point Jon. There are different levels of effort
> > associated
> > > > with different options. If we want to allow pcaps to be shared with
> > > > specific users, we will need to introduce ACL security in our REST
> > > > application using something like the ACL capability that comes with
> > > Spring
> > > > Security or Ranger. This would be more complex to design and
> implement.
> > > > If we want something more broad like admin roles that can see all or
> > > > allowing pcap files to become public, this would be less work. Do
> you
> > > > think ACL security is required or would the other options be
> > acceptable?
> > > >
> > > > On Thu, May 10, 2018 at 2:47 PM, Zeolla@GMail.com <ze...@gmail.com>
>
> > > > wrote:
> > > >
> > > > > At the very least there needs to be the ability to share
> downloaded
> > > PCAPs
> > > > > with other users and/or have roles that can see all pcaps. A
> platform
> > > > > engineer may want to clean up old pcaps after x time, or a manger
> may
> > > ask
> > > > > an analyst to find all of the traffic that exhibits xyz behavior,
> > dump
> > > a
> > > > > pcap, and then point him to it so the manager can review. Since
> the
> > > > > pcap may be huge, we wouldn't want to try to push people to
> sending
> > it
> > > > via
> > > > > email, uploading to a file server, finding an external hard drive,
> > etc.
> > > > >
> > > > > Jon
> > > > >
> > > > > On Thu, May 10, 2018 at 10:16 AM Ryan Merriman <
> merrimanr@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Mike, I believe the /pcapGetter/getPcapsByIdentifiers endpoint
> > > exposes
> > > > > the
> > > > > > fixed query option which we have covered. I agree with you that
> > > > > > deprecating the metron-api module should be a goal of this
> feature.
> > > > > >
> > > > > > On Wed, May 9, 2018 at 1:36 PM, Michael Miklavcic <
> > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > >
> > > > > > > This looks like a pretty good start Ryan. Does the metadata
> > > endpoint
> > > > > > cover
> > > > > > > this https://github.com/apache/metron/tree/master/
> > > > > > > metron-platform/metron-api#the-pcapgettergetpcapsbyidentifier
> > > > > s-endpoint
> > > > > > > from the original metron-api? If so, then we would be able to
> > > > deprecate
> > > > > > the
> > > > > > > existing metron-api project. If we later go to micro-services,
> a
> > > pcap
> > > > > > > module would spin back into the fold, but it would probably
> look
> > > > > > different
> > > > > > > from metron-api.
> > > > > > >
> > > > > > > I commented on the UI thread, but to reiterate for the purpose
> of
> > > > > backend
> > > > > > > functionality here I don't believe there is a way to "PAUSE"
> or
> > > > > "SUSPEND"
> > > > > > > jobs. That said, I think GET /api/v1/pcap/stop/<jobId> is
> > > sufficient
> > > > > for
> > > > > > > the job management operations.
> > > > > > >
> > > > > > > On Wed, May 9, 2018 at 11:00 AM, Ryan Merriman <
> > > merrimanr@gmail.com>
> > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Now that we are confident we can run submit a MR job from
> our
> > > > current
> > > > > > > REST
> > > > > > > > application, is this the desired approach? Just want to
> > confirm.
> > > > > > > >
> > > > > > > > Next I think we should map out what the REST interface will
> > look
> > > > > like.
> > > > > > > > Here are the endpoints I'm thinking about:
> > > > > > > >
> > > > > > > > GET /api/v1/pcap/metadata?basePath
> > > > > > > >
> > > > > > > > This endpoint will return metadata of pcap data stored in
> HDFS.
> > > > This
> > > > > > > would
> > > > > > > > include pcap size, date ranges (how far back can I go), etc.
> It
> > > > > would
> > > > > > > > accept an optional HDFS basePath parameter for cases where
> pcap
> > > > data
> > > > > is
> > > > > > > > stored in multiple places and/or different from the default
> > > > location.
> > > > > > > >
> > > > > > > > POST /api/v1/pcap/query
> > > > > > > >
> > > > > > > > This endpoint would accept a pcap request, submit a pcap
> query
> > > job,
> > > > > and
> > > > > > > > return a job id. The request would be an object containing
> the
> > > > > > > parameters
> > > > > > > > documented here: https://github.com/apache/
> metron/tree/master/
> > > > > > > > metron-platform/metron-pcap-backend#query-filter-utility. A
> > > > > query/job
> > > > > > > > would be associated with a user that submits it. An
> exception
> > > will
> > > > > be
> > > > > > > > returned for violating constraints like too many queries
> > > submitted,
> > > > > > query
> > > > > > > > parameters out of limits, etc.
> > > > > > > >
> > > > > > > > GET /api/v1/pcap/status/<jobId>
> > > > > > > >
> > > > > > > > This endpoint will return the status of a running job. I
> > imagine
> > > > > this
> > > > > > is
> > > > > > > > just a proxy to the YARN REST api. We can discuss the
> > > > implementation
> > > > > > > > behind these endpoints later.
> > > > > > > >
> > > > > > > > GET /api/v1/pcap/stop/<jobId>
> > > > > > > >
> > > > > > > > This endpoint would kill a running pcap job. If the job has
> > > > already
> > > > > > > > completed this is a noop.
> > > > > > > >
> > > > > > > > GET /api/v1/pcap/list
> > > > > > > >
> > > > > > > > This endpoint will list a user's submitted pcap queries.
> Items
> > in
> > > > > the
> > > > > > > list
> > > > > > > > would contain job id, status (is it finished?), start/end
> time,
> > > and
> > > > > > > number
> > > > > > > > of pages. Maybe there is some overlap with the status
> endpoint
> > > > above
> > > > > > and
> > > > > > > > the status endpoint is not needed?
> > > > > > > >
> > > > > > > > GET /api/v1/pcap/pdml/<jobId>/<pageNumber>
> > > > > > > >
> > > > > > > > This endpoint will return pcap results for the given page in
> > pdml
> > > > > > format
> > > > > > > (
> > > > > > > > https://wiki.wireshark.org/PDML). Are there other formats
> we
> > > want
> > > > > to
> > > > > > > > support?
> > > > > > > >
> > > > > > > > GET /api/v1/pcap/raw/<jobId>/<pageNumber>
> > > > > > > >
> > > > > > > > This endpoint will allow a user to download raw pcap results
> > for
> > > > the
> > > > > > > given
> > > > > > > > page.
> > > > > > > >
> > > > > > > > DELETE /api/v1/pcap/<jobId>
> > > > > > > >
> > > > > > > > This endpoint will delete pcap query results. Not sure yet
> how
> > > > this
> > > > > > fits
> > > > > > > > in with our broader cleanup strategy.
> > > > > > > >
> > > > > > > > This should get us started. What did I miss and what would
> you
> > > > > change
> > > > > > > > about these? I did not include much detail related to
> security,
> > > > > > cleanup
> > > > > > > > strategy, or underlying implementation details but these are
> > > items
> > > > we
> > > > > > > > should discuss at some point.
> > > > > > > >
> > > > > > > > On Tue, May 8, 2018 at 5:38 PM, Michael Miklavcic <
> > > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Sweet! That's great news. The pom changes are a lot
> simpler
> > > than
> > > > I
> > > > > > > > > expected. Very nice.
> > > > > > > > >
> > > > > > > > > On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman <
> > > > merrimanr@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Finally figured it out. Commit is here:
> > > > > > > > > > https://github.com/merrimanr/incubator-metron/commit/
> > > > > > > > > > 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
> > > > > > > > > >
> > > > > > > > > > It came down to figuring out the right combination of
> maven
> > > > > > > > dependencies
> > > > > > > > > > and passing in the HDP version to REST as a Java system
> > > > property.
> > > > > > I
> > > > > > > > also
> > > > > > > > > > included some HDFS setup tasks. I tested this in full
> dev
> > and
> > > > > can
> > > > > > > now
> > > > > > > > > > successfully run a pcap query and get results. All you
> > should
> > > > > have
> > > > > > > to
> > > > > > > > do
> > > > > > > > > > is generate some pcap data first.
> > > > > > > > > >
> > > > > > > > > > On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
> > > > > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > @Ryan - pulled your branch and experimented with a few
> > > > things.
> > > > > In
> > > > > > > > doing
> > > > > > > > > > so,
> > > > > > > > > > > it dawned on me that by adding the yarn and hadoop
> > > classpath,
> > > > > you
> > > > > > > > > > probably
> > > > > > > > > > > didn't introduce a new classpath issue, rather you
> > probably
> > > > > just
> > > > > > > > moved
> > > > > > > > > > onto
> > > > > > > > > > > the next classpath issue, ie hbase per your exception
> > about
> > > > > hbase
> > > > > > > > jaxb.
> > > > > > > > > > > Anyhow, I put up a branch with some pom changes worth
> > > trying
> > > > in
> > > > > > > > > > conjunction
> > > > > > > > > > > with invoking the rest app startup via "/usr/bin/yarn
> > jar"
> > > > > > > > > > >
> > > > > > > > > > > https://github.com/mmiklavc/metron/tree/ryan-rest-test
> > > > > > > > > > >
> > > > > > > > > > > https://github.com/mmiklavc/metron/commit/
> > > > > > > > > 5ca23580fc6e043fafae2327c80b65
> > > > > > > > > > > b20ca1c0c9
> > > > > > > > > > >
> > > > > > > > > > > Mike
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> > > > > > > > > > > simon@simonellistonball.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > That would be a step closer to something more like a
> > > > > > > micro-service
> > > > > > > > > > > > architecture. However, I would want to make sure we
> > think
> > > > > about
> > > > > > > the
> > > > > > > > > > > > operational complexity, and mpack implications of
> > having
> > > > > > another
> > > > > > > > > server
> > > > > > > > > > > > installed and running somewhere on the cluster
> (also,
> > > ssl,
> > > > > > > > kerberos,
> > > > > > > > > > etc
> > > > > > > > > > > > etc requirements for that service).
> > > > > > > > > > > >
> > > > > > > > > > > > On 8 May 2018 at 14:27, Ryan Merriman <
> > > merrimanr@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > +1 to having metron-api as it's own service and
> > using a
> > > > > > gateway
> > > > > > > > > type
> > > > > > > > > > > > > pattern.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <
> > > > > > > > > ottobackwards@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Why not have metron-api as it’s own service and
> > use a
> > > > > > > ‘gateway’
> > > > > > > > > > type
> > > > > > > > > > > > > > pattern in rest?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (
> > > > > > > merrimanr@gmail.com
> > > > > > > > )
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Moving the yarn classpath command earlier in the
> > > > > classpath
> > > > > > > now
> > > > > > > > > > gives
> > > > > > > > > > > > this
> > > > > > > > > > > > > > error:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Caused by: java.lang.NoSuchMethodError:
> > > > > > > > > > > > > > javax.servlet.ServletContext.
> > > > > getVirtualServerName()Ljava/
> > > > > > > > > > > lang/String;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I will experiment with other combinations, I
> > suspect
> > > we
> > > > > > will
> > > > > > > > need
> > > > > > > > > > > > > > finer-grain control over the order.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The grep matches class names inside jar files. I
> > use
> > > > this
> > > > > > all
> > > > > > > > the
> > > > > > > > > > > time
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > it's really useful.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The metron-rest jar is already shaded.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Reverse engineering the yarn jar command was the
> > next
> > > > > > thing I
> > > > > > > > was
> > > > > > > > > > > going
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > try. Will let you know how it goes.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, May 8, 2018 at 12:36 AM, Michael
> Miklavcic
> > <
> > > > > > > > > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > What order did you add the hadoop or yarn
> > > classpath?
> > > > > The
> > > > > > > > > "shaded"
> > > > > > > > > > > > > > package
> > > > > > > > > > > > > > > stands out to me in this name
> > > > > "org.apache.hadoop.hbase.*
> > > > > > > > > shaded*
> > > > > > > > > > > > > > >
> > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider."
> > > > > > > Maybe
> > > > > > > > > try
> > > > > > > > > > > > adding
> > > > > > > > > > > > > > > those packages earlier on the classpath.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think that find command needs a "jar tvf",
> > > > otherwise
> > > > > > > you're
> > > > > > > > > > > looking
> > > > > > > > > > > > > > for a
> > > > > > > > > > > > > > > class name in jar file names.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Have you tried shading the rest jar?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'd also look at the classpath you get when
> > running
> > > > > "yarn
> > > > > > > > jar"
> > > > > > > > > to
> > > > > > > > > > > > start
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > existing pcap service, per the instructions in
> > > > > > > > > > > metron-api/README.md.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman
> <
> > > > > > > > > > merrimanr@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > To explore the idea of merging metron-api
> into
> > > > > > > metron-rest
> > > > > > > > > and
> > > > > > > > > > > > > running
> > > > > > > > > > > > > > > pcap
> > > > > > > > > > > > > > > > queries inside our REST application, I
> created
> > a
> > > > > simple
> > > > > > > > test
> > > > > > > > > > > here:
> > > > > > > > > > > > > > > >
> > > > > > https://github.com/merrimanr/incubator-metron/tree/pcap-
> > > > > > > > > > > rest-test.
> > > > > > > > > > > > A
> > > > > > > > > > > > > > > > summary of what's included:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Added pcap as a dependency in the
> metron-rest
> > > > > pom.xml
> > > > > > > > > > > > > > > > - Added a pcap query controller endpoint at
> > > > > > > > > > > > > > > > http://node1:8082/swagger-ui.
> > > > > > > html#!/pcap-query-controller/
> > > > > > > > > > > > > > > queryUsingGET
> > > > > > > > > > > > > > > > - Added a pcap query service that runs a
> > simple,
> > > > > > > hardcoded
> > > > > > > > > > query
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Generate some pcap data using pycapa (
> > > > > > > > > > > > > > > >
> > > > https://github.com/apache/metron/tree/master/metron-
> > > > > > > > > > > sensors/pycapa
> > > > > > > > > > > > )
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > pcap topology (
> > > > > > > > > > > > > > > >
> > > > https://github.com/apache/metron/tree/master/metron-
> > > > > > > > > > > > > > > >
> > > > platform/metron-pcap-backend#starting-the-topology).
> > > > > > > > > > > > > > > > After this initial setup there should be
> data
> > in
> > > > HDFS
> > > > > > at
> > > > > > > > > > > > > > > > "/apps/metron/pcap". I believe this should
> be
> > > > enough
> > > > > to
> > > > > > > > > > exercise
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > issue. Just hit the endpoint referenced
> above.
> > I
> > > > > tested
> > > > > > > > this
> > > > > > > > > in
> > > > > > > > > > > an
> > > > > > > > > > > > > > > > already running full dev by building and
> > > deploying
> > > > > the
> > > > > > > > > > > metron-rest
> > > > > > > > > > > > > > jar.
> > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > did not rebuild full dev with this change
> but I
> > > > would
> > > > > > > still
> > > > > > > > > > > expect
> > > > > > > > > > > > it
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > work. Let me know if it doesn't.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The first error I see when I hit this
> endpoint
> > > is:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > java.lang.NoClassDefFoundError:
> > > > > > > > > > > > > > > > org/apache/hadoop/yarn/webapp/
> > > > > > > YarnJacksonJaxbJsonProvider.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Here are the things I've tried so far:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Run the REST application with the YARN jar
> > > > command
> > > > > > > since
> > > > > > > > > this
> > > > > > > > > > > is
> > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > all our other YARN/MR-related applications
> are
> > > > > started
> > > > > > > > > > > (metron-api,
> > > > > > > > > > > > > > > > MAAS,
> > > > > > > > > > > > > > > > pcap query, etc). I wouldn't expect this to
> > work
> > > > > since
> > > > > > we
> > > > > > > > > have
> > > > > > > > > > > > > > > runtime
> > > > > > > > > > > > > > > > dependencies on our shaded elasticsearch and
> > > parser
> > > > > > jars
> > > > > > > > and
> > > > > > > > > > I'm
> > > > > > > > > > > > not
> > > > > > > > > > > > > > > > aware
> > > > > > > > > > > > > > > > of a way to add additional jars to the
> > classpath
> > > > with
> > > > > > the
> > > > > > > > > YARN
> > > > > > > > > > > jar
> > > > > > > > > > > > > > > > command
> > > > > > > > > > > > > > > > (is there a way?). Either way I get this
> error:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 18/05/04 19:49:56 WARN
> reflections.Reflections:
> > > > could
> > > > > > not
> > > > > > > > > > create
> > > > > > > > > > > > Dir
> > > > > > > > > > > > > > > using
> > > > > > > > > > > > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/
> > > > > > > > > > hadoop/lib/ojdbc6.jar.
> > > > > > > > > > > > > > > skipping.
> > > > > > > > > > > > > > > > java.lang.NullPointerException
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - I tried adding `yarn classpath` and
> `hadoop
> > > > > > classpath`
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > > > > > > classpath in /usr/metron/0.4.3/bin/metron-
> > > rest.sh
> > > > > (REST
> > > > > > > > > start
> > > > > > > > > > > > > > > > script). I
> > > > > > > > > > > > > > > > get this error:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > java.lang.ClassNotFoundException:
> > > > > > > > > > > > > > > >
> > > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > > > > > > > > > > > > jaxrs.JacksonJaxbJsonProvider
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - I searched for the class in the previous
> > > attempt
> > > > > but
> > > > > > > > could
> > > > > > > > > > not
> > > > > > > > > > > > find
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > in full dev:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > find / -name "*.jar" 2>/dev/null | xargs
> grep
> > > > > > > > > > > > > > > >
> > > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > > > > > > > > > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > > > > > > > > > > > > 2>/dev/null
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Further up in the stack trace I see the
> error
> > > > > happens
> > > > > > > > when
> > > > > > > > > > > > > > > initiating
> > > > > > > > > > > > > > > > the org.apache.hadoop.yarn.util.
> > > > > timeline.TimelineUtils
> > > > > > > > > class.
> > > > > > > > > > I
> > > > > > > > > > > > > > > tried
> > > > > > > > > > > > > > > > setting "yarn.timeline-service.enabled" in
> > > Ambari
> > > > to
> > > > > > > false
> > > > > > > > > and
> > > > > > > > > > > > then
> > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > get
> > > > > > > > > > > > > > > > this error:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Unable to parse
> > > > > > > > > > > > > > > >
> > > > > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-
> > > > > > > > > > > framework'
> > > > > > > > > > > > > as
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > URI, check the setting for
> > mapreduce.application.
> > > > > > > > > > framework.path
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - I've tried adding different hadoop, hbase,
> > yarn
> > > > and
> > > > > > > > > mapreduce
> > > > > > > > > > > > Maven
> > > > > > > > > > > > > > > > dependencies without any success
> > > > > > > > > > > > > > > > - hadoop-yarn-client
> > > > > > > > > > > > > > > > - hadoop-yarn-common
> > > > > > > > > > > > > > > > - hadoop-mapreduce-client-core
> > > > > > > > > > > > > > > > - hadoop-yarn-server-common
> > > > > > > > > > > > > > > > - hadoop-yarn-api
> > > > > > > > > > > > > > > > - hbase-server
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I will keep exploring other possible
> solutions.
> > > Let
> > > > > me
> > > > > > > know
> > > > > > > > > if
> > > > > > > > > > > > anyone
> > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > any ideas.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler
> <
> > > > > > > > > > > > ottobackwards@gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I can imagine a new generic service(s)
> > > capability
> > > > > > whose
> > > > > > > > > job (
> > > > > > > > > > > pun
> > > > > > > > > > > > > > > > intended
> > > > > > > > > > > > > > > > > ) is to
> > > > > > > > > > > > > > > > > abstract the submittal, tracking, and
> storage
> > > of
> > > > > > > results
> > > > > > > > to
> > > > > > > > > > > yarn.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > It would be extended with storage
> providers,
> > > > queue
> > > > > > > > > provider,
> > > > > > > > > > > > > > possibly
> > > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > set of policies or rather strategies.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The pcap ‘report’ would be a client to
> that
> > > > > service,
> > > > > > > the
> > > > > > > > > > > > > specializes
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > service operation for the way we want pcap
> to
> > > > work.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > We can then re-use the generic service for
> > > other
> > > > > long
> > > > > > > > > running
> > > > > > > > > > > > yarn
> > > > > > > > > > > > > > > > > things…..
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
> > > > > > > > > > > ottobackwards@gmail.com
> > > > > > > > > > > > )
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > RE: Tracking v. users
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The submittal and tracking can associate
> the
> > > > > > submitter
> > > > > > > > with
> > > > > > > > > > the
> > > > > > > > > > > > > yarn
> > > > > > > > > > > > > > > job
> > > > > > > > > > > > > > > > > and track that,
> > > > > > > > > > > > > > > > > regardless of the yarn credentials.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > IE> if all submittals and monitoring are
> by
> > the
> > > > > same
> > > > > > > yarn
> > > > > > > > > > user
> > > > > > > > > > > (
> > > > > > > > > > > > > > > Metron )
> > > > > > > > > > > > > > > > > from a single or
> > > > > > > > > > > > > > > > > co-operative set of services, that service
> > can
> > > > > > maintain
> > > > > > > > the
> > > > > > > > > > > > > mapping.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman
> (
> > > > > > > > > > merrimanr@gmail.com
> > > > > > > > > > > )
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Otto, your use case makes sense to me.
> We'll
> > > have
> > > > > to
> > > > > > > > think
> > > > > > > > > > > about
> > > > > > > > > > > > > how
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > manage the user to job relationships. I'm
> > > > assuming
> > > > > > YARN
> > > > > > > > > jobs
> > > > > > > > > > > will
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > submitted as the metron service user so
> YARN
> > > > won't
> > > > > > keep
> > > > > > > > > track
> > > > > > > > > > > of
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > us. Is that assumption correct? Do you
> have
> > any
> > > > > ideas
> > > > > > > for
> > > > > > > > > > doing
> > > > > > > > > > > > > > that?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Mike, I can start a feature branch and
> > > experiment
> > > > > > with
> > > > > > > > > > merging
> > > > > > > > > > > > > > > metron-api
> > > > > > > > > > > > > > > > > into metron-rest. That should allow us to
> > > > > collaborate
> > > > > > > on
> > > > > > > > > any
> > > > > > > > > > > > issues
> > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > challenges. Also, can you expand on your
> idea
> > > to
> > > > > > manage
> > > > > > > > > > > external
> > > > > > > > > > > > > > > > > dependencies as a special module? That
> seems
> > > > like a
> > > > > > > very
> > > > > > > > > > > > attractive
> > > > > > > > > > > > > > > > option
> > > > > > > > > > > > > > > > > to me.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto
> Fowler <
> > > > > > > > > > > > > ottobackwards@gmail.com>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > From my response on the other thread,
> but
> > > > > > applicable
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > > > > backend
> > > > > > > > > > > > > > > > > stuff:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > "The PCAP Query seems more like PCAP
> Report
> > > to
> > > > > me.
> > > > > > > You
> > > > > > > > > are
> > > > > > > > > > > > > > > generating a
> > > > > > > > > > > > > > > > > > report based on parameters.
> > > > > > > > > > > > > > > > > > That report is something that takes some
> > time
> > > > and
> > > > > > > > > external
> > > > > > > > > > > > > process
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > generate… ie you have to wait for it.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I can almost imagine a flow where you:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > * Are in the AlertUI
> > > > > > > > > > > > > > > > > > * Ask to generate a PCAP report based on
> > some
> > > > > > > selected
> > > > > > > > > > > > > > > > alerts/meta-alert,
> > > > > > > > > > > > > > > > > > possibly picking from on or more report
> > > > > ‘templates’
> > > > > > > > > > > > > > > > > > that have query options etc
> > > > > > > > > > > > > > > > > > * The report request is ‘queued’, that
> is
> > > > > > dispatched
> > > > > > > to
> > > > > > > > > be
> > > > > > > > > > be
> > > > > > > > > > > > > > > > > > executed/generated
> > > > > > > > > > > > > > > > > > * You as a user have a ‘queue’ of your
> > report
> > > > > > > results,
> > > > > > > > > and
> > > > > > > > > > > when
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > report
> > > > > > > > > > > > > > > > > > is done it is queued there
> > > > > > > > > > > > > > > > > > * We ‘monitor’ the report/queue press
> > through
> > > > the
> > > > > > > yarn
> > > > > > > > > > rest (
> > > > > > > > > > > > > > report
> > > > > > > > > > > > > > > > > > info/meta has the yarn details )
> > > > > > > > > > > > > > > > > > * You can select the report from your
> queue
> > > and
> > > > > > view
> > > > > > > it
> > > > > > > > > > > either
> > > > > > > > > > > > in
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > UI
> > > > > > > > > > > > > > > > > > or custom component
> > > > > > > > > > > > > > > > > > * You can then apply a different ‘view’
> to
> > > the
> > > > > > report
> > > > > > > > or
> > > > > > > > > > work
> > > > > > > > > > > > > with
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > report data
> > > > > > > > > > > > > > > > > > * You can print / save etc
> > > > > > > > > > > > > > > > > > * You can associate the report with the
> > > alerts
> > > > (
> > > > > > > again
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > > > report
> > > > > > > > > > > > > > > > info
> > > > > > > > > > > > > > > > > > ) with…. a ‘case’ or ‘ticket’ or
> > > investigation
> > > > > > > > something
> > > > > > > > > or
> > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > We can introduce extensibility into the
> > > report
> > > > > > > > templates,
> > > > > > > > > > > > report
> > > > > > > > > > > > > > > views
> > > > > > > > > > > > > > > > (
> > > > > > > > > > > > > > > > > > thinks that work with the json data of
> the
> > > > > report )
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Something like that.”
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Maybe we can do :
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > template -> query parameters -> script
> =>
> > > yarn
> > > > > info
> > > > > > > > > > > > > > > > > > yarn info + query info + alert context +
> > yarn
> > > > > > status
> > > > > > > =>
> > > > > > > > > > > report
> > > > > > > > > > > > > > info
> > > > > > > > > > > > > > > ->
> > > > > > > > > > > > > > > > > > stored in a user’s ‘report queue’
> > > > > > > > > > > > > > > > > > report persistence added to report info
> > > > > > > > > > > > > > > > > > metron-rest -> api to monitor the queue,
> > read
> > > > > > > results (
> > > > > > > > > > page
> > > > > > > > > > > ),
> > > > > > > > > > > > > > etc
> > > > > > > > > > > > > > > etc
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On May 4, 2018 at 09:23:39, Ryan
> Merriman (
> > > > > > > > > > > merrimanr@gmail.com
> > > > > > > > > > > > )
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I started a separate thread on Pcap UI
> > > > > > considerations
> > > > > > > > and
> > > > > > > > > > > user
> > > > > > > > > > > > > > > > > > requirements
> > > > > > > > > > > > > > > > > > at Otto's request. This should help us
> keep
> > > > these
> > > > > > two
> > > > > > > > > > related
> > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > > discussions focused.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel
> > Sumbul
> > > <
> > > > > > > > > > > > > > > michelsumbul@gmail.com>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > (Youhouuu my first reply on this kind
> of
> > > mail
> > > > > > > > chain^^)
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > If I may, I would like to share my
> view
> > on
> > > > the
> > > > > > > > > following
> > > > > > > > > > 3
> > > > > > > > > > > > > > points.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > - Backend:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > The current metron-api is totally
> > seperate,
> > > > it
> > > > > > will
> > > > > > > > be
> > > > > > > > > > > logic
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > me
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > it at the same place as the others
> rest
> > > api.
> > > > > > > > Especially
> > > > > > > > > > > when
> > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > > security
> > > > > > > > > > > > > > > > > > > will be added, it will not be needed
> to
> > do
> > > > the
> > > > > > job
> > > > > > > > > twice.
> > > > > > > > > > > > > > > > > > > The current implementation send back a
> > pcap
> > > > > > object
> > > > > > > > > which
> > > > > > > > > > > > still
> > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > decoded. In the opensoc, the decoding
> was
> > > > done
> > > > > > with
> > > > > > > > > > tshard
> > > > > > > > > > > on
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > frontend.
> > > > > > > > > > > > > > > > > > > It will be good to have this decoding
> > > > happening
> > > > > > > > > directly
> > > > > > > > > > on
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > backend
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > not create a load on frontend. An
> option
> > > will
> > > > > be
> > > > > > to
> > > > > > > > > > install
> > > > > > > > > > > > > > tshark
> > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > rest server and to use to convert the
> > pcap
> > > to
> > > > > xml
> > > > > > > and
> > > > > > > > > > then
> > > > > > > > > > > > to a
> > > > > > > > > > > > > > > json
> > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > will be send to the frontend.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I tried to start directly the
> map/reduce
> > > job
> > > > to
> > > > > > > > search
> > > > > > > > > > over
> > > > > > > > > > > > all
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > pcap
> > > > > > > > > > > > > > > > > > > data from the rest server and as Ryan
> > > mention
> > > > > it,
> > > > > > > we
> > > > > > > > > had
> > > > > > > > > > > > > > trouble. I
> > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > try to find back the error.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Then in the POC, what we tried is to
> use
> > > the
> > > > > > > > pcap_query
> > > > > > > > > > > > script
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > work fine. I just modified it that he
> > sends
> > > > > back
> > > > > > > > > directly
> > > > > > > > > > > the
> > > > > > > > > > > > > > > job_id
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > yarn and not waiting that the job is
> > > > finished.
> > > > > > Then
> > > > > > > > it
> > > > > > > > > > will
> > > > > > > > > > > > > > allow
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > UI
> > > > > > > > > > > > > > > > > > > and the rest server to know what the
> > status
> > > > of
> > > > > > the
> > > > > > > > > > research
> > > > > > > > > > > > by
> > > > > > > > > > > > > > > > querying
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > yarn rest api. This will allow the UI
> and
> > > the
> > > > > > rest
> > > > > > > > > server
> > > > > > > > > > > to
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > async
> > > > > > > > > > > > > > > > > > > without any blocking phase. What do
> you
> > > think
> > > > > > about
> > > > > > > > > that?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Having the job submitted directly from
> > the
> > > > code
> > > > > > of
> > > > > > > > the
> > > > > > > > > > rest
> > > > > > > > > > > > > > server
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > perfect, but it will need a lot of
> > > > > investigation
> > > > > > I
> > > > > > > > > think
> > > > > > > > > > > (but
> > > > > > > > > > > > > > I'm
> > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > expert so I might be completely wrong
> > ^^).
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > We know that the pcap_query scritp
> work
> > > fine
> > > > so
> > > > > > why
> > > > > > > > not
> > > > > > > > > > > > calling
> > > > > > > > > > > > > > it?
> > > > > > > > > > > > > > > > Is
> > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > that bad? (maybe stupid question, but
> I
> > > > really
> > > > > > > don’t
> > > > > > > > > see
> > > > > > > > > > a
> > > > > > > > > > > > lot
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > drawback)
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > - Front end:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Adding the the pcap search to the
> alert
> > UI
> > > > is,
> > > > > I
> > > > > > > > think,
> > > > > > > > > > the
> > > > > > > > > > > > > > easiest
> > > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > move forward. But indeed, it will then
> be
> > > the
> > > > > > > “Alert
> > > > > > > > UI
> > > > > > > > > > and
> > > > > > > > > > > > > > > > pcapquery”.
> > > > > > > > > > > > > > > > > > > Maybe the name of the UI should just
> > change
> > > > to
> > > > > > > > > something
> > > > > > > > > > > like
> > > > > > > > > > > > > > > > > > “Monitoring &
> > > > > > > > > > > > > > > > > > > Investigation UI” ?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Is there any roadmap or plan for the
> > > > different
> > > > > > UI?
> > > > > > > I
> > > > > > > > > mean
> > > > > > > > > > > did
> > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > already
> > > > > > > > > > > > > > > > > > > had discussion on how you see the ui
> > > evolving
> > > > > > with
> > > > > > > > the
> > > > > > > > > > new
> > > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > will come in the future?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > - Microservices:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > What do you mean exactly by
> > microservices?
> > > Is
> > > > > it
> > > > > > to
> > > > > > > > > > > separate
> > > > > > > > > > > > > all
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > features in different projects? Or
> > > something
> > > > > like
> > > > > > > > > having
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > components in container like kubernet?
> > > (again
> > > > > > maybe
> > > > > > > > > > stupid
> > > > > > > > > > > > > > > question,
> > > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > > > don’t clearly understand what you mean
> J
> > )
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Michel
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > --
> > > > > > > > > > > > simon elliston ball
> > > > > > > > > > > > @sireb
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > --
> > > > >
> > > > > Jon
> > > > >
> > > >
> > > --
> > >
> > > Jon
> > >
> >
> >
> >
> > --
> > --
> > simon elliston ball
> > @sireb
> >
>
>

Re: [DISCUSS] Pcap panel architecture

Posted by Otto Fowler <ot...@gmail.com>.
Do we at least require a admin/super user?  See’s all the queues and jobs?


On May 11, 2018 at 17:03:34, Ryan Merriman (merrimanr@gmail.com) wrote:

Thanks everyone for the input and feedback. I will attempt to summarize so
we can come to a consensus and get this tasked out.

The following endpoints will be included:

- GET /api/v1/pcap/metadata?basePath - This endpoint will return
metadata of pcap data stored in HDFS. This would include pcap size, date
ranges (how far back can I go), etc. It would accept an optional HDFS
basePath parameter for cases where pcap data is stored in multiple places
and/or different from the default location.
- POST /api/v1/pcap/fixed - This endpoint would accept a fixed pcap
request, submit a pcap job, and return a job id. The request would be an
object containing the options documented here for the fixed filter:
https://github.com/apache/metron/tree/master/metron-platform/metron-pcap-
backend#query-filter-utility
<
https://github.com/apache/metron/tree/master/metron-platform/metron-pcap-backend#query-filter-utility>.

A job will be associated with a user that submits it. An exception will be
returned for violating constraints like too many queries submitted, query
parameters out of limits, etc. A record of the user and job id will be
persisted to a data store so a list of a user's jobs can later be
retrieved.
- POST /api/v1/pcap/query - This endpoint would accept a query pcap
request, submit a pcap job, and return a job id. The request would be
an object containing the options documented here for the query filter:
https://github.com/apache/metron/tree/master/metron-platform/metron-pcap-
backend#query-filter-utility
<
https://github.com/apache/metron/tree/master/metron-platform/metron-pcap-backend#query-filter-utility>.

A job will be associated with a user that submits it. An exception will be
returned for violating constraints like too many queries submitted, query
parameters out of limits, etc. A record of the user and job id will be
persisted to a data store so a list of a user's jobs can later be
retrieved.
- GET /api/v1/pcap/status/<jobId> - This endpoint will return the YARN
status of a running/completed job.
- GET /api/v1/pcap/stop/<jobId> - This endpoint would kill a running
pcap job. If the job has already completed this is a noop.
- GET /api/v1/pcap/list - This endpoint will list a user's submitted
pcap queries. Items in the list would contain job id, status (is it
finished?), start/end time, and number of pages.
- GET /api/v1/pcap/pdml/<jobId>/<pageNumber> - This endpoint will return
pcap results for the given page in pdml format (
https://wiki.wireshark.org/PDML <https://wiki.wireshark.org/PDML>). Are
there other formats we want to support?
- GET /api/v1/pcap/raw/<jobId>/<pageNumber> - This endpoint will allow a
user to download raw pcap results for the given page.
- DELETE /api/v1/pcap/<jobId> - This endpoint will delete pcap query
results.

With respect to security, users will only be able to see their list of jobs
and query results. We will also include an admin role that will be able to
read and delete all. Jobs will be submitted as the metron service user and
user to job relationships will be managed and persisted by the REST
application.

This is a substantial feature and compromises should be made to get an
initial version out (baby steps). Here are some areas we will compromise
on and enhance in the future:

- Security - Initially we will rely on Spring Security for authorization
and authentication. Eventually this feature will fit into a broader
security strategy. This could mean an authentication strategy that is
consistent with the rest of Metron, integration with Ranger for
authorization, and submitting jobs as individual users rather than a
service user. We can also explore sharing access between users and more
fine-grained ACL-based security.
- Priority and scheduling - Eventually we should form a strategy for
prioritizing jobs, imposing limits, etc with YARN scheduling. Submitting
jobs with individual users will give us even more flexibility in this area.
- Job cleanup - Initially cleanup will be a manual process either
through exposed endpoints. Later we can explore automated cleanup
strategies and introduce data retention policies.
- Filter options - Initially we will expose the 2 filter options that
currently exist in Metron: fixed and pcap. Eventually we can add more
filters like bpf.
- Data directory - This could be a TOC of different pcap data sets as
Simon alluded to. This could also include a way to upload pcap data and
register it as Otto suggested. We could also expand this to other storage
locations besides HDFS.

We haven't discussed the underlying implementation details for the various
parts of this feature. I will leave that to whoever tasks on a subtask to
either lead a specific discussion or iterate in a PR. If anyone feels I
haven't captured their input here, please let me know. Once we agree on
this initial scope I will task it out in Jira and we can get started.


On Fri, May 11, 2018 at 11:52 AM, Simon Elliston Ball <
simon@simonellistonball.com> wrote:

> On the sharing and securing points, it seems like having the result of a
> query run hang around in HDFS on a given URL would work, we can then use
> Ranger policies (RBAC, or even ABAC later) to control access, this also
> solves the page storage problem, and gives us a kind of two step
approach,
> both of which could (maybe, possibly, but probably not) be large enough
to
> need distribution, i.e. the initial search everything and the subsequent
> sort / page through the results. Does anyone imaging sorting? Maybe
> sub-filtering, but PCAP is heavily time based, so timestamp sort ok?
>
> I also suspect it's worth considering the lifecycle of our stored result
> sets as being meta-data driven. If I'm doing a speculative search I don't
> really care if an admin cleans that up after at the end of day / week /
> disk space nervousness limit. However, if I find something good, I might
> want to mark the result set as immune from automatic deletion.
>
> The other issue I would raise, which has implications for our PCAP
capture,
> and also impacts Otto's suggestion of 'self-uploaded' PCAPs is how we
> namespace PCAP collection and retrieval. The problem here is that I might
> have PCAPs from multiple locations which have conflicting private IP
> ranges, so I can't logically dump them all in the same repository.
Solving
> the collection end of that is probably a separate unit of effort, but
this
> retrieval architecture should support multiple file system locations.
>
> If we wanted to get fancy about it, we should look at using the stored
> result sets as a kind of cache, for other queries, as people refine and
> narrow down queries, it may make sense to be more sophisticated about
where
> our query jobs pull from (i.e. filter the subset from a previous
resultset,
> rather than scanning petabytes of source data). This may imply some kind
of
> TOC for the cache. The underlying immutability of the PCAP store should
> make this fairly tractable.
>
> FYI, I've been doing a lot of thinking around data security, API and
> configuration security and auditing recently, but I suspect that is a
> different discuss thread. I'll kick something off shortly with a few
> thoughts.
>
> I see a lot of this as long term goals to be honest, so as Jon says, we
can
> definitely take a few baby steps to start.
>
> Simon
>
> On 11 May 2018 at 15:40, Otto Fowler <ot...@gmail.com> wrote:
>
> > Don’t lose the use case for manually uploading PCAPS for analysis Jon.
> >
> >
> > On May 11, 2018 at 10:14:02, Zeolla@GMail.com (zeolla@gmail.com) wrote:
> >
> > I think baby steps are fine - admin gets access to all, otherwise you
> only
> > see your own pcaps, but we file a jira for a future add of API
security,
> > which more mature SOCs that align with the Metron personas will need.
> >
> > Jon
> >
> > On Fri, May 11, 2018, 09:27 Ryan Merriman <me...@gmail.com> wrote:
> >
> > > That's a good point Jon. There are different levels of effort
> associated
> > > with different options. If we want to allow pcaps to be shared with
> > > specific users, we will need to introduce ACL security in our REST
> > > application using something like the ACL capability that comes with
> > Spring
> > > Security or Ranger. This would be more complex to design and
implement.
> > > If we want something more broad like admin roles that can see all or
> > > allowing pcap files to become public, this would be less work. Do you
> > > think ACL security is required or would the other options be
> acceptable?
> > >
> > > On Thu, May 10, 2018 at 2:47 PM, Zeolla@GMail.com <ze...@gmail.com>
> > > wrote:
> > >
> > > > At the very least there needs to be the ability to share downloaded
> > PCAPs
> > > > with other users and/or have roles that can see all pcaps. A
platform
> > > > engineer may want to clean up old pcaps after x time, or a manger
may
> > ask
> > > > an analyst to find all of the traffic that exhibits xyz behavior,
> dump
> > a
> > > > pcap, and then point him to it so the manager can review. Since the
> > > > pcap may be huge, we wouldn't want to try to push people to sending
> it
> > > via
> > > > email, uploading to a file server, finding an external hard drive,
> etc.
> > > >
> > > > Jon
> > > >
> > > > On Thu, May 10, 2018 at 10:16 AM Ryan Merriman <me...@gmail.com>

> > > > wrote:
> > > >
> > > > > Mike, I believe the /pcapGetter/getPcapsByIdentifiers endpoint
> > exposes
> > > > the
> > > > > fixed query option which we have covered. I agree with you that
> > > > > deprecating the metron-api module should be a goal of this
feature.
> > > > >
> > > > > On Wed, May 9, 2018 at 1:36 PM, Michael Miklavcic <
> > > > > michael.miklavcic@gmail.com> wrote:
> > > > >
> > > > > > This looks like a pretty good start Ryan. Does the metadata
> > endpoint
> > > > > cover
> > > > > > this https://github.com/apache/metron/tree/master/
> > > > > > metron-platform/metron-api#the-pcapgettergetpcapsbyidentifier
> > > > s-endpoint
> > > > > > from the original metron-api? If so, then we would be able to
> > > deprecate
> > > > > the
> > > > > > existing metron-api project. If we later go to micro-services,
a
> > pcap
> > > > > > module would spin back into the fold, but it would probably
look
> > > > > different
> > > > > > from metron-api.
> > > > > >
> > > > > > I commented on the UI thread, but to reiterate for the purpose
of
> > > > backend
> > > > > > functionality here I don't believe there is a way to "PAUSE" or
> > > > "SUSPEND"
> > > > > > jobs. That said, I think GET /api/v1/pcap/stop/<jobId> is
> > sufficient
> > > > for
> > > > > > the job management operations.
> > > > > >
> > > > > > On Wed, May 9, 2018 at 11:00 AM, Ryan Merriman <
> > merrimanr@gmail.com>
> >
> > > > > > wrote:
> > > > > >
> > > > > > > Now that we are confident we can run submit a MR job from our
> > > current
> > > > > > REST
> > > > > > > application, is this the desired approach? Just want to
> confirm.
> > > > > > >
> > > > > > > Next I think we should map out what the REST interface will
> look
> > > > like.
> > > > > > > Here are the endpoints I'm thinking about:
> > > > > > >
> > > > > > > GET /api/v1/pcap/metadata?basePath
> > > > > > >
> > > > > > > This endpoint will return metadata of pcap data stored in
HDFS.
> > > This
> > > > > > would
> > > > > > > include pcap size, date ranges (how far back can I go), etc.
It
> > > > would
> > > > > > > accept an optional HDFS basePath parameter for cases where
pcap
> > > data
> > > > is
> > > > > > > stored in multiple places and/or different from the default
> > > location.
> > > > > > >
> > > > > > > POST /api/v1/pcap/query
> > > > > > >
> > > > > > > This endpoint would accept a pcap request, submit a pcap
query
> > job,
> > > > and
> > > > > > > return a job id. The request would be an object containing
the
> > > > > > parameters
> > > > > > > documented here: https://github.com/apache/metron/tree/master/
> > > > > > > metron-platform/metron-pcap-backend#query-filter-utility. A
> > > > query/job
> > > > > > > would be associated with a user that submits it. An exception
> > will
> > > > be
> > > > > > > returned for violating constraints like too many queries
> > submitted,
> > > > > query
> > > > > > > parameters out of limits, etc.
> > > > > > >
> > > > > > > GET /api/v1/pcap/status/<jobId>
> > > > > > >
> > > > > > > This endpoint will return the status of a running job. I
> imagine
> > > > this
> > > > > is
> > > > > > > just a proxy to the YARN REST api. We can discuss the
> > > implementation
> > > > > > > behind these endpoints later.
> > > > > > >
> > > > > > > GET /api/v1/pcap/stop/<jobId>
> > > > > > >
> > > > > > > This endpoint would kill a running pcap job. If the job has
> > > already
> > > > > > > completed this is a noop.
> > > > > > >
> > > > > > > GET /api/v1/pcap/list
> > > > > > >
> > > > > > > This endpoint will list a user's submitted pcap queries.
Items
> in
> > > > the
> > > > > > list
> > > > > > > would contain job id, status (is it finished?), start/end
time,
> > and
> > > > > > number
> > > > > > > of pages. Maybe there is some overlap with the status
endpoint
> > > above
> > > > > and
> > > > > > > the status endpoint is not needed?
> > > > > > >
> > > > > > > GET /api/v1/pcap/pdml/<jobId>/<pageNumber>
> > > > > > >
> > > > > > > This endpoint will return pcap results for the given page in
> pdml
> > > > > format
> > > > > > (
> > > > > > > https://wiki.wireshark.org/PDML). Are there other formats we
> > want
> > > > to
> > > > > > > support?
> > > > > > >
> > > > > > > GET /api/v1/pcap/raw/<jobId>/<pageNumber>
> > > > > > >
> > > > > > > This endpoint will allow a user to download raw pcap results
> for
> > > the
> > > > > > given
> > > > > > > page.
> > > > > > >
> > > > > > > DELETE /api/v1/pcap/<jobId>
> > > > > > >
> > > > > > > This endpoint will delete pcap query results. Not sure yet
how
> > > this
> > > > > fits
> > > > > > > in with our broader cleanup strategy.
> > > > > > >
> > > > > > > This should get us started. What did I miss and what would
you
> > > > change
> > > > > > > about these? I did not include much detail related to
security,
> > > > > cleanup
> > > > > > > strategy, or underlying implementation details but these are
> > items
> > > we
> > > > > > > should discuss at some point.
> > > > > > >
> > > > > > > On Tue, May 8, 2018 at 5:38 PM, Michael Miklavcic <
> > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > >
> > > > > > > > Sweet! That's great news. The pom changes are a lot simpler
> > than
> > > I
> > > > > > > > expected. Very nice.
> > > > > > > >
> > > > > > > > On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman <
> > > merrimanr@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Finally figured it out. Commit is here:
> > > > > > > > > https://github.com/merrimanr/incubator-metron/commit/
> > > > > > > > > 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
> > > > > > > > >
> > > > > > > > > It came down to figuring out the right combination of
maven
> > > > > > > dependencies
> > > > > > > > > and passing in the HDP version to REST as a Java system
> > > property.
> > > > > I
> > > > > > > also
> > > > > > > > > included some HDFS setup tasks. I tested this in full dev
> and
> > > > can
> > > > > > now
> > > > > > > > > successfully run a pcap query and get results. All you
> should
> > > > have
> > > > > > to
> > > > > > > do
> > > > > > > > > is generate some pcap data first.
> > > > > > > > >
> > > > > > > > > On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
> > > > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > @Ryan - pulled your branch and experimented with a few
> > > things.
> > > > In
> > > > > > > doing
> > > > > > > > > so,
> > > > > > > > > > it dawned on me that by adding the yarn and hadoop
> > classpath,
> > > > you
> > > > > > > > > probably
> > > > > > > > > > didn't introduce a new classpath issue, rather you
> probably
> > > > just
> > > > > > > moved
> > > > > > > > > onto
> > > > > > > > > > the next classpath issue, ie hbase per your exception
> about
> > > > hbase
> > > > > > > jaxb.
> > > > > > > > > > Anyhow, I put up a branch with some pom changes worth
> > trying
> > > in
> > > > > > > > > conjunction
> > > > > > > > > > with invoking the rest app startup via "/usr/bin/yarn
> jar"
> > > > > > > > > >
> > > > > > > > > > https://github.com/mmiklavc/metron/tree/ryan-rest-test
> > > > > > > > > >
> > > > > > > > > > https://github.com/mmiklavc/metron/commit/
> > > > > > > > 5ca23580fc6e043fafae2327c80b65
> > > > > > > > > > b20ca1c0c9
> > > > > > > > > >
> > > > > > > > > > Mike
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> > > > > > > > > > simon@simonellistonball.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > That would be a step closer to something more like a
> > > > > > micro-service
> > > > > > > > > > > architecture. However, I would want to make sure we
> think
> > > > about
> > > > > > the
> > > > > > > > > > > operational complexity, and mpack implications of
> having
> > > > > another
> > > > > > > > server
> > > > > > > > > > > installed and running somewhere on the cluster (also,
> > ssl,
> > > > > > > kerberos,
> > > > > > > > > etc
> > > > > > > > > > > etc requirements for that service).
> > > > > > > > > > >
> > > > > > > > > > > On 8 May 2018 at 14:27, Ryan Merriman <
> > merrimanr@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > +1 to having metron-api as it's own service and
> using a
> > > > > gateway
> > > > > > > > type
> > > > > > > > > > > > pattern.
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <
> > > > > > > > ottobackwards@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Why not have metron-api as it’s own service and
> use a
> > > > > > ‘gateway’
> > > > > > > > > type
> > > > > > > > > > > > > pattern in rest?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (
> > > > > > merrimanr@gmail.com
> > > > > > > )
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Moving the yarn classpath command earlier in the
> > > > classpath
> > > > > > now
> > > > > > > > > gives
> > > > > > > > > > > this
> > > > > > > > > > > > > error:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Caused by: java.lang.NoSuchMethodError:
> > > > > > > > > > > > > javax.servlet.ServletContext.
> > > > getVirtualServerName()Ljava/
> > > > > > > > > > lang/String;
> > > > > > > > > > > > >
> > > > > > > > > > > > > I will experiment with other combinations, I
> suspect
> > we
> > > > > will
> > > > > > > need
> > > > > > > > > > > > > finer-grain control over the order.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The grep matches class names inside jar files. I
> use
> > > this
> > > > > all
> > > > > > > the
> > > > > > > > > > time
> > > > > > > > > > > > and
> > > > > > > > > > > > > it's really useful.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The metron-rest jar is already shaded.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Reverse engineering the yarn jar command was the
> next
> > > > > thing I
> > > > > > > was
> > > > > > > > > > going
> > > > > > > > > > > > to
> > > > > > > > > > > > > try. Will let you know how it goes.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, May 8, 2018 at 12:36 AM, Michael
Miklavcic
> <
> > > > > > > > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > What order did you add the hadoop or yarn
> > classpath?
> > > > The
> > > > > > > > "shaded"
> > > > > > > > > > > > > package
> > > > > > > > > > > > > > stands out to me in this name
> > > > "org.apache.hadoop.hbase.*
> > > > > > > > shaded*
> > > > > > > > > > > > > >
> > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider."
> > > > > > Maybe
> > > > > > > > try
> > > > > > > > > > > adding
> > > > > > > > > > > > > > those packages earlier on the classpath.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think that find command needs a "jar tvf",
> > > otherwise
> > > > > > you're
> > > > > > > > > > looking
> > > > > > > > > > > > > for a
> > > > > > > > > > > > > > class name in jar file names.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Have you tried shading the rest jar?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'd also look at the classpath you get when
> running
> > > > "yarn
> > > > > > > jar"
> > > > > > > > to
> > > > > > > > > > > start
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > existing pcap service, per the instructions in
> > > > > > > > > > metron-api/README.md.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <
> > > > > > > > > merrimanr@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > To explore the idea of merging metron-api
into
> > > > > > metron-rest
> > > > > > > > and
> > > > > > > > > > > > running
> > > > > > > > > > > > > > pcap
> > > > > > > > > > > > > > > queries inside our REST application, I
created
> a
> > > > simple
> > > > > > > test
> > > > > > > > > > here:
> > > > > > > > > > > > > > >
> > > > > https://github.com/merrimanr/incubator-metron/tree/pcap-
> > > > > > > > > > rest-test.
> > > > > > > > > > > A
> > > > > > > > > > > > > > > summary of what's included:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Added pcap as a dependency in the
metron-rest
> > > > pom.xml
> > > > > > > > > > > > > > > - Added a pcap query controller endpoint at
> > > > > > > > > > > > > > > http://node1:8082/swagger-ui.
> > > > > > html#!/pcap-query-controller/
> > > > > > > > > > > > > > queryUsingGET
> > > > > > > > > > > > > > > - Added a pcap query service that runs a
> simple,
> > > > > > hardcoded
> > > > > > > > > query
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Generate some pcap data using pycapa (
> > > > > > > > > > > > > > >
> > > https://github.com/apache/metron/tree/master/metron-
> > > > > > > > > > sensors/pycapa
> > > > > > > > > > > )
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > pcap topology (
> > > > > > > > > > > > > > >
> > > https://github.com/apache/metron/tree/master/metron-
> > > > > > > > > > > > > > >
> > > platform/metron-pcap-backend#starting-the-topology).
> > > > > > > > > > > > > > > After this initial setup there should be data
> in
> > > HDFS
> > > > > at
> > > > > > > > > > > > > > > "/apps/metron/pcap". I believe this should be
> > > enough
> > > > to
> > > > > > > > > exercise
> > > > > > > > > > > the
> > > > > > > > > > > > > > > issue. Just hit the endpoint referenced
above.
> I
> > > > tested
> > > > > > > this
> > > > > > > > in
> > > > > > > > > > an
> > > > > > > > > > > > > > > already running full dev by building and
> > deploying
> > > > the
> > > > > > > > > > metron-rest
> > > > > > > > > > > > > jar.
> > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > did not rebuild full dev with this change but
I
> > > would
> > > > > > still
> > > > > > > > > > expect
> > > > > > > > > > > it
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > work. Let me know if it doesn't.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The first error I see when I hit this
endpoint
> > is:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > java.lang.NoClassDefFoundError:
> > > > > > > > > > > > > > > org/apache/hadoop/yarn/webapp/
> > > > > > YarnJacksonJaxbJsonProvider.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Here are the things I've tried so far:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Run the REST application with the YARN jar
> > > command
> > > > > > since
> > > > > > > > this
> > > > > > > > > > is
> > > > > > > > > > > > how
> > > > > > > > > > > > > > > all our other YARN/MR-related applications
are
> > > > started
> > > > > > > > > > (metron-api,
> > > > > > > > > > > > > > > MAAS,
> > > > > > > > > > > > > > > pcap query, etc). I wouldn't expect this to
> work
> > > > since
> > > > > we
> > > > > > > > have
> > > > > > > > > > > > > > runtime
> > > > > > > > > > > > > > > dependencies on our shaded elasticsearch and
> > parser
> > > > > jars
> > > > > > > and
> > > > > > > > > I'm
> > > > > > > > > > > not
> > > > > > > > > > > > > > > aware
> > > > > > > > > > > > > > > of a way to add additional jars to the
> classpath
> > > with
> > > > > the
> > > > > > > > YARN
> > > > > > > > > > jar
> > > > > > > > > > > > > > > command
> > > > > > > > > > > > > > > (is there a way?). Either way I get this
error:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 18/05/04 19:49:56 WARN
reflections.Reflections:
> > > could
> > > > > not
> > > > > > > > > create
> > > > > > > > > > > Dir
> > > > > > > > > > > > > > using
> > > > > > > > > > > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/
> > > > > > > > > hadoop/lib/ojdbc6.jar.
> > > > > > > > > > > > > > skipping.
> > > > > > > > > > > > > > > java.lang.NullPointerException
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - I tried adding `yarn classpath` and `hadoop
> > > > > classpath`
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > > > > > classpath in /usr/metron/0.4.3/bin/metron-
> > rest.sh
> > > > (REST
> > > > > > > > start
> > > > > > > > > > > > > > > script). I
> > > > > > > > > > > > > > > get this error:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > java.lang.ClassNotFoundException:
> > > > > > > > > > > > > > >
> > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > > > > > > > > > > > jaxrs.JacksonJaxbJsonProvider
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - I searched for the class in the previous
> > attempt
> > > > but
> > > > > > > could
> > > > > > > > > not
> > > > > > > > > > > find
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > in full dev:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > > > > > > > > > > > >
> > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > > > > > > > > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > > > > > > > > > > > 2>/dev/null
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Further up in the stack trace I see the
error
> > > > happens
> > > > > > > when
> > > > > > > > > > > > > > initiating
> > > > > > > > > > > > > > > the org.apache.hadoop.yarn.util.
> > > > timeline.TimelineUtils
> > > > > > > > class.
> > > > > > > > > I
> > > > > > > > > > > > > > tried
> > > > > > > > > > > > > > > setting "yarn.timeline-service.enabled" in
> > Ambari
> > > to
> > > > > > false
> > > > > > > > and
> > > > > > > > > > > then
> > > > > > > > > > > > I
> > > > > > > > > > > > > > > get
> > > > > > > > > > > > > > > this error:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Unable to parse
> > > > > > > > > > > > > > >
> > > > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-
> > > > > > > > > > framework'
> > > > > > > > > > > > as
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > URI, check the setting for
> mapreduce.application.
> > > > > > > > > framework.path
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - I've tried adding different hadoop, hbase,
> yarn
> > > and
> > > > > > > > mapreduce
> > > > > > > > > > > Maven
> > > > > > > > > > > > > > > dependencies without any success
> > > > > > > > > > > > > > > - hadoop-yarn-client
> > > > > > > > > > > > > > > - hadoop-yarn-common
> > > > > > > > > > > > > > > - hadoop-mapreduce-client-core
> > > > > > > > > > > > > > > - hadoop-yarn-server-common
> > > > > > > > > > > > > > > - hadoop-yarn-api
> > > > > > > > > > > > > > > - hbase-server
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I will keep exploring other possible
solutions.
> > Let
> > > > me
> > > > > > know
> > > > > > > > if
> > > > > > > > > > > anyone
> > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > any ideas.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
> > > > > > > > > > > ottobackwards@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I can imagine a new generic service(s)
> > capability
> > > > > whose
> > > > > > > > job (
> > > > > > > > > > pun
> > > > > > > > > > > > > > > intended
> > > > > > > > > > > > > > > > ) is to
> > > > > > > > > > > > > > > > abstract the submittal, tracking, and
storage
> > of
> > > > > > results
> > > > > > > to
> > > > > > > > > > yarn.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > It would be extended with storage
providers,
> > > queue
> > > > > > > > provider,
> > > > > > > > > > > > > possibly
> > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > set of policies or rather strategies.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The pcap ‘report’ would be a client to that
> > > > service,
> > > > > > the
> > > > > > > > > > > > specializes
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > service operation for the way we want pcap
to
> > > work.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We can then re-use the generic service for
> > other
> > > > long
> > > > > > > > running
> > > > > > > > > > > yarn
> > > > > > > > > > > > > > > > things…..
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
> > > > > > > > > > ottobackwards@gmail.com
> > > > > > > > > > > )
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > RE: Tracking v. users
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The submittal and tracking can associate
the
> > > > > submitter
> > > > > > > with
> > > > > > > > > the
> > > > > > > > > > > > yarn
> > > > > > > > > > > > > > job
> > > > > > > > > > > > > > > > and track that,
> > > > > > > > > > > > > > > > regardless of the yarn credentials.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > IE> if all submittals and monitoring are by
> the
> > > > same
> > > > > > yarn
> > > > > > > > > user
> > > > > > > > > > (
> > > > > > > > > > > > > > Metron )
> > > > > > > > > > > > > > > > from a single or
> > > > > > > > > > > > > > > > co-operative set of services, that service
> can
> > > > > maintain
> > > > > > > the
> > > > > > > > > > > > mapping.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (
> > > > > > > > > merrimanr@gmail.com
> > > > > > > > > > )
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Otto, your use case makes sense to me.
We'll
> > have
> > > > to
> > > > > > > think
> > > > > > > > > > about
> > > > > > > > > > > > how
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > manage the user to job relationships. I'm
> > > assuming
> > > > > YARN
> > > > > > > > jobs
> > > > > > > > > > will
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > > submitted as the metron service user so
YARN
> > > won't
> > > > > keep
> > > > > > > > track
> > > > > > > > > > of
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > us. Is that assumption correct? Do you have
> any
> > > > ideas
> > > > > > for
> > > > > > > > > doing
> > > > > > > > > > > > > that?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Mike, I can start a feature branch and
> > experiment
> > > > > with
> > > > > > > > > merging
> > > > > > > > > > > > > > metron-api
> > > > > > > > > > > > > > > > into metron-rest. That should allow us to
> > > > collaborate
> > > > > > on
> > > > > > > > any
> > > > > > > > > > > issues
> > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > challenges. Also, can you expand on your
idea
> > to
> > > > > manage
> > > > > > > > > > external
> > > > > > > > > > > > > > > > dependencies as a special module? That
seems
> > > like a
> > > > > > very
> > > > > > > > > > > attractive
> > > > > > > > > > > > > > > option
> > > > > > > > > > > > > > > > to me.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler
<
> > > > > > > > > > > > ottobackwards@gmail.com>
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > From my response on the other thread, but
> > > > > applicable
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > > > backend
> > > > > > > > > > > > > > > > stuff:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > "The PCAP Query seems more like PCAP
Report
> > to
> > > > me.
> > > > > > You
> > > > > > > > are
> > > > > > > > > > > > > > generating a
> > > > > > > > > > > > > > > > > report based on parameters.
> > > > > > > > > > > > > > > > > That report is something that takes some
> time
> > > and
> > > > > > > > external
> > > > > > > > > > > > process
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > generate… ie you have to wait for it.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I can almost imagine a flow where you:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > * Are in the AlertUI
> > > > > > > > > > > > > > > > > * Ask to generate a PCAP report based on
> some
> > > > > > selected
> > > > > > > > > > > > > > > alerts/meta-alert,
> > > > > > > > > > > > > > > > > possibly picking from on or more report
> > > > ‘templates’
> > > > > > > > > > > > > > > > > that have query options etc
> > > > > > > > > > > > > > > > > * The report request is ‘queued’, that is
> > > > > dispatched
> > > > > > to
> > > > > > > > be
> > > > > > > > > be
> > > > > > > > > > > > > > > > > executed/generated
> > > > > > > > > > > > > > > > > * You as a user have a ‘queue’ of your
> report
> > > > > > results,
> > > > > > > > and
> > > > > > > > > > when
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > report
> > > > > > > > > > > > > > > > > is done it is queued there
> > > > > > > > > > > > > > > > > * We ‘monitor’ the report/queue press
> through
> > > the
> > > > > > yarn
> > > > > > > > > rest (
> > > > > > > > > > > > > report
> > > > > > > > > > > > > > > > > info/meta has the yarn details )
> > > > > > > > > > > > > > > > > * You can select the report from your
queue
> > and
> > > > > view
> > > > > > it
> > > > > > > > > > either
> > > > > > > > > > > in
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > UI
> > > > > > > > > > > > > > > > > or custom component
> > > > > > > > > > > > > > > > > * You can then apply a different ‘view’
to
> > the
> > > > > report
> > > > > > > or
> > > > > > > > > work
> > > > > > > > > > > > with
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > report data
> > > > > > > > > > > > > > > > > * You can print / save etc
> > > > > > > > > > > > > > > > > * You can associate the report with the
> > alerts
> > > (
> > > > > > again
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > > report
> > > > > > > > > > > > > > > info
> > > > > > > > > > > > > > > > > ) with…. a ‘case’ or ‘ticket’ or
> > investigation
> > > > > > > something
> > > > > > > > or
> > > > > > > > > > > other
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > We can introduce extensibility into the
> > report
> > > > > > > templates,
> > > > > > > > > > > report
> > > > > > > > > > > > > > views
> > > > > > > > > > > > > > > (
> > > > > > > > > > > > > > > > > thinks that work with the json data of
the
> > > > report )
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Something like that.”
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Maybe we can do :
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > template -> query parameters -> script =>
> > yarn
> > > > info
> > > > > > > > > > > > > > > > > yarn info + query info + alert context +
> yarn
> > > > > status
> > > > > > =>
> > > > > > > > > > report
> > > > > > > > > > > > > info
> > > > > > > > > > > > > > ->
> > > > > > > > > > > > > > > > > stored in a user’s ‘report queue’
> > > > > > > > > > > > > > > > > report persistence added to report info
> > > > > > > > > > > > > > > > > metron-rest -> api to monitor the queue,
> read
> > > > > > results (
> > > > > > > > > page
> > > > > > > > > > ),
> > > > > > > > > > > > > etc
> > > > > > > > > > > > > > etc
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman
(
> > > > > > > > > > merrimanr@gmail.com
> > > > > > > > > > > )
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I started a separate thread on Pcap UI
> > > > > considerations
> > > > > > > and
> > > > > > > > > > user
> > > > > > > > > > > > > > > > > requirements
> > > > > > > > > > > > > > > > > at Otto's request. This should help us
keep
> > > these
> > > > > two
> > > > > > > > > related
> > > > > > > > > > > but
> > > > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > discussions focused.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel
> Sumbul
> > <
> > > > > > > > > > > > > > michelsumbul@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > (Youhouuu my first reply on this kind
of
> > mail
> > > > > > > chain^^)
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > If I may, I would like to share my view
> on
> > > the
> > > > > > > > following
> > > > > > > > > 3
> > > > > > > > > > > > > points.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > - Backend:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The current metron-api is totally
> seperate,
> > > it
> > > > > will
> > > > > > > be
> > > > > > > > > > logic
> > > > > > > > > > > > for
> > > > > > > > > > > > > me
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > it at the same place as the others rest
> > api.
> > > > > > > Especially
> > > > > > > > > > when
> > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > security
> > > > > > > > > > > > > > > > > > will be added, it will not be needed to
> do
> > > the
> > > > > job
> > > > > > > > twice.
> > > > > > > > > > > > > > > > > > The current implementation send back a
> pcap
> > > > > object
> > > > > > > > which
> > > > > > > > > > > still
> > > > > > > > > > > > > need
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > decoded. In the opensoc, the decoding
was
> > > done
> > > > > with
> > > > > > > > > tshard
> > > > > > > > > > on
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > frontend.
> > > > > > > > > > > > > > > > > > It will be good to have this decoding
> > > happening
> > > > > > > > directly
> > > > > > > > > on
> > > > > > > > > > > the
> > > > > > > > > > > > > > > backend
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > not create a load on frontend. An
option
> > will
> > > > be
> > > > > to
> > > > > > > > > install
> > > > > > > > > > > > > tshark
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > rest server and to use to convert the
> pcap
> > to
> > > > xml
> > > > > > and
> > > > > > > > > then
> > > > > > > > > > > to a
> > > > > > > > > > > > > > json
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > will be send to the frontend.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I tried to start directly the
map/reduce
> > job
> > > to
> > > > > > > search
> > > > > > > > > over
> > > > > > > > > > > all
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > pcap
> > > > > > > > > > > > > > > > > > data from the rest server and as Ryan
> > mention
> > > > it,
> > > > > > we
> > > > > > > > had
> > > > > > > > > > > > > trouble. I
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > try to find back the error.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Then in the POC, what we tried is to
use
> > the
> > > > > > > pcap_query
> > > > > > > > > > > script
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > work fine. I just modified it that he
> sends
> > > > back
> > > > > > > > directly
> > > > > > > > > > the
> > > > > > > > > > > > > > job_id
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > yarn and not waiting that the job is
> > > finished.
> > > > > Then
> > > > > > > it
> > > > > > > > > will
> > > > > > > > > > > > > allow
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > UI
> > > > > > > > > > > > > > > > > > and the rest server to know what the
> status
> > > of
> > > > > the
> > > > > > > > > research
> > > > > > > > > > > by
> > > > > > > > > > > > > > > querying
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > yarn rest api. This will allow the UI
and
> > the
> > > > > rest
> > > > > > > > server
> > > > > > > > > > to
> > > > > > > > > > > be
> > > > > > > > > > > > > > async
> > > > > > > > > > > > > > > > > > without any blocking phase. What do you
> > think
> > > > > about
> > > > > > > > that?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Having the job submitted directly from
> the
> > > code
> > > > > of
> > > > > > > the
> > > > > > > > > rest
> > > > > > > > > > > > > server
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > perfect, but it will need a lot of
> > > > investigation
> > > > > I
> > > > > > > > think
> > > > > > > > > > (but
> > > > > > > > > > > > > I'm
> > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > expert so I might be completely wrong
> ^^).
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > We know that the pcap_query scritp work
> > fine
> > > so
> > > > > why
> > > > > > > not
> > > > > > > > > > > calling
> > > > > > > > > > > > > it?
> > > > > > > > > > > > > > > Is
> > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > that bad? (maybe stupid question, but I
> > > really
> > > > > > don’t
> > > > > > > > see
> > > > > > > > > a
> > > > > > > > > > > lot
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > drawback)
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > - Front end:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Adding the the pcap search to the alert
> UI
> > > is,
> > > > I
> > > > > > > think,
> > > > > > > > > the
> > > > > > > > > > > > > easiest
> > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > move forward. But indeed, it will then
be
> > the
> > > > > > “Alert
> > > > > > > UI
> > > > > > > > > and
> > > > > > > > > > > > > > > pcapquery”.
> > > > > > > > > > > > > > > > > > Maybe the name of the UI should just
> change
> > > to
> > > > > > > > something
> > > > > > > > > > like
> > > > > > > > > > > > > > > > > “Monitoring &
> > > > > > > > > > > > > > > > > > Investigation UI” ?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Is there any roadmap or plan for the
> > > different
> > > > > UI?
> > > > > > I
> > > > > > > > mean
> > > > > > > > > > did
> > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > already
> > > > > > > > > > > > > > > > > > had discussion on how you see the ui
> > evolving
> > > > > with
> > > > > > > the
> > > > > > > > > new
> > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > will come in the future?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > - Microservices:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > What do you mean exactly by
> microservices?
> > Is
> > > > it
> > > > > to
> > > > > > > > > > separate
> > > > > > > > > > > > all
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > features in different projects? Or
> > something
> > > > like
> > > > > > > > having
> > > > > > > > > > the
> > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > components in container like kubernet?
> > (again
> > > > > maybe
> > > > > > > > > stupid
> > > > > > > > > > > > > > question,
> > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > > don’t clearly understand what you mean
J
> )
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Michel
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > --
> > > > > > > > > > > simon elliston ball
> > > > > > > > > > > @sireb
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > --
> > > >
> > > > Jon
> > > >
> > >
> > --
> >
> > Jon
> >
>
>
>
> --
> --
> simon elliston ball
> @sireb
>

Re: [DISCUSS] Pcap panel architecture

Posted by Ryan Merriman <me...@gmail.com>.
Thanks everyone for the input and feedback.  I will attempt to summarize so
we can come to a consensus and get this tasked out.

The following endpoints will be included:

   - GET /api/v1/pcap/metadata?basePath - This endpoint will return
   metadata of pcap data stored in HDFS.  This would include pcap size, date
   ranges (how far back can I go), etc.  It would accept an optional HDFS
   basePath parameter for cases where pcap data is stored in multiple places
   and/or different from the default location.
   - POST /api/v1/pcap/fixed - This endpoint would accept a fixed pcap
   request, submit a pcap job, and return a job id.  The request would be an
   object containing the options documented here for the fixed filter:
   https://github.com/apache/metron/tree/master/metron-platform/metron-pcap-
   backend#query-filter-utility
   <https://github.com/apache/metron/tree/master/metron-platform/metron-pcap-backend#query-filter-utility>.
   A job will be associated with a user that submits it.  An exception will be
   returned for violating constraints like too many queries submitted, query
   parameters out of limits, etc.  A record of the user and job id will be
   persisted to a data store so a list of a user's jobs can later be retrieved.
   - POST /api/v1/pcap/query - This endpoint would accept a query pcap
   request, submit a pcap job, and return a job id.  The request would be
   an object containing the options documented here for the query filter:
   https://github.com/apache/metron/tree/master/metron-platform/metron-pcap-
   backend#query-filter-utility
   <https://github.com/apache/metron/tree/master/metron-platform/metron-pcap-backend#query-filter-utility>.
   A job will be associated with a user that submits it.  An exception will be
   returned for violating constraints like too many queries submitted, query
   parameters out of limits, etc.  A record of the user and job id will be
   persisted to a data store so a list of a user's jobs can later be retrieved.
   - GET /api/v1/pcap/status/<jobId> - This endpoint will return the YARN
   status of a running/completed job.
   - GET /api/v1/pcap/stop/<jobId> - This endpoint would kill a running
   pcap job.  If the job has already completed this is a noop.
   - GET /api/v1/pcap/list - This endpoint will list a user's submitted
   pcap queries.  Items in the list would contain job id, status (is it
   finished?), start/end time, and number of pages.
   - GET /api/v1/pcap/pdml/<jobId>/<pageNumber> - This endpoint will return
   pcap results for the given page in pdml format (
   https://wiki.wireshark.org/PDML <https://wiki.wireshark.org/PDML>).  Are
   there other formats we want to support?
   - GET /api/v1/pcap/raw/<jobId>/<pageNumber> - This endpoint will allow a
   user to download raw pcap results for the given page.
   - DELETE /api/v1/pcap/<jobId> - This endpoint will delete pcap query
   results.

With respect to security, users will only be able to see their list of jobs
and query results.  We will also include an admin role that will be able to
read and delete all.  Jobs will be submitted as the metron service user and
user to job relationships will be managed and persisted by the REST
application.

This is a substantial feature and compromises should be made to get an
initial version out (baby steps).  Here are some areas we will compromise
on and enhance in the future:

   - Security - Initially we will rely on Spring Security for authorization
   and authentication.  Eventually this feature will fit into a broader
   security strategy.  This could mean an authentication strategy that is
   consistent with the rest of Metron, integration with Ranger for
   authorization, and submitting jobs as individual users rather than a
   service user.  We can also explore sharing access between users and more
   fine-grained ACL-based security.
   - Priority and scheduling - Eventually we should form a strategy for
   prioritizing jobs, imposing limits, etc with YARN scheduling.  Submitting
   jobs with individual users will give us even more flexibility in this area.
   - Job cleanup - Initially cleanup will be a manual process either
   through exposed endpoints.  Later we can explore automated cleanup
   strategies and introduce data retention policies.
   - Filter options - Initially we will expose the 2 filter options that
   currently exist in Metron:  fixed and pcap.  Eventually we can add more
   filters like bpf.
   - Data directory - This could be a TOC of different pcap data sets as
   Simon alluded to.  This could also include a way to upload pcap data and
   register it as Otto suggested.  We could also expand this to other storage
   locations besides HDFS.

We haven't discussed the underlying implementation details for the various
parts of this feature.  I will leave that to whoever tasks on a subtask to
either lead a specific discussion or iterate in a PR.  If anyone feels I
haven't captured their input here, please let me know.  Once we agree on
this initial scope I will task it out in Jira and we can get started.


On Fri, May 11, 2018 at 11:52 AM, Simon Elliston Ball <
simon@simonellistonball.com> wrote:

> On the sharing and securing points, it seems like having the result of a
> query run hang around in HDFS on a given URL would work, we can then use
> Ranger policies (RBAC, or even ABAC later) to control access, this also
> solves the page storage problem, and gives us a kind of two step approach,
> both of which could (maybe, possibly, but probably not) be large enough to
> need distribution, i.e. the initial search everything and the subsequent
> sort / page through the results. Does anyone imaging sorting? Maybe
> sub-filtering, but PCAP is heavily time based, so timestamp sort ok?
>
> I also suspect it's worth considering the lifecycle of our stored result
> sets as being meta-data driven. If I'm doing a speculative search I don't
> really care if an admin cleans that up after at the end of day / week /
> disk space nervousness limit. However, if I find something good, I might
> want to mark the result set as immune from automatic deletion.
>
> The other issue I would raise, which has implications for our PCAP capture,
> and also impacts Otto's suggestion of 'self-uploaded' PCAPs is how we
> namespace PCAP collection and retrieval. The problem here is that I might
> have PCAPs from multiple locations which have conflicting private IP
> ranges, so I can't logically dump them all in the same repository. Solving
> the collection end of that is probably a separate unit of effort, but this
> retrieval architecture should support multiple file system locations.
>
> If we wanted to get fancy about it, we should look at using the stored
> result sets as a kind of cache, for other queries, as people refine and
> narrow down queries, it may make sense to be more sophisticated about where
> our query jobs pull from (i.e. filter the subset from a previous resultset,
> rather than scanning petabytes of source data). This may imply some kind of
> TOC for the cache. The underlying immutability of the PCAP store should
> make this fairly tractable.
>
> FYI, I've been doing a lot of thinking around data security, API and
> configuration security and auditing recently, but I suspect that is a
> different discuss thread. I'll kick something off shortly with a few
> thoughts.
>
> I see a lot of this as long term goals to be honest, so as Jon says, we can
> definitely take a few baby steps to start.
>
> Simon
>
> On 11 May 2018 at 15:40, Otto Fowler <ot...@gmail.com> wrote:
>
> > Don’t lose the use case for manually uploading PCAPS for analysis Jon.
> >
> >
> > On May 11, 2018 at 10:14:02, Zeolla@GMail.com (zeolla@gmail.com) wrote:
> >
> > I think baby steps are fine - admin gets access to all, otherwise you
> only
> > see your own pcaps, but we file a jira for a future add of API security,
> > which more mature SOCs that align with the Metron personas will need.
> >
> > Jon
> >
> > On Fri, May 11, 2018, 09:27 Ryan Merriman <me...@gmail.com> wrote:
> >
> > > That's a good point Jon. There are different levels of effort
> associated
> > > with different options. If we want to allow pcaps to be shared with
> > > specific users, we will need to introduce ACL security in our REST
> > > application using something like the ACL capability that comes with
> > Spring
> > > Security or Ranger. This would be more complex to design and implement.
> > > If we want something more broad like admin roles that can see all or
> > > allowing pcap files to become public, this would be less work. Do you
> > > think ACL security is required or would the other options be
> acceptable?
> > >
> > > On Thu, May 10, 2018 at 2:47 PM, Zeolla@GMail.com <ze...@gmail.com>
> > > wrote:
> > >
> > > > At the very least there needs to be the ability to share downloaded
> > PCAPs
> > > > with other users and/or have roles that can see all pcaps. A platform
> > > > engineer may want to clean up old pcaps after x time, or a manger may
> > ask
> > > > an analyst to find all of the traffic that exhibits xyz behavior,
> dump
> > a
> > > > pcap, and then point him to it so the manager can review. Since the
> > > > pcap may be huge, we wouldn't want to try to push people to sending
> it
> > > via
> > > > email, uploading to a file server, finding an external hard drive,
> etc.
> > > >
> > > > Jon
> > > >
> > > > On Thu, May 10, 2018 at 10:16 AM Ryan Merriman <me...@gmail.com>
> > > > wrote:
> > > >
> > > > > Mike, I believe the /pcapGetter/getPcapsByIdentifiers endpoint
> > exposes
> > > > the
> > > > > fixed query option which we have covered. I agree with you that
> > > > > deprecating the metron-api module should be a goal of this feature.
> > > > >
> > > > > On Wed, May 9, 2018 at 1:36 PM, Michael Miklavcic <
> > > > > michael.miklavcic@gmail.com> wrote:
> > > > >
> > > > > > This looks like a pretty good start Ryan. Does the metadata
> > endpoint
> > > > > cover
> > > > > > this https://github.com/apache/metron/tree/master/
> > > > > > metron-platform/metron-api#the-pcapgettergetpcapsbyidentifier
> > > > s-endpoint
> > > > > > from the original metron-api? If so, then we would be able to
> > > deprecate
> > > > > the
> > > > > > existing metron-api project. If we later go to micro-services, a
> > pcap
> > > > > > module would spin back into the fold, but it would probably look
> > > > > different
> > > > > > from metron-api.
> > > > > >
> > > > > > I commented on the UI thread, but to reiterate for the purpose of
> > > > backend
> > > > > > functionality here I don't believe there is a way to "PAUSE" or
> > > > "SUSPEND"
> > > > > > jobs. That said, I think GET /api/v1/pcap/stop/<jobId> is
> > sufficient
> > > > for
> > > > > > the job management operations.
> > > > > >
> > > > > > On Wed, May 9, 2018 at 11:00 AM, Ryan Merriman <
> > merrimanr@gmail.com>
> >
> > > > > > wrote:
> > > > > >
> > > > > > > Now that we are confident we can run submit a MR job from our
> > > current
> > > > > > REST
> > > > > > > application, is this the desired approach? Just want to
> confirm.
> > > > > > >
> > > > > > > Next I think we should map out what the REST interface will
> look
> > > > like.
> > > > > > > Here are the endpoints I'm thinking about:
> > > > > > >
> > > > > > > GET /api/v1/pcap/metadata?basePath
> > > > > > >
> > > > > > > This endpoint will return metadata of pcap data stored in HDFS.
> > > This
> > > > > > would
> > > > > > > include pcap size, date ranges (how far back can I go), etc. It
> > > > would
> > > > > > > accept an optional HDFS basePath parameter for cases where pcap
> > > data
> > > > is
> > > > > > > stored in multiple places and/or different from the default
> > > location.
> > > > > > >
> > > > > > > POST /api/v1/pcap/query
> > > > > > >
> > > > > > > This endpoint would accept a pcap request, submit a pcap query
> > job,
> > > > and
> > > > > > > return a job id. The request would be an object containing the
> > > > > > parameters
> > > > > > > documented here: https://github.com/apache/metron/tree/master/
> > > > > > > metron-platform/metron-pcap-backend#query-filter-utility. A
> > > > query/job
> > > > > > > would be associated with a user that submits it. An exception
> > will
> > > > be
> > > > > > > returned for violating constraints like too many queries
> > submitted,
> > > > > query
> > > > > > > parameters out of limits, etc.
> > > > > > >
> > > > > > > GET /api/v1/pcap/status/<jobId>
> > > > > > >
> > > > > > > This endpoint will return the status of a running job. I
> imagine
> > > > this
> > > > > is
> > > > > > > just a proxy to the YARN REST api. We can discuss the
> > > implementation
> > > > > > > behind these endpoints later.
> > > > > > >
> > > > > > > GET /api/v1/pcap/stop/<jobId>
> > > > > > >
> > > > > > > This endpoint would kill a running pcap job. If the job has
> > > already
> > > > > > > completed this is a noop.
> > > > > > >
> > > > > > > GET /api/v1/pcap/list
> > > > > > >
> > > > > > > This endpoint will list a user's submitted pcap queries. Items
> in
> > > > the
> > > > > > list
> > > > > > > would contain job id, status (is it finished?), start/end time,
> > and
> > > > > > number
> > > > > > > of pages. Maybe there is some overlap with the status endpoint
> > > above
> > > > > and
> > > > > > > the status endpoint is not needed?
> > > > > > >
> > > > > > > GET /api/v1/pcap/pdml/<jobId>/<pageNumber>
> > > > > > >
> > > > > > > This endpoint will return pcap results for the given page in
> pdml
> > > > > format
> > > > > > (
> > > > > > > https://wiki.wireshark.org/PDML). Are there other formats we
> > want
> > > > to
> > > > > > > support?
> > > > > > >
> > > > > > > GET /api/v1/pcap/raw/<jobId>/<pageNumber>
> > > > > > >
> > > > > > > This endpoint will allow a user to download raw pcap results
> for
> > > the
> > > > > > given
> > > > > > > page.
> > > > > > >
> > > > > > > DELETE /api/v1/pcap/<jobId>
> > > > > > >
> > > > > > > This endpoint will delete pcap query results. Not sure yet how
> > > this
> > > > > fits
> > > > > > > in with our broader cleanup strategy.
> > > > > > >
> > > > > > > This should get us started. What did I miss and what would you
> > > > change
> > > > > > > about these? I did not include much detail related to security,
> > > > > cleanup
> > > > > > > strategy, or underlying implementation details but these are
> > items
> > > we
> > > > > > > should discuss at some point.
> > > > > > >
> > > > > > > On Tue, May 8, 2018 at 5:38 PM, Michael Miklavcic <
> > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > >
> > > > > > > > Sweet! That's great news. The pom changes are a lot simpler
> > than
> > > I
> > > > > > > > expected. Very nice.
> > > > > > > >
> > > > > > > > On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman <
> > > merrimanr@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Finally figured it out. Commit is here:
> > > > > > > > > https://github.com/merrimanr/incubator-metron/commit/
> > > > > > > > > 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
> > > > > > > > >
> > > > > > > > > It came down to figuring out the right combination of maven
> > > > > > > dependencies
> > > > > > > > > and passing in the HDP version to REST as a Java system
> > > property.
> > > > > I
> > > > > > > also
> > > > > > > > > included some HDFS setup tasks. I tested this in full dev
> and
> > > > can
> > > > > > now
> > > > > > > > > successfully run a pcap query and get results. All you
> should
> > > > have
> > > > > > to
> > > > > > > do
> > > > > > > > > is generate some pcap data first.
> > > > > > > > >
> > > > > > > > > On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
> > > > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > @Ryan - pulled your branch and experimented with a few
> > > things.
> > > > In
> > > > > > > doing
> > > > > > > > > so,
> > > > > > > > > > it dawned on me that by adding the yarn and hadoop
> > classpath,
> > > > you
> > > > > > > > > probably
> > > > > > > > > > didn't introduce a new classpath issue, rather you
> probably
> > > > just
> > > > > > > moved
> > > > > > > > > onto
> > > > > > > > > > the next classpath issue, ie hbase per your exception
> about
> > > > hbase
> > > > > > > jaxb.
> > > > > > > > > > Anyhow, I put up a branch with some pom changes worth
> > trying
> > > in
> > > > > > > > > conjunction
> > > > > > > > > > with invoking the rest app startup via "/usr/bin/yarn
> jar"
> > > > > > > > > >
> > > > > > > > > > https://github.com/mmiklavc/metron/tree/ryan-rest-test
> > > > > > > > > >
> > > > > > > > > > https://github.com/mmiklavc/metron/commit/
> > > > > > > > 5ca23580fc6e043fafae2327c80b65
> > > > > > > > > > b20ca1c0c9
> > > > > > > > > >
> > > > > > > > > > Mike
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> > > > > > > > > > simon@simonellistonball.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > That would be a step closer to something more like a
> > > > > > micro-service
> > > > > > > > > > > architecture. However, I would want to make sure we
> think
> > > > about
> > > > > > the
> > > > > > > > > > > operational complexity, and mpack implications of
> having
> > > > > another
> > > > > > > > server
> > > > > > > > > > > installed and running somewhere on the cluster (also,
> > ssl,
> > > > > > > kerberos,
> > > > > > > > > etc
> > > > > > > > > > > etc requirements for that service).
> > > > > > > > > > >
> > > > > > > > > > > On 8 May 2018 at 14:27, Ryan Merriman <
> > merrimanr@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > +1 to having metron-api as it's own service and
> using a
> > > > > gateway
> > > > > > > > type
> > > > > > > > > > > > pattern.
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <
> > > > > > > > ottobackwards@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Why not have metron-api as it’s own service and
> use a
> > > > > > ‘gateway’
> > > > > > > > > type
> > > > > > > > > > > > > pattern in rest?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (
> > > > > > merrimanr@gmail.com
> > > > > > > )
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Moving the yarn classpath command earlier in the
> > > > classpath
> > > > > > now
> > > > > > > > > gives
> > > > > > > > > > > this
> > > > > > > > > > > > > error:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Caused by: java.lang.NoSuchMethodError:
> > > > > > > > > > > > > javax.servlet.ServletContext.
> > > > getVirtualServerName()Ljava/
> > > > > > > > > > lang/String;
> > > > > > > > > > > > >
> > > > > > > > > > > > > I will experiment with other combinations, I
> suspect
> > we
> > > > > will
> > > > > > > need
> > > > > > > > > > > > > finer-grain control over the order.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The grep matches class names inside jar files. I
> use
> > > this
> > > > > all
> > > > > > > the
> > > > > > > > > > time
> > > > > > > > > > > > and
> > > > > > > > > > > > > it's really useful.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The metron-rest jar is already shaded.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Reverse engineering the yarn jar command was the
> next
> > > > > thing I
> > > > > > > was
> > > > > > > > > > going
> > > > > > > > > > > > to
> > > > > > > > > > > > > try. Will let you know how it goes.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic
> <
> > > > > > > > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > What order did you add the hadoop or yarn
> > classpath?
> > > > The
> > > > > > > > "shaded"
> > > > > > > > > > > > > package
> > > > > > > > > > > > > > stands out to me in this name
> > > > "org.apache.hadoop.hbase.*
> > > > > > > > shaded*
> > > > > > > > > > > > > >
> > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider."
> > > > > > Maybe
> > > > > > > > try
> > > > > > > > > > > adding
> > > > > > > > > > > > > > those packages earlier on the classpath.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think that find command needs a "jar tvf",
> > > otherwise
> > > > > > you're
> > > > > > > > > > looking
> > > > > > > > > > > > > for a
> > > > > > > > > > > > > > class name in jar file names.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Have you tried shading the rest jar?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'd also look at the classpath you get when
> running
> > > > "yarn
> > > > > > > jar"
> > > > > > > > to
> > > > > > > > > > > start
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > existing pcap service, per the instructions in
> > > > > > > > > > metron-api/README.md.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <
> > > > > > > > > merrimanr@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > To explore the idea of merging metron-api into
> > > > > > metron-rest
> > > > > > > > and
> > > > > > > > > > > > running
> > > > > > > > > > > > > > pcap
> > > > > > > > > > > > > > > queries inside our REST application, I created
> a
> > > > simple
> > > > > > > test
> > > > > > > > > > here:
> > > > > > > > > > > > > > >
> > > > > https://github.com/merrimanr/incubator-metron/tree/pcap-
> > > > > > > > > > rest-test.
> > > > > > > > > > > A
> > > > > > > > > > > > > > > summary of what's included:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Added pcap as a dependency in the metron-rest
> > > > pom.xml
> > > > > > > > > > > > > > > - Added a pcap query controller endpoint at
> > > > > > > > > > > > > > > http://node1:8082/swagger-ui.
> > > > > > html#!/pcap-query-controller/
> > > > > > > > > > > > > > queryUsingGET
> > > > > > > > > > > > > > > - Added a pcap query service that runs a
> simple,
> > > > > > hardcoded
> > > > > > > > > query
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Generate some pcap data using pycapa (
> > > > > > > > > > > > > > >
> > > https://github.com/apache/metron/tree/master/metron-
> > > > > > > > > > sensors/pycapa
> > > > > > > > > > > )
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > pcap topology (
> > > > > > > > > > > > > > >
> > > https://github.com/apache/metron/tree/master/metron-
> > > > > > > > > > > > > > >
> > > platform/metron-pcap-backend#starting-the-topology).
> > > > > > > > > > > > > > > After this initial setup there should be data
> in
> > > HDFS
> > > > > at
> > > > > > > > > > > > > > > "/apps/metron/pcap". I believe this should be
> > > enough
> > > > to
> > > > > > > > > exercise
> > > > > > > > > > > the
> > > > > > > > > > > > > > > issue. Just hit the endpoint referenced above.
> I
> > > > tested
> > > > > > > this
> > > > > > > > in
> > > > > > > > > > an
> > > > > > > > > > > > > > > already running full dev by building and
> > deploying
> > > > the
> > > > > > > > > > metron-rest
> > > > > > > > > > > > > jar.
> > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > did not rebuild full dev with this change but I
> > > would
> > > > > > still
> > > > > > > > > > expect
> > > > > > > > > > > it
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > work. Let me know if it doesn't.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The first error I see when I hit this endpoint
> > is:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > java.lang.NoClassDefFoundError:
> > > > > > > > > > > > > > > org/apache/hadoop/yarn/webapp/
> > > > > > YarnJacksonJaxbJsonProvider.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Here are the things I've tried so far:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Run the REST application with the YARN jar
> > > command
> > > > > > since
> > > > > > > > this
> > > > > > > > > > is
> > > > > > > > > > > > how
> > > > > > > > > > > > > > > all our other YARN/MR-related applications are
> > > > started
> > > > > > > > > > (metron-api,
> > > > > > > > > > > > > > > MAAS,
> > > > > > > > > > > > > > > pcap query, etc). I wouldn't expect this to
> work
> > > > since
> > > > > we
> > > > > > > > have
> > > > > > > > > > > > > > runtime
> > > > > > > > > > > > > > > dependencies on our shaded elasticsearch and
> > parser
> > > > > jars
> > > > > > > and
> > > > > > > > > I'm
> > > > > > > > > > > not
> > > > > > > > > > > > > > > aware
> > > > > > > > > > > > > > > of a way to add additional jars to the
> classpath
> > > with
> > > > > the
> > > > > > > > YARN
> > > > > > > > > > jar
> > > > > > > > > > > > > > > command
> > > > > > > > > > > > > > > (is there a way?). Either way I get this error:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 18/05/04 19:49:56 WARN reflections.Reflections:
> > > could
> > > > > not
> > > > > > > > > create
> > > > > > > > > > > Dir
> > > > > > > > > > > > > > using
> > > > > > > > > > > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/
> > > > > > > > > hadoop/lib/ojdbc6.jar.
> > > > > > > > > > > > > > skipping.
> > > > > > > > > > > > > > > java.lang.NullPointerException
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - I tried adding `yarn classpath` and `hadoop
> > > > > classpath`
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > > > > > classpath in /usr/metron/0.4.3/bin/metron-
> > rest.sh
> > > > (REST
> > > > > > > > start
> > > > > > > > > > > > > > > script). I
> > > > > > > > > > > > > > > get this error:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > java.lang.ClassNotFoundException:
> > > > > > > > > > > > > > >
> > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > > > > > > > > > > > jaxrs.JacksonJaxbJsonProvider
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - I searched for the class in the previous
> > attempt
> > > > but
> > > > > > > could
> > > > > > > > > not
> > > > > > > > > > > find
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > in full dev:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > > > > > > > > > > > >
> > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > > > > > > > > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > > > > > > > > > > > 2>/dev/null
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Further up in the stack trace I see the error
> > > > happens
> > > > > > > when
> > > > > > > > > > > > > > initiating
> > > > > > > > > > > > > > > the org.apache.hadoop.yarn.util.
> > > > timeline.TimelineUtils
> > > > > > > > class.
> > > > > > > > > I
> > > > > > > > > > > > > > tried
> > > > > > > > > > > > > > > setting "yarn.timeline-service.enabled" in
> > Ambari
> > > to
> > > > > > false
> > > > > > > > and
> > > > > > > > > > > then
> > > > > > > > > > > > I
> > > > > > > > > > > > > > > get
> > > > > > > > > > > > > > > this error:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Unable to parse
> > > > > > > > > > > > > > >
> > > > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-
> > > > > > > > > > framework'
> > > > > > > > > > > > as
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > URI, check the setting for
> mapreduce.application.
> > > > > > > > > framework.path
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - I've tried adding different hadoop, hbase,
> yarn
> > > and
> > > > > > > > mapreduce
> > > > > > > > > > > Maven
> > > > > > > > > > > > > > > dependencies without any success
> > > > > > > > > > > > > > > - hadoop-yarn-client
> > > > > > > > > > > > > > > - hadoop-yarn-common
> > > > > > > > > > > > > > > - hadoop-mapreduce-client-core
> > > > > > > > > > > > > > > - hadoop-yarn-server-common
> > > > > > > > > > > > > > > - hadoop-yarn-api
> > > > > > > > > > > > > > > - hbase-server
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I will keep exploring other possible solutions.
> > Let
> > > > me
> > > > > > know
> > > > > > > > if
> > > > > > > > > > > anyone
> > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > any ideas.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
> > > > > > > > > > > ottobackwards@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I can imagine a new generic service(s)
> > capability
> > > > > whose
> > > > > > > > job (
> > > > > > > > > > pun
> > > > > > > > > > > > > > > intended
> > > > > > > > > > > > > > > > ) is to
> > > > > > > > > > > > > > > > abstract the submittal, tracking, and storage
> > of
> > > > > > results
> > > > > > > to
> > > > > > > > > > yarn.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > It would be extended with storage providers,
> > > queue
> > > > > > > > provider,
> > > > > > > > > > > > > possibly
> > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > set of policies or rather strategies.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The pcap ‘report’ would be a client to that
> > > > service,
> > > > > > the
> > > > > > > > > > > > specializes
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > service operation for the way we want pcap to
> > > work.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We can then re-use the generic service for
> > other
> > > > long
> > > > > > > > running
> > > > > > > > > > > yarn
> > > > > > > > > > > > > > > > things…..
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
> > > > > > > > > > ottobackwards@gmail.com
> > > > > > > > > > > )
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > RE: Tracking v. users
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The submittal and tracking can associate the
> > > > > submitter
> > > > > > > with
> > > > > > > > > the
> > > > > > > > > > > > yarn
> > > > > > > > > > > > > > job
> > > > > > > > > > > > > > > > and track that,
> > > > > > > > > > > > > > > > regardless of the yarn credentials.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > IE> if all submittals and monitoring are by
> the
> > > > same
> > > > > > yarn
> > > > > > > > > user
> > > > > > > > > > (
> > > > > > > > > > > > > > Metron )
> > > > > > > > > > > > > > > > from a single or
> > > > > > > > > > > > > > > > co-operative set of services, that service
> can
> > > > > maintain
> > > > > > > the
> > > > > > > > > > > > mapping.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (
> > > > > > > > > merrimanr@gmail.com
> > > > > > > > > > )
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Otto, your use case makes sense to me. We'll
> > have
> > > > to
> > > > > > > think
> > > > > > > > > > about
> > > > > > > > > > > > how
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > manage the user to job relationships. I'm
> > > assuming
> > > > > YARN
> > > > > > > > jobs
> > > > > > > > > > will
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > > submitted as the metron service user so YARN
> > > won't
> > > > > keep
> > > > > > > > track
> > > > > > > > > > of
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > us. Is that assumption correct? Do you have
> any
> > > > ideas
> > > > > > for
> > > > > > > > > doing
> > > > > > > > > > > > > that?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Mike, I can start a feature branch and
> > experiment
> > > > > with
> > > > > > > > > merging
> > > > > > > > > > > > > > metron-api
> > > > > > > > > > > > > > > > into metron-rest. That should allow us to
> > > > collaborate
> > > > > > on
> > > > > > > > any
> > > > > > > > > > > issues
> > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > challenges. Also, can you expand on your idea
> > to
> > > > > manage
> > > > > > > > > > external
> > > > > > > > > > > > > > > > dependencies as a special module? That seems
> > > like a
> > > > > > very
> > > > > > > > > > > attractive
> > > > > > > > > > > > > > > option
> > > > > > > > > > > > > > > > to me.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
> > > > > > > > > > > > ottobackwards@gmail.com>
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > From my response on the other thread, but
> > > > > applicable
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > > > backend
> > > > > > > > > > > > > > > > stuff:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > "The PCAP Query seems more like PCAP Report
> > to
> > > > me.
> > > > > > You
> > > > > > > > are
> > > > > > > > > > > > > > generating a
> > > > > > > > > > > > > > > > > report based on parameters.
> > > > > > > > > > > > > > > > > That report is something that takes some
> time
> > > and
> > > > > > > > external
> > > > > > > > > > > > process
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > generate… ie you have to wait for it.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I can almost imagine a flow where you:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > * Are in the AlertUI
> > > > > > > > > > > > > > > > > * Ask to generate a PCAP report based on
> some
> > > > > > selected
> > > > > > > > > > > > > > > alerts/meta-alert,
> > > > > > > > > > > > > > > > > possibly picking from on or more report
> > > > ‘templates’
> > > > > > > > > > > > > > > > > that have query options etc
> > > > > > > > > > > > > > > > > * The report request is ‘queued’, that is
> > > > > dispatched
> > > > > > to
> > > > > > > > be
> > > > > > > > > be
> > > > > > > > > > > > > > > > > executed/generated
> > > > > > > > > > > > > > > > > * You as a user have a ‘queue’ of your
> report
> > > > > > results,
> > > > > > > > and
> > > > > > > > > > when
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > report
> > > > > > > > > > > > > > > > > is done it is queued there
> > > > > > > > > > > > > > > > > * We ‘monitor’ the report/queue press
> through
> > > the
> > > > > > yarn
> > > > > > > > > rest (
> > > > > > > > > > > > > report
> > > > > > > > > > > > > > > > > info/meta has the yarn details )
> > > > > > > > > > > > > > > > > * You can select the report from your queue
> > and
> > > > > view
> > > > > > it
> > > > > > > > > > either
> > > > > > > > > > > in
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > UI
> > > > > > > > > > > > > > > > > or custom component
> > > > > > > > > > > > > > > > > * You can then apply a different ‘view’ to
> > the
> > > > > report
> > > > > > > or
> > > > > > > > > work
> > > > > > > > > > > > with
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > report data
> > > > > > > > > > > > > > > > > * You can print / save etc
> > > > > > > > > > > > > > > > > * You can associate the report with the
> > alerts
> > > (
> > > > > > again
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > > report
> > > > > > > > > > > > > > > info
> > > > > > > > > > > > > > > > > ) with…. a ‘case’ or ‘ticket’ or
> > investigation
> > > > > > > something
> > > > > > > > or
> > > > > > > > > > > other
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > We can introduce extensibility into the
> > report
> > > > > > > templates,
> > > > > > > > > > > report
> > > > > > > > > > > > > > views
> > > > > > > > > > > > > > > (
> > > > > > > > > > > > > > > > > thinks that work with the json data of the
> > > > report )
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Something like that.”
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Maybe we can do :
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > template -> query parameters -> script =>
> > yarn
> > > > info
> > > > > > > > > > > > > > > > > yarn info + query info + alert context +
> yarn
> > > > > status
> > > > > > =>
> > > > > > > > > > report
> > > > > > > > > > > > > info
> > > > > > > > > > > > > > ->
> > > > > > > > > > > > > > > > > stored in a user’s ‘report queue’
> > > > > > > > > > > > > > > > > report persistence added to report info
> > > > > > > > > > > > > > > > > metron-rest -> api to monitor the queue,
> read
> > > > > > results (
> > > > > > > > > page
> > > > > > > > > > ),
> > > > > > > > > > > > > etc
> > > > > > > > > > > > > > etc
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (
> > > > > > > > > > merrimanr@gmail.com
> > > > > > > > > > > )
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I started a separate thread on Pcap UI
> > > > > considerations
> > > > > > > and
> > > > > > > > > > user
> > > > > > > > > > > > > > > > > requirements
> > > > > > > > > > > > > > > > > at Otto's request. This should help us keep
> > > these
> > > > > two
> > > > > > > > > related
> > > > > > > > > > > but
> > > > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > discussions focused.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel
> Sumbul
> > <
> > > > > > > > > > > > > > michelsumbul@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > (Youhouuu my first reply on this kind of
> > mail
> > > > > > > chain^^)
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > If I may, I would like to share my view
> on
> > > the
> > > > > > > > following
> > > > > > > > > 3
> > > > > > > > > > > > > points.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > - Backend:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The current metron-api is totally
> seperate,
> > > it
> > > > > will
> > > > > > > be
> > > > > > > > > > logic
> > > > > > > > > > > > for
> > > > > > > > > > > > > me
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > it at the same place as the others rest
> > api.
> > > > > > > Especially
> > > > > > > > > > when
> > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > security
> > > > > > > > > > > > > > > > > > will be added, it will not be needed to
> do
> > > the
> > > > > job
> > > > > > > > twice.
> > > > > > > > > > > > > > > > > > The current implementation send back a
> pcap
> > > > > object
> > > > > > > > which
> > > > > > > > > > > still
> > > > > > > > > > > > > need
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > decoded. In the opensoc, the decoding was
> > > done
> > > > > with
> > > > > > > > > tshard
> > > > > > > > > > on
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > frontend.
> > > > > > > > > > > > > > > > > > It will be good to have this decoding
> > > happening
> > > > > > > > directly
> > > > > > > > > on
> > > > > > > > > > > the
> > > > > > > > > > > > > > > backend
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > not create a load on frontend. An option
> > will
> > > > be
> > > > > to
> > > > > > > > > install
> > > > > > > > > > > > > tshark
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > rest server and to use to convert the
> pcap
> > to
> > > > xml
> > > > > > and
> > > > > > > > > then
> > > > > > > > > > > to a
> > > > > > > > > > > > > > json
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > will be send to the frontend.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I tried to start directly the map/reduce
> > job
> > > to
> > > > > > > search
> > > > > > > > > over
> > > > > > > > > > > all
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > pcap
> > > > > > > > > > > > > > > > > > data from the rest server and as Ryan
> > mention
> > > > it,
> > > > > > we
> > > > > > > > had
> > > > > > > > > > > > > trouble. I
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > try to find back the error.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Then in the POC, what we tried is to use
> > the
> > > > > > > pcap_query
> > > > > > > > > > > script
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > work fine. I just modified it that he
> sends
> > > > back
> > > > > > > > directly
> > > > > > > > > > the
> > > > > > > > > > > > > > job_id
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > yarn and not waiting that the job is
> > > finished.
> > > > > Then
> > > > > > > it
> > > > > > > > > will
> > > > > > > > > > > > > allow
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > UI
> > > > > > > > > > > > > > > > > > and the rest server to know what the
> status
> > > of
> > > > > the
> > > > > > > > > research
> > > > > > > > > > > by
> > > > > > > > > > > > > > > querying
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > yarn rest api. This will allow the UI and
> > the
> > > > > rest
> > > > > > > > server
> > > > > > > > > > to
> > > > > > > > > > > be
> > > > > > > > > > > > > > async
> > > > > > > > > > > > > > > > > > without any blocking phase. What do you
> > think
> > > > > about
> > > > > > > > that?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Having the job submitted directly from
> the
> > > code
> > > > > of
> > > > > > > the
> > > > > > > > > rest
> > > > > > > > > > > > > server
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > perfect, but it will need a lot of
> > > > investigation
> > > > > I
> > > > > > > > think
> > > > > > > > > > (but
> > > > > > > > > > > > > I'm
> > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > expert so I might be completely wrong
> ^^).
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > We know that the pcap_query scritp work
> > fine
> > > so
> > > > > why
> > > > > > > not
> > > > > > > > > > > calling
> > > > > > > > > > > > > it?
> > > > > > > > > > > > > > > Is
> > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > that bad? (maybe stupid question, but I
> > > really
> > > > > > don’t
> > > > > > > > see
> > > > > > > > > a
> > > > > > > > > > > lot
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > drawback)
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > - Front end:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Adding the the pcap search to the alert
> UI
> > > is,
> > > > I
> > > > > > > think,
> > > > > > > > > the
> > > > > > > > > > > > > easiest
> > > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > move forward. But indeed, it will then be
> > the
> > > > > > “Alert
> > > > > > > UI
> > > > > > > > > and
> > > > > > > > > > > > > > > pcapquery”.
> > > > > > > > > > > > > > > > > > Maybe the name of the UI should just
> change
> > > to
> > > > > > > > something
> > > > > > > > > > like
> > > > > > > > > > > > > > > > > “Monitoring &
> > > > > > > > > > > > > > > > > > Investigation UI” ?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Is there any roadmap or plan for the
> > > different
> > > > > UI?
> > > > > > I
> > > > > > > > mean
> > > > > > > > > > did
> > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > already
> > > > > > > > > > > > > > > > > > had discussion on how you see the ui
> > evolving
> > > > > with
> > > > > > > the
> > > > > > > > > new
> > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > will come in the future?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > - Microservices:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > What do you mean exactly by
> microservices?
> > Is
> > > > it
> > > > > to
> > > > > > > > > > separate
> > > > > > > > > > > > all
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > features in different projects? Or
> > something
> > > > like
> > > > > > > > having
> > > > > > > > > > the
> > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > components in container like kubernet?
> > (again
> > > > > maybe
> > > > > > > > > stupid
> > > > > > > > > > > > > > question,
> > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > > don’t clearly understand what you mean J
> )
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Michel
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > --
> > > > > > > > > > > simon elliston ball
> > > > > > > > > > > @sireb
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > --
> > > >
> > > > Jon
> > > >
> > >
> > --
> >
> > Jon
> >
>
>
>
> --
> --
> simon elliston ball
> @sireb
>

Re: [DISCUSS] Pcap panel architecture

Posted by Simon Elliston Ball <si...@simonellistonball.com>.
On the sharing and securing points, it seems like having the result of a
query run hang around in HDFS on a given URL would work, we can then use
Ranger policies (RBAC, or even ABAC later) to control access, this also
solves the page storage problem, and gives us a kind of two step approach,
both of which could (maybe, possibly, but probably not) be large enough to
need distribution, i.e. the initial search everything and the subsequent
sort / page through the results. Does anyone imaging sorting? Maybe
sub-filtering, but PCAP is heavily time based, so timestamp sort ok?

I also suspect it's worth considering the lifecycle of our stored result
sets as being meta-data driven. If I'm doing a speculative search I don't
really care if an admin cleans that up after at the end of day / week /
disk space nervousness limit. However, if I find something good, I might
want to mark the result set as immune from automatic deletion.

The other issue I would raise, which has implications for our PCAP capture,
and also impacts Otto's suggestion of 'self-uploaded' PCAPs is how we
namespace PCAP collection and retrieval. The problem here is that I might
have PCAPs from multiple locations which have conflicting private IP
ranges, so I can't logically dump them all in the same repository. Solving
the collection end of that is probably a separate unit of effort, but this
retrieval architecture should support multiple file system locations.

If we wanted to get fancy about it, we should look at using the stored
result sets as a kind of cache, for other queries, as people refine and
narrow down queries, it may make sense to be more sophisticated about where
our query jobs pull from (i.e. filter the subset from a previous resultset,
rather than scanning petabytes of source data). This may imply some kind of
TOC for the cache. The underlying immutability of the PCAP store should
make this fairly tractable.

FYI, I've been doing a lot of thinking around data security, API and
configuration security and auditing recently, but I suspect that is a
different discuss thread. I'll kick something off shortly with a few
thoughts.

I see a lot of this as long term goals to be honest, so as Jon says, we can
definitely take a few baby steps to start.

Simon

On 11 May 2018 at 15:40, Otto Fowler <ot...@gmail.com> wrote:

> Don’t lose the use case for manually uploading PCAPS for analysis Jon.
>
>
> On May 11, 2018 at 10:14:02, Zeolla@GMail.com (zeolla@gmail.com) wrote:
>
> I think baby steps are fine - admin gets access to all, otherwise you only
> see your own pcaps, but we file a jira for a future add of API security,
> which more mature SOCs that align with the Metron personas will need.
>
> Jon
>
> On Fri, May 11, 2018, 09:27 Ryan Merriman <me...@gmail.com> wrote:
>
> > That's a good point Jon. There are different levels of effort associated
> > with different options. If we want to allow pcaps to be shared with
> > specific users, we will need to introduce ACL security in our REST
> > application using something like the ACL capability that comes with
> Spring
> > Security or Ranger. This would be more complex to design and implement.
> > If we want something more broad like admin roles that can see all or
> > allowing pcap files to become public, this would be less work. Do you
> > think ACL security is required or would the other options be acceptable?
> >
> > On Thu, May 10, 2018 at 2:47 PM, Zeolla@GMail.com <ze...@gmail.com>
> > wrote:
> >
> > > At the very least there needs to be the ability to share downloaded
> PCAPs
> > > with other users and/or have roles that can see all pcaps. A platform
> > > engineer may want to clean up old pcaps after x time, or a manger may
> ask
> > > an analyst to find all of the traffic that exhibits xyz behavior, dump
> a
> > > pcap, and then point him to it so the manager can review. Since the
> > > pcap may be huge, we wouldn't want to try to push people to sending it
> > via
> > > email, uploading to a file server, finding an external hard drive, etc.
> > >
> > > Jon
> > >
> > > On Thu, May 10, 2018 at 10:16 AM Ryan Merriman <me...@gmail.com>
> > > wrote:
> > >
> > > > Mike, I believe the /pcapGetter/getPcapsByIdentifiers endpoint
> exposes
> > > the
> > > > fixed query option which we have covered. I agree with you that
> > > > deprecating the metron-api module should be a goal of this feature.
> > > >
> > > > On Wed, May 9, 2018 at 1:36 PM, Michael Miklavcic <
> > > > michael.miklavcic@gmail.com> wrote:
> > > >
> > > > > This looks like a pretty good start Ryan. Does the metadata
> endpoint
> > > > cover
> > > > > this https://github.com/apache/metron/tree/master/
> > > > > metron-platform/metron-api#the-pcapgettergetpcapsbyidentifier
> > > s-endpoint
> > > > > from the original metron-api? If so, then we would be able to
> > deprecate
> > > > the
> > > > > existing metron-api project. If we later go to micro-services, a
> pcap
> > > > > module would spin back into the fold, but it would probably look
> > > > different
> > > > > from metron-api.
> > > > >
> > > > > I commented on the UI thread, but to reiterate for the purpose of
> > > backend
> > > > > functionality here I don't believe there is a way to "PAUSE" or
> > > "SUSPEND"
> > > > > jobs. That said, I think GET /api/v1/pcap/stop/<jobId> is
> sufficient
> > > for
> > > > > the job management operations.
> > > > >
> > > > > On Wed, May 9, 2018 at 11:00 AM, Ryan Merriman <
> merrimanr@gmail.com>
>
> > > > > wrote:
> > > > >
> > > > > > Now that we are confident we can run submit a MR job from our
> > current
> > > > > REST
> > > > > > application, is this the desired approach? Just want to confirm.
> > > > > >
> > > > > > Next I think we should map out what the REST interface will look
> > > like.
> > > > > > Here are the endpoints I'm thinking about:
> > > > > >
> > > > > > GET /api/v1/pcap/metadata?basePath
> > > > > >
> > > > > > This endpoint will return metadata of pcap data stored in HDFS.
> > This
> > > > > would
> > > > > > include pcap size, date ranges (how far back can I go), etc. It
> > > would
> > > > > > accept an optional HDFS basePath parameter for cases where pcap
> > data
> > > is
> > > > > > stored in multiple places and/or different from the default
> > location.
> > > > > >
> > > > > > POST /api/v1/pcap/query
> > > > > >
> > > > > > This endpoint would accept a pcap request, submit a pcap query
> job,
> > > and
> > > > > > return a job id. The request would be an object containing the
> > > > > parameters
> > > > > > documented here: https://github.com/apache/metron/tree/master/
> > > > > > metron-platform/metron-pcap-backend#query-filter-utility. A
> > > query/job
> > > > > > would be associated with a user that submits it. An exception
> will
> > > be
> > > > > > returned for violating constraints like too many queries
> submitted,
> > > > query
> > > > > > parameters out of limits, etc.
> > > > > >
> > > > > > GET /api/v1/pcap/status/<jobId>
> > > > > >
> > > > > > This endpoint will return the status of a running job. I imagine
> > > this
> > > > is
> > > > > > just a proxy to the YARN REST api. We can discuss the
> > implementation
> > > > > > behind these endpoints later.
> > > > > >
> > > > > > GET /api/v1/pcap/stop/<jobId>
> > > > > >
> > > > > > This endpoint would kill a running pcap job. If the job has
> > already
> > > > > > completed this is a noop.
> > > > > >
> > > > > > GET /api/v1/pcap/list
> > > > > >
> > > > > > This endpoint will list a user's submitted pcap queries. Items in
> > > the
> > > > > list
> > > > > > would contain job id, status (is it finished?), start/end time,
> and
> > > > > number
> > > > > > of pages. Maybe there is some overlap with the status endpoint
> > above
> > > > and
> > > > > > the status endpoint is not needed?
> > > > > >
> > > > > > GET /api/v1/pcap/pdml/<jobId>/<pageNumber>
> > > > > >
> > > > > > This endpoint will return pcap results for the given page in pdml
> > > > format
> > > > > (
> > > > > > https://wiki.wireshark.org/PDML). Are there other formats we
> want
> > > to
> > > > > > support?
> > > > > >
> > > > > > GET /api/v1/pcap/raw/<jobId>/<pageNumber>
> > > > > >
> > > > > > This endpoint will allow a user to download raw pcap results for
> > the
> > > > > given
> > > > > > page.
> > > > > >
> > > > > > DELETE /api/v1/pcap/<jobId>
> > > > > >
> > > > > > This endpoint will delete pcap query results. Not sure yet how
> > this
> > > > fits
> > > > > > in with our broader cleanup strategy.
> > > > > >
> > > > > > This should get us started. What did I miss and what would you
> > > change
> > > > > > about these? I did not include much detail related to security,
> > > > cleanup
> > > > > > strategy, or underlying implementation details but these are
> items
> > we
> > > > > > should discuss at some point.
> > > > > >
> > > > > > On Tue, May 8, 2018 at 5:38 PM, Michael Miklavcic <
> > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > >
> > > > > > > Sweet! That's great news. The pom changes are a lot simpler
> than
> > I
> > > > > > > expected. Very nice.
> > > > > > >
> > > > > > > On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman <
> > merrimanr@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Finally figured it out. Commit is here:
> > > > > > > > https://github.com/merrimanr/incubator-metron/commit/
> > > > > > > > 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
> > > > > > > >
> > > > > > > > It came down to figuring out the right combination of maven
> > > > > > dependencies
> > > > > > > > and passing in the HDP version to REST as a Java system
> > property.
> > > > I
> > > > > > also
> > > > > > > > included some HDFS setup tasks. I tested this in full dev and
> > > can
> > > > > now
> > > > > > > > successfully run a pcap query and get results. All you should
> > > have
> > > > > to
> > > > > > do
> > > > > > > > is generate some pcap data first.
> > > > > > > >
> > > > > > > > On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
> > > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > @Ryan - pulled your branch and experimented with a few
> > things.
> > > In
> > > > > > doing
> > > > > > > > so,
> > > > > > > > > it dawned on me that by adding the yarn and hadoop
> classpath,
> > > you
> > > > > > > > probably
> > > > > > > > > didn't introduce a new classpath issue, rather you probably
> > > just
> > > > > > moved
> > > > > > > > onto
> > > > > > > > > the next classpath issue, ie hbase per your exception about
> > > hbase
> > > > > > jaxb.
> > > > > > > > > Anyhow, I put up a branch with some pom changes worth
> trying
> > in
> > > > > > > > conjunction
> > > > > > > > > with invoking the rest app startup via "/usr/bin/yarn jar"
> > > > > > > > >
> > > > > > > > > https://github.com/mmiklavc/metron/tree/ryan-rest-test
> > > > > > > > >
> > > > > > > > > https://github.com/mmiklavc/metron/commit/
> > > > > > > 5ca23580fc6e043fafae2327c80b65
> > > > > > > > > b20ca1c0c9
> > > > > > > > >
> > > > > > > > > Mike
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> > > > > > > > > simon@simonellistonball.com> wrote:
> > > > > > > > >
> > > > > > > > > > That would be a step closer to something more like a
> > > > > micro-service
> > > > > > > > > > architecture. However, I would want to make sure we think
> > > about
> > > > > the
> > > > > > > > > > operational complexity, and mpack implications of having
> > > > another
> > > > > > > server
> > > > > > > > > > installed and running somewhere on the cluster (also,
> ssl,
> > > > > > kerberos,
> > > > > > > > etc
> > > > > > > > > > etc requirements for that service).
> > > > > > > > > >
> > > > > > > > > > On 8 May 2018 at 14:27, Ryan Merriman <
> merrimanr@gmail.com
> > >
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1 to having metron-api as it's own service and using a
> > > > gateway
> > > > > > > type
> > > > > > > > > > > pattern.
> > > > > > > > > > >
> > > > > > > > > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <
> > > > > > > ottobackwards@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Why not have metron-api as it’s own service and use a
> > > > > ‘gateway’
> > > > > > > > type
> > > > > > > > > > > > pattern in rest?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (
> > > > > merrimanr@gmail.com
> > > > > > )
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Moving the yarn classpath command earlier in the
> > > classpath
> > > > > now
> > > > > > > > gives
> > > > > > > > > > this
> > > > > > > > > > > > error:
> > > > > > > > > > > >
> > > > > > > > > > > > Caused by: java.lang.NoSuchMethodError:
> > > > > > > > > > > > javax.servlet.ServletContext.
> > > getVirtualServerName()Ljava/
> > > > > > > > > lang/String;
> > > > > > > > > > > >
> > > > > > > > > > > > I will experiment with other combinations, I suspect
> we
> > > > will
> > > > > > need
> > > > > > > > > > > > finer-grain control over the order.
> > > > > > > > > > > >
> > > > > > > > > > > > The grep matches class names inside jar files. I use
> > this
> > > > all
> > > > > > the
> > > > > > > > > time
> > > > > > > > > > > and
> > > > > > > > > > > > it's really useful.
> > > > > > > > > > > >
> > > > > > > > > > > > The metron-rest jar is already shaded.
> > > > > > > > > > > >
> > > > > > > > > > > > Reverse engineering the yarn jar command was the next
> > > > thing I
> > > > > > was
> > > > > > > > > going
> > > > > > > > > > > to
> > > > > > > > > > > > try. Will let you know how it goes.
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > > > > > > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > What order did you add the hadoop or yarn
> classpath?
> > > The
> > > > > > > "shaded"
> > > > > > > > > > > > package
> > > > > > > > > > > > > stands out to me in this name
> > > "org.apache.hadoop.hbase.*
> > > > > > > shaded*
> > > > > > > > > > > > >
> .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider."
> > > > > Maybe
> > > > > > > try
> > > > > > > > > > adding
> > > > > > > > > > > > > those packages earlier on the classpath.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think that find command needs a "jar tvf",
> > otherwise
> > > > > you're
> > > > > > > > > looking
> > > > > > > > > > > > for a
> > > > > > > > > > > > > class name in jar file names.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Have you tried shading the rest jar?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'd also look at the classpath you get when running
> > > "yarn
> > > > > > jar"
> > > > > > > to
> > > > > > > > > > start
> > > > > > > > > > > > the
> > > > > > > > > > > > > existing pcap service, per the instructions in
> > > > > > > > > metron-api/README.md.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <
> > > > > > > > merrimanr@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > To explore the idea of merging metron-api into
> > > > > metron-rest
> > > > > > > and
> > > > > > > > > > > running
> > > > > > > > > > > > > pcap
> > > > > > > > > > > > > > queries inside our REST application, I created a
> > > simple
> > > > > > test
> > > > > > > > > here:
> > > > > > > > > > > > > >
> > > > https://github.com/merrimanr/incubator-metron/tree/pcap-
> > > > > > > > > rest-test.
> > > > > > > > > > A
> > > > > > > > > > > > > > summary of what's included:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Added pcap as a dependency in the metron-rest
> > > pom.xml
> > > > > > > > > > > > > > - Added a pcap query controller endpoint at
> > > > > > > > > > > > > > http://node1:8082/swagger-ui.
> > > > > html#!/pcap-query-controller/
> > > > > > > > > > > > > queryUsingGET
> > > > > > > > > > > > > > - Added a pcap query service that runs a simple,
> > > > > hardcoded
> > > > > > > > query
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Generate some pcap data using pycapa (
> > > > > > > > > > > > > >
> > https://github.com/apache/metron/tree/master/metron-
> > > > > > > > > sensors/pycapa
> > > > > > > > > > )
> > > > > > > > > > > > and
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > pcap topology (
> > > > > > > > > > > > > >
> > https://github.com/apache/metron/tree/master/metron-
> > > > > > > > > > > > > >
> > platform/metron-pcap-backend#starting-the-topology).
> > > > > > > > > > > > > > After this initial setup there should be data in
> > HDFS
> > > > at
> > > > > > > > > > > > > > "/apps/metron/pcap". I believe this should be
> > enough
> > > to
> > > > > > > > exercise
> > > > > > > > > > the
> > > > > > > > > > > > > > issue. Just hit the endpoint referenced above. I
> > > tested
> > > > > > this
> > > > > > > in
> > > > > > > > > an
> > > > > > > > > > > > > > already running full dev by building and
> deploying
> > > the
> > > > > > > > > metron-rest
> > > > > > > > > > > > jar.
> > > > > > > > > > > > > I
> > > > > > > > > > > > > > did not rebuild full dev with this change but I
> > would
> > > > > still
> > > > > > > > > expect
> > > > > > > > > > it
> > > > > > > > > > > > to
> > > > > > > > > > > > > > work. Let me know if it doesn't.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The first error I see when I hit this endpoint
> is:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > java.lang.NoClassDefFoundError:
> > > > > > > > > > > > > > org/apache/hadoop/yarn/webapp/
> > > > > YarnJacksonJaxbJsonProvider.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Here are the things I've tried so far:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Run the REST application with the YARN jar
> > command
> > > > > since
> > > > > > > this
> > > > > > > > > is
> > > > > > > > > > > how
> > > > > > > > > > > > > > all our other YARN/MR-related applications are
> > > started
> > > > > > > > > (metron-api,
> > > > > > > > > > > > > > MAAS,
> > > > > > > > > > > > > > pcap query, etc). I wouldn't expect this to work
> > > since
> > > > we
> > > > > > > have
> > > > > > > > > > > > > runtime
> > > > > > > > > > > > > > dependencies on our shaded elasticsearch and
> parser
> > > > jars
> > > > > > and
> > > > > > > > I'm
> > > > > > > > > > not
> > > > > > > > > > > > > > aware
> > > > > > > > > > > > > > of a way to add additional jars to the classpath
> > with
> > > > the
> > > > > > > YARN
> > > > > > > > > jar
> > > > > > > > > > > > > > command
> > > > > > > > > > > > > > (is there a way?). Either way I get this error:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 18/05/04 19:49:56 WARN reflections.Reflections:
> > could
> > > > not
> > > > > > > > create
> > > > > > > > > > Dir
> > > > > > > > > > > > > using
> > > > > > > > > > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/
> > > > > > > > hadoop/lib/ojdbc6.jar.
> > > > > > > > > > > > > skipping.
> > > > > > > > > > > > > > java.lang.NullPointerException
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - I tried adding `yarn classpath` and `hadoop
> > > > classpath`
> > > > > to
> > > > > > > the
> > > > > > > > > > > > > > classpath in /usr/metron/0.4.3/bin/metron-
> rest.sh
> > > (REST
> > > > > > > start
> > > > > > > > > > > > > > script). I
> > > > > > > > > > > > > > get this error:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > java.lang.ClassNotFoundException:
> > > > > > > > > > > > > >
> > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > > > > > > > > > > jaxrs.JacksonJaxbJsonProvider
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - I searched for the class in the previous
> attempt
> > > but
> > > > > > could
> > > > > > > > not
> > > > > > > > > > find
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > in full dev:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > > > > > > > > > > >
> > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > > > > > > > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > > > > > > > > > > 2>/dev/null
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Further up in the stack trace I see the error
> > > happens
> > > > > > when
> > > > > > > > > > > > > initiating
> > > > > > > > > > > > > > the org.apache.hadoop.yarn.util.
> > > timeline.TimelineUtils
> > > > > > > class.
> > > > > > > > I
> > > > > > > > > > > > > tried
> > > > > > > > > > > > > > setting "yarn.timeline-service.enabled" in
> Ambari
> > to
> > > > > false
> > > > > > > and
> > > > > > > > > > then
> > > > > > > > > > > I
> > > > > > > > > > > > > > get
> > > > > > > > > > > > > > this error:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Unable to parse
> > > > > > > > > > > > > >
> > > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-
> > > > > > > > > framework'
> > > > > > > > > > > as
> > > > > > > > > > > > a
> > > > > > > > > > > > > > URI, check the setting for mapreduce.application.
> > > > > > > > framework.path
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - I've tried adding different hadoop, hbase, yarn
> > and
> > > > > > > mapreduce
> > > > > > > > > > Maven
> > > > > > > > > > > > > > dependencies without any success
> > > > > > > > > > > > > > - hadoop-yarn-client
> > > > > > > > > > > > > > - hadoop-yarn-common
> > > > > > > > > > > > > > - hadoop-mapreduce-client-core
> > > > > > > > > > > > > > - hadoop-yarn-server-common
> > > > > > > > > > > > > > - hadoop-yarn-api
> > > > > > > > > > > > > > - hbase-server
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I will keep exploring other possible solutions.
> Let
> > > me
> > > > > know
> > > > > > > if
> > > > > > > > > > anyone
> > > > > > > > > > > > > has
> > > > > > > > > > > > > > any ideas.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
> > > > > > > > > > ottobackwards@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I can imagine a new generic service(s)
> capability
> > > > whose
> > > > > > > job (
> > > > > > > > > pun
> > > > > > > > > > > > > > intended
> > > > > > > > > > > > > > > ) is to
> > > > > > > > > > > > > > > abstract the submittal, tracking, and storage
> of
> > > > > results
> > > > > > to
> > > > > > > > > yarn.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It would be extended with storage providers,
> > queue
> > > > > > > provider,
> > > > > > > > > > > > possibly
> > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > set of policies or rather strategies.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The pcap ‘report’ would be a client to that
> > > service,
> > > > > the
> > > > > > > > > > > specializes
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > service operation for the way we want pcap to
> > work.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We can then re-use the generic service for
> other
> > > long
> > > > > > > running
> > > > > > > > > > yarn
> > > > > > > > > > > > > > > things…..
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
> > > > > > > > > ottobackwards@gmail.com
> > > > > > > > > > )
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > RE: Tracking v. users
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The submittal and tracking can associate the
> > > > submitter
> > > > > > with
> > > > > > > > the
> > > > > > > > > > > yarn
> > > > > > > > > > > > > job
> > > > > > > > > > > > > > > and track that,
> > > > > > > > > > > > > > > regardless of the yarn credentials.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > IE> if all submittals and monitoring are by the
> > > same
> > > > > yarn
> > > > > > > > user
> > > > > > > > > (
> > > > > > > > > > > > > Metron )
> > > > > > > > > > > > > > > from a single or
> > > > > > > > > > > > > > > co-operative set of services, that service can
> > > > maintain
> > > > > > the
> > > > > > > > > > > mapping.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (
> > > > > > > > merrimanr@gmail.com
> > > > > > > > > )
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Otto, your use case makes sense to me. We'll
> have
> > > to
> > > > > > think
> > > > > > > > > about
> > > > > > > > > > > how
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > manage the user to job relationships. I'm
> > assuming
> > > > YARN
> > > > > > > jobs
> > > > > > > > > will
> > > > > > > > > > > be
> > > > > > > > > > > > > > > submitted as the metron service user so YARN
> > won't
> > > > keep
> > > > > > > track
> > > > > > > > > of
> > > > > > > > > > > > this
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > us. Is that assumption correct? Do you have any
> > > ideas
> > > > > for
> > > > > > > > doing
> > > > > > > > > > > > that?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Mike, I can start a feature branch and
> experiment
> > > > with
> > > > > > > > merging
> > > > > > > > > > > > > metron-api
> > > > > > > > > > > > > > > into metron-rest. That should allow us to
> > > collaborate
> > > > > on
> > > > > > > any
> > > > > > > > > > issues
> > > > > > > > > > > > or
> > > > > > > > > > > > > > > challenges. Also, can you expand on your idea
> to
> > > > manage
> > > > > > > > > external
> > > > > > > > > > > > > > > dependencies as a special module? That seems
> > like a
> > > > > very
> > > > > > > > > > attractive
> > > > > > > > > > > > > > option
> > > > > > > > > > > > > > > to me.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
> > > > > > > > > > > ottobackwards@gmail.com>
> > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > From my response on the other thread, but
> > > > applicable
> > > > > to
> > > > > > > the
> > > > > > > > > > > > backend
> > > > > > > > > > > > > > > stuff:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > "The PCAP Query seems more like PCAP Report
> to
> > > me.
> > > > > You
> > > > > > > are
> > > > > > > > > > > > > generating a
> > > > > > > > > > > > > > > > report based on parameters.
> > > > > > > > > > > > > > > > That report is something that takes some time
> > and
> > > > > > > external
> > > > > > > > > > > process
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > > generate… ie you have to wait for it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I can almost imagine a flow where you:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > * Are in the AlertUI
> > > > > > > > > > > > > > > > * Ask to generate a PCAP report based on some
> > > > > selected
> > > > > > > > > > > > > > alerts/meta-alert,
> > > > > > > > > > > > > > > > possibly picking from on or more report
> > > ‘templates’
> > > > > > > > > > > > > > > > that have query options etc
> > > > > > > > > > > > > > > > * The report request is ‘queued’, that is
> > > > dispatched
> > > > > to
> > > > > > > be
> > > > > > > > be
> > > > > > > > > > > > > > > > executed/generated
> > > > > > > > > > > > > > > > * You as a user have a ‘queue’ of your report
> > > > > results,
> > > > > > > and
> > > > > > > > > when
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > report
> > > > > > > > > > > > > > > > is done it is queued there
> > > > > > > > > > > > > > > > * We ‘monitor’ the report/queue press through
> > the
> > > > > yarn
> > > > > > > > rest (
> > > > > > > > > > > > report
> > > > > > > > > > > > > > > > info/meta has the yarn details )
> > > > > > > > > > > > > > > > * You can select the report from your queue
> and
> > > > view
> > > > > it
> > > > > > > > > either
> > > > > > > > > > in
> > > > > > > > > > > > a
> > > > > > > > > > > > > new
> > > > > > > > > > > > > > > UI
> > > > > > > > > > > > > > > > or custom component
> > > > > > > > > > > > > > > > * You can then apply a different ‘view’ to
> the
> > > > report
> > > > > > or
> > > > > > > > work
> > > > > > > > > > > with
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > report data
> > > > > > > > > > > > > > > > * You can print / save etc
> > > > > > > > > > > > > > > > * You can associate the report with the
> alerts
> > (
> > > > > again
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > > report
> > > > > > > > > > > > > > info
> > > > > > > > > > > > > > > > ) with…. a ‘case’ or ‘ticket’ or
> investigation
> > > > > > something
> > > > > > > or
> > > > > > > > > > other
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We can introduce extensibility into the
> report
> > > > > > templates,
> > > > > > > > > > report
> > > > > > > > > > > > > views
> > > > > > > > > > > > > > (
> > > > > > > > > > > > > > > > thinks that work with the json data of the
> > > report )
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Something like that.”
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Maybe we can do :
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > template -> query parameters -> script =>
> yarn
> > > info
> > > > > > > > > > > > > > > > yarn info + query info + alert context + yarn
> > > > status
> > > > > =>
> > > > > > > > > report
> > > > > > > > > > > > info
> > > > > > > > > > > > > ->
> > > > > > > > > > > > > > > > stored in a user’s ‘report queue’
> > > > > > > > > > > > > > > > report persistence added to report info
> > > > > > > > > > > > > > > > metron-rest -> api to monitor the queue, read
> > > > > results (
> > > > > > > > page
> > > > > > > > > ),
> > > > > > > > > > > > etc
> > > > > > > > > > > > > etc
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (
> > > > > > > > > merrimanr@gmail.com
> > > > > > > > > > )
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I started a separate thread on Pcap UI
> > > > considerations
> > > > > > and
> > > > > > > > > user
> > > > > > > > > > > > > > > > requirements
> > > > > > > > > > > > > > > > at Otto's request. This should help us keep
> > these
> > > > two
> > > > > > > > related
> > > > > > > > > > but
> > > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > discussions focused.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul
> <
> > > > > > > > > > > > > michelsumbul@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > (Youhouuu my first reply on this kind of
> mail
> > > > > > chain^^)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > If I may, I would like to share my view on
> > the
> > > > > > > following
> > > > > > > > 3
> > > > > > > > > > > > points.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Backend:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The current metron-api is totally seperate,
> > it
> > > > will
> > > > > > be
> > > > > > > > > logic
> > > > > > > > > > > for
> > > > > > > > > > > > me
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > it at the same place as the others rest
> api.
> > > > > > Especially
> > > > > > > > > when
> > > > > > > > > > > > more
> > > > > > > > > > > > > > > > security
> > > > > > > > > > > > > > > > > will be added, it will not be needed to do
> > the
> > > > job
> > > > > > > twice.
> > > > > > > > > > > > > > > > > The current implementation send back a pcap
> > > > object
> > > > > > > which
> > > > > > > > > > still
> > > > > > > > > > > > need
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > decoded. In the opensoc, the decoding was
> > done
> > > > with
> > > > > > > > tshard
> > > > > > > > > on
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > frontend.
> > > > > > > > > > > > > > > > > It will be good to have this decoding
> > happening
> > > > > > > directly
> > > > > > > > on
> > > > > > > > > > the
> > > > > > > > > > > > > > backend
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > not create a load on frontend. An option
> will
> > > be
> > > > to
> > > > > > > > install
> > > > > > > > > > > > tshark
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > rest server and to use to convert the pcap
> to
> > > xml
> > > > > and
> > > > > > > > then
> > > > > > > > > > to a
> > > > > > > > > > > > > json
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > will be send to the frontend.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I tried to start directly the map/reduce
> job
> > to
> > > > > > search
> > > > > > > > over
> > > > > > > > > > all
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > pcap
> > > > > > > > > > > > > > > > > data from the rest server and as Ryan
> mention
> > > it,
> > > > > we
> > > > > > > had
> > > > > > > > > > > > trouble. I
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > try to find back the error.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Then in the POC, what we tried is to use
> the
> > > > > > pcap_query
> > > > > > > > > > script
> > > > > > > > > > > > and
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > work fine. I just modified it that he sends
> > > back
> > > > > > > directly
> > > > > > > > > the
> > > > > > > > > > > > > job_id
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > yarn and not waiting that the job is
> > finished.
> > > > Then
> > > > > > it
> > > > > > > > will
> > > > > > > > > > > > allow
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > UI
> > > > > > > > > > > > > > > > > and the rest server to know what the status
> > of
> > > > the
> > > > > > > > research
> > > > > > > > > > by
> > > > > > > > > > > > > > querying
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > yarn rest api. This will allow the UI and
> the
> > > > rest
> > > > > > > server
> > > > > > > > > to
> > > > > > > > > > be
> > > > > > > > > > > > > async
> > > > > > > > > > > > > > > > > without any blocking phase. What do you
> think
> > > > about
> > > > > > > that?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Having the job submitted directly from the
> > code
> > > > of
> > > > > > the
> > > > > > > > rest
> > > > > > > > > > > > server
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > perfect, but it will need a lot of
> > > investigation
> > > > I
> > > > > > > think
> > > > > > > > > (but
> > > > > > > > > > > > I'm
> > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > expert so I might be completely wrong ^^).
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > We know that the pcap_query scritp work
> fine
> > so
> > > > why
> > > > > > not
> > > > > > > > > > calling
> > > > > > > > > > > > it?
> > > > > > > > > > > > > > Is
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > that bad? (maybe stupid question, but I
> > really
> > > > > don’t
> > > > > > > see
> > > > > > > > a
> > > > > > > > > > lot
> > > > > > > > > > > > of
> > > > > > > > > > > > > > > > drawback)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Front end:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Adding the the pcap search to the alert UI
> > is,
> > > I
> > > > > > think,
> > > > > > > > the
> > > > > > > > > > > > easiest
> > > > > > > > > > > > > > way
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > move forward. But indeed, it will then be
> the
> > > > > “Alert
> > > > > > UI
> > > > > > > > and
> > > > > > > > > > > > > > pcapquery”.
> > > > > > > > > > > > > > > > > Maybe the name of the UI should just change
> > to
> > > > > > > something
> > > > > > > > > like
> > > > > > > > > > > > > > > > “Monitoring &
> > > > > > > > > > > > > > > > > Investigation UI” ?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Is there any roadmap or plan for the
> > different
> > > > UI?
> > > > > I
> > > > > > > mean
> > > > > > > > > did
> > > > > > > > > > > > you
> > > > > > > > > > > > > > > > already
> > > > > > > > > > > > > > > > > had discussion on how you see the ui
> evolving
> > > > with
> > > > > > the
> > > > > > > > new
> > > > > > > > > > > > feature
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > will come in the future?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Microservices:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > What do you mean exactly by microservices?
> Is
> > > it
> > > > to
> > > > > > > > > separate
> > > > > > > > > > > all
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > features in different projects? Or
> something
> > > like
> > > > > > > having
> > > > > > > > > the
> > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > components in container like kubernet?
> (again
> > > > maybe
> > > > > > > > stupid
> > > > > > > > > > > > > question,
> > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > don’t clearly understand what you mean J )
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Michel
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > --
> > > > > > > > > > simon elliston ball
> > > > > > > > > > @sireb
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > --
> > >
> > > Jon
> > >
> >
> --
>
> Jon
>



-- 
--
simon elliston ball
@sireb

Re: [DISCUSS] Pcap panel architecture

Posted by Otto Fowler <ot...@gmail.com>.
Don’t lose the use case for manually uploading PCAPS for analysis Jon.


On May 11, 2018 at 10:14:02, Zeolla@GMail.com (zeolla@gmail.com) wrote:

I think baby steps are fine - admin gets access to all, otherwise you only
see your own pcaps, but we file a jira for a future add of API security,
which more mature SOCs that align with the Metron personas will need.

Jon

On Fri, May 11, 2018, 09:27 Ryan Merriman <me...@gmail.com> wrote:

> That's a good point Jon. There are different levels of effort associated
> with different options. If we want to allow pcaps to be shared with
> specific users, we will need to introduce ACL security in our REST
> application using something like the ACL capability that comes with
Spring
> Security or Ranger. This would be more complex to design and implement.
> If we want something more broad like admin roles that can see all or
> allowing pcap files to become public, this would be less work. Do you
> think ACL security is required or would the other options be acceptable?
>
> On Thu, May 10, 2018 at 2:47 PM, Zeolla@GMail.com <ze...@gmail.com>
> wrote:
>
> > At the very least there needs to be the ability to share downloaded
PCAPs
> > with other users and/or have roles that can see all pcaps. A platform
> > engineer may want to clean up old pcaps after x time, or a manger may
ask
> > an analyst to find all of the traffic that exhibits xyz behavior, dump
a
> > pcap, and then point him to it so the manager can review. Since the
> > pcap may be huge, we wouldn't want to try to push people to sending it
> via
> > email, uploading to a file server, finding an external hard drive, etc.
> >
> > Jon
> >
> > On Thu, May 10, 2018 at 10:16 AM Ryan Merriman <me...@gmail.com>
> > wrote:
> >
> > > Mike, I believe the /pcapGetter/getPcapsByIdentifiers endpoint
exposes
> > the
> > > fixed query option which we have covered. I agree with you that
> > > deprecating the metron-api module should be a goal of this feature.
> > >
> > > On Wed, May 9, 2018 at 1:36 PM, Michael Miklavcic <
> > > michael.miklavcic@gmail.com> wrote:
> > >
> > > > This looks like a pretty good start Ryan. Does the metadata
endpoint
> > > cover
> > > > this https://github.com/apache/metron/tree/master/
> > > > metron-platform/metron-api#the-pcapgettergetpcapsbyidentifier
> > s-endpoint
> > > > from the original metron-api? If so, then we would be able to
> deprecate
> > > the
> > > > existing metron-api project. If we later go to micro-services, a
pcap
> > > > module would spin back into the fold, but it would probably look
> > > different
> > > > from metron-api.
> > > >
> > > > I commented on the UI thread, but to reiterate for the purpose of
> > backend
> > > > functionality here I don't believe there is a way to "PAUSE" or
> > "SUSPEND"
> > > > jobs. That said, I think GET /api/v1/pcap/stop/<jobId> is
sufficient
> > for
> > > > the job management operations.
> > > >
> > > > On Wed, May 9, 2018 at 11:00 AM, Ryan Merriman <me...@gmail.com>

> > > > wrote:
> > > >
> > > > > Now that we are confident we can run submit a MR job from our
> current
> > > > REST
> > > > > application, is this the desired approach? Just want to confirm.
> > > > >
> > > > > Next I think we should map out what the REST interface will look
> > like.
> > > > > Here are the endpoints I'm thinking about:
> > > > >
> > > > > GET /api/v1/pcap/metadata?basePath
> > > > >
> > > > > This endpoint will return metadata of pcap data stored in HDFS.
> This
> > > > would
> > > > > include pcap size, date ranges (how far back can I go), etc. It
> > would
> > > > > accept an optional HDFS basePath parameter for cases where pcap
> data
> > is
> > > > > stored in multiple places and/or different from the default
> location.
> > > > >
> > > > > POST /api/v1/pcap/query
> > > > >
> > > > > This endpoint would accept a pcap request, submit a pcap query
job,
> > and
> > > > > return a job id. The request would be an object containing the
> > > > parameters
> > > > > documented here: https://github.com/apache/metron/tree/master/
> > > > > metron-platform/metron-pcap-backend#query-filter-utility. A
> > query/job
> > > > > would be associated with a user that submits it. An exception
will
> > be
> > > > > returned for violating constraints like too many queries
submitted,
> > > query
> > > > > parameters out of limits, etc.
> > > > >
> > > > > GET /api/v1/pcap/status/<jobId>
> > > > >
> > > > > This endpoint will return the status of a running job. I imagine
> > this
> > > is
> > > > > just a proxy to the YARN REST api. We can discuss the
> implementation
> > > > > behind these endpoints later.
> > > > >
> > > > > GET /api/v1/pcap/stop/<jobId>
> > > > >
> > > > > This endpoint would kill a running pcap job. If the job has
> already
> > > > > completed this is a noop.
> > > > >
> > > > > GET /api/v1/pcap/list
> > > > >
> > > > > This endpoint will list a user's submitted pcap queries. Items in
> > the
> > > > list
> > > > > would contain job id, status (is it finished?), start/end time,
and
> > > > number
> > > > > of pages. Maybe there is some overlap with the status endpoint
> above
> > > and
> > > > > the status endpoint is not needed?
> > > > >
> > > > > GET /api/v1/pcap/pdml/<jobId>/<pageNumber>
> > > > >
> > > > > This endpoint will return pcap results for the given page in pdml
> > > format
> > > > (
> > > > > https://wiki.wireshark.org/PDML). Are there other formats we want
> > to
> > > > > support?
> > > > >
> > > > > GET /api/v1/pcap/raw/<jobId>/<pageNumber>
> > > > >
> > > > > This endpoint will allow a user to download raw pcap results for
> the
> > > > given
> > > > > page.
> > > > >
> > > > > DELETE /api/v1/pcap/<jobId>
> > > > >
> > > > > This endpoint will delete pcap query results. Not sure yet how
> this
> > > fits
> > > > > in with our broader cleanup strategy.
> > > > >
> > > > > This should get us started. What did I miss and what would you
> > change
> > > > > about these? I did not include much detail related to security,
> > > cleanup
> > > > > strategy, or underlying implementation details but these are
items
> we
> > > > > should discuss at some point.
> > > > >
> > > > > On Tue, May 8, 2018 at 5:38 PM, Michael Miklavcic <
> > > > > michael.miklavcic@gmail.com> wrote:
> > > > >
> > > > > > Sweet! That's great news. The pom changes are a lot simpler
than
> I
> > > > > > expected. Very nice.
> > > > > >
> > > > > > On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman <
> merrimanr@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Finally figured it out. Commit is here:
> > > > > > > https://github.com/merrimanr/incubator-metron/commit/
> > > > > > > 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
> > > > > > >
> > > > > > > It came down to figuring out the right combination of maven
> > > > > dependencies
> > > > > > > and passing in the HDP version to REST as a Java system
> property.
> > > I
> > > > > also
> > > > > > > included some HDFS setup tasks. I tested this in full dev and
> > can
> > > > now
> > > > > > > successfully run a pcap query and get results. All you should
> > have
> > > > to
> > > > > do
> > > > > > > is generate some pcap data first.
> > > > > > >
> > > > > > > On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
> > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > >
> > > > > > > > @Ryan - pulled your branch and experimented with a few
> things.
> > In
> > > > > doing
> > > > > > > so,
> > > > > > > > it dawned on me that by adding the yarn and hadoop
classpath,
> > you
> > > > > > > probably
> > > > > > > > didn't introduce a new classpath issue, rather you probably
> > just
> > > > > moved
> > > > > > > onto
> > > > > > > > the next classpath issue, ie hbase per your exception about
> > hbase
> > > > > jaxb.
> > > > > > > > Anyhow, I put up a branch with some pom changes worth
trying
> in
> > > > > > > conjunction
> > > > > > > > with invoking the rest app startup via "/usr/bin/yarn jar"
> > > > > > > >
> > > > > > > > https://github.com/mmiklavc/metron/tree/ryan-rest-test
> > > > > > > >
> > > > > > > > https://github.com/mmiklavc/metron/commit/
> > > > > > 5ca23580fc6e043fafae2327c80b65
> > > > > > > > b20ca1c0c9
> > > > > > > >
> > > > > > > > Mike
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> > > > > > > > simon@simonellistonball.com> wrote:
> > > > > > > >
> > > > > > > > > That would be a step closer to something more like a
> > > > micro-service
> > > > > > > > > architecture. However, I would want to make sure we think
> > about
> > > > the
> > > > > > > > > operational complexity, and mpack implications of having
> > > another
> > > > > > server
> > > > > > > > > installed and running somewhere on the cluster (also,
ssl,
> > > > > kerberos,
> > > > > > > etc
> > > > > > > > > etc requirements for that service).
> > > > > > > > >
> > > > > > > > > On 8 May 2018 at 14:27, Ryan Merriman <merrimanr@gmail.com
> >
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 to having metron-api as it's own service and using a
> > > gateway
> > > > > > type
> > > > > > > > > > pattern.
> > > > > > > > > >
> > > > > > > > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <
> > > > > > ottobackwards@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Why not have metron-api as it’s own service and use a
> > > > ‘gateway’
> > > > > > > type
> > > > > > > > > > > pattern in rest?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (
> > > > merrimanr@gmail.com
> > > > > )
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Moving the yarn classpath command earlier in the
> > classpath
> > > > now
> > > > > > > gives
> > > > > > > > > this
> > > > > > > > > > > error:
> > > > > > > > > > >
> > > > > > > > > > > Caused by: java.lang.NoSuchMethodError:
> > > > > > > > > > > javax.servlet.ServletContext.
> > getVirtualServerName()Ljava/
> > > > > > > > lang/String;
> > > > > > > > > > >
> > > > > > > > > > > I will experiment with other combinations, I suspect
we
> > > will
> > > > > need
> > > > > > > > > > > finer-grain control over the order.
> > > > > > > > > > >
> > > > > > > > > > > The grep matches class names inside jar files. I use
> this
> > > all
> > > > > the
> > > > > > > > time
> > > > > > > > > > and
> > > > > > > > > > > it's really useful.
> > > > > > > > > > >
> > > > > > > > > > > The metron-rest jar is already shaded.
> > > > > > > > > > >
> > > > > > > > > > > Reverse engineering the yarn jar command was the next
> > > thing I
> > > > > was
> > > > > > > > going
> > > > > > > > > > to
> > > > > > > > > > > try. Will let you know how it goes.
> > > > > > > > > > >
> > > > > > > > > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > > > > > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > What order did you add the hadoop or yarn
classpath?
> > The
> > > > > > "shaded"
> > > > > > > > > > > package
> > > > > > > > > > > > stands out to me in this name
> > "org.apache.hadoop.hbase.*
> > > > > > shaded*
> > > > > > > > > > > >
.org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider."
> > > > Maybe
> > > > > > try
> > > > > > > > > adding
> > > > > > > > > > > > those packages earlier on the classpath.
> > > > > > > > > > > >
> > > > > > > > > > > > I think that find command needs a "jar tvf",
> otherwise
> > > > you're
> > > > > > > > looking
> > > > > > > > > > > for a
> > > > > > > > > > > > class name in jar file names.
> > > > > > > > > > > >
> > > > > > > > > > > > Have you tried shading the rest jar?
> > > > > > > > > > > >
> > > > > > > > > > > > I'd also look at the classpath you get when running
> > "yarn
> > > > > jar"
> > > > > > to
> > > > > > > > > start
> > > > > > > > > > > the
> > > > > > > > > > > > existing pcap service, per the instructions in
> > > > > > > > metron-api/README.md.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <
> > > > > > > merrimanr@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > To explore the idea of merging metron-api into
> > > > metron-rest
> > > > > > and
> > > > > > > > > > running
> > > > > > > > > > > > pcap
> > > > > > > > > > > > > queries inside our REST application, I created a
> > simple
> > > > > test
> > > > > > > > here:
> > > > > > > > > > > > >
> > > https://github.com/merrimanr/incubator-metron/tree/pcap-
> > > > > > > > rest-test.
> > > > > > > > > A
> > > > > > > > > > > > > summary of what's included:
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Added pcap as a dependency in the metron-rest
> > pom.xml
> > > > > > > > > > > > > - Added a pcap query controller endpoint at
> > > > > > > > > > > > > http://node1:8082/swagger-ui.
> > > > html#!/pcap-query-controller/
> > > > > > > > > > > > queryUsingGET
> > > > > > > > > > > > > - Added a pcap query service that runs a simple,
> > > > hardcoded
> > > > > > > query
> > > > > > > > > > > > >
> > > > > > > > > > > > > Generate some pcap data using pycapa (
> > > > > > > > > > > > >
> https://github.com/apache/metron/tree/master/metron-
> > > > > > > > sensors/pycapa
> > > > > > > > > )
> > > > > > > > > > > and
> > > > > > > > > > > > > the
> > > > > > > > > > > > > pcap topology (
> > > > > > > > > > > > >
> https://github.com/apache/metron/tree/master/metron-
> > > > > > > > > > > > >
> platform/metron-pcap-backend#starting-the-topology).
> > > > > > > > > > > > > After this initial setup there should be data in
> HDFS
> > > at
> > > > > > > > > > > > > "/apps/metron/pcap". I believe this should be
> enough
> > to
> > > > > > > exercise
> > > > > > > > > the
> > > > > > > > > > > > > issue. Just hit the endpoint referenced above. I
> > tested
> > > > > this
> > > > > > in
> > > > > > > > an
> > > > > > > > > > > > > already running full dev by building and
deploying
> > the
> > > > > > > > metron-rest
> > > > > > > > > > > jar.
> > > > > > > > > > > > I
> > > > > > > > > > > > > did not rebuild full dev with this change but I
> would
> > > > still
> > > > > > > > expect
> > > > > > > > > it
> > > > > > > > > > > to
> > > > > > > > > > > > > work. Let me know if it doesn't.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The first error I see when I hit this endpoint
is:
> > > > > > > > > > > > >
> > > > > > > > > > > > > java.lang.NoClassDefFoundError:
> > > > > > > > > > > > > org/apache/hadoop/yarn/webapp/
> > > > YarnJacksonJaxbJsonProvider.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Here are the things I've tried so far:
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Run the REST application with the YARN jar
> command
> > > > since
> > > > > > this
> > > > > > > > is
> > > > > > > > > > how
> > > > > > > > > > > > > all our other YARN/MR-related applications are
> > started
> > > > > > > > (metron-api,
> > > > > > > > > > > > > MAAS,
> > > > > > > > > > > > > pcap query, etc). I wouldn't expect this to work
> > since
> > > we
> > > > > > have
> > > > > > > > > > > > runtime
> > > > > > > > > > > > > dependencies on our shaded elasticsearch and
parser
> > > jars
> > > > > and
> > > > > > > I'm
> > > > > > > > > not
> > > > > > > > > > > > > aware
> > > > > > > > > > > > > of a way to add additional jars to the classpath
> with
> > > the
> > > > > > YARN
> > > > > > > > jar
> > > > > > > > > > > > > command
> > > > > > > > > > > > > (is there a way?). Either way I get this error:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 18/05/04 19:49:56 WARN reflections.Reflections:
> could
> > > not
> > > > > > > create
> > > > > > > > > Dir
> > > > > > > > > > > > using
> > > > > > > > > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/
> > > > > > > hadoop/lib/ojdbc6.jar.
> > > > > > > > > > > > skipping.
> > > > > > > > > > > > > java.lang.NullPointerException
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > - I tried adding `yarn classpath` and `hadoop
> > > classpath`
> > > > to
> > > > > > the
> > > > > > > > > > > > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh
> > (REST
> > > > > > start
> > > > > > > > > > > > > script). I
> > > > > > > > > > > > > get this error:
> > > > > > > > > > > > >
> > > > > > > > > > > > > java.lang.ClassNotFoundException:
> > > > > > > > > > > > >
> org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > > > > > > > > > jaxrs.JacksonJaxbJsonProvider
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > - I searched for the class in the previous
attempt
> > but
> > > > > could
> > > > > > > not
> > > > > > > > > find
> > > > > > > > > > > > it
> > > > > > > > > > > > > in full dev:
> > > > > > > > > > > > >
> > > > > > > > > > > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > > > > > > > > > >
> org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > > > > > > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > > > > > > > > > 2>/dev/null
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Further up in the stack trace I see the error
> > happens
> > > > > when
> > > > > > > > > > > > initiating
> > > > > > > > > > > > > the org.apache.hadoop.yarn.util.
> > timeline.TimelineUtils
> > > > > > class.
> > > > > > > I
> > > > > > > > > > > > tried
> > > > > > > > > > > > > setting "yarn.timeline-service.enabled" in Ambari
> to
> > > > false
> > > > > > and
> > > > > > > > > then
> > > > > > > > > > I
> > > > > > > > > > > > > get
> > > > > > > > > > > > > this error:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Unable to parse
> > > > > > > > > > > > >
> > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-
> > > > > > > > framework'
> > > > > > > > > > as
> > > > > > > > > > > a
> > > > > > > > > > > > > URI, check the setting for mapreduce.application.
> > > > > > > framework.path
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > - I've tried adding different hadoop, hbase, yarn
> and
> > > > > > mapreduce
> > > > > > > > > Maven
> > > > > > > > > > > > > dependencies without any success
> > > > > > > > > > > > > - hadoop-yarn-client
> > > > > > > > > > > > > - hadoop-yarn-common
> > > > > > > > > > > > > - hadoop-mapreduce-client-core
> > > > > > > > > > > > > - hadoop-yarn-server-common
> > > > > > > > > > > > > - hadoop-yarn-api
> > > > > > > > > > > > > - hbase-server
> > > > > > > > > > > > >
> > > > > > > > > > > > > I will keep exploring other possible solutions.
Let
> > me
> > > > know
> > > > > > if
> > > > > > > > > anyone
> > > > > > > > > > > > has
> > > > > > > > > > > > > any ideas.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
> > > > > > > > > ottobackwards@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I can imagine a new generic service(s)
capability
> > > whose
> > > > > > job (
> > > > > > > > pun
> > > > > > > > > > > > > intended
> > > > > > > > > > > > > > ) is to
> > > > > > > > > > > > > > abstract the submittal, tracking, and storage
of
> > > > results
> > > > > to
> > > > > > > > yarn.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It would be extended with storage providers,
> queue
> > > > > > provider,
> > > > > > > > > > > possibly
> > > > > > > > > > > > > some
> > > > > > > > > > > > > > set of policies or rather strategies.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The pcap ‘report’ would be a client to that
> > service,
> > > > the
> > > > > > > > > > specializes
> > > > > > > > > > > > the
> > > > > > > > > > > > > > service operation for the way we want pcap to
> work.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We can then re-use the generic service for
other
> > long
> > > > > > running
> > > > > > > > > yarn
> > > > > > > > > > > > > > things…..
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
> > > > > > > > ottobackwards@gmail.com
> > > > > > > > > )
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > RE: Tracking v. users
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The submittal and tracking can associate the
> > > submitter
> > > > > with
> > > > > > > the
> > > > > > > > > > yarn
> > > > > > > > > > > > job
> > > > > > > > > > > > > > and track that,
> > > > > > > > > > > > > > regardless of the yarn credentials.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > IE> if all submittals and monitoring are by the
> > same
> > > > yarn
> > > > > > > user
> > > > > > > > (
> > > > > > > > > > > > Metron )
> > > > > > > > > > > > > > from a single or
> > > > > > > > > > > > > > co-operative set of services, that service can
> > > maintain
> > > > > the
> > > > > > > > > > mapping.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (
> > > > > > > merrimanr@gmail.com
> > > > > > > > )
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Otto, your use case makes sense to me. We'll
have
> > to
> > > > > think
> > > > > > > > about
> > > > > > > > > > how
> > > > > > > > > > > to
> > > > > > > > > > > > > > manage the user to job relationships. I'm
> assuming
> > > YARN
> > > > > > jobs
> > > > > > > > will
> > > > > > > > > > be
> > > > > > > > > > > > > > submitted as the metron service user so YARN
> won't
> > > keep
> > > > > > track
> > > > > > > > of
> > > > > > > > > > > this
> > > > > > > > > > > > for
> > > > > > > > > > > > > > us. Is that assumption correct? Do you have any
> > ideas
> > > > for
> > > > > > > doing
> > > > > > > > > > > that?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Mike, I can start a feature branch and
experiment
> > > with
> > > > > > > merging
> > > > > > > > > > > > metron-api
> > > > > > > > > > > > > > into metron-rest. That should allow us to
> > collaborate
> > > > on
> > > > > > any
> > > > > > > > > issues
> > > > > > > > > > > or
> > > > > > > > > > > > > > challenges. Also, can you expand on your idea
to
> > > manage
> > > > > > > > external
> > > > > > > > > > > > > > dependencies as a special module? That seems
> like a
> > > > very
> > > > > > > > > attractive
> > > > > > > > > > > > > option
> > > > > > > > > > > > > > to me.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
> > > > > > > > > > ottobackwards@gmail.com>
> > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > From my response on the other thread, but
> > > applicable
> > > > to
> > > > > > the
> > > > > > > > > > > backend
> > > > > > > > > > > > > > stuff:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > "The PCAP Query seems more like PCAP Report
to
> > me.
> > > > You
> > > > > > are
> > > > > > > > > > > > generating a
> > > > > > > > > > > > > > > report based on parameters.
> > > > > > > > > > > > > > > That report is something that takes some time
> and
> > > > > > external
> > > > > > > > > > process
> > > > > > > > > > > to
> > > > > > > > > > > > > > > generate… ie you have to wait for it.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I can almost imagine a flow where you:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > * Are in the AlertUI
> > > > > > > > > > > > > > > * Ask to generate a PCAP report based on some
> > > > selected
> > > > > > > > > > > > > alerts/meta-alert,
> > > > > > > > > > > > > > > possibly picking from on or more report
> > ‘templates’
> > > > > > > > > > > > > > > that have query options etc
> > > > > > > > > > > > > > > * The report request is ‘queued’, that is
> > > dispatched
> > > > to
> > > > > > be
> > > > > > > be
> > > > > > > > > > > > > > > executed/generated
> > > > > > > > > > > > > > > * You as a user have a ‘queue’ of your report
> > > > results,
> > > > > > and
> > > > > > > > when
> > > > > > > > > > > the
> > > > > > > > > > > > > > report
> > > > > > > > > > > > > > > is done it is queued there
> > > > > > > > > > > > > > > * We ‘monitor’ the report/queue press through
> the
> > > > yarn
> > > > > > > rest (
> > > > > > > > > > > report
> > > > > > > > > > > > > > > info/meta has the yarn details )
> > > > > > > > > > > > > > > * You can select the report from your queue
and
> > > view
> > > > it
> > > > > > > > either
> > > > > > > > > in
> > > > > > > > > > > a
> > > > > > > > > > > > new
> > > > > > > > > > > > > > UI
> > > > > > > > > > > > > > > or custom component
> > > > > > > > > > > > > > > * You can then apply a different ‘view’ to
the
> > > report
> > > > > or
> > > > > > > work
> > > > > > > > > > with
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > report data
> > > > > > > > > > > > > > > * You can print / save etc
> > > > > > > > > > > > > > > * You can associate the report with the
alerts
> (
> > > > again
> > > > > in
> > > > > > > the
> > > > > > > > > > > report
> > > > > > > > > > > > > info
> > > > > > > > > > > > > > > ) with…. a ‘case’ or ‘ticket’ or
investigation
> > > > > something
> > > > > > or
> > > > > > > > > other
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We can introduce extensibility into the
report
> > > > > templates,
> > > > > > > > > report
> > > > > > > > > > > > views
> > > > > > > > > > > > > (
> > > > > > > > > > > > > > > thinks that work with the json data of the
> > report )
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Something like that.”
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Maybe we can do :
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > template -> query parameters -> script =>
yarn
> > info
> > > > > > > > > > > > > > > yarn info + query info + alert context + yarn
> > > status
> > > > =>
> > > > > > > > report
> > > > > > > > > > > info
> > > > > > > > > > > > ->
> > > > > > > > > > > > > > > stored in a user’s ‘report queue’
> > > > > > > > > > > > > > > report persistence added to report info
> > > > > > > > > > > > > > > metron-rest -> api to monitor the queue, read
> > > > results (
> > > > > > > page
> > > > > > > > ),
> > > > > > > > > > > etc
> > > > > > > > > > > > etc
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (
> > > > > > > > merrimanr@gmail.com
> > > > > > > > > )
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I started a separate thread on Pcap UI
> > > considerations
> > > > > and
> > > > > > > > user
> > > > > > > > > > > > > > > requirements
> > > > > > > > > > > > > > > at Otto's request. This should help us keep
> these
> > > two
> > > > > > > related
> > > > > > > > > but
> > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > discussions focused.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul
<
> > > > > > > > > > > > michelsumbul@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > (Youhouuu my first reply on this kind of
mail
> > > > > chain^^)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If I may, I would like to share my view on
> the
> > > > > > following
> > > > > > > 3
> > > > > > > > > > > points.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Backend:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The current metron-api is totally seperate,
> it
> > > will
> > > > > be
> > > > > > > > logic
> > > > > > > > > > for
> > > > > > > > > > > me
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > it at the same place as the others rest
api.
> > > > > Especially
> > > > > > > > when
> > > > > > > > > > > more
> > > > > > > > > > > > > > > security
> > > > > > > > > > > > > > > > will be added, it will not be needed to do
> the
> > > job
> > > > > > twice.
> > > > > > > > > > > > > > > > The current implementation send back a pcap
> > > object
> > > > > > which
> > > > > > > > > still
> > > > > > > > > > > need
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > decoded. In the opensoc, the decoding was
> done
> > > with
> > > > > > > tshard
> > > > > > > > on
> > > > > > > > > > > the
> > > > > > > > > > > > > > > frontend.
> > > > > > > > > > > > > > > > It will be good to have this decoding
> happening
> > > > > > directly
> > > > > > > on
> > > > > > > > > the
> > > > > > > > > > > > > backend
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > not create a load on frontend. An option
will
> > be
> > > to
> > > > > > > install
> > > > > > > > > > > tshark
> > > > > > > > > > > > on
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > rest server and to use to convert the pcap
to
> > xml
> > > > and
> > > > > > > then
> > > > > > > > > to a
> > > > > > > > > > > > json
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > will be send to the frontend.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I tried to start directly the map/reduce
job
> to
> > > > > search
> > > > > > > over
> > > > > > > > > all
> > > > > > > > > > > the
> > > > > > > > > > > > > > pcap
> > > > > > > > > > > > > > > > data from the rest server and as Ryan
mention
> > it,
> > > > we
> > > > > > had
> > > > > > > > > > > trouble. I
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > try to find back the error.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Then in the POC, what we tried is to use
the
> > > > > pcap_query
> > > > > > > > > script
> > > > > > > > > > > and
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > work fine. I just modified it that he sends
> > back
> > > > > > directly
> > > > > > > > the
> > > > > > > > > > > > job_id
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > yarn and not waiting that the job is
> finished.
> > > Then
> > > > > it
> > > > > > > will
> > > > > > > > > > > allow
> > > > > > > > > > > > the
> > > > > > > > > > > > > > UI
> > > > > > > > > > > > > > > > and the rest server to know what the status
> of
> > > the
> > > > > > > research
> > > > > > > > > by
> > > > > > > > > > > > > querying
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > yarn rest api. This will allow the UI and
the
> > > rest
> > > > > > server
> > > > > > > > to
> > > > > > > > > be
> > > > > > > > > > > > async
> > > > > > > > > > > > > > > > without any blocking phase. What do you
think
> > > about
> > > > > > that?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Having the job submitted directly from the
> code
> > > of
> > > > > the
> > > > > > > rest
> > > > > > > > > > > server
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > perfect, but it will need a lot of
> > investigation
> > > I
> > > > > > think
> > > > > > > > (but
> > > > > > > > > > > I'm
> > > > > > > > > > > > not
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > expert so I might be completely wrong ^^).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We know that the pcap_query scritp work
fine
> so
> > > why
> > > > > not
> > > > > > > > > calling
> > > > > > > > > > > it?
> > > > > > > > > > > > > Is
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > that bad? (maybe stupid question, but I
> really
> > > > don’t
> > > > > > see
> > > > > > > a
> > > > > > > > > lot
> > > > > > > > > > > of
> > > > > > > > > > > > > > > drawback)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Front end:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Adding the the pcap search to the alert UI
> is,
> > I
> > > > > think,
> > > > > > > the
> > > > > > > > > > > easiest
> > > > > > > > > > > > > way
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > move forward. But indeed, it will then be
the
> > > > “Alert
> > > > > UI
> > > > > > > and
> > > > > > > > > > > > > pcapquery”.
> > > > > > > > > > > > > > > > Maybe the name of the UI should just change
> to
> > > > > > something
> > > > > > > > like
> > > > > > > > > > > > > > > “Monitoring &
> > > > > > > > > > > > > > > > Investigation UI” ?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Is there any roadmap or plan for the
> different
> > > UI?
> > > > I
> > > > > > mean
> > > > > > > > did
> > > > > > > > > > > you
> > > > > > > > > > > > > > > already
> > > > > > > > > > > > > > > > had discussion on how you see the ui
evolving
> > > with
> > > > > the
> > > > > > > new
> > > > > > > > > > > feature
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > will come in the future?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Microservices:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > What do you mean exactly by microservices?
Is
> > it
> > > to
> > > > > > > > separate
> > > > > > > > > > all
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > features in different projects? Or
something
> > like
> > > > > > having
> > > > > > > > the
> > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > components in container like kubernet?
(again
> > > maybe
> > > > > > > stupid
> > > > > > > > > > > > question,
> > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > don’t clearly understand what you mean J )
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Michel
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > --
> > > > > > > > > simon elliston ball
> > > > > > > > > @sireb
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > --
> >
> > Jon
> >
>
-- 

Jon

Re: [DISCUSS] Pcap panel architecture

Posted by "Zeolla@GMail.com" <ze...@gmail.com>.
I think baby steps are fine - admin gets access to all, otherwise you only
see your own pcaps, but we file a jira for a future add of API security,
which more mature SOCs that align with the Metron personas will need.

Jon

On Fri, May 11, 2018, 09:27 Ryan Merriman <me...@gmail.com> wrote:

> That's a good point Jon.  There are different levels of effort associated
> with different options.  If we want to allow pcaps to be shared with
> specific users, we will need to introduce ACL security in our REST
> application using something like the ACL capability that comes with Spring
> Security or Ranger.  This would be more complex to design and implement.
> If we want something more broad like admin roles that can see all or
> allowing pcap files to become public, this would be less work.  Do you
> think ACL security is required or would the other options be acceptable?
>
> On Thu, May 10, 2018 at 2:47 PM, Zeolla@GMail.com <ze...@gmail.com>
> wrote:
>
> > At the very least there needs to be the ability to share downloaded PCAPs
> > with other users and/or have roles that can see all pcaps.  A platform
> > engineer may want to clean up old pcaps after x time, or a manger may ask
> > an analyst to find all of the traffic that exhibits xyz behavior, dump a
> > pcap, and then point him to it so the manager can review.  Since the
> > pcap may be huge, we wouldn't want to try to push people to sending it
> via
> > email, uploading to a file server, finding an external hard drive, etc.
> >
> > Jon
> >
> > On Thu, May 10, 2018 at 10:16 AM Ryan Merriman <me...@gmail.com>
> > wrote:
> >
> > > Mike, I believe the /pcapGetter/getPcapsByIdentifiers endpoint exposes
> > the
> > > fixed query option which we have covered.  I agree with you that
> > > deprecating the metron-api module should be a goal of this feature.
> > >
> > > On Wed, May 9, 2018 at 1:36 PM, Michael Miklavcic <
> > > michael.miklavcic@gmail.com> wrote:
> > >
> > > > This looks like a pretty good start Ryan. Does the metadata endpoint
> > > cover
> > > > this https://github.com/apache/metron/tree/master/
> > > > metron-platform/metron-api#the-pcapgettergetpcapsbyidentifier
> > s-endpoint
> > > > from the original metron-api? If so, then we would be able to
> deprecate
> > > the
> > > > existing metron-api project. If we later go to micro-services, a pcap
> > > > module would spin back into the fold, but it would probably look
> > > different
> > > > from metron-api.
> > > >
> > > > I commented on the UI thread, but to reiterate for the purpose of
> > backend
> > > > functionality here I don't believe there is a way to "PAUSE" or
> > "SUSPEND"
> > > > jobs. That said, I think GET /api/v1/pcap/stop/<jobId> is sufficient
> > for
> > > > the job management operations.
> > > >
> > > > On Wed, May 9, 2018 at 11:00 AM, Ryan Merriman <me...@gmail.com>
> > > > wrote:
> > > >
> > > > > Now that we are confident we can run submit a MR job from our
> current
> > > > REST
> > > > > application, is this the desired approach?  Just want to confirm.
> > > > >
> > > > > Next I think we should map out what the REST interface will look
> > like.
> > > > > Here are the endpoints I'm thinking about:
> > > > >
> > > > > GET /api/v1/pcap/metadata?basePath
> > > > >
> > > > > This endpoint will return metadata of pcap data stored in HDFS.
> This
> > > > would
> > > > > include pcap size, date ranges (how far back can I go), etc.  It
> > would
> > > > > accept an optional HDFS basePath parameter for cases where pcap
> data
> > is
> > > > > stored in multiple places and/or different from the default
> location.
> > > > >
> > > > > POST /api/v1/pcap/query
> > > > >
> > > > > This endpoint would accept a pcap request, submit a pcap query job,
> > and
> > > > > return a job id.  The request would be an object containing the
> > > > parameters
> > > > > documented here:  https://github.com/apache/metron/tree/master/
> > > > > metron-platform/metron-pcap-backend#query-filter-utility.  A
> > query/job
> > > > > would be associated with a user that submits it.  An exception will
> > be
> > > > > returned for violating constraints like too many queries submitted,
> > > query
> > > > > parameters out of limits, etc.
> > > > >
> > > > > GET /api/v1/pcap/status/<jobId>
> > > > >
> > > > > This endpoint will return the status of a running job.  I imagine
> > this
> > > is
> > > > > just a proxy to the YARN REST api.  We can discuss the
> implementation
> > > > > behind these endpoints later.
> > > > >
> > > > > GET /api/v1/pcap/stop/<jobId>
> > > > >
> > > > > This endpoint would kill a running pcap job.  If the job has
> already
> > > > > completed this is a noop.
> > > > >
> > > > > GET /api/v1/pcap/list
> > > > >
> > > > > This endpoint will list a user's submitted pcap queries.  Items in
> > the
> > > > list
> > > > > would contain job id, status (is it finished?), start/end time, and
> > > > number
> > > > > of pages.  Maybe there is some overlap with the status endpoint
> above
> > > and
> > > > > the status endpoint is not needed?
> > > > >
> > > > > GET /api/v1/pcap/pdml/<jobId>/<pageNumber>
> > > > >
> > > > > This endpoint will return pcap results for the given page in pdml
> > > format
> > > > (
> > > > > https://wiki.wireshark.org/PDML).  Are there other formats we want
> > to
> > > > > support?
> > > > >
> > > > > GET /api/v1/pcap/raw/<jobId>/<pageNumber>
> > > > >
> > > > > This endpoint will allow a user to download raw pcap results for
> the
> > > > given
> > > > > page.
> > > > >
> > > > > DELETE /api/v1/pcap/<jobId>
> > > > >
> > > > > This endpoint will delete pcap query results.  Not sure yet how
> this
> > > fits
> > > > > in with our broader cleanup strategy.
> > > > >
> > > > > This should get us started.  What did I miss and what would you
> > change
> > > > > about these?  I did not include much detail related to security,
> > > cleanup
> > > > > strategy, or underlying implementation details but these are items
> we
> > > > > should discuss at some point.
> > > > >
> > > > > On Tue, May 8, 2018 at 5:38 PM, Michael Miklavcic <
> > > > > michael.miklavcic@gmail.com> wrote:
> > > > >
> > > > > > Sweet! That's great news. The pom changes are a lot simpler than
> I
> > > > > > expected. Very nice.
> > > > > >
> > > > > > On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman <
> merrimanr@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Finally figured it out.  Commit is here:
> > > > > > > https://github.com/merrimanr/incubator-metron/commit/
> > > > > > > 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
> > > > > > >
> > > > > > > It came down to figuring out the right combination of maven
> > > > > dependencies
> > > > > > > and passing in the HDP version to REST as a Java system
> property.
> > > I
> > > > > also
> > > > > > > included some HDFS setup tasks.  I tested this in full dev and
> > can
> > > > now
> > > > > > > successfully run a pcap query and get results.  All you should
> > have
> > > > to
> > > > > do
> > > > > > > is generate some pcap data first.
> > > > > > >
> > > > > > > On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
> > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > >
> > > > > > > > @Ryan - pulled your branch and experimented with a few
> things.
> > In
> > > > > doing
> > > > > > > so,
> > > > > > > > it dawned on me that by adding the yarn and hadoop classpath,
> > you
> > > > > > > probably
> > > > > > > > didn't introduce a new classpath issue, rather you probably
> > just
> > > > > moved
> > > > > > > onto
> > > > > > > > the next classpath issue, ie hbase per your exception about
> > hbase
> > > > > jaxb.
> > > > > > > > Anyhow, I put up a branch with some pom changes worth trying
> in
> > > > > > > conjunction
> > > > > > > > with invoking the rest app startup via "/usr/bin/yarn jar"
> > > > > > > >
> > > > > > > > https://github.com/mmiklavc/metron/tree/ryan-rest-test
> > > > > > > >
> > > > > > > > https://github.com/mmiklavc/metron/commit/
> > > > > > 5ca23580fc6e043fafae2327c80b65
> > > > > > > > b20ca1c0c9
> > > > > > > >
> > > > > > > > Mike
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> > > > > > > > simon@simonellistonball.com> wrote:
> > > > > > > >
> > > > > > > > > That would be a step closer to something more like a
> > > > micro-service
> > > > > > > > > architecture. However, I would want to make sure we think
> > about
> > > > the
> > > > > > > > > operational complexity, and mpack implications of having
> > > another
> > > > > > server
> > > > > > > > > installed and running somewhere on the cluster (also, ssl,
> > > > > kerberos,
> > > > > > > etc
> > > > > > > > > etc requirements for that service).
> > > > > > > > >
> > > > > > > > > On 8 May 2018 at 14:27, Ryan Merriman <merrimanr@gmail.com
> >
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 to having metron-api as it's own service and using a
> > > gateway
> > > > > > type
> > > > > > > > > > pattern.
> > > > > > > > > >
> > > > > > > > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <
> > > > > > ottobackwards@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Why not have metron-api as it’s own service and use a
> > > > ‘gateway’
> > > > > > > type
> > > > > > > > > > > pattern in rest?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (
> > > > merrimanr@gmail.com
> > > > > )
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Moving the yarn classpath command earlier in the
> > classpath
> > > > now
> > > > > > > gives
> > > > > > > > > this
> > > > > > > > > > > error:
> > > > > > > > > > >
> > > > > > > > > > > Caused by: java.lang.NoSuchMethodError:
> > > > > > > > > > > javax.servlet.ServletContext.
> > getVirtualServerName()Ljava/
> > > > > > > > lang/String;
> > > > > > > > > > >
> > > > > > > > > > > I will experiment with other combinations, I suspect we
> > > will
> > > > > need
> > > > > > > > > > > finer-grain control over the order.
> > > > > > > > > > >
> > > > > > > > > > > The grep matches class names inside jar files. I use
> this
> > > all
> > > > > the
> > > > > > > > time
> > > > > > > > > > and
> > > > > > > > > > > it's really useful.
> > > > > > > > > > >
> > > > > > > > > > > The metron-rest jar is already shaded.
> > > > > > > > > > >
> > > > > > > > > > > Reverse engineering the yarn jar command was the next
> > > thing I
> > > > > was
> > > > > > > > going
> > > > > > > > > > to
> > > > > > > > > > > try. Will let you know how it goes.
> > > > > > > > > > >
> > > > > > > > > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > > > > > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > What order did you add the hadoop or yarn classpath?
> > The
> > > > > > "shaded"
> > > > > > > > > > > package
> > > > > > > > > > > > stands out to me in this name
> > "org.apache.hadoop.hbase.*
> > > > > > shaded*
> > > > > > > > > > > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider."
> > > > Maybe
> > > > > > try
> > > > > > > > > adding
> > > > > > > > > > > > those packages earlier on the classpath.
> > > > > > > > > > > >
> > > > > > > > > > > > I think that find command needs a "jar tvf",
> otherwise
> > > > you're
> > > > > > > > looking
> > > > > > > > > > > for a
> > > > > > > > > > > > class name in jar file names.
> > > > > > > > > > > >
> > > > > > > > > > > > Have you tried shading the rest jar?
> > > > > > > > > > > >
> > > > > > > > > > > > I'd also look at the classpath you get when running
> > "yarn
> > > > > jar"
> > > > > > to
> > > > > > > > > start
> > > > > > > > > > > the
> > > > > > > > > > > > existing pcap service, per the instructions in
> > > > > > > > metron-api/README.md.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <
> > > > > > > merrimanr@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > To explore the idea of merging metron-api into
> > > > metron-rest
> > > > > > and
> > > > > > > > > > running
> > > > > > > > > > > > pcap
> > > > > > > > > > > > > queries inside our REST application, I created a
> > simple
> > > > > test
> > > > > > > > here:
> > > > > > > > > > > > >
> > > https://github.com/merrimanr/incubator-metron/tree/pcap-
> > > > > > > > rest-test.
> > > > > > > > > A
> > > > > > > > > > > > > summary of what's included:
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Added pcap as a dependency in the metron-rest
> > pom.xml
> > > > > > > > > > > > > - Added a pcap query controller endpoint at
> > > > > > > > > > > > > http://node1:8082/swagger-ui.
> > > > html#!/pcap-query-controller/
> > > > > > > > > > > > queryUsingGET
> > > > > > > > > > > > > - Added a pcap query service that runs a simple,
> > > > hardcoded
> > > > > > > query
> > > > > > > > > > > > >
> > > > > > > > > > > > > Generate some pcap data using pycapa (
> > > > > > > > > > > > >
> https://github.com/apache/metron/tree/master/metron-
> > > > > > > > sensors/pycapa
> > > > > > > > > )
> > > > > > > > > > > and
> > > > > > > > > > > > > the
> > > > > > > > > > > > > pcap topology (
> > > > > > > > > > > > >
> https://github.com/apache/metron/tree/master/metron-
> > > > > > > > > > > > >
> platform/metron-pcap-backend#starting-the-topology).
> > > > > > > > > > > > > After this initial setup there should be data in
> HDFS
> > > at
> > > > > > > > > > > > > "/apps/metron/pcap". I believe this should be
> enough
> > to
> > > > > > > exercise
> > > > > > > > > the
> > > > > > > > > > > > > issue. Just hit the endpoint referenced above. I
> > tested
> > > > > this
> > > > > > in
> > > > > > > > an
> > > > > > > > > > > > > already running full dev by building and deploying
> > the
> > > > > > > > metron-rest
> > > > > > > > > > > jar.
> > > > > > > > > > > > I
> > > > > > > > > > > > > did not rebuild full dev with this change but I
> would
> > > > still
> > > > > > > > expect
> > > > > > > > > it
> > > > > > > > > > > to
> > > > > > > > > > > > > work. Let me know if it doesn't.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The first error I see when I hit this endpoint is:
> > > > > > > > > > > > >
> > > > > > > > > > > > > java.lang.NoClassDefFoundError:
> > > > > > > > > > > > > org/apache/hadoop/yarn/webapp/
> > > > YarnJacksonJaxbJsonProvider.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Here are the things I've tried so far:
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Run the REST application with the YARN jar
> command
> > > > since
> > > > > > this
> > > > > > > > is
> > > > > > > > > > how
> > > > > > > > > > > > > all our other YARN/MR-related applications are
> > started
> > > > > > > > (metron-api,
> > > > > > > > > > > > > MAAS,
> > > > > > > > > > > > > pcap query, etc). I wouldn't expect this to work
> > since
> > > we
> > > > > > have
> > > > > > > > > > > > runtime
> > > > > > > > > > > > > dependencies on our shaded elasticsearch and parser
> > > jars
> > > > > and
> > > > > > > I'm
> > > > > > > > > not
> > > > > > > > > > > > > aware
> > > > > > > > > > > > > of a way to add additional jars to the classpath
> with
> > > the
> > > > > > YARN
> > > > > > > > jar
> > > > > > > > > > > > > command
> > > > > > > > > > > > > (is there a way?). Either way I get this error:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 18/05/04 19:49:56 WARN reflections.Reflections:
> could
> > > not
> > > > > > > create
> > > > > > > > > Dir
> > > > > > > > > > > > using
> > > > > > > > > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/
> > > > > > > hadoop/lib/ojdbc6.jar.
> > > > > > > > > > > > skipping.
> > > > > > > > > > > > > java.lang.NullPointerException
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > - I tried adding `yarn classpath` and `hadoop
> > > classpath`
> > > > to
> > > > > > the
> > > > > > > > > > > > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh
> > (REST
> > > > > > start
> > > > > > > > > > > > > script). I
> > > > > > > > > > > > > get this error:
> > > > > > > > > > > > >
> > > > > > > > > > > > > java.lang.ClassNotFoundException:
> > > > > > > > > > > > >
> org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > > > > > > > > > jaxrs.JacksonJaxbJsonProvider
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > - I searched for the class in the previous attempt
> > but
> > > > > could
> > > > > > > not
> > > > > > > > > find
> > > > > > > > > > > > it
> > > > > > > > > > > > > in full dev:
> > > > > > > > > > > > >
> > > > > > > > > > > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > > > > > > > > > >
> org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > > > > > > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > > > > > > > > > 2>/dev/null
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Further up in the stack trace I see the error
> > happens
> > > > > when
> > > > > > > > > > > > initiating
> > > > > > > > > > > > > the org.apache.hadoop.yarn.util.
> > timeline.TimelineUtils
> > > > > > class.
> > > > > > > I
> > > > > > > > > > > > tried
> > > > > > > > > > > > > setting "yarn.timeline-service.enabled" in Ambari
> to
> > > > false
> > > > > > and
> > > > > > > > > then
> > > > > > > > > > I
> > > > > > > > > > > > > get
> > > > > > > > > > > > > this error:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Unable to parse
> > > > > > > > > > > > >
> > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-
> > > > > > > > framework'
> > > > > > > > > > as
> > > > > > > > > > > a
> > > > > > > > > > > > > URI, check the setting for mapreduce.application.
> > > > > > > framework.path
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > - I've tried adding different hadoop, hbase, yarn
> and
> > > > > > mapreduce
> > > > > > > > > Maven
> > > > > > > > > > > > > dependencies without any success
> > > > > > > > > > > > > - hadoop-yarn-client
> > > > > > > > > > > > > - hadoop-yarn-common
> > > > > > > > > > > > > - hadoop-mapreduce-client-core
> > > > > > > > > > > > > - hadoop-yarn-server-common
> > > > > > > > > > > > > - hadoop-yarn-api
> > > > > > > > > > > > > - hbase-server
> > > > > > > > > > > > >
> > > > > > > > > > > > > I will keep exploring other possible solutions. Let
> > me
> > > > know
> > > > > > if
> > > > > > > > > anyone
> > > > > > > > > > > > has
> > > > > > > > > > > > > any ideas.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
> > > > > > > > > ottobackwards@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I can imagine a new generic service(s) capability
> > > whose
> > > > > > job (
> > > > > > > > pun
> > > > > > > > > > > > > intended
> > > > > > > > > > > > > > ) is to
> > > > > > > > > > > > > > abstract the submittal, tracking, and storage of
> > > > results
> > > > > to
> > > > > > > > yarn.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It would be extended with storage providers,
> queue
> > > > > > provider,
> > > > > > > > > > > possibly
> > > > > > > > > > > > > some
> > > > > > > > > > > > > > set of policies or rather strategies.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The pcap ‘report’ would be a client to that
> > service,
> > > > the
> > > > > > > > > > specializes
> > > > > > > > > > > > the
> > > > > > > > > > > > > > service operation for the way we want pcap to
> work.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We can then re-use the generic service for other
> > long
> > > > > > running
> > > > > > > > > yarn
> > > > > > > > > > > > > > things…..
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
> > > > > > > > ottobackwards@gmail.com
> > > > > > > > > )
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > RE: Tracking v. users
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The submittal and tracking can associate the
> > > submitter
> > > > > with
> > > > > > > the
> > > > > > > > > > yarn
> > > > > > > > > > > > job
> > > > > > > > > > > > > > and track that,
> > > > > > > > > > > > > > regardless of the yarn credentials.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > IE> if all submittals and monitoring are by the
> > same
> > > > yarn
> > > > > > > user
> > > > > > > > (
> > > > > > > > > > > > Metron )
> > > > > > > > > > > > > > from a single or
> > > > > > > > > > > > > > co-operative set of services, that service can
> > > maintain
> > > > > the
> > > > > > > > > > mapping.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (
> > > > > > > merrimanr@gmail.com
> > > > > > > > )
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Otto, your use case makes sense to me. We'll have
> > to
> > > > > think
> > > > > > > > about
> > > > > > > > > > how
> > > > > > > > > > > to
> > > > > > > > > > > > > > manage the user to job relationships. I'm
> assuming
> > > YARN
> > > > > > jobs
> > > > > > > > will
> > > > > > > > > > be
> > > > > > > > > > > > > > submitted as the metron service user so YARN
> won't
> > > keep
> > > > > > track
> > > > > > > > of
> > > > > > > > > > > this
> > > > > > > > > > > > for
> > > > > > > > > > > > > > us. Is that assumption correct? Do you have any
> > ideas
> > > > for
> > > > > > > doing
> > > > > > > > > > > that?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Mike, I can start a feature branch and experiment
> > > with
> > > > > > > merging
> > > > > > > > > > > > metron-api
> > > > > > > > > > > > > > into metron-rest. That should allow us to
> > collaborate
> > > > on
> > > > > > any
> > > > > > > > > issues
> > > > > > > > > > > or
> > > > > > > > > > > > > > challenges. Also, can you expand on your idea to
> > > manage
> > > > > > > > external
> > > > > > > > > > > > > > dependencies as a special module? That seems
> like a
> > > > very
> > > > > > > > > attractive
> > > > > > > > > > > > > option
> > > > > > > > > > > > > > to me.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
> > > > > > > > > > ottobackwards@gmail.com>
> > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > From my response on the other thread, but
> > > applicable
> > > > to
> > > > > > the
> > > > > > > > > > > backend
> > > > > > > > > > > > > > stuff:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > "The PCAP Query seems more like PCAP Report to
> > me.
> > > > You
> > > > > > are
> > > > > > > > > > > > generating a
> > > > > > > > > > > > > > > report based on parameters.
> > > > > > > > > > > > > > > That report is something that takes some time
> and
> > > > > > external
> > > > > > > > > > process
> > > > > > > > > > > to
> > > > > > > > > > > > > > > generate… ie you have to wait for it.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I can almost imagine a flow where you:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > * Are in the AlertUI
> > > > > > > > > > > > > > > * Ask to generate a PCAP report based on some
> > > > selected
> > > > > > > > > > > > > alerts/meta-alert,
> > > > > > > > > > > > > > > possibly picking from on or more report
> > ‘templates’
> > > > > > > > > > > > > > > that have query options etc
> > > > > > > > > > > > > > > * The report request is ‘queued’, that is
> > > dispatched
> > > > to
> > > > > > be
> > > > > > > be
> > > > > > > > > > > > > > > executed/generated
> > > > > > > > > > > > > > > * You as a user have a ‘queue’ of your report
> > > > results,
> > > > > > and
> > > > > > > > when
> > > > > > > > > > > the
> > > > > > > > > > > > > > report
> > > > > > > > > > > > > > > is done it is queued there
> > > > > > > > > > > > > > > * We ‘monitor’ the report/queue press through
> the
> > > > yarn
> > > > > > > rest (
> > > > > > > > > > > report
> > > > > > > > > > > > > > > info/meta has the yarn details )
> > > > > > > > > > > > > > > * You can select the report from your queue and
> > > view
> > > > it
> > > > > > > > either
> > > > > > > > > in
> > > > > > > > > > > a
> > > > > > > > > > > > new
> > > > > > > > > > > > > > UI
> > > > > > > > > > > > > > > or custom component
> > > > > > > > > > > > > > > * You can then apply a different ‘view’ to the
> > > report
> > > > > or
> > > > > > > work
> > > > > > > > > > with
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > report data
> > > > > > > > > > > > > > > * You can print / save etc
> > > > > > > > > > > > > > > * You can associate the report with the alerts
> (
> > > > again
> > > > > in
> > > > > > > the
> > > > > > > > > > > report
> > > > > > > > > > > > > info
> > > > > > > > > > > > > > > ) with…. a ‘case’ or ‘ticket’ or investigation
> > > > > something
> > > > > > or
> > > > > > > > > other
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We can introduce extensibility into the report
> > > > > templates,
> > > > > > > > > report
> > > > > > > > > > > > views
> > > > > > > > > > > > > (
> > > > > > > > > > > > > > > thinks that work with the json data of the
> > report )
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Something like that.”
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Maybe we can do :
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > template -> query parameters -> script => yarn
> > info
> > > > > > > > > > > > > > > yarn info + query info + alert context + yarn
> > > status
> > > > =>
> > > > > > > > report
> > > > > > > > > > > info
> > > > > > > > > > > > ->
> > > > > > > > > > > > > > > stored in a user’s ‘report queue’
> > > > > > > > > > > > > > > report persistence added to report info
> > > > > > > > > > > > > > > metron-rest -> api to monitor the queue, read
> > > > results (
> > > > > > > page
> > > > > > > > ),
> > > > > > > > > > > etc
> > > > > > > > > > > > etc
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (
> > > > > > > > merrimanr@gmail.com
> > > > > > > > > )
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I started a separate thread on Pcap UI
> > > considerations
> > > > > and
> > > > > > > > user
> > > > > > > > > > > > > > > requirements
> > > > > > > > > > > > > > > at Otto's request. This should help us keep
> these
> > > two
> > > > > > > related
> > > > > > > > > but
> > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > discussions focused.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
> > > > > > > > > > > > michelsumbul@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > (Youhouuu my first reply on this kind of mail
> > > > > chain^^)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If I may, I would like to share my view on
> the
> > > > > > following
> > > > > > > 3
> > > > > > > > > > > points.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Backend:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The current metron-api is totally seperate,
> it
> > > will
> > > > > be
> > > > > > > > logic
> > > > > > > > > > for
> > > > > > > > > > > me
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > it at the same place as the others rest api.
> > > > > Especially
> > > > > > > > when
> > > > > > > > > > > more
> > > > > > > > > > > > > > > security
> > > > > > > > > > > > > > > > will be added, it will not be needed to do
> the
> > > job
> > > > > > twice.
> > > > > > > > > > > > > > > > The current implementation send back a pcap
> > > object
> > > > > > which
> > > > > > > > > still
> > > > > > > > > > > need
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > decoded. In the opensoc, the decoding was
> done
> > > with
> > > > > > > tshard
> > > > > > > > on
> > > > > > > > > > > the
> > > > > > > > > > > > > > > frontend.
> > > > > > > > > > > > > > > > It will be good to have this decoding
> happening
> > > > > > directly
> > > > > > > on
> > > > > > > > > the
> > > > > > > > > > > > > backend
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > not create a load on frontend. An option will
> > be
> > > to
> > > > > > > install
> > > > > > > > > > > tshark
> > > > > > > > > > > > on
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > rest server and to use to convert the pcap to
> > xml
> > > > and
> > > > > > > then
> > > > > > > > > to a
> > > > > > > > > > > > json
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > will be send to the frontend.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I tried to start directly the map/reduce job
> to
> > > > > search
> > > > > > > over
> > > > > > > > > all
> > > > > > > > > > > the
> > > > > > > > > > > > > > pcap
> > > > > > > > > > > > > > > > data from the rest server and as Ryan mention
> > it,
> > > > we
> > > > > > had
> > > > > > > > > > > trouble. I
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > try to find back the error.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Then in the POC, what we tried is to use the
> > > > > pcap_query
> > > > > > > > > script
> > > > > > > > > > > and
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > work fine. I just modified it that he sends
> > back
> > > > > > directly
> > > > > > > > the
> > > > > > > > > > > > job_id
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > yarn and not waiting that the job is
> finished.
> > > Then
> > > > > it
> > > > > > > will
> > > > > > > > > > > allow
> > > > > > > > > > > > the
> > > > > > > > > > > > > > UI
> > > > > > > > > > > > > > > > and the rest server to know what the status
> of
> > > the
> > > > > > > research
> > > > > > > > > by
> > > > > > > > > > > > > querying
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > yarn rest api. This will allow the UI and the
> > > rest
> > > > > > server
> > > > > > > > to
> > > > > > > > > be
> > > > > > > > > > > > async
> > > > > > > > > > > > > > > > without any blocking phase. What do you think
> > > about
> > > > > > that?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Having the job submitted directly from the
> code
> > > of
> > > > > the
> > > > > > > rest
> > > > > > > > > > > server
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > perfect, but it will need a lot of
> > investigation
> > > I
> > > > > > think
> > > > > > > > (but
> > > > > > > > > > > I'm
> > > > > > > > > > > > not
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > expert so I might be completely wrong ^^).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We know that the pcap_query scritp work fine
> so
> > > why
> > > > > not
> > > > > > > > > calling
> > > > > > > > > > > it?
> > > > > > > > > > > > > Is
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > that bad? (maybe stupid question, but I
> really
> > > > don’t
> > > > > > see
> > > > > > > a
> > > > > > > > > lot
> > > > > > > > > > > of
> > > > > > > > > > > > > > > drawback)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Front end:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Adding the the pcap search to the alert UI
> is,
> > I
> > > > > think,
> > > > > > > the
> > > > > > > > > > > easiest
> > > > > > > > > > > > > way
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > move forward. But indeed, it will then be the
> > > > “Alert
> > > > > UI
> > > > > > > and
> > > > > > > > > > > > > pcapquery”.
> > > > > > > > > > > > > > > > Maybe the name of the UI should just change
> to
> > > > > > something
> > > > > > > > like
> > > > > > > > > > > > > > > “Monitoring &
> > > > > > > > > > > > > > > > Investigation UI” ?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Is there any roadmap or plan for the
> different
> > > UI?
> > > > I
> > > > > > mean
> > > > > > > > did
> > > > > > > > > > > you
> > > > > > > > > > > > > > > already
> > > > > > > > > > > > > > > > had discussion on how you see the ui evolving
> > > with
> > > > > the
> > > > > > > new
> > > > > > > > > > > feature
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > will come in the future?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Microservices:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > What do you mean exactly by microservices? Is
> > it
> > > to
> > > > > > > > separate
> > > > > > > > > > all
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > features in different projects? Or something
> > like
> > > > > > having
> > > > > > > > the
> > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > components in container like kubernet? (again
> > > maybe
> > > > > > > stupid
> > > > > > > > > > > > question,
> > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > don’t clearly understand what you mean J )
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Michel
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > --
> > > > > > > > > simon elliston ball
> > > > > > > > > @sireb
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > --
> >
> > Jon
> >
>
-- 

Jon

Re: [DISCUSS] Pcap panel architecture

Posted by Ryan Merriman <me...@gmail.com>.
That's a good point Jon.  There are different levels of effort associated
with different options.  If we want to allow pcaps to be shared with
specific users, we will need to introduce ACL security in our REST
application using something like the ACL capability that comes with Spring
Security or Ranger.  This would be more complex to design and implement.
If we want something more broad like admin roles that can see all or
allowing pcap files to become public, this would be less work.  Do you
think ACL security is required or would the other options be acceptable?

On Thu, May 10, 2018 at 2:47 PM, Zeolla@GMail.com <ze...@gmail.com> wrote:

> At the very least there needs to be the ability to share downloaded PCAPs
> with other users and/or have roles that can see all pcaps.  A platform
> engineer may want to clean up old pcaps after x time, or a manger may ask
> an analyst to find all of the traffic that exhibits xyz behavior, dump a
> pcap, and then point him to it so the manager can review.  Since the
> pcap may be huge, we wouldn't want to try to push people to sending it via
> email, uploading to a file server, finding an external hard drive, etc.
>
> Jon
>
> On Thu, May 10, 2018 at 10:16 AM Ryan Merriman <me...@gmail.com>
> wrote:
>
> > Mike, I believe the /pcapGetter/getPcapsByIdentifiers endpoint exposes
> the
> > fixed query option which we have covered.  I agree with you that
> > deprecating the metron-api module should be a goal of this feature.
> >
> > On Wed, May 9, 2018 at 1:36 PM, Michael Miklavcic <
> > michael.miklavcic@gmail.com> wrote:
> >
> > > This looks like a pretty good start Ryan. Does the metadata endpoint
> > cover
> > > this https://github.com/apache/metron/tree/master/
> > > metron-platform/metron-api#the-pcapgettergetpcapsbyidentifier
> s-endpoint
> > > from the original metron-api? If so, then we would be able to deprecate
> > the
> > > existing metron-api project. If we later go to micro-services, a pcap
> > > module would spin back into the fold, but it would probably look
> > different
> > > from metron-api.
> > >
> > > I commented on the UI thread, but to reiterate for the purpose of
> backend
> > > functionality here I don't believe there is a way to "PAUSE" or
> "SUSPEND"
> > > jobs. That said, I think GET /api/v1/pcap/stop/<jobId> is sufficient
> for
> > > the job management operations.
> > >
> > > On Wed, May 9, 2018 at 11:00 AM, Ryan Merriman <me...@gmail.com>
> > > wrote:
> > >
> > > > Now that we are confident we can run submit a MR job from our current
> > > REST
> > > > application, is this the desired approach?  Just want to confirm.
> > > >
> > > > Next I think we should map out what the REST interface will look
> like.
> > > > Here are the endpoints I'm thinking about:
> > > >
> > > > GET /api/v1/pcap/metadata?basePath
> > > >
> > > > This endpoint will return metadata of pcap data stored in HDFS.  This
> > > would
> > > > include pcap size, date ranges (how far back can I go), etc.  It
> would
> > > > accept an optional HDFS basePath parameter for cases where pcap data
> is
> > > > stored in multiple places and/or different from the default location.
> > > >
> > > > POST /api/v1/pcap/query
> > > >
> > > > This endpoint would accept a pcap request, submit a pcap query job,
> and
> > > > return a job id.  The request would be an object containing the
> > > parameters
> > > > documented here:  https://github.com/apache/metron/tree/master/
> > > > metron-platform/metron-pcap-backend#query-filter-utility.  A
> query/job
> > > > would be associated with a user that submits it.  An exception will
> be
> > > > returned for violating constraints like too many queries submitted,
> > query
> > > > parameters out of limits, etc.
> > > >
> > > > GET /api/v1/pcap/status/<jobId>
> > > >
> > > > This endpoint will return the status of a running job.  I imagine
> this
> > is
> > > > just a proxy to the YARN REST api.  We can discuss the implementation
> > > > behind these endpoints later.
> > > >
> > > > GET /api/v1/pcap/stop/<jobId>
> > > >
> > > > This endpoint would kill a running pcap job.  If the job has already
> > > > completed this is a noop.
> > > >
> > > > GET /api/v1/pcap/list
> > > >
> > > > This endpoint will list a user's submitted pcap queries.  Items in
> the
> > > list
> > > > would contain job id, status (is it finished?), start/end time, and
> > > number
> > > > of pages.  Maybe there is some overlap with the status endpoint above
> > and
> > > > the status endpoint is not needed?
> > > >
> > > > GET /api/v1/pcap/pdml/<jobId>/<pageNumber>
> > > >
> > > > This endpoint will return pcap results for the given page in pdml
> > format
> > > (
> > > > https://wiki.wireshark.org/PDML).  Are there other formats we want
> to
> > > > support?
> > > >
> > > > GET /api/v1/pcap/raw/<jobId>/<pageNumber>
> > > >
> > > > This endpoint will allow a user to download raw pcap results for the
> > > given
> > > > page.
> > > >
> > > > DELETE /api/v1/pcap/<jobId>
> > > >
> > > > This endpoint will delete pcap query results.  Not sure yet how this
> > fits
> > > > in with our broader cleanup strategy.
> > > >
> > > > This should get us started.  What did I miss and what would you
> change
> > > > about these?  I did not include much detail related to security,
> > cleanup
> > > > strategy, or underlying implementation details but these are items we
> > > > should discuss at some point.
> > > >
> > > > On Tue, May 8, 2018 at 5:38 PM, Michael Miklavcic <
> > > > michael.miklavcic@gmail.com> wrote:
> > > >
> > > > > Sweet! That's great news. The pom changes are a lot simpler than I
> > > > > expected. Very nice.
> > > > >
> > > > > On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman <merrimanr@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Finally figured it out.  Commit is here:
> > > > > > https://github.com/merrimanr/incubator-metron/commit/
> > > > > > 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
> > > > > >
> > > > > > It came down to figuring out the right combination of maven
> > > > dependencies
> > > > > > and passing in the HDP version to REST as a Java system property.
> > I
> > > > also
> > > > > > included some HDFS setup tasks.  I tested this in full dev and
> can
> > > now
> > > > > > successfully run a pcap query and get results.  All you should
> have
> > > to
> > > > do
> > > > > > is generate some pcap data first.
> > > > > >
> > > > > > On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
> > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > >
> > > > > > > @Ryan - pulled your branch and experimented with a few things.
> In
> > > > doing
> > > > > > so,
> > > > > > > it dawned on me that by adding the yarn and hadoop classpath,
> you
> > > > > > probably
> > > > > > > didn't introduce a new classpath issue, rather you probably
> just
> > > > moved
> > > > > > onto
> > > > > > > the next classpath issue, ie hbase per your exception about
> hbase
> > > > jaxb.
> > > > > > > Anyhow, I put up a branch with some pom changes worth trying in
> > > > > > conjunction
> > > > > > > with invoking the rest app startup via "/usr/bin/yarn jar"
> > > > > > >
> > > > > > > https://github.com/mmiklavc/metron/tree/ryan-rest-test
> > > > > > >
> > > > > > > https://github.com/mmiklavc/metron/commit/
> > > > > 5ca23580fc6e043fafae2327c80b65
> > > > > > > b20ca1c0c9
> > > > > > >
> > > > > > > Mike
> > > > > > >
> > > > > > >
> > > > > > > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> > > > > > > simon@simonellistonball.com> wrote:
> > > > > > >
> > > > > > > > That would be a step closer to something more like a
> > > micro-service
> > > > > > > > architecture. However, I would want to make sure we think
> about
> > > the
> > > > > > > > operational complexity, and mpack implications of having
> > another
> > > > > server
> > > > > > > > installed and running somewhere on the cluster (also, ssl,
> > > > kerberos,
> > > > > > etc
> > > > > > > > etc requirements for that service).
> > > > > > > >
> > > > > > > > On 8 May 2018 at 14:27, Ryan Merriman <me...@gmail.com>
> > > wrote:
> > > > > > > >
> > > > > > > > > +1 to having metron-api as it's own service and using a
> > gateway
> > > > > type
> > > > > > > > > pattern.
> > > > > > > > >
> > > > > > > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <
> > > > > ottobackwards@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Why not have metron-api as it’s own service and use a
> > > ‘gateway’
> > > > > > type
> > > > > > > > > > pattern in rest?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (
> > > merrimanr@gmail.com
> > > > )
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Moving the yarn classpath command earlier in the
> classpath
> > > now
> > > > > > gives
> > > > > > > > this
> > > > > > > > > > error:
> > > > > > > > > >
> > > > > > > > > > Caused by: java.lang.NoSuchMethodError:
> > > > > > > > > > javax.servlet.ServletContext.
> getVirtualServerName()Ljava/
> > > > > > > lang/String;
> > > > > > > > > >
> > > > > > > > > > I will experiment with other combinations, I suspect we
> > will
> > > > need
> > > > > > > > > > finer-grain control over the order.
> > > > > > > > > >
> > > > > > > > > > The grep matches class names inside jar files. I use this
> > all
> > > > the
> > > > > > > time
> > > > > > > > > and
> > > > > > > > > > it's really useful.
> > > > > > > > > >
> > > > > > > > > > The metron-rest jar is already shaded.
> > > > > > > > > >
> > > > > > > > > > Reverse engineering the yarn jar command was the next
> > thing I
> > > > was
> > > > > > > going
> > > > > > > > > to
> > > > > > > > > > try. Will let you know how it goes.
> > > > > > > > > >
> > > > > > > > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > > > > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > What order did you add the hadoop or yarn classpath?
> The
> > > > > "shaded"
> > > > > > > > > > package
> > > > > > > > > > > stands out to me in this name
> "org.apache.hadoop.hbase.*
> > > > > shaded*
> > > > > > > > > > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider."
> > > Maybe
> > > > > try
> > > > > > > > adding
> > > > > > > > > > > those packages earlier on the classpath.
> > > > > > > > > > >
> > > > > > > > > > > I think that find command needs a "jar tvf", otherwise
> > > you're
> > > > > > > looking
> > > > > > > > > > for a
> > > > > > > > > > > class name in jar file names.
> > > > > > > > > > >
> > > > > > > > > > > Have you tried shading the rest jar?
> > > > > > > > > > >
> > > > > > > > > > > I'd also look at the classpath you get when running
> "yarn
> > > > jar"
> > > > > to
> > > > > > > > start
> > > > > > > > > > the
> > > > > > > > > > > existing pcap service, per the instructions in
> > > > > > > metron-api/README.md.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <
> > > > > > merrimanr@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > To explore the idea of merging metron-api into
> > > metron-rest
> > > > > and
> > > > > > > > > running
> > > > > > > > > > > pcap
> > > > > > > > > > > > queries inside our REST application, I created a
> simple
> > > > test
> > > > > > > here:
> > > > > > > > > > > >
> > https://github.com/merrimanr/incubator-metron/tree/pcap-
> > > > > > > rest-test.
> > > > > > > > A
> > > > > > > > > > > > summary of what's included:
> > > > > > > > > > > >
> > > > > > > > > > > > - Added pcap as a dependency in the metron-rest
> pom.xml
> > > > > > > > > > > > - Added a pcap query controller endpoint at
> > > > > > > > > > > > http://node1:8082/swagger-ui.
> > > html#!/pcap-query-controller/
> > > > > > > > > > > queryUsingGET
> > > > > > > > > > > > - Added a pcap query service that runs a simple,
> > > hardcoded
> > > > > > query
> > > > > > > > > > > >
> > > > > > > > > > > > Generate some pcap data using pycapa (
> > > > > > > > > > > > https://github.com/apache/metron/tree/master/metron-
> > > > > > > sensors/pycapa
> > > > > > > > )
> > > > > > > > > > and
> > > > > > > > > > > > the
> > > > > > > > > > > > pcap topology (
> > > > > > > > > > > > https://github.com/apache/metron/tree/master/metron-
> > > > > > > > > > > > platform/metron-pcap-backend#starting-the-topology).
> > > > > > > > > > > > After this initial setup there should be data in HDFS
> > at
> > > > > > > > > > > > "/apps/metron/pcap". I believe this should be enough
> to
> > > > > > exercise
> > > > > > > > the
> > > > > > > > > > > > issue. Just hit the endpoint referenced above. I
> tested
> > > > this
> > > > > in
> > > > > > > an
> > > > > > > > > > > > already running full dev by building and deploying
> the
> > > > > > > metron-rest
> > > > > > > > > > jar.
> > > > > > > > > > > I
> > > > > > > > > > > > did not rebuild full dev with this change but I would
> > > still
> > > > > > > expect
> > > > > > > > it
> > > > > > > > > > to
> > > > > > > > > > > > work. Let me know if it doesn't.
> > > > > > > > > > > >
> > > > > > > > > > > > The first error I see when I hit this endpoint is:
> > > > > > > > > > > >
> > > > > > > > > > > > java.lang.NoClassDefFoundError:
> > > > > > > > > > > > org/apache/hadoop/yarn/webapp/
> > > YarnJacksonJaxbJsonProvider.
> > > > > > > > > > > >
> > > > > > > > > > > > Here are the things I've tried so far:
> > > > > > > > > > > >
> > > > > > > > > > > > - Run the REST application with the YARN jar command
> > > since
> > > > > this
> > > > > > > is
> > > > > > > > > how
> > > > > > > > > > > > all our other YARN/MR-related applications are
> started
> > > > > > > (metron-api,
> > > > > > > > > > > > MAAS,
> > > > > > > > > > > > pcap query, etc). I wouldn't expect this to work
> since
> > we
> > > > > have
> > > > > > > > > > > runtime
> > > > > > > > > > > > dependencies on our shaded elasticsearch and parser
> > jars
> > > > and
> > > > > > I'm
> > > > > > > > not
> > > > > > > > > > > > aware
> > > > > > > > > > > > of a way to add additional jars to the classpath with
> > the
> > > > > YARN
> > > > > > > jar
> > > > > > > > > > > > command
> > > > > > > > > > > > (is there a way?). Either way I get this error:
> > > > > > > > > > > >
> > > > > > > > > > > > 18/05/04 19:49:56 WARN reflections.Reflections: could
> > not
> > > > > > create
> > > > > > > > Dir
> > > > > > > > > > > using
> > > > > > > > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/
> > > > > > hadoop/lib/ojdbc6.jar.
> > > > > > > > > > > skipping.
> > > > > > > > > > > > java.lang.NullPointerException
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > - I tried adding `yarn classpath` and `hadoop
> > classpath`
> > > to
> > > > > the
> > > > > > > > > > > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh
> (REST
> > > > > start
> > > > > > > > > > > > script). I
> > > > > > > > > > > > get this error:
> > > > > > > > > > > >
> > > > > > > > > > > > java.lang.ClassNotFoundException:
> > > > > > > > > > > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > > > > > > > > jaxrs.JacksonJaxbJsonProvider
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > - I searched for the class in the previous attempt
> but
> > > > could
> > > > > > not
> > > > > > > > find
> > > > > > > > > > > it
> > > > > > > > > > > > in full dev:
> > > > > > > > > > > >
> > > > > > > > > > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > > > > > > > > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > > > > > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > > > > > > > > 2>/dev/null
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > - Further up in the stack trace I see the error
> happens
> > > > when
> > > > > > > > > > > initiating
> > > > > > > > > > > > the org.apache.hadoop.yarn.util.
> timeline.TimelineUtils
> > > > > class.
> > > > > > I
> > > > > > > > > > > tried
> > > > > > > > > > > > setting "yarn.timeline-service.enabled" in Ambari to
> > > false
> > > > > and
> > > > > > > > then
> > > > > > > > > I
> > > > > > > > > > > > get
> > > > > > > > > > > > this error:
> > > > > > > > > > > >
> > > > > > > > > > > > Unable to parse
> > > > > > > > > > > >
> > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-
> > > > > > > framework'
> > > > > > > > > as
> > > > > > > > > > a
> > > > > > > > > > > > URI, check the setting for mapreduce.application.
> > > > > > framework.path
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > - I've tried adding different hadoop, hbase, yarn and
> > > > > mapreduce
> > > > > > > > Maven
> > > > > > > > > > > > dependencies without any success
> > > > > > > > > > > > - hadoop-yarn-client
> > > > > > > > > > > > - hadoop-yarn-common
> > > > > > > > > > > > - hadoop-mapreduce-client-core
> > > > > > > > > > > > - hadoop-yarn-server-common
> > > > > > > > > > > > - hadoop-yarn-api
> > > > > > > > > > > > - hbase-server
> > > > > > > > > > > >
> > > > > > > > > > > > I will keep exploring other possible solutions. Let
> me
> > > know
> > > > > if
> > > > > > > > anyone
> > > > > > > > > > > has
> > > > > > > > > > > > any ideas.
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
> > > > > > > > ottobackwards@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > I can imagine a new generic service(s) capability
> > whose
> > > > > job (
> > > > > > > pun
> > > > > > > > > > > > intended
> > > > > > > > > > > > > ) is to
> > > > > > > > > > > > > abstract the submittal, tracking, and storage of
> > > results
> > > > to
> > > > > > > yarn.
> > > > > > > > > > > > >
> > > > > > > > > > > > > It would be extended with storage providers, queue
> > > > > provider,
> > > > > > > > > > possibly
> > > > > > > > > > > > some
> > > > > > > > > > > > > set of policies or rather strategies.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The pcap ‘report’ would be a client to that
> service,
> > > the
> > > > > > > > > specializes
> > > > > > > > > > > the
> > > > > > > > > > > > > service operation for the way we want pcap to work.
> > > > > > > > > > > > >
> > > > > > > > > > > > > We can then re-use the generic service for other
> long
> > > > > running
> > > > > > > > yarn
> > > > > > > > > > > > > things…..
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
> > > > > > > ottobackwards@gmail.com
> > > > > > > > )
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > RE: Tracking v. users
> > > > > > > > > > > > >
> > > > > > > > > > > > > The submittal and tracking can associate the
> > submitter
> > > > with
> > > > > > the
> > > > > > > > > yarn
> > > > > > > > > > > job
> > > > > > > > > > > > > and track that,
> > > > > > > > > > > > > regardless of the yarn credentials.
> > > > > > > > > > > > >
> > > > > > > > > > > > > IE> if all submittals and monitoring are by the
> same
> > > yarn
> > > > > > user
> > > > > > > (
> > > > > > > > > > > Metron )
> > > > > > > > > > > > > from a single or
> > > > > > > > > > > > > co-operative set of services, that service can
> > maintain
> > > > the
> > > > > > > > > mapping.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (
> > > > > > merrimanr@gmail.com
> > > > > > > )
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Otto, your use case makes sense to me. We'll have
> to
> > > > think
> > > > > > > about
> > > > > > > > > how
> > > > > > > > > > to
> > > > > > > > > > > > > manage the user to job relationships. I'm assuming
> > YARN
> > > > > jobs
> > > > > > > will
> > > > > > > > > be
> > > > > > > > > > > > > submitted as the metron service user so YARN won't
> > keep
> > > > > track
> > > > > > > of
> > > > > > > > > > this
> > > > > > > > > > > for
> > > > > > > > > > > > > us. Is that assumption correct? Do you have any
> ideas
> > > for
> > > > > > doing
> > > > > > > > > > that?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Mike, I can start a feature branch and experiment
> > with
> > > > > > merging
> > > > > > > > > > > metron-api
> > > > > > > > > > > > > into metron-rest. That should allow us to
> collaborate
> > > on
> > > > > any
> > > > > > > > issues
> > > > > > > > > > or
> > > > > > > > > > > > > challenges. Also, can you expand on your idea to
> > manage
> > > > > > > external
> > > > > > > > > > > > > dependencies as a special module? That seems like a
> > > very
> > > > > > > > attractive
> > > > > > > > > > > > option
> > > > > > > > > > > > > to me.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
> > > > > > > > > ottobackwards@gmail.com>
> > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > From my response on the other thread, but
> > applicable
> > > to
> > > > > the
> > > > > > > > > > backend
> > > > > > > > > > > > > stuff:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > "The PCAP Query seems more like PCAP Report to
> me.
> > > You
> > > > > are
> > > > > > > > > > > generating a
> > > > > > > > > > > > > > report based on parameters.
> > > > > > > > > > > > > > That report is something that takes some time and
> > > > > external
> > > > > > > > > process
> > > > > > > > > > to
> > > > > > > > > > > > > > generate… ie you have to wait for it.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I can almost imagine a flow where you:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > * Are in the AlertUI
> > > > > > > > > > > > > > * Ask to generate a PCAP report based on some
> > > selected
> > > > > > > > > > > > alerts/meta-alert,
> > > > > > > > > > > > > > possibly picking from on or more report
> ‘templates’
> > > > > > > > > > > > > > that have query options etc
> > > > > > > > > > > > > > * The report request is ‘queued’, that is
> > dispatched
> > > to
> > > > > be
> > > > > > be
> > > > > > > > > > > > > > executed/generated
> > > > > > > > > > > > > > * You as a user have a ‘queue’ of your report
> > > results,
> > > > > and
> > > > > > > when
> > > > > > > > > > the
> > > > > > > > > > > > > report
> > > > > > > > > > > > > > is done it is queued there
> > > > > > > > > > > > > > * We ‘monitor’ the report/queue press through the
> > > yarn
> > > > > > rest (
> > > > > > > > > > report
> > > > > > > > > > > > > > info/meta has the yarn details )
> > > > > > > > > > > > > > * You can select the report from your queue and
> > view
> > > it
> > > > > > > either
> > > > > > > > in
> > > > > > > > > > a
> > > > > > > > > > > new
> > > > > > > > > > > > > UI
> > > > > > > > > > > > > > or custom component
> > > > > > > > > > > > > > * You can then apply a different ‘view’ to the
> > report
> > > > or
> > > > > > work
> > > > > > > > > with
> > > > > > > > > > > the
> > > > > > > > > > > > > > report data
> > > > > > > > > > > > > > * You can print / save etc
> > > > > > > > > > > > > > * You can associate the report with the alerts (
> > > again
> > > > in
> > > > > > the
> > > > > > > > > > report
> > > > > > > > > > > > info
> > > > > > > > > > > > > > ) with…. a ‘case’ or ‘ticket’ or investigation
> > > > something
> > > > > or
> > > > > > > > other
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We can introduce extensibility into the report
> > > > templates,
> > > > > > > > report
> > > > > > > > > > > views
> > > > > > > > > > > > (
> > > > > > > > > > > > > > thinks that work with the json data of the
> report )
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Something like that.”
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Maybe we can do :
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > template -> query parameters -> script => yarn
> info
> > > > > > > > > > > > > > yarn info + query info + alert context + yarn
> > status
> > > =>
> > > > > > > report
> > > > > > > > > > info
> > > > > > > > > > > ->
> > > > > > > > > > > > > > stored in a user’s ‘report queue’
> > > > > > > > > > > > > > report persistence added to report info
> > > > > > > > > > > > > > metron-rest -> api to monitor the queue, read
> > > results (
> > > > > > page
> > > > > > > ),
> > > > > > > > > > etc
> > > > > > > > > > > etc
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (
> > > > > > > merrimanr@gmail.com
> > > > > > > > )
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I started a separate thread on Pcap UI
> > considerations
> > > > and
> > > > > > > user
> > > > > > > > > > > > > > requirements
> > > > > > > > > > > > > > at Otto's request. This should help us keep these
> > two
> > > > > > related
> > > > > > > > but
> > > > > > > > > > > > > separate
> > > > > > > > > > > > > > discussions focused.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
> > > > > > > > > > > michelsumbul@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > (Youhouuu my first reply on this kind of mail
> > > > chain^^)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If I may, I would like to share my view on the
> > > > > following
> > > > > > 3
> > > > > > > > > > points.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Backend:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The current metron-api is totally seperate, it
> > will
> > > > be
> > > > > > > logic
> > > > > > > > > for
> > > > > > > > > > me
> > > > > > > > > > > > to
> > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > it at the same place as the others rest api.
> > > > Especially
> > > > > > > when
> > > > > > > > > > more
> > > > > > > > > > > > > > security
> > > > > > > > > > > > > > > will be added, it will not be needed to do the
> > job
> > > > > twice.
> > > > > > > > > > > > > > > The current implementation send back a pcap
> > object
> > > > > which
> > > > > > > > still
> > > > > > > > > > need
> > > > > > > > > > > > to
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > decoded. In the opensoc, the decoding was done
> > with
> > > > > > tshard
> > > > > > > on
> > > > > > > > > > the
> > > > > > > > > > > > > > frontend.
> > > > > > > > > > > > > > > It will be good to have this decoding happening
> > > > > directly
> > > > > > on
> > > > > > > > the
> > > > > > > > > > > > backend
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > not create a load on frontend. An option will
> be
> > to
> > > > > > install
> > > > > > > > > > tshark
> > > > > > > > > > > on
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > rest server and to use to convert the pcap to
> xml
> > > and
> > > > > > then
> > > > > > > > to a
> > > > > > > > > > > json
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > will be send to the frontend.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I tried to start directly the map/reduce job to
> > > > search
> > > > > > over
> > > > > > > > all
> > > > > > > > > > the
> > > > > > > > > > > > > pcap
> > > > > > > > > > > > > > > data from the rest server and as Ryan mention
> it,
> > > we
> > > > > had
> > > > > > > > > > trouble. I
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > > try to find back the error.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Then in the POC, what we tried is to use the
> > > > pcap_query
> > > > > > > > script
> > > > > > > > > > and
> > > > > > > > > > > > this
> > > > > > > > > > > > > > > work fine. I just modified it that he sends
> back
> > > > > directly
> > > > > > > the
> > > > > > > > > > > job_id
> > > > > > > > > > > > of
> > > > > > > > > > > > > > > yarn and not waiting that the job is finished.
> > Then
> > > > it
> > > > > > will
> > > > > > > > > > allow
> > > > > > > > > > > the
> > > > > > > > > > > > > UI
> > > > > > > > > > > > > > > and the rest server to know what the status of
> > the
> > > > > > research
> > > > > > > > by
> > > > > > > > > > > > querying
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > yarn rest api. This will allow the UI and the
> > rest
> > > > > server
> > > > > > > to
> > > > > > > > be
> > > > > > > > > > > async
> > > > > > > > > > > > > > > without any blocking phase. What do you think
> > about
> > > > > that?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Having the job submitted directly from the code
> > of
> > > > the
> > > > > > rest
> > > > > > > > > > server
> > > > > > > > > > > > will
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > perfect, but it will need a lot of
> investigation
> > I
> > > > > think
> > > > > > > (but
> > > > > > > > > > I'm
> > > > > > > > > > > not
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > expert so I might be completely wrong ^^).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We know that the pcap_query scritp work fine so
> > why
> > > > not
> > > > > > > > calling
> > > > > > > > > > it?
> > > > > > > > > > > > Is
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > that bad? (maybe stupid question, but I really
> > > don’t
> > > > > see
> > > > > > a
> > > > > > > > lot
> > > > > > > > > > of
> > > > > > > > > > > > > > drawback)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Front end:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Adding the the pcap search to the alert UI is,
> I
> > > > think,
> > > > > > the
> > > > > > > > > > easiest
> > > > > > > > > > > > way
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > move forward. But indeed, it will then be the
> > > “Alert
> > > > UI
> > > > > > and
> > > > > > > > > > > > pcapquery”.
> > > > > > > > > > > > > > > Maybe the name of the UI should just change to
> > > > > something
> > > > > > > like
> > > > > > > > > > > > > > “Monitoring &
> > > > > > > > > > > > > > > Investigation UI” ?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Is there any roadmap or plan for the different
> > UI?
> > > I
> > > > > mean
> > > > > > > did
> > > > > > > > > > you
> > > > > > > > > > > > > > already
> > > > > > > > > > > > > > > had discussion on how you see the ui evolving
> > with
> > > > the
> > > > > > new
> > > > > > > > > > feature
> > > > > > > > > > > > that
> > > > > > > > > > > > > > > will come in the future?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Microservices:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > What do you mean exactly by microservices? Is
> it
> > to
> > > > > > > separate
> > > > > > > > > all
> > > > > > > > > > > the
> > > > > > > > > > > > > > > features in different projects? Or something
> like
> > > > > having
> > > > > > > the
> > > > > > > > > > > > different
> > > > > > > > > > > > > > > components in container like kubernet? (again
> > maybe
> > > > > > stupid
> > > > > > > > > > > question,
> > > > > > > > > > > > > but
> > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > don’t clearly understand what you mean J )
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Michel
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > --
> > > > > > > > simon elliston ball
> > > > > > > > @sireb
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> --
>
> Jon
>

Re: [DISCUSS] Pcap panel architecture

Posted by "Zeolla@GMail.com" <ze...@gmail.com>.
At the very least there needs to be the ability to share downloaded PCAPs
with other users and/or have roles that can see all pcaps.  A platform
engineer may want to clean up old pcaps after x time, or a manger may ask
an analyst to find all of the traffic that exhibits xyz behavior, dump a
pcap, and then point him to it so the manager can review.  Since the
pcap may be huge, we wouldn't want to try to push people to sending it via
email, uploading to a file server, finding an external hard drive, etc.

Jon

On Thu, May 10, 2018 at 10:16 AM Ryan Merriman <me...@gmail.com> wrote:

> Mike, I believe the /pcapGetter/getPcapsByIdentifiers endpoint exposes the
> fixed query option which we have covered.  I agree with you that
> deprecating the metron-api module should be a goal of this feature.
>
> On Wed, May 9, 2018 at 1:36 PM, Michael Miklavcic <
> michael.miklavcic@gmail.com> wrote:
>
> > This looks like a pretty good start Ryan. Does the metadata endpoint
> cover
> > this https://github.com/apache/metron/tree/master/
> > metron-platform/metron-api#the-pcapgettergetpcapsbyidentifiers-endpoint
> > from the original metron-api? If so, then we would be able to deprecate
> the
> > existing metron-api project. If we later go to micro-services, a pcap
> > module would spin back into the fold, but it would probably look
> different
> > from metron-api.
> >
> > I commented on the UI thread, but to reiterate for the purpose of backend
> > functionality here I don't believe there is a way to "PAUSE" or "SUSPEND"
> > jobs. That said, I think GET /api/v1/pcap/stop/<jobId> is sufficient for
> > the job management operations.
> >
> > On Wed, May 9, 2018 at 11:00 AM, Ryan Merriman <me...@gmail.com>
> > wrote:
> >
> > > Now that we are confident we can run submit a MR job from our current
> > REST
> > > application, is this the desired approach?  Just want to confirm.
> > >
> > > Next I think we should map out what the REST interface will look like.
> > > Here are the endpoints I'm thinking about:
> > >
> > > GET /api/v1/pcap/metadata?basePath
> > >
> > > This endpoint will return metadata of pcap data stored in HDFS.  This
> > would
> > > include pcap size, date ranges (how far back can I go), etc.  It would
> > > accept an optional HDFS basePath parameter for cases where pcap data is
> > > stored in multiple places and/or different from the default location.
> > >
> > > POST /api/v1/pcap/query
> > >
> > > This endpoint would accept a pcap request, submit a pcap query job, and
> > > return a job id.  The request would be an object containing the
> > parameters
> > > documented here:  https://github.com/apache/metron/tree/master/
> > > metron-platform/metron-pcap-backend#query-filter-utility.  A query/job
> > > would be associated with a user that submits it.  An exception will be
> > > returned for violating constraints like too many queries submitted,
> query
> > > parameters out of limits, etc.
> > >
> > > GET /api/v1/pcap/status/<jobId>
> > >
> > > This endpoint will return the status of a running job.  I imagine this
> is
> > > just a proxy to the YARN REST api.  We can discuss the implementation
> > > behind these endpoints later.
> > >
> > > GET /api/v1/pcap/stop/<jobId>
> > >
> > > This endpoint would kill a running pcap job.  If the job has already
> > > completed this is a noop.
> > >
> > > GET /api/v1/pcap/list
> > >
> > > This endpoint will list a user's submitted pcap queries.  Items in the
> > list
> > > would contain job id, status (is it finished?), start/end time, and
> > number
> > > of pages.  Maybe there is some overlap with the status endpoint above
> and
> > > the status endpoint is not needed?
> > >
> > > GET /api/v1/pcap/pdml/<jobId>/<pageNumber>
> > >
> > > This endpoint will return pcap results for the given page in pdml
> format
> > (
> > > https://wiki.wireshark.org/PDML).  Are there other formats we want to
> > > support?
> > >
> > > GET /api/v1/pcap/raw/<jobId>/<pageNumber>
> > >
> > > This endpoint will allow a user to download raw pcap results for the
> > given
> > > page.
> > >
> > > DELETE /api/v1/pcap/<jobId>
> > >
> > > This endpoint will delete pcap query results.  Not sure yet how this
> fits
> > > in with our broader cleanup strategy.
> > >
> > > This should get us started.  What did I miss and what would you change
> > > about these?  I did not include much detail related to security,
> cleanup
> > > strategy, or underlying implementation details but these are items we
> > > should discuss at some point.
> > >
> > > On Tue, May 8, 2018 at 5:38 PM, Michael Miklavcic <
> > > michael.miklavcic@gmail.com> wrote:
> > >
> > > > Sweet! That's great news. The pom changes are a lot simpler than I
> > > > expected. Very nice.
> > > >
> > > > On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman <me...@gmail.com>
> > > wrote:
> > > >
> > > > > Finally figured it out.  Commit is here:
> > > > > https://github.com/merrimanr/incubator-metron/commit/
> > > > > 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
> > > > >
> > > > > It came down to figuring out the right combination of maven
> > > dependencies
> > > > > and passing in the HDP version to REST as a Java system property.
> I
> > > also
> > > > > included some HDFS setup tasks.  I tested this in full dev and can
> > now
> > > > > successfully run a pcap query and get results.  All you should have
> > to
> > > do
> > > > > is generate some pcap data first.
> > > > >
> > > > > On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
> > > > > michael.miklavcic@gmail.com> wrote:
> > > > >
> > > > > > @Ryan - pulled your branch and experimented with a few things. In
> > > doing
> > > > > so,
> > > > > > it dawned on me that by adding the yarn and hadoop classpath, you
> > > > > probably
> > > > > > didn't introduce a new classpath issue, rather you probably just
> > > moved
> > > > > onto
> > > > > > the next classpath issue, ie hbase per your exception about hbase
> > > jaxb.
> > > > > > Anyhow, I put up a branch with some pom changes worth trying in
> > > > > conjunction
> > > > > > with invoking the rest app startup via "/usr/bin/yarn jar"
> > > > > >
> > > > > > https://github.com/mmiklavc/metron/tree/ryan-rest-test
> > > > > >
> > > > > > https://github.com/mmiklavc/metron/commit/
> > > > 5ca23580fc6e043fafae2327c80b65
> > > > > > b20ca1c0c9
> > > > > >
> > > > > > Mike
> > > > > >
> > > > > >
> > > > > > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> > > > > > simon@simonellistonball.com> wrote:
> > > > > >
> > > > > > > That would be a step closer to something more like a
> > micro-service
> > > > > > > architecture. However, I would want to make sure we think about
> > the
> > > > > > > operational complexity, and mpack implications of having
> another
> > > > server
> > > > > > > installed and running somewhere on the cluster (also, ssl,
> > > kerberos,
> > > > > etc
> > > > > > > etc requirements for that service).
> > > > > > >
> > > > > > > On 8 May 2018 at 14:27, Ryan Merriman <me...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > +1 to having metron-api as it's own service and using a
> gateway
> > > > type
> > > > > > > > pattern.
> > > > > > > >
> > > > > > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <
> > > > ottobackwards@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Why not have metron-api as it’s own service and use a
> > ‘gateway’
> > > > > type
> > > > > > > > > pattern in rest?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (
> > merrimanr@gmail.com
> > > )
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Moving the yarn classpath command earlier in the classpath
> > now
> > > > > gives
> > > > > > > this
> > > > > > > > > error:
> > > > > > > > >
> > > > > > > > > Caused by: java.lang.NoSuchMethodError:
> > > > > > > > > javax.servlet.ServletContext.getVirtualServerName()Ljava/
> > > > > > lang/String;
> > > > > > > > >
> > > > > > > > > I will experiment with other combinations, I suspect we
> will
> > > need
> > > > > > > > > finer-grain control over the order.
> > > > > > > > >
> > > > > > > > > The grep matches class names inside jar files. I use this
> all
> > > the
> > > > > > time
> > > > > > > > and
> > > > > > > > > it's really useful.
> > > > > > > > >
> > > > > > > > > The metron-rest jar is already shaded.
> > > > > > > > >
> > > > > > > > > Reverse engineering the yarn jar command was the next
> thing I
> > > was
> > > > > > going
> > > > > > > > to
> > > > > > > > > try. Will let you know how it goes.
> > > > > > > > >
> > > > > > > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > > > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > What order did you add the hadoop or yarn classpath? The
> > > > "shaded"
> > > > > > > > > package
> > > > > > > > > > stands out to me in this name "org.apache.hadoop.hbase.*
> > > > shaded*
> > > > > > > > > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider."
> > Maybe
> > > > try
> > > > > > > adding
> > > > > > > > > > those packages earlier on the classpath.
> > > > > > > > > >
> > > > > > > > > > I think that find command needs a "jar tvf", otherwise
> > you're
> > > > > > looking
> > > > > > > > > for a
> > > > > > > > > > class name in jar file names.
> > > > > > > > > >
> > > > > > > > > > Have you tried shading the rest jar?
> > > > > > > > > >
> > > > > > > > > > I'd also look at the classpath you get when running "yarn
> > > jar"
> > > > to
> > > > > > > start
> > > > > > > > > the
> > > > > > > > > > existing pcap service, per the instructions in
> > > > > > metron-api/README.md.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <
> > > > > merrimanr@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > To explore the idea of merging metron-api into
> > metron-rest
> > > > and
> > > > > > > > running
> > > > > > > > > > pcap
> > > > > > > > > > > queries inside our REST application, I created a simple
> > > test
> > > > > > here:
> > > > > > > > > > >
> https://github.com/merrimanr/incubator-metron/tree/pcap-
> > > > > > rest-test.
> > > > > > > A
> > > > > > > > > > > summary of what's included:
> > > > > > > > > > >
> > > > > > > > > > > - Added pcap as a dependency in the metron-rest pom.xml
> > > > > > > > > > > - Added a pcap query controller endpoint at
> > > > > > > > > > > http://node1:8082/swagger-ui.
> > html#!/pcap-query-controller/
> > > > > > > > > > queryUsingGET
> > > > > > > > > > > - Added a pcap query service that runs a simple,
> > hardcoded
> > > > > query
> > > > > > > > > > >
> > > > > > > > > > > Generate some pcap data using pycapa (
> > > > > > > > > > > https://github.com/apache/metron/tree/master/metron-
> > > > > > sensors/pycapa
> > > > > > > )
> > > > > > > > > and
> > > > > > > > > > > the
> > > > > > > > > > > pcap topology (
> > > > > > > > > > > https://github.com/apache/metron/tree/master/metron-
> > > > > > > > > > > platform/metron-pcap-backend#starting-the-topology).
> > > > > > > > > > > After this initial setup there should be data in HDFS
> at
> > > > > > > > > > > "/apps/metron/pcap". I believe this should be enough to
> > > > > exercise
> > > > > > > the
> > > > > > > > > > > issue. Just hit the endpoint referenced above. I tested
> > > this
> > > > in
> > > > > > an
> > > > > > > > > > > already running full dev by building and deploying the
> > > > > > metron-rest
> > > > > > > > > jar.
> > > > > > > > > > I
> > > > > > > > > > > did not rebuild full dev with this change but I would
> > still
> > > > > > expect
> > > > > > > it
> > > > > > > > > to
> > > > > > > > > > > work. Let me know if it doesn't.
> > > > > > > > > > >
> > > > > > > > > > > The first error I see when I hit this endpoint is:
> > > > > > > > > > >
> > > > > > > > > > > java.lang.NoClassDefFoundError:
> > > > > > > > > > > org/apache/hadoop/yarn/webapp/
> > YarnJacksonJaxbJsonProvider.
> > > > > > > > > > >
> > > > > > > > > > > Here are the things I've tried so far:
> > > > > > > > > > >
> > > > > > > > > > > - Run the REST application with the YARN jar command
> > since
> > > > this
> > > > > > is
> > > > > > > > how
> > > > > > > > > > > all our other YARN/MR-related applications are started
> > > > > > (metron-api,
> > > > > > > > > > > MAAS,
> > > > > > > > > > > pcap query, etc). I wouldn't expect this to work since
> we
> > > > have
> > > > > > > > > > runtime
> > > > > > > > > > > dependencies on our shaded elasticsearch and parser
> jars
> > > and
> > > > > I'm
> > > > > > > not
> > > > > > > > > > > aware
> > > > > > > > > > > of a way to add additional jars to the classpath with
> the
> > > > YARN
> > > > > > jar
> > > > > > > > > > > command
> > > > > > > > > > > (is there a way?). Either way I get this error:
> > > > > > > > > > >
> > > > > > > > > > > 18/05/04 19:49:56 WARN reflections.Reflections: could
> not
> > > > > create
> > > > > > > Dir
> > > > > > > > > > using
> > > > > > > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/
> > > > > hadoop/lib/ojdbc6.jar.
> > > > > > > > > > skipping.
> > > > > > > > > > > java.lang.NullPointerException
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > - I tried adding `yarn classpath` and `hadoop
> classpath`
> > to
> > > > the
> > > > > > > > > > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST
> > > > start
> > > > > > > > > > > script). I
> > > > > > > > > > > get this error:
> > > > > > > > > > >
> > > > > > > > > > > java.lang.ClassNotFoundException:
> > > > > > > > > > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > > > > > > > jaxrs.JacksonJaxbJsonProvider
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > - I searched for the class in the previous attempt but
> > > could
> > > > > not
> > > > > > > find
> > > > > > > > > > it
> > > > > > > > > > > in full dev:
> > > > > > > > > > >
> > > > > > > > > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > > > > > > > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > > > > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > > > > > > > 2>/dev/null
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > - Further up in the stack trace I see the error happens
> > > when
> > > > > > > > > > initiating
> > > > > > > > > > > the org.apache.hadoop.yarn.util.timeline.TimelineUtils
> > > > class.
> > > > > I
> > > > > > > > > > tried
> > > > > > > > > > > setting "yarn.timeline-service.enabled" in Ambari to
> > false
> > > > and
> > > > > > > then
> > > > > > > > I
> > > > > > > > > > > get
> > > > > > > > > > > this error:
> > > > > > > > > > >
> > > > > > > > > > > Unable to parse
> > > > > > > > > > >
> '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-
> > > > > > framework'
> > > > > > > > as
> > > > > > > > > a
> > > > > > > > > > > URI, check the setting for mapreduce.application.
> > > > > framework.path
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > - I've tried adding different hadoop, hbase, yarn and
> > > > mapreduce
> > > > > > > Maven
> > > > > > > > > > > dependencies without any success
> > > > > > > > > > > - hadoop-yarn-client
> > > > > > > > > > > - hadoop-yarn-common
> > > > > > > > > > > - hadoop-mapreduce-client-core
> > > > > > > > > > > - hadoop-yarn-server-common
> > > > > > > > > > > - hadoop-yarn-api
> > > > > > > > > > > - hbase-server
> > > > > > > > > > >
> > > > > > > > > > > I will keep exploring other possible solutions. Let me
> > know
> > > > if
> > > > > > > anyone
> > > > > > > > > > has
> > > > > > > > > > > any ideas.
> > > > > > > > > > >
> > > > > > > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
> > > > > > > ottobackwards@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I can imagine a new generic service(s) capability
> whose
> > > > job (
> > > > > > pun
> > > > > > > > > > > intended
> > > > > > > > > > > > ) is to
> > > > > > > > > > > > abstract the submittal, tracking, and storage of
> > results
> > > to
> > > > > > yarn.
> > > > > > > > > > > >
> > > > > > > > > > > > It would be extended with storage providers, queue
> > > > provider,
> > > > > > > > > possibly
> > > > > > > > > > > some
> > > > > > > > > > > > set of policies or rather strategies.
> > > > > > > > > > > >
> > > > > > > > > > > > The pcap ‘report’ would be a client to that service,
> > the
> > > > > > > > specializes
> > > > > > > > > > the
> > > > > > > > > > > > service operation for the way we want pcap to work.
> > > > > > > > > > > >
> > > > > > > > > > > > We can then re-use the generic service for other long
> > > > running
> > > > > > > yarn
> > > > > > > > > > > > things…..
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
> > > > > > ottobackwards@gmail.com
> > > > > > > )
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > RE: Tracking v. users
> > > > > > > > > > > >
> > > > > > > > > > > > The submittal and tracking can associate the
> submitter
> > > with
> > > > > the
> > > > > > > > yarn
> > > > > > > > > > job
> > > > > > > > > > > > and track that,
> > > > > > > > > > > > regardless of the yarn credentials.
> > > > > > > > > > > >
> > > > > > > > > > > > IE> if all submittals and monitoring are by the same
> > yarn
> > > > > user
> > > > > > (
> > > > > > > > > > Metron )
> > > > > > > > > > > > from a single or
> > > > > > > > > > > > co-operative set of services, that service can
> maintain
> > > the
> > > > > > > > mapping.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (
> > > > > merrimanr@gmail.com
> > > > > > )
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Otto, your use case makes sense to me. We'll have to
> > > think
> > > > > > about
> > > > > > > > how
> > > > > > > > > to
> > > > > > > > > > > > manage the user to job relationships. I'm assuming
> YARN
> > > > jobs
> > > > > > will
> > > > > > > > be
> > > > > > > > > > > > submitted as the metron service user so YARN won't
> keep
> > > > track
> > > > > > of
> > > > > > > > > this
> > > > > > > > > > for
> > > > > > > > > > > > us. Is that assumption correct? Do you have any ideas
> > for
> > > > > doing
> > > > > > > > > that?
> > > > > > > > > > > >
> > > > > > > > > > > > Mike, I can start a feature branch and experiment
> with
> > > > > merging
> > > > > > > > > > metron-api
> > > > > > > > > > > > into metron-rest. That should allow us to collaborate
> > on
> > > > any
> > > > > > > issues
> > > > > > > > > or
> > > > > > > > > > > > challenges. Also, can you expand on your idea to
> manage
> > > > > > external
> > > > > > > > > > > > dependencies as a special module? That seems like a
> > very
> > > > > > > attractive
> > > > > > > > > > > option
> > > > > > > > > > > > to me.
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
> > > > > > > > ottobackwards@gmail.com>
> > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > From my response on the other thread, but
> applicable
> > to
> > > > the
> > > > > > > > > backend
> > > > > > > > > > > > stuff:
> > > > > > > > > > > > >
> > > > > > > > > > > > > "The PCAP Query seems more like PCAP Report to me.
> > You
> > > > are
> > > > > > > > > > generating a
> > > > > > > > > > > > > report based on parameters.
> > > > > > > > > > > > > That report is something that takes some time and
> > > > external
> > > > > > > > process
> > > > > > > > > to
> > > > > > > > > > > > > generate… ie you have to wait for it.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I can almost imagine a flow where you:
> > > > > > > > > > > > >
> > > > > > > > > > > > > * Are in the AlertUI
> > > > > > > > > > > > > * Ask to generate a PCAP report based on some
> > selected
> > > > > > > > > > > alerts/meta-alert,
> > > > > > > > > > > > > possibly picking from on or more report ‘templates’
> > > > > > > > > > > > > that have query options etc
> > > > > > > > > > > > > * The report request is ‘queued’, that is
> dispatched
> > to
> > > > be
> > > > > be
> > > > > > > > > > > > > executed/generated
> > > > > > > > > > > > > * You as a user have a ‘queue’ of your report
> > results,
> > > > and
> > > > > > when
> > > > > > > > > the
> > > > > > > > > > > > report
> > > > > > > > > > > > > is done it is queued there
> > > > > > > > > > > > > * We ‘monitor’ the report/queue press through the
> > yarn
> > > > > rest (
> > > > > > > > > report
> > > > > > > > > > > > > info/meta has the yarn details )
> > > > > > > > > > > > > * You can select the report from your queue and
> view
> > it
> > > > > > either
> > > > > > > in
> > > > > > > > > a
> > > > > > > > > > new
> > > > > > > > > > > > UI
> > > > > > > > > > > > > or custom component
> > > > > > > > > > > > > * You can then apply a different ‘view’ to the
> report
> > > or
> > > > > work
> > > > > > > > with
> > > > > > > > > > the
> > > > > > > > > > > > > report data
> > > > > > > > > > > > > * You can print / save etc
> > > > > > > > > > > > > * You can associate the report with the alerts (
> > again
> > > in
> > > > > the
> > > > > > > > > report
> > > > > > > > > > > info
> > > > > > > > > > > > > ) with…. a ‘case’ or ‘ticket’ or investigation
> > > something
> > > > or
> > > > > > > other
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > We can introduce extensibility into the report
> > > templates,
> > > > > > > report
> > > > > > > > > > views
> > > > > > > > > > > (
> > > > > > > > > > > > > thinks that work with the json data of the report )
> > > > > > > > > > > > >
> > > > > > > > > > > > > Something like that.”
> > > > > > > > > > > > >
> > > > > > > > > > > > > Maybe we can do :
> > > > > > > > > > > > >
> > > > > > > > > > > > > template -> query parameters -> script => yarn info
> > > > > > > > > > > > > yarn info + query info + alert context + yarn
> status
> > =>
> > > > > > report
> > > > > > > > > info
> > > > > > > > > > ->
> > > > > > > > > > > > > stored in a user’s ‘report queue’
> > > > > > > > > > > > > report persistence added to report info
> > > > > > > > > > > > > metron-rest -> api to monitor the queue, read
> > results (
> > > > > page
> > > > > > ),
> > > > > > > > > etc
> > > > > > > > > > etc
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (
> > > > > > merrimanr@gmail.com
> > > > > > > )
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > I started a separate thread on Pcap UI
> considerations
> > > and
> > > > > > user
> > > > > > > > > > > > > requirements
> > > > > > > > > > > > > at Otto's request. This should help us keep these
> two
> > > > > related
> > > > > > > but
> > > > > > > > > > > > separate
> > > > > > > > > > > > > discussions focused.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
> > > > > > > > > > michelsumbul@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Youhouuu my first reply on this kind of mail
> > > chain^^)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If I may, I would like to share my view on the
> > > > following
> > > > > 3
> > > > > > > > > points.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Backend:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The current metron-api is totally seperate, it
> will
> > > be
> > > > > > logic
> > > > > > > > for
> > > > > > > > > me
> > > > > > > > > > > to
> > > > > > > > > > > > > have
> > > > > > > > > > > > > > it at the same place as the others rest api.
> > > Especially
> > > > > > when
> > > > > > > > > more
> > > > > > > > > > > > > security
> > > > > > > > > > > > > > will be added, it will not be needed to do the
> job
> > > > twice.
> > > > > > > > > > > > > > The current implementation send back a pcap
> object
> > > > which
> > > > > > > still
> > > > > > > > > need
> > > > > > > > > > > to
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > decoded. In the opensoc, the decoding was done
> with
> > > > > tshard
> > > > > > on
> > > > > > > > > the
> > > > > > > > > > > > > frontend.
> > > > > > > > > > > > > > It will be good to have this decoding happening
> > > > directly
> > > > > on
> > > > > > > the
> > > > > > > > > > > backend
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > not create a load on frontend. An option will be
> to
> > > > > install
> > > > > > > > > tshark
> > > > > > > > > > on
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > rest server and to use to convert the pcap to xml
> > and
> > > > > then
> > > > > > > to a
> > > > > > > > > > json
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > will be send to the frontend.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I tried to start directly the map/reduce job to
> > > search
> > > > > over
> > > > > > > all
> > > > > > > > > the
> > > > > > > > > > > > pcap
> > > > > > > > > > > > > > data from the rest server and as Ryan mention it,
> > we
> > > > had
> > > > > > > > > trouble. I
> > > > > > > > > > > > will
> > > > > > > > > > > > > > try to find back the error.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Then in the POC, what we tried is to use the
> > > pcap_query
> > > > > > > script
> > > > > > > > > and
> > > > > > > > > > > this
> > > > > > > > > > > > > > work fine. I just modified it that he sends back
> > > > directly
> > > > > > the
> > > > > > > > > > job_id
> > > > > > > > > > > of
> > > > > > > > > > > > > > yarn and not waiting that the job is finished.
> Then
> > > it
> > > > > will
> > > > > > > > > allow
> > > > > > > > > > the
> > > > > > > > > > > > UI
> > > > > > > > > > > > > > and the rest server to know what the status of
> the
> > > > > research
> > > > > > > by
> > > > > > > > > > > querying
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > yarn rest api. This will allow the UI and the
> rest
> > > > server
> > > > > > to
> > > > > > > be
> > > > > > > > > > async
> > > > > > > > > > > > > > without any blocking phase. What do you think
> about
> > > > that?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Having the job submitted directly from the code
> of
> > > the
> > > > > rest
> > > > > > > > > server
> > > > > > > > > > > will
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > perfect, but it will need a lot of investigation
> I
> > > > think
> > > > > > (but
> > > > > > > > > I'm
> > > > > > > > > > not
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > expert so I might be completely wrong ^^).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We know that the pcap_query scritp work fine so
> why
> > > not
> > > > > > > calling
> > > > > > > > > it?
> > > > > > > > > > > Is
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > that bad? (maybe stupid question, but I really
> > don’t
> > > > see
> > > > > a
> > > > > > > lot
> > > > > > > > > of
> > > > > > > > > > > > > drawback)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Front end:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Adding the the pcap search to the alert UI is, I
> > > think,
> > > > > the
> > > > > > > > > easiest
> > > > > > > > > > > way
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > move forward. But indeed, it will then be the
> > “Alert
> > > UI
> > > > > and
> > > > > > > > > > > pcapquery”.
> > > > > > > > > > > > > > Maybe the name of the UI should just change to
> > > > something
> > > > > > like
> > > > > > > > > > > > > “Monitoring &
> > > > > > > > > > > > > > Investigation UI” ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Is there any roadmap or plan for the different
> UI?
> > I
> > > > mean
> > > > > > did
> > > > > > > > > you
> > > > > > > > > > > > > already
> > > > > > > > > > > > > > had discussion on how you see the ui evolving
> with
> > > the
> > > > > new
> > > > > > > > > feature
> > > > > > > > > > > that
> > > > > > > > > > > > > > will come in the future?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Microservices:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > What do you mean exactly by microservices? Is it
> to
> > > > > > separate
> > > > > > > > all
> > > > > > > > > > the
> > > > > > > > > > > > > > features in different projects? Or something like
> > > > having
> > > > > > the
> > > > > > > > > > > different
> > > > > > > > > > > > > > components in container like kubernet? (again
> maybe
> > > > > stupid
> > > > > > > > > > question,
> > > > > > > > > > > > but
> > > > > > > > > > > > > I
> > > > > > > > > > > > > > don’t clearly understand what you mean J )
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Michel
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > --
> > > > > > > simon elliston ball
> > > > > > > @sireb
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
-- 

Jon

Re: [DISCUSS] Pcap panel architecture

Posted by Ryan Merriman <me...@gmail.com>.
Mike, I believe the /pcapGetter/getPcapsByIdentifiers endpoint exposes the
fixed query option which we have covered.  I agree with you that
deprecating the metron-api module should be a goal of this feature.

On Wed, May 9, 2018 at 1:36 PM, Michael Miklavcic <
michael.miklavcic@gmail.com> wrote:

> This looks like a pretty good start Ryan. Does the metadata endpoint cover
> this https://github.com/apache/metron/tree/master/
> metron-platform/metron-api#the-pcapgettergetpcapsbyidentifiers-endpoint
> from the original metron-api? If so, then we would be able to deprecate the
> existing metron-api project. If we later go to micro-services, a pcap
> module would spin back into the fold, but it would probably look different
> from metron-api.
>
> I commented on the UI thread, but to reiterate for the purpose of backend
> functionality here I don't believe there is a way to "PAUSE" or "SUSPEND"
> jobs. That said, I think GET /api/v1/pcap/stop/<jobId> is sufficient for
> the job management operations.
>
> On Wed, May 9, 2018 at 11:00 AM, Ryan Merriman <me...@gmail.com>
> wrote:
>
> > Now that we are confident we can run submit a MR job from our current
> REST
> > application, is this the desired approach?  Just want to confirm.
> >
> > Next I think we should map out what the REST interface will look like.
> > Here are the endpoints I'm thinking about:
> >
> > GET /api/v1/pcap/metadata?basePath
> >
> > This endpoint will return metadata of pcap data stored in HDFS.  This
> would
> > include pcap size, date ranges (how far back can I go), etc.  It would
> > accept an optional HDFS basePath parameter for cases where pcap data is
> > stored in multiple places and/or different from the default location.
> >
> > POST /api/v1/pcap/query
> >
> > This endpoint would accept a pcap request, submit a pcap query job, and
> > return a job id.  The request would be an object containing the
> parameters
> > documented here:  https://github.com/apache/metron/tree/master/
> > metron-platform/metron-pcap-backend#query-filter-utility.  A query/job
> > would be associated with a user that submits it.  An exception will be
> > returned for violating constraints like too many queries submitted, query
> > parameters out of limits, etc.
> >
> > GET /api/v1/pcap/status/<jobId>
> >
> > This endpoint will return the status of a running job.  I imagine this is
> > just a proxy to the YARN REST api.  We can discuss the implementation
> > behind these endpoints later.
> >
> > GET /api/v1/pcap/stop/<jobId>
> >
> > This endpoint would kill a running pcap job.  If the job has already
> > completed this is a noop.
> >
> > GET /api/v1/pcap/list
> >
> > This endpoint will list a user's submitted pcap queries.  Items in the
> list
> > would contain job id, status (is it finished?), start/end time, and
> number
> > of pages.  Maybe there is some overlap with the status endpoint above and
> > the status endpoint is not needed?
> >
> > GET /api/v1/pcap/pdml/<jobId>/<pageNumber>
> >
> > This endpoint will return pcap results for the given page in pdml format
> (
> > https://wiki.wireshark.org/PDML).  Are there other formats we want to
> > support?
> >
> > GET /api/v1/pcap/raw/<jobId>/<pageNumber>
> >
> > This endpoint will allow a user to download raw pcap results for the
> given
> > page.
> >
> > DELETE /api/v1/pcap/<jobId>
> >
> > This endpoint will delete pcap query results.  Not sure yet how this fits
> > in with our broader cleanup strategy.
> >
> > This should get us started.  What did I miss and what would you change
> > about these?  I did not include much detail related to security, cleanup
> > strategy, or underlying implementation details but these are items we
> > should discuss at some point.
> >
> > On Tue, May 8, 2018 at 5:38 PM, Michael Miklavcic <
> > michael.miklavcic@gmail.com> wrote:
> >
> > > Sweet! That's great news. The pom changes are a lot simpler than I
> > > expected. Very nice.
> > >
> > > On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman <me...@gmail.com>
> > wrote:
> > >
> > > > Finally figured it out.  Commit is here:
> > > > https://github.com/merrimanr/incubator-metron/commit/
> > > > 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
> > > >
> > > > It came down to figuring out the right combination of maven
> > dependencies
> > > > and passing in the HDP version to REST as a Java system property.  I
> > also
> > > > included some HDFS setup tasks.  I tested this in full dev and can
> now
> > > > successfully run a pcap query and get results.  All you should have
> to
> > do
> > > > is generate some pcap data first.
> > > >
> > > > On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
> > > > michael.miklavcic@gmail.com> wrote:
> > > >
> > > > > @Ryan - pulled your branch and experimented with a few things. In
> > doing
> > > > so,
> > > > > it dawned on me that by adding the yarn and hadoop classpath, you
> > > > probably
> > > > > didn't introduce a new classpath issue, rather you probably just
> > moved
> > > > onto
> > > > > the next classpath issue, ie hbase per your exception about hbase
> > jaxb.
> > > > > Anyhow, I put up a branch with some pom changes worth trying in
> > > > conjunction
> > > > > with invoking the rest app startup via "/usr/bin/yarn jar"
> > > > >
> > > > > https://github.com/mmiklavc/metron/tree/ryan-rest-test
> > > > >
> > > > > https://github.com/mmiklavc/metron/commit/
> > > 5ca23580fc6e043fafae2327c80b65
> > > > > b20ca1c0c9
> > > > >
> > > > > Mike
> > > > >
> > > > >
> > > > > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> > > > > simon@simonellistonball.com> wrote:
> > > > >
> > > > > > That would be a step closer to something more like a
> micro-service
> > > > > > architecture. However, I would want to make sure we think about
> the
> > > > > > operational complexity, and mpack implications of having another
> > > server
> > > > > > installed and running somewhere on the cluster (also, ssl,
> > kerberos,
> > > > etc
> > > > > > etc requirements for that service).
> > > > > >
> > > > > > On 8 May 2018 at 14:27, Ryan Merriman <me...@gmail.com>
> wrote:
> > > > > >
> > > > > > > +1 to having metron-api as it's own service and using a gateway
> > > type
> > > > > > > pattern.
> > > > > > >
> > > > > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <
> > > ottobackwards@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Why not have metron-api as it’s own service and use a
> ‘gateway’
> > > > type
> > > > > > > > pattern in rest?
> > > > > > > >
> > > > > > > >
> > > > > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (
> merrimanr@gmail.com
> > )
> > > > > wrote:
> > > > > > > >
> > > > > > > > Moving the yarn classpath command earlier in the classpath
> now
> > > > gives
> > > > > > this
> > > > > > > > error:
> > > > > > > >
> > > > > > > > Caused by: java.lang.NoSuchMethodError:
> > > > > > > > javax.servlet.ServletContext.getVirtualServerName()Ljava/
> > > > > lang/String;
> > > > > > > >
> > > > > > > > I will experiment with other combinations, I suspect we will
> > need
> > > > > > > > finer-grain control over the order.
> > > > > > > >
> > > > > > > > The grep matches class names inside jar files. I use this all
> > the
> > > > > time
> > > > > > > and
> > > > > > > > it's really useful.
> > > > > > > >
> > > > > > > > The metron-rest jar is already shaded.
> > > > > > > >
> > > > > > > > Reverse engineering the yarn jar command was the next thing I
> > was
> > > > > going
> > > > > > > to
> > > > > > > > try. Will let you know how it goes.
> > > > > > > >
> > > > > > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > What order did you add the hadoop or yarn classpath? The
> > > "shaded"
> > > > > > > > package
> > > > > > > > > stands out to me in this name "org.apache.hadoop.hbase.*
> > > shaded*
> > > > > > > > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider."
> Maybe
> > > try
> > > > > > adding
> > > > > > > > > those packages earlier on the classpath.
> > > > > > > > >
> > > > > > > > > I think that find command needs a "jar tvf", otherwise
> you're
> > > > > looking
> > > > > > > > for a
> > > > > > > > > class name in jar file names.
> > > > > > > > >
> > > > > > > > > Have you tried shading the rest jar?
> > > > > > > > >
> > > > > > > > > I'd also look at the classpath you get when running "yarn
> > jar"
> > > to
> > > > > > start
> > > > > > > > the
> > > > > > > > > existing pcap service, per the instructions in
> > > > > metron-api/README.md.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <
> > > > merrimanr@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > To explore the idea of merging metron-api into
> metron-rest
> > > and
> > > > > > > running
> > > > > > > > > pcap
> > > > > > > > > > queries inside our REST application, I created a simple
> > test
> > > > > here:
> > > > > > > > > > https://github.com/merrimanr/incubator-metron/tree/pcap-
> > > > > rest-test.
> > > > > > A
> > > > > > > > > > summary of what's included:
> > > > > > > > > >
> > > > > > > > > > - Added pcap as a dependency in the metron-rest pom.xml
> > > > > > > > > > - Added a pcap query controller endpoint at
> > > > > > > > > > http://node1:8082/swagger-ui.
> html#!/pcap-query-controller/
> > > > > > > > > queryUsingGET
> > > > > > > > > > - Added a pcap query service that runs a simple,
> hardcoded
> > > > query
> > > > > > > > > >
> > > > > > > > > > Generate some pcap data using pycapa (
> > > > > > > > > > https://github.com/apache/metron/tree/master/metron-
> > > > > sensors/pycapa
> > > > > > )
> > > > > > > > and
> > > > > > > > > > the
> > > > > > > > > > pcap topology (
> > > > > > > > > > https://github.com/apache/metron/tree/master/metron-
> > > > > > > > > > platform/metron-pcap-backend#starting-the-topology).
> > > > > > > > > > After this initial setup there should be data in HDFS at
> > > > > > > > > > "/apps/metron/pcap". I believe this should be enough to
> > > > exercise
> > > > > > the
> > > > > > > > > > issue. Just hit the endpoint referenced above. I tested
> > this
> > > in
> > > > > an
> > > > > > > > > > already running full dev by building and deploying the
> > > > > metron-rest
> > > > > > > > jar.
> > > > > > > > > I
> > > > > > > > > > did not rebuild full dev with this change but I would
> still
> > > > > expect
> > > > > > it
> > > > > > > > to
> > > > > > > > > > work. Let me know if it doesn't.
> > > > > > > > > >
> > > > > > > > > > The first error I see when I hit this endpoint is:
> > > > > > > > > >
> > > > > > > > > > java.lang.NoClassDefFoundError:
> > > > > > > > > > org/apache/hadoop/yarn/webapp/
> YarnJacksonJaxbJsonProvider.
> > > > > > > > > >
> > > > > > > > > > Here are the things I've tried so far:
> > > > > > > > > >
> > > > > > > > > > - Run the REST application with the YARN jar command
> since
> > > this
> > > > > is
> > > > > > > how
> > > > > > > > > > all our other YARN/MR-related applications are started
> > > > > (metron-api,
> > > > > > > > > > MAAS,
> > > > > > > > > > pcap query, etc). I wouldn't expect this to work since we
> > > have
> > > > > > > > > runtime
> > > > > > > > > > dependencies on our shaded elasticsearch and parser jars
> > and
> > > > I'm
> > > > > > not
> > > > > > > > > > aware
> > > > > > > > > > of a way to add additional jars to the classpath with the
> > > YARN
> > > > > jar
> > > > > > > > > > command
> > > > > > > > > > (is there a way?). Either way I get this error:
> > > > > > > > > >
> > > > > > > > > > 18/05/04 19:49:56 WARN reflections.Reflections: could not
> > > > create
> > > > > > Dir
> > > > > > > > > using
> > > > > > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/
> > > > hadoop/lib/ojdbc6.jar.
> > > > > > > > > skipping.
> > > > > > > > > > java.lang.NullPointerException
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > - I tried adding `yarn classpath` and `hadoop classpath`
> to
> > > the
> > > > > > > > > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST
> > > start
> > > > > > > > > > script). I
> > > > > > > > > > get this error:
> > > > > > > > > >
> > > > > > > > > > java.lang.ClassNotFoundException:
> > > > > > > > > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > > > > > > jaxrs.JacksonJaxbJsonProvider
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > - I searched for the class in the previous attempt but
> > could
> > > > not
> > > > > > find
> > > > > > > > > it
> > > > > > > > > > in full dev:
> > > > > > > > > >
> > > > > > > > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > > > > > > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > > > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > > > > > > 2>/dev/null
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > - Further up in the stack trace I see the error happens
> > when
> > > > > > > > > initiating
> > > > > > > > > > the org.apache.hadoop.yarn.util.timeline.TimelineUtils
> > > class.
> > > > I
> > > > > > > > > tried
> > > > > > > > > > setting "yarn.timeline-service.enabled" in Ambari to
> false
> > > and
> > > > > > then
> > > > > > > I
> > > > > > > > > > get
> > > > > > > > > > this error:
> > > > > > > > > >
> > > > > > > > > > Unable to parse
> > > > > > > > > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-
> > > > > framework'
> > > > > > > as
> > > > > > > > a
> > > > > > > > > > URI, check the setting for mapreduce.application.
> > > > framework.path
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > - I've tried adding different hadoop, hbase, yarn and
> > > mapreduce
> > > > > > Maven
> > > > > > > > > > dependencies without any success
> > > > > > > > > > - hadoop-yarn-client
> > > > > > > > > > - hadoop-yarn-common
> > > > > > > > > > - hadoop-mapreduce-client-core
> > > > > > > > > > - hadoop-yarn-server-common
> > > > > > > > > > - hadoop-yarn-api
> > > > > > > > > > - hbase-server
> > > > > > > > > >
> > > > > > > > > > I will keep exploring other possible solutions. Let me
> know
> > > if
> > > > > > anyone
> > > > > > > > > has
> > > > > > > > > > any ideas.
> > > > > > > > > >
> > > > > > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
> > > > > > ottobackwards@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I can imagine a new generic service(s) capability whose
> > > job (
> > > > > pun
> > > > > > > > > > intended
> > > > > > > > > > > ) is to
> > > > > > > > > > > abstract the submittal, tracking, and storage of
> results
> > to
> > > > > yarn.
> > > > > > > > > > >
> > > > > > > > > > > It would be extended with storage providers, queue
> > > provider,
> > > > > > > > possibly
> > > > > > > > > > some
> > > > > > > > > > > set of policies or rather strategies.
> > > > > > > > > > >
> > > > > > > > > > > The pcap ‘report’ would be a client to that service,
> the
> > > > > > > specializes
> > > > > > > > > the
> > > > > > > > > > > service operation for the way we want pcap to work.
> > > > > > > > > > >
> > > > > > > > > > > We can then re-use the generic service for other long
> > > running
> > > > > > yarn
> > > > > > > > > > > things…..
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
> > > > > ottobackwards@gmail.com
> > > > > > )
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > RE: Tracking v. users
> > > > > > > > > > >
> > > > > > > > > > > The submittal and tracking can associate the submitter
> > with
> > > > the
> > > > > > > yarn
> > > > > > > > > job
> > > > > > > > > > > and track that,
> > > > > > > > > > > regardless of the yarn credentials.
> > > > > > > > > > >
> > > > > > > > > > > IE> if all submittals and monitoring are by the same
> yarn
> > > > user
> > > > > (
> > > > > > > > > Metron )
> > > > > > > > > > > from a single or
> > > > > > > > > > > co-operative set of services, that service can maintain
> > the
> > > > > > > mapping.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (
> > > > merrimanr@gmail.com
> > > > > )
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Otto, your use case makes sense to me. We'll have to
> > think
> > > > > about
> > > > > > > how
> > > > > > > > to
> > > > > > > > > > > manage the user to job relationships. I'm assuming YARN
> > > jobs
> > > > > will
> > > > > > > be
> > > > > > > > > > > submitted as the metron service user so YARN won't keep
> > > track
> > > > > of
> > > > > > > > this
> > > > > > > > > for
> > > > > > > > > > > us. Is that assumption correct? Do you have any ideas
> for
> > > > doing
> > > > > > > > that?
> > > > > > > > > > >
> > > > > > > > > > > Mike, I can start a feature branch and experiment with
> > > > merging
> > > > > > > > > metron-api
> > > > > > > > > > > into metron-rest. That should allow us to collaborate
> on
> > > any
> > > > > > issues
> > > > > > > > or
> > > > > > > > > > > challenges. Also, can you expand on your idea to manage
> > > > > external
> > > > > > > > > > > dependencies as a special module? That seems like a
> very
> > > > > > attractive
> > > > > > > > > > option
> > > > > > > > > > > to me.
> > > > > > > > > > >
> > > > > > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
> > > > > > > ottobackwards@gmail.com>
> > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > From my response on the other thread, but applicable
> to
> > > the
> > > > > > > > backend
> > > > > > > > > > > stuff:
> > > > > > > > > > > >
> > > > > > > > > > > > "The PCAP Query seems more like PCAP Report to me.
> You
> > > are
> > > > > > > > > generating a
> > > > > > > > > > > > report based on parameters.
> > > > > > > > > > > > That report is something that takes some time and
> > > external
> > > > > > > process
> > > > > > > > to
> > > > > > > > > > > > generate… ie you have to wait for it.
> > > > > > > > > > > >
> > > > > > > > > > > > I can almost imagine a flow where you:
> > > > > > > > > > > >
> > > > > > > > > > > > * Are in the AlertUI
> > > > > > > > > > > > * Ask to generate a PCAP report based on some
> selected
> > > > > > > > > > alerts/meta-alert,
> > > > > > > > > > > > possibly picking from on or more report ‘templates’
> > > > > > > > > > > > that have query options etc
> > > > > > > > > > > > * The report request is ‘queued’, that is dispatched
> to
> > > be
> > > > be
> > > > > > > > > > > > executed/generated
> > > > > > > > > > > > * You as a user have a ‘queue’ of your report
> results,
> > > and
> > > > > when
> > > > > > > > the
> > > > > > > > > > > report
> > > > > > > > > > > > is done it is queued there
> > > > > > > > > > > > * We ‘monitor’ the report/queue press through the
> yarn
> > > > rest (
> > > > > > > > report
> > > > > > > > > > > > info/meta has the yarn details )
> > > > > > > > > > > > * You can select the report from your queue and view
> it
> > > > > either
> > > > > > in
> > > > > > > > a
> > > > > > > > > new
> > > > > > > > > > > UI
> > > > > > > > > > > > or custom component
> > > > > > > > > > > > * You can then apply a different ‘view’ to the report
> > or
> > > > work
> > > > > > > with
> > > > > > > > > the
> > > > > > > > > > > > report data
> > > > > > > > > > > > * You can print / save etc
> > > > > > > > > > > > * You can associate the report with the alerts (
> again
> > in
> > > > the
> > > > > > > > report
> > > > > > > > > > info
> > > > > > > > > > > > ) with…. a ‘case’ or ‘ticket’ or investigation
> > something
> > > or
> > > > > > other
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > We can introduce extensibility into the report
> > templates,
> > > > > > report
> > > > > > > > > views
> > > > > > > > > > (
> > > > > > > > > > > > thinks that work with the json data of the report )
> > > > > > > > > > > >
> > > > > > > > > > > > Something like that.”
> > > > > > > > > > > >
> > > > > > > > > > > > Maybe we can do :
> > > > > > > > > > > >
> > > > > > > > > > > > template -> query parameters -> script => yarn info
> > > > > > > > > > > > yarn info + query info + alert context + yarn status
> =>
> > > > > report
> > > > > > > > info
> > > > > > > > > ->
> > > > > > > > > > > > stored in a user’s ‘report queue’
> > > > > > > > > > > > report persistence added to report info
> > > > > > > > > > > > metron-rest -> api to monitor the queue, read
> results (
> > > > page
> > > > > ),
> > > > > > > > etc
> > > > > > > > > etc
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (
> > > > > merrimanr@gmail.com
> > > > > > )
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > I started a separate thread on Pcap UI considerations
> > and
> > > > > user
> > > > > > > > > > > > requirements
> > > > > > > > > > > > at Otto's request. This should help us keep these two
> > > > related
> > > > > > but
> > > > > > > > > > > separate
> > > > > > > > > > > > discussions focused.
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
> > > > > > > > > michelsumbul@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hello,
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Youhouuu my first reply on this kind of mail
> > chain^^)
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > If I may, I would like to share my view on the
> > > following
> > > > 3
> > > > > > > > points.
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Backend:
> > > > > > > > > > > > >
> > > > > > > > > > > > > The current metron-api is totally seperate, it will
> > be
> > > > > logic
> > > > > > > for
> > > > > > > > me
> > > > > > > > > > to
> > > > > > > > > > > > have
> > > > > > > > > > > > > it at the same place as the others rest api.
> > Especially
> > > > > when
> > > > > > > > more
> > > > > > > > > > > > security
> > > > > > > > > > > > > will be added, it will not be needed to do the job
> > > twice.
> > > > > > > > > > > > > The current implementation send back a pcap object
> > > which
> > > > > > still
> > > > > > > > need
> > > > > > > > > > to
> > > > > > > > > > > > be
> > > > > > > > > > > > > decoded. In the opensoc, the decoding was done with
> > > > tshard
> > > > > on
> > > > > > > > the
> > > > > > > > > > > > frontend.
> > > > > > > > > > > > > It will be good to have this decoding happening
> > > directly
> > > > on
> > > > > > the
> > > > > > > > > > backend
> > > > > > > > > > > > to
> > > > > > > > > > > > > not create a load on frontend. An option will be to
> > > > install
> > > > > > > > tshark
> > > > > > > > > on
> > > > > > > > > > > > the
> > > > > > > > > > > > > rest server and to use to convert the pcap to xml
> and
> > > > then
> > > > > > to a
> > > > > > > > > json
> > > > > > > > > > > > that
> > > > > > > > > > > > > will be send to the frontend.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I tried to start directly the map/reduce job to
> > search
> > > > over
> > > > > > all
> > > > > > > > the
> > > > > > > > > > > pcap
> > > > > > > > > > > > > data from the rest server and as Ryan mention it,
> we
> > > had
> > > > > > > > trouble. I
> > > > > > > > > > > will
> > > > > > > > > > > > > try to find back the error.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Then in the POC, what we tried is to use the
> > pcap_query
> > > > > > script
> > > > > > > > and
> > > > > > > > > > this
> > > > > > > > > > > > > work fine. I just modified it that he sends back
> > > directly
> > > > > the
> > > > > > > > > job_id
> > > > > > > > > > of
> > > > > > > > > > > > > yarn and not waiting that the job is finished. Then
> > it
> > > > will
> > > > > > > > allow
> > > > > > > > > the
> > > > > > > > > > > UI
> > > > > > > > > > > > > and the rest server to know what the status of the
> > > > research
> > > > > > by
> > > > > > > > > > querying
> > > > > > > > > > > > the
> > > > > > > > > > > > > yarn rest api. This will allow the UI and the rest
> > > server
> > > > > to
> > > > > > be
> > > > > > > > > async
> > > > > > > > > > > > > without any blocking phase. What do you think about
> > > that?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Having the job submitted directly from the code of
> > the
> > > > rest
> > > > > > > > server
> > > > > > > > > > will
> > > > > > > > > > > > be
> > > > > > > > > > > > > perfect, but it will need a lot of investigation I
> > > think
> > > > > (but
> > > > > > > > I'm
> > > > > > > > > not
> > > > > > > > > > > > the
> > > > > > > > > > > > > expert so I might be completely wrong ^^).
> > > > > > > > > > > > >
> > > > > > > > > > > > > We know that the pcap_query scritp work fine so why
> > not
> > > > > > calling
> > > > > > > > it?
> > > > > > > > > > Is
> > > > > > > > > > > > it
> > > > > > > > > > > > > that bad? (maybe stupid question, but I really
> don’t
> > > see
> > > > a
> > > > > > lot
> > > > > > > > of
> > > > > > > > > > > > drawback)
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Front end:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Adding the the pcap search to the alert UI is, I
> > think,
> > > > the
> > > > > > > > easiest
> > > > > > > > > > way
> > > > > > > > > > > > to
> > > > > > > > > > > > > move forward. But indeed, it will then be the
> “Alert
> > UI
> > > > and
> > > > > > > > > > pcapquery”.
> > > > > > > > > > > > > Maybe the name of the UI should just change to
> > > something
> > > > > like
> > > > > > > > > > > > “Monitoring &
> > > > > > > > > > > > > Investigation UI” ?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Is there any roadmap or plan for the different UI?
> I
> > > mean
> > > > > did
> > > > > > > > you
> > > > > > > > > > > > already
> > > > > > > > > > > > > had discussion on how you see the ui evolving with
> > the
> > > > new
> > > > > > > > feature
> > > > > > > > > > that
> > > > > > > > > > > > > will come in the future?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Microservices:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > What do you mean exactly by microservices? Is it to
> > > > > separate
> > > > > > > all
> > > > > > > > > the
> > > > > > > > > > > > > features in different projects? Or something like
> > > having
> > > > > the
> > > > > > > > > > different
> > > > > > > > > > > > > components in container like kubernet? (again maybe
> > > > stupid
> > > > > > > > > question,
> > > > > > > > > > > but
> > > > > > > > > > > > I
> > > > > > > > > > > > > don’t clearly understand what you mean J )
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Michel
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > --
> > > > > > simon elliston ball
> > > > > > @sireb
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Pcap panel architecture

Posted by Michael Miklavcic <mi...@gmail.com>.
This looks like a pretty good start Ryan. Does the metadata endpoint cover
this https://github.com/apache/metron/tree/master/
metron-platform/metron-api#the-pcapgettergetpcapsbyidentifiers-endpoint
from the original metron-api? If so, then we would be able to deprecate the
existing metron-api project. If we later go to micro-services, a pcap
module would spin back into the fold, but it would probably look different
from metron-api.

I commented on the UI thread, but to reiterate for the purpose of backend
functionality here I don't believe there is a way to "PAUSE" or "SUSPEND"
jobs. That said, I think GET /api/v1/pcap/stop/<jobId> is sufficient for
the job management operations.

On Wed, May 9, 2018 at 11:00 AM, Ryan Merriman <me...@gmail.com> wrote:

> Now that we are confident we can run submit a MR job from our current REST
> application, is this the desired approach?  Just want to confirm.
>
> Next I think we should map out what the REST interface will look like.
> Here are the endpoints I'm thinking about:
>
> GET /api/v1/pcap/metadata?basePath
>
> This endpoint will return metadata of pcap data stored in HDFS.  This would
> include pcap size, date ranges (how far back can I go), etc.  It would
> accept an optional HDFS basePath parameter for cases where pcap data is
> stored in multiple places and/or different from the default location.
>
> POST /api/v1/pcap/query
>
> This endpoint would accept a pcap request, submit a pcap query job, and
> return a job id.  The request would be an object containing the parameters
> documented here:  https://github.com/apache/metron/tree/master/
> metron-platform/metron-pcap-backend#query-filter-utility.  A query/job
> would be associated with a user that submits it.  An exception will be
> returned for violating constraints like too many queries submitted, query
> parameters out of limits, etc.
>
> GET /api/v1/pcap/status/<jobId>
>
> This endpoint will return the status of a running job.  I imagine this is
> just a proxy to the YARN REST api.  We can discuss the implementation
> behind these endpoints later.
>
> GET /api/v1/pcap/stop/<jobId>
>
> This endpoint would kill a running pcap job.  If the job has already
> completed this is a noop.
>
> GET /api/v1/pcap/list
>
> This endpoint will list a user's submitted pcap queries.  Items in the list
> would contain job id, status (is it finished?), start/end time, and number
> of pages.  Maybe there is some overlap with the status endpoint above and
> the status endpoint is not needed?
>
> GET /api/v1/pcap/pdml/<jobId>/<pageNumber>
>
> This endpoint will return pcap results for the given page in pdml format (
> https://wiki.wireshark.org/PDML).  Are there other formats we want to
> support?
>
> GET /api/v1/pcap/raw/<jobId>/<pageNumber>
>
> This endpoint will allow a user to download raw pcap results for the given
> page.
>
> DELETE /api/v1/pcap/<jobId>
>
> This endpoint will delete pcap query results.  Not sure yet how this fits
> in with our broader cleanup strategy.
>
> This should get us started.  What did I miss and what would you change
> about these?  I did not include much detail related to security, cleanup
> strategy, or underlying implementation details but these are items we
> should discuss at some point.
>
> On Tue, May 8, 2018 at 5:38 PM, Michael Miklavcic <
> michael.miklavcic@gmail.com> wrote:
>
> > Sweet! That's great news. The pom changes are a lot simpler than I
> > expected. Very nice.
> >
> > On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman <me...@gmail.com>
> wrote:
> >
> > > Finally figured it out.  Commit is here:
> > > https://github.com/merrimanr/incubator-metron/commit/
> > > 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
> > >
> > > It came down to figuring out the right combination of maven
> dependencies
> > > and passing in the HDP version to REST as a Java system property.  I
> also
> > > included some HDFS setup tasks.  I tested this in full dev and can now
> > > successfully run a pcap query and get results.  All you should have to
> do
> > > is generate some pcap data first.
> > >
> > > On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
> > > michael.miklavcic@gmail.com> wrote:
> > >
> > > > @Ryan - pulled your branch and experimented with a few things. In
> doing
> > > so,
> > > > it dawned on me that by adding the yarn and hadoop classpath, you
> > > probably
> > > > didn't introduce a new classpath issue, rather you probably just
> moved
> > > onto
> > > > the next classpath issue, ie hbase per your exception about hbase
> jaxb.
> > > > Anyhow, I put up a branch with some pom changes worth trying in
> > > conjunction
> > > > with invoking the rest app startup via "/usr/bin/yarn jar"
> > > >
> > > > https://github.com/mmiklavc/metron/tree/ryan-rest-test
> > > >
> > > > https://github.com/mmiklavc/metron/commit/
> > 5ca23580fc6e043fafae2327c80b65
> > > > b20ca1c0c9
> > > >
> > > > Mike
> > > >
> > > >
> > > > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> > > > simon@simonellistonball.com> wrote:
> > > >
> > > > > That would be a step closer to something more like a micro-service
> > > > > architecture. However, I would want to make sure we think about the
> > > > > operational complexity, and mpack implications of having another
> > server
> > > > > installed and running somewhere on the cluster (also, ssl,
> kerberos,
> > > etc
> > > > > etc requirements for that service).
> > > > >
> > > > > On 8 May 2018 at 14:27, Ryan Merriman <me...@gmail.com> wrote:
> > > > >
> > > > > > +1 to having metron-api as it's own service and using a gateway
> > type
> > > > > > pattern.
> > > > > >
> > > > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <
> > ottobackwards@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Why not have metron-api as it’s own service and use a ‘gateway’
> > > type
> > > > > > > pattern in rest?
> > > > > > >
> > > > > > >
> > > > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (merrimanr@gmail.com
> )
> > > > wrote:
> > > > > > >
> > > > > > > Moving the yarn classpath command earlier in the classpath now
> > > gives
> > > > > this
> > > > > > > error:
> > > > > > >
> > > > > > > Caused by: java.lang.NoSuchMethodError:
> > > > > > > javax.servlet.ServletContext.getVirtualServerName()Ljava/
> > > > lang/String;
> > > > > > >
> > > > > > > I will experiment with other combinations, I suspect we will
> need
> > > > > > > finer-grain control over the order.
> > > > > > >
> > > > > > > The grep matches class names inside jar files. I use this all
> the
> > > > time
> > > > > > and
> > > > > > > it's really useful.
> > > > > > >
> > > > > > > The metron-rest jar is already shaded.
> > > > > > >
> > > > > > > Reverse engineering the yarn jar command was the next thing I
> was
> > > > going
> > > > > > to
> > > > > > > try. Will let you know how it goes.
> > > > > > >
> > > > > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > >
> > > > > > > > What order did you add the hadoop or yarn classpath? The
> > "shaded"
> > > > > > > package
> > > > > > > > stands out to me in this name "org.apache.hadoop.hbase.*
> > shaded*
> > > > > > > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider." Maybe
> > try
> > > > > adding
> > > > > > > > those packages earlier on the classpath.
> > > > > > > >
> > > > > > > > I think that find command needs a "jar tvf", otherwise you're
> > > > looking
> > > > > > > for a
> > > > > > > > class name in jar file names.
> > > > > > > >
> > > > > > > > Have you tried shading the rest jar?
> > > > > > > >
> > > > > > > > I'd also look at the classpath you get when running "yarn
> jar"
> > to
> > > > > start
> > > > > > > the
> > > > > > > > existing pcap service, per the instructions in
> > > > metron-api/README.md.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <
> > > merrimanr@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > To explore the idea of merging metron-api into metron-rest
> > and
> > > > > > running
> > > > > > > > pcap
> > > > > > > > > queries inside our REST application, I created a simple
> test
> > > > here:
> > > > > > > > > https://github.com/merrimanr/incubator-metron/tree/pcap-
> > > > rest-test.
> > > > > A
> > > > > > > > > summary of what's included:
> > > > > > > > >
> > > > > > > > > - Added pcap as a dependency in the metron-rest pom.xml
> > > > > > > > > - Added a pcap query controller endpoint at
> > > > > > > > > http://node1:8082/swagger-ui.html#!/pcap-query-controller/
> > > > > > > > queryUsingGET
> > > > > > > > > - Added a pcap query service that runs a simple, hardcoded
> > > query
> > > > > > > > >
> > > > > > > > > Generate some pcap data using pycapa (
> > > > > > > > > https://github.com/apache/metron/tree/master/metron-
> > > > sensors/pycapa
> > > > > )
> > > > > > > and
> > > > > > > > > the
> > > > > > > > > pcap topology (
> > > > > > > > > https://github.com/apache/metron/tree/master/metron-
> > > > > > > > > platform/metron-pcap-backend#starting-the-topology).
> > > > > > > > > After this initial setup there should be data in HDFS at
> > > > > > > > > "/apps/metron/pcap". I believe this should be enough to
> > > exercise
> > > > > the
> > > > > > > > > issue. Just hit the endpoint referenced above. I tested
> this
> > in
> > > > an
> > > > > > > > > already running full dev by building and deploying the
> > > > metron-rest
> > > > > > > jar.
> > > > > > > > I
> > > > > > > > > did not rebuild full dev with this change but I would still
> > > > expect
> > > > > it
> > > > > > > to
> > > > > > > > > work. Let me know if it doesn't.
> > > > > > > > >
> > > > > > > > > The first error I see when I hit this endpoint is:
> > > > > > > > >
> > > > > > > > > java.lang.NoClassDefFoundError:
> > > > > > > > > org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.
> > > > > > > > >
> > > > > > > > > Here are the things I've tried so far:
> > > > > > > > >
> > > > > > > > > - Run the REST application with the YARN jar command since
> > this
> > > > is
> > > > > > how
> > > > > > > > > all our other YARN/MR-related applications are started
> > > > (metron-api,
> > > > > > > > > MAAS,
> > > > > > > > > pcap query, etc). I wouldn't expect this to work since we
> > have
> > > > > > > > runtime
> > > > > > > > > dependencies on our shaded elasticsearch and parser jars
> and
> > > I'm
> > > > > not
> > > > > > > > > aware
> > > > > > > > > of a way to add additional jars to the classpath with the
> > YARN
> > > > jar
> > > > > > > > > command
> > > > > > > > > (is there a way?). Either way I get this error:
> > > > > > > > >
> > > > > > > > > 18/05/04 19:49:56 WARN reflections.Reflections: could not
> > > create
> > > > > Dir
> > > > > > > > using
> > > > > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/
> > > hadoop/lib/ojdbc6.jar.
> > > > > > > > skipping.
> > > > > > > > > java.lang.NullPointerException
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > - I tried adding `yarn classpath` and `hadoop classpath` to
> > the
> > > > > > > > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST
> > start
> > > > > > > > > script). I
> > > > > > > > > get this error:
> > > > > > > > >
> > > > > > > > > java.lang.ClassNotFoundException:
> > > > > > > > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > > > > > jaxrs.JacksonJaxbJsonProvider
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > - I searched for the class in the previous attempt but
> could
> > > not
> > > > > find
> > > > > > > > it
> > > > > > > > > in full dev:
> > > > > > > > >
> > > > > > > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > > > > > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > > > > > 2>/dev/null
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > - Further up in the stack trace I see the error happens
> when
> > > > > > > > initiating
> > > > > > > > > the org.apache.hadoop.yarn.util.timeline.TimelineUtils
> > class.
> > > I
> > > > > > > > tried
> > > > > > > > > setting "yarn.timeline-service.enabled" in Ambari to false
> > and
> > > > > then
> > > > > > I
> > > > > > > > > get
> > > > > > > > > this error:
> > > > > > > > >
> > > > > > > > > Unable to parse
> > > > > > > > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-
> > > > framework'
> > > > > > as
> > > > > > > a
> > > > > > > > > URI, check the setting for mapreduce.application.
> > > framework.path
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > - I've tried adding different hadoop, hbase, yarn and
> > mapreduce
> > > > > Maven
> > > > > > > > > dependencies without any success
> > > > > > > > > - hadoop-yarn-client
> > > > > > > > > - hadoop-yarn-common
> > > > > > > > > - hadoop-mapreduce-client-core
> > > > > > > > > - hadoop-yarn-server-common
> > > > > > > > > - hadoop-yarn-api
> > > > > > > > > - hbase-server
> > > > > > > > >
> > > > > > > > > I will keep exploring other possible solutions. Let me know
> > if
> > > > > anyone
> > > > > > > > has
> > > > > > > > > any ideas.
> > > > > > > > >
> > > > > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
> > > > > ottobackwards@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I can imagine a new generic service(s) capability whose
> > job (
> > > > pun
> > > > > > > > > intended
> > > > > > > > > > ) is to
> > > > > > > > > > abstract the submittal, tracking, and storage of results
> to
> > > > yarn.
> > > > > > > > > >
> > > > > > > > > > It would be extended with storage providers, queue
> > provider,
> > > > > > > possibly
> > > > > > > > > some
> > > > > > > > > > set of policies or rather strategies.
> > > > > > > > > >
> > > > > > > > > > The pcap ‘report’ would be a client to that service, the
> > > > > > specializes
> > > > > > > > the
> > > > > > > > > > service operation for the way we want pcap to work.
> > > > > > > > > >
> > > > > > > > > > We can then re-use the generic service for other long
> > running
> > > > > yarn
> > > > > > > > > > things…..
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
> > > > ottobackwards@gmail.com
> > > > > )
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > RE: Tracking v. users
> > > > > > > > > >
> > > > > > > > > > The submittal and tracking can associate the submitter
> with
> > > the
> > > > > > yarn
> > > > > > > > job
> > > > > > > > > > and track that,
> > > > > > > > > > regardless of the yarn credentials.
> > > > > > > > > >
> > > > > > > > > > IE> if all submittals and monitoring are by the same yarn
> > > user
> > > > (
> > > > > > > > Metron )
> > > > > > > > > > from a single or
> > > > > > > > > > co-operative set of services, that service can maintain
> the
> > > > > > mapping.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (
> > > merrimanr@gmail.com
> > > > )
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Otto, your use case makes sense to me. We'll have to
> think
> > > > about
> > > > > > how
> > > > > > > to
> > > > > > > > > > manage the user to job relationships. I'm assuming YARN
> > jobs
> > > > will
> > > > > > be
> > > > > > > > > > submitted as the metron service user so YARN won't keep
> > track
> > > > of
> > > > > > > this
> > > > > > > > for
> > > > > > > > > > us. Is that assumption correct? Do you have any ideas for
> > > doing
> > > > > > > that?
> > > > > > > > > >
> > > > > > > > > > Mike, I can start a feature branch and experiment with
> > > merging
> > > > > > > > metron-api
> > > > > > > > > > into metron-rest. That should allow us to collaborate on
> > any
> > > > > issues
> > > > > > > or
> > > > > > > > > > challenges. Also, can you expand on your idea to manage
> > > > external
> > > > > > > > > > dependencies as a special module? That seems like a very
> > > > > attractive
> > > > > > > > > option
> > > > > > > > > > to me.
> > > > > > > > > >
> > > > > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
> > > > > > ottobackwards@gmail.com>
> > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > From my response on the other thread, but applicable to
> > the
> > > > > > > backend
> > > > > > > > > > stuff:
> > > > > > > > > > >
> > > > > > > > > > > "The PCAP Query seems more like PCAP Report to me. You
> > are
> > > > > > > > generating a
> > > > > > > > > > > report based on parameters.
> > > > > > > > > > > That report is something that takes some time and
> > external
> > > > > > process
> > > > > > > to
> > > > > > > > > > > generate… ie you have to wait for it.
> > > > > > > > > > >
> > > > > > > > > > > I can almost imagine a flow where you:
> > > > > > > > > > >
> > > > > > > > > > > * Are in the AlertUI
> > > > > > > > > > > * Ask to generate a PCAP report based on some selected
> > > > > > > > > alerts/meta-alert,
> > > > > > > > > > > possibly picking from on or more report ‘templates’
> > > > > > > > > > > that have query options etc
> > > > > > > > > > > * The report request is ‘queued’, that is dispatched to
> > be
> > > be
> > > > > > > > > > > executed/generated
> > > > > > > > > > > * You as a user have a ‘queue’ of your report results,
> > and
> > > > when
> > > > > > > the
> > > > > > > > > > report
> > > > > > > > > > > is done it is queued there
> > > > > > > > > > > * We ‘monitor’ the report/queue press through the yarn
> > > rest (
> > > > > > > report
> > > > > > > > > > > info/meta has the yarn details )
> > > > > > > > > > > * You can select the report from your queue and view it
> > > > either
> > > > > in
> > > > > > > a
> > > > > > > > new
> > > > > > > > > > UI
> > > > > > > > > > > or custom component
> > > > > > > > > > > * You can then apply a different ‘view’ to the report
> or
> > > work
> > > > > > with
> > > > > > > > the
> > > > > > > > > > > report data
> > > > > > > > > > > * You can print / save etc
> > > > > > > > > > > * You can associate the report with the alerts ( again
> in
> > > the
> > > > > > > report
> > > > > > > > > info
> > > > > > > > > > > ) with…. a ‘case’ or ‘ticket’ or investigation
> something
> > or
> > > > > other
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > We can introduce extensibility into the report
> templates,
> > > > > report
> > > > > > > > views
> > > > > > > > > (
> > > > > > > > > > > thinks that work with the json data of the report )
> > > > > > > > > > >
> > > > > > > > > > > Something like that.”
> > > > > > > > > > >
> > > > > > > > > > > Maybe we can do :
> > > > > > > > > > >
> > > > > > > > > > > template -> query parameters -> script => yarn info
> > > > > > > > > > > yarn info + query info + alert context + yarn status =>
> > > > report
> > > > > > > info
> > > > > > > > ->
> > > > > > > > > > > stored in a user’s ‘report queue’
> > > > > > > > > > > report persistence added to report info
> > > > > > > > > > > metron-rest -> api to monitor the queue, read results (
> > > page
> > > > ),
> > > > > > > etc
> > > > > > > > etc
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (
> > > > merrimanr@gmail.com
> > > > > )
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > I started a separate thread on Pcap UI considerations
> and
> > > > user
> > > > > > > > > > > requirements
> > > > > > > > > > > at Otto's request. This should help us keep these two
> > > related
> > > > > but
> > > > > > > > > > separate
> > > > > > > > > > > discussions focused.
> > > > > > > > > > >
> > > > > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
> > > > > > > > michelsumbul@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hello,
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > (Youhouuu my first reply on this kind of mail
> chain^^)
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > If I may, I would like to share my view on the
> > following
> > > 3
> > > > > > > points.
> > > > > > > > > > > >
> > > > > > > > > > > > - Backend:
> > > > > > > > > > > >
> > > > > > > > > > > > The current metron-api is totally seperate, it will
> be
> > > > logic
> > > > > > for
> > > > > > > me
> > > > > > > > > to
> > > > > > > > > > > have
> > > > > > > > > > > > it at the same place as the others rest api.
> Especially
> > > > when
> > > > > > > more
> > > > > > > > > > > security
> > > > > > > > > > > > will be added, it will not be needed to do the job
> > twice.
> > > > > > > > > > > > The current implementation send back a pcap object
> > which
> > > > > still
> > > > > > > need
> > > > > > > > > to
> > > > > > > > > > > be
> > > > > > > > > > > > decoded. In the opensoc, the decoding was done with
> > > tshard
> > > > on
> > > > > > > the
> > > > > > > > > > > frontend.
> > > > > > > > > > > > It will be good to have this decoding happening
> > directly
> > > on
> > > > > the
> > > > > > > > > backend
> > > > > > > > > > > to
> > > > > > > > > > > > not create a load on frontend. An option will be to
> > > install
> > > > > > > tshark
> > > > > > > > on
> > > > > > > > > > > the
> > > > > > > > > > > > rest server and to use to convert the pcap to xml and
> > > then
> > > > > to a
> > > > > > > > json
> > > > > > > > > > > that
> > > > > > > > > > > > will be send to the frontend.
> > > > > > > > > > > >
> > > > > > > > > > > > I tried to start directly the map/reduce job to
> search
> > > over
> > > > > all
> > > > > > > the
> > > > > > > > > > pcap
> > > > > > > > > > > > data from the rest server and as Ryan mention it, we
> > had
> > > > > > > trouble. I
> > > > > > > > > > will
> > > > > > > > > > > > try to find back the error.
> > > > > > > > > > > >
> > > > > > > > > > > > Then in the POC, what we tried is to use the
> pcap_query
> > > > > script
> > > > > > > and
> > > > > > > > > this
> > > > > > > > > > > > work fine. I just modified it that he sends back
> > directly
> > > > the
> > > > > > > > job_id
> > > > > > > > > of
> > > > > > > > > > > > yarn and not waiting that the job is finished. Then
> it
> > > will
> > > > > > > allow
> > > > > > > > the
> > > > > > > > > > UI
> > > > > > > > > > > > and the rest server to know what the status of the
> > > research
> > > > > by
> > > > > > > > > querying
> > > > > > > > > > > the
> > > > > > > > > > > > yarn rest api. This will allow the UI and the rest
> > server
> > > > to
> > > > > be
> > > > > > > > async
> > > > > > > > > > > > without any blocking phase. What do you think about
> > that?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Having the job submitted directly from the code of
> the
> > > rest
> > > > > > > server
> > > > > > > > > will
> > > > > > > > > > > be
> > > > > > > > > > > > perfect, but it will need a lot of investigation I
> > think
> > > > (but
> > > > > > > I'm
> > > > > > > > not
> > > > > > > > > > > the
> > > > > > > > > > > > expert so I might be completely wrong ^^).
> > > > > > > > > > > >
> > > > > > > > > > > > We know that the pcap_query scritp work fine so why
> not
> > > > > calling
> > > > > > > it?
> > > > > > > > > Is
> > > > > > > > > > > it
> > > > > > > > > > > > that bad? (maybe stupid question, but I really don’t
> > see
> > > a
> > > > > lot
> > > > > > > of
> > > > > > > > > > > drawback)
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > - Front end:
> > > > > > > > > > > >
> > > > > > > > > > > > Adding the the pcap search to the alert UI is, I
> think,
> > > the
> > > > > > > easiest
> > > > > > > > > way
> > > > > > > > > > > to
> > > > > > > > > > > > move forward. But indeed, it will then be the “Alert
> UI
> > > and
> > > > > > > > > pcapquery”.
> > > > > > > > > > > > Maybe the name of the UI should just change to
> > something
> > > > like
> > > > > > > > > > > “Monitoring &
> > > > > > > > > > > > Investigation UI” ?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Is there any roadmap or plan for the different UI? I
> > mean
> > > > did
> > > > > > > you
> > > > > > > > > > > already
> > > > > > > > > > > > had discussion on how you see the ui evolving with
> the
> > > new
> > > > > > > feature
> > > > > > > > > that
> > > > > > > > > > > > will come in the future?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > - Microservices:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > What do you mean exactly by microservices? Is it to
> > > > separate
> > > > > > all
> > > > > > > > the
> > > > > > > > > > > > features in different projects? Or something like
> > having
> > > > the
> > > > > > > > > different
> > > > > > > > > > > > components in container like kubernet? (again maybe
> > > stupid
> > > > > > > > question,
> > > > > > > > > > but
> > > > > > > > > > > I
> > > > > > > > > > > > don’t clearly understand what you mean J )
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Michel
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > --
> > > > > simon elliston ball
> > > > > @sireb
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Pcap panel architecture

Posted by Ryan Merriman <me...@gmail.com>.
Security is another important topic related to our pcap architecture.  This
may spill over into a more general, system-wide discussion and we can start
a separate thread for that if necessary.

I'm assuming we want to manage pcap queries by user.  One important
question is which user do we use to submit MR jobs?  Right now they are
submitted with the "metron" service user.  If we continue with this
approach (all jobs run as the metron service user) it will require less
Kerberos work and configuration but we will need to manage user to job
relationships in the REST layer.  It would also give us less flexibility
since all assets are permissioned for a single user.  If we decided to have
REST impersonate users and submit jobs that way there would be substantial
work (maybe?) to get to that point since we don't do it now.  We would need
to add LDAP authentication to REST, sync OS users with LDAP, and all the
other stuff that goes along with setting that up.  Maybe we want to do this
anyways in the future.  Has anyone done this before and has a clear
understanding of what's involved?  If this were the ideal approach we need
to think about how we get to that point.  Managing user to job
relationships in REST would be throwaway in that case since that
information would now be stored in YARN.

For authorization I'm assuming that each user should only have access to
information about their queries and query results.  Any actions like
downloading results or cleaning up queries (deleting results) would also be
limited to that user.  Does this sound reasonable?  Do we want to add an
admin role that can do everything?  Is there anything anyone else wants to
discuss with regards to authorization or security in general?

On Wed, May 9, 2018 at 1:22 PM, Ryan Merriman <me...@gmail.com> wrote:

> Thanks for the feedback Jon.  I'm am not as familiar with BPF filtering as
> you probably are.  Do you have an idea of much effort would be involved in
> implementing this?  I suspect this would be another PcapFilter (
> https://github.com/apache/metron/blob/master/metron-
> platform/metron-pcap/src/main/java/org/apache/metron/pcap/
> filter/PcapFilter.java) similar to fixed and query so regardless of when
> we decide to add it our endpoint strategy should support 1 to n filters.
> Maybe we have an endpoint for each type of filter:
>
> POST /api/v1/pcap/fixed
> POST /api/v1/pcap/query
> POST /api/v1/pcap/bpf
>
> This would allow us to accept requests that are structured differently and
> specific to the type of filter.
>
> On Wed, May 9, 2018 at 12:32 PM, Zeolla@GMail.com <ze...@gmail.com>
> wrote:
>
>> This looks really great and gets me excited to maybe revisit some old
>> conversations about PCAP capture in Metron.  The only thing that I think
>> it's missing is the ability to filter using bpf.  I think the same thing
>> can technically be accomplished by using packet_filter and I wouldn't
>> throw
>> a fit if that's considered a follow-on, but bpf is the standard language
>> that people who do packet munging for a living know.
>>
>> Jon
>>
>> On Wed, May 9, 2018 at 1:00 PM Ryan Merriman <me...@gmail.com> wrote:
>>
>> > Now that we are confident we can run submit a MR job from our current
>> REST
>> > application, is this the desired approach?  Just want to confirm.
>> >
>> > Next I think we should map out what the REST interface will look like.
>> > Here are the endpoints I'm thinking about:
>> >
>> > GET /api/v1/pcap/metadata?basePath
>> >
>> > This endpoint will return metadata of pcap data stored in HDFS.  This
>> would
>> > include pcap size, date ranges (how far back can I go), etc.  It would
>> > accept an optional HDFS basePath parameter for cases where pcap data is
>> > stored in multiple places and/or different from the default location.
>> >
>> > POST /api/v1/pcap/query
>> >
>> > This endpoint would accept a pcap request, submit a pcap query job, and
>> > return a job id.  The request would be an object containing the
>> parameters
>> > documented here:  https://github.com/apache/metron/tree/master/
>> > metron-platform/metron-pcap-backend#query-filter-utility.  A query/job
>> > would be associated with a user that submits it.  An exception will be
>> > returned for violating constraints like too many queries submitted,
>> query
>> > parameters out of limits, etc.
>> >
>> > GET /api/v1/pcap/status/<jobId>
>> >
>> > This endpoint will return the status of a running job.  I imagine this
>> is
>> > just a proxy to the YARN REST api.  We can discuss the implementation
>> > behind these endpoints later.
>> >
>> > GET /api/v1/pcap/stop/<jobId>
>> >
>> > This endpoint would kill a running pcap job.  If the job has already
>> > completed this is a noop.
>> >
>> > GET /api/v1/pcap/list
>> >
>> > This endpoint will list a user's submitted pcap queries.  Items in the
>> list
>> > would contain job id, status (is it finished?), start/end time, and
>> number
>> > of pages.  Maybe there is some overlap with the status endpoint above
>> and
>> > the status endpoint is not needed?
>> >
>> > GET /api/v1/pcap/pdml/<jobId>/<pageNumber>
>> >
>> > This endpoint will return pcap results for the given page in pdml
>> format (
>> > https://wiki.wireshark.org/PDML).  Are there other formats we want to
>> > support?
>> >
>> > GET /api/v1/pcap/raw/<jobId>/<pageNumber>
>> >
>> > This endpoint will allow a user to download raw pcap results for the
>> given
>> > page.
>> >
>> > DELETE /api/v1/pcap/<jobId>
>> >
>> > This endpoint will delete pcap query results.  Not sure yet how this
>> fits
>> > in with our broader cleanup strategy.
>> >
>> > This should get us started.  What did I miss and what would you change
>> > about these?  I did not include much detail related to security, cleanup
>> > strategy, or underlying implementation details but these are items we
>> > should discuss at some point.
>> >
>> > On Tue, May 8, 2018 at 5:38 PM, Michael Miklavcic <
>> > michael.miklavcic@gmail.com> wrote:
>> >
>> > > Sweet! That's great news. The pom changes are a lot simpler than I
>> > > expected. Very nice.
>> > >
>> > > On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman <me...@gmail.com>
>> > wrote:
>> > >
>> > > > Finally figured it out.  Commit is here:
>> > > > https://github.com/merrimanr/incubator-metron/commit/
>> > > > 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
>> > > >
>> > > > It came down to figuring out the right combination of maven
>> > dependencies
>> > > > and passing in the HDP version to REST as a Java system property.  I
>> > also
>> > > > included some HDFS setup tasks.  I tested this in full dev and can
>> now
>> > > > successfully run a pcap query and get results.  All you should have
>> to
>> > do
>> > > > is generate some pcap data first.
>> > > >
>> > > > On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
>> > > > michael.miklavcic@gmail.com> wrote:
>> > > >
>> > > > > @Ryan - pulled your branch and experimented with a few things. In
>> > doing
>> > > > so,
>> > > > > it dawned on me that by adding the yarn and hadoop classpath, you
>> > > > probably
>> > > > > didn't introduce a new classpath issue, rather you probably just
>> > moved
>> > > > onto
>> > > > > the next classpath issue, ie hbase per your exception about hbase
>> > jaxb.
>> > > > > Anyhow, I put up a branch with some pom changes worth trying in
>> > > > conjunction
>> > > > > with invoking the rest app startup via "/usr/bin/yarn jar"
>> > > > >
>> > > > > https://github.com/mmiklavc/metron/tree/ryan-rest-test
>> > > > >
>> > > > > https://github.com/mmiklavc/metron/commit/
>> > > 5ca23580fc6e043fafae2327c80b65
>> > > > > b20ca1c0c9
>> > > > >
>> > > > > Mike
>> > > > >
>> > > > >
>> > > > > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
>> > > > > simon@simonellistonball.com> wrote:
>> > > > >
>> > > > > > That would be a step closer to something more like a
>> micro-service
>> > > > > > architecture. However, I would want to make sure we think about
>> the
>> > > > > > operational complexity, and mpack implications of having another
>> > > server
>> > > > > > installed and running somewhere on the cluster (also, ssl,
>> > kerberos,
>> > > > etc
>> > > > > > etc requirements for that service).
>> > > > > >
>> > > > > > On 8 May 2018 at 14:27, Ryan Merriman <me...@gmail.com>
>> wrote:
>> > > > > >
>> > > > > > > +1 to having metron-api as it's own service and using a
>> gateway
>> > > type
>> > > > > > > pattern.
>> > > > > > >
>> > > > > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <
>> > > ottobackwards@gmail.com
>> > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Why not have metron-api as it’s own service and use a
>> ‘gateway’
>> > > > type
>> > > > > > > > pattern in rest?
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (
>> merrimanr@gmail.com
>> > )
>> > > > > wrote:
>> > > > > > > >
>> > > > > > > > Moving the yarn classpath command earlier in the classpath
>> now
>> > > > gives
>> > > > > > this
>> > > > > > > > error:
>> > > > > > > >
>> > > > > > > > Caused by: java.lang.NoSuchMethodError:
>> > > > > > > > javax.servlet.ServletContext.getVirtualServerName()Ljava/
>> > > > > lang/String;
>> > > > > > > >
>> > > > > > > > I will experiment with other combinations, I suspect we will
>> > need
>> > > > > > > > finer-grain control over the order.
>> > > > > > > >
>> > > > > > > > The grep matches class names inside jar files. I use this
>> all
>> > the
>> > > > > time
>> > > > > > > and
>> > > > > > > > it's really useful.
>> > > > > > > >
>> > > > > > > > The metron-rest jar is already shaded.
>> > > > > > > >
>> > > > > > > > Reverse engineering the yarn jar command was the next thing
>> I
>> > was
>> > > > > going
>> > > > > > > to
>> > > > > > > > try. Will let you know how it goes.
>> > > > > > > >
>> > > > > > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
>> > > > > > > > michael.miklavcic@gmail.com> wrote:
>> > > > > > > >
>> > > > > > > > > What order did you add the hadoop or yarn classpath? The
>> > > "shaded"
>> > > > > > > > package
>> > > > > > > > > stands out to me in this name "org.apache.hadoop.hbase.*
>> > > shaded*
>> > > > > > > > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider."
>> Maybe
>> > > try
>> > > > > > adding
>> > > > > > > > > those packages earlier on the classpath.
>> > > > > > > > >
>> > > > > > > > > I think that find command needs a "jar tvf", otherwise
>> you're
>> > > > > looking
>> > > > > > > > for a
>> > > > > > > > > class name in jar file names.
>> > > > > > > > >
>> > > > > > > > > Have you tried shading the rest jar?
>> > > > > > > > >
>> > > > > > > > > I'd also look at the classpath you get when running "yarn
>> > jar"
>> > > to
>> > > > > > start
>> > > > > > > > the
>> > > > > > > > > existing pcap service, per the instructions in
>> > > > > metron-api/README.md.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <
>> > > > merrimanr@gmail.com
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > To explore the idea of merging metron-api into
>> metron-rest
>> > > and
>> > > > > > > running
>> > > > > > > > > pcap
>> > > > > > > > > > queries inside our REST application, I created a simple
>> > test
>> > > > > here:
>> > > > > > > > > > https://github.com/merrimanr/i
>> ncubator-metron/tree/pcap-
>> > > > > rest-test.
>> > > > > > A
>> > > > > > > > > > summary of what's included:
>> > > > > > > > > >
>> > > > > > > > > > - Added pcap as a dependency in the metron-rest pom.xml
>> > > > > > > > > > - Added a pcap query controller endpoint at
>> > > > > > > > > > http://node1:8082/swagger-ui.h
>> tml#!/pcap-query-controller/
>> > > > > > > > > queryUsingGET
>> > > > > > > > > > - Added a pcap query service that runs a simple,
>> hardcoded
>> > > > query
>> > > > > > > > > >
>> > > > > > > > > > Generate some pcap data using pycapa (
>> > > > > > > > > > https://github.com/apache/metron/tree/master/metron-
>> > > > > sensors/pycapa
>> > > > > > )
>> > > > > > > > and
>> > > > > > > > > > the
>> > > > > > > > > > pcap topology (
>> > > > > > > > > > https://github.com/apache/metron/tree/master/metron-
>> > > > > > > > > > platform/metron-pcap-backend#starting-the-topology).
>> > > > > > > > > > After this initial setup there should be data in HDFS at
>> > > > > > > > > > "/apps/metron/pcap". I believe this should be enough to
>> > > > exercise
>> > > > > > the
>> > > > > > > > > > issue. Just hit the endpoint referenced above. I tested
>> > this
>> > > in
>> > > > > an
>> > > > > > > > > > already running full dev by building and deploying the
>> > > > > metron-rest
>> > > > > > > > jar.
>> > > > > > > > > I
>> > > > > > > > > > did not rebuild full dev with this change but I would
>> still
>> > > > > expect
>> > > > > > it
>> > > > > > > > to
>> > > > > > > > > > work. Let me know if it doesn't.
>> > > > > > > > > >
>> > > > > > > > > > The first error I see when I hit this endpoint is:
>> > > > > > > > > >
>> > > > > > > > > > java.lang.NoClassDefFoundError:
>> > > > > > > > > > org/apache/hadoop/yarn/webapp/
>> YarnJacksonJaxbJsonProvider.
>> > > > > > > > > >
>> > > > > > > > > > Here are the things I've tried so far:
>> > > > > > > > > >
>> > > > > > > > > > - Run the REST application with the YARN jar command
>> since
>> > > this
>> > > > > is
>> > > > > > > how
>> > > > > > > > > > all our other YARN/MR-related applications are started
>> > > > > (metron-api,
>> > > > > > > > > > MAAS,
>> > > > > > > > > > pcap query, etc). I wouldn't expect this to work since
>> we
>> > > have
>> > > > > > > > > runtime
>> > > > > > > > > > dependencies on our shaded elasticsearch and parser jars
>> > and
>> > > > I'm
>> > > > > > not
>> > > > > > > > > > aware
>> > > > > > > > > > of a way to add additional jars to the classpath with
>> the
>> > > YARN
>> > > > > jar
>> > > > > > > > > > command
>> > > > > > > > > > (is there a way?). Either way I get this error:
>> > > > > > > > > >
>> > > > > > > > > > 18/05/04 19:49:56 WARN reflections.Reflections: could
>> not
>> > > > create
>> > > > > > Dir
>> > > > > > > > > using
>> > > > > > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/
>> > > > hadoop/lib/ojdbc6.jar.
>> > > > > > > > > skipping.
>> > > > > > > > > > java.lang.NullPointerException
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > - I tried adding `yarn classpath` and `hadoop
>> classpath` to
>> > > the
>> > > > > > > > > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST
>> > > start
>> > > > > > > > > > script). I
>> > > > > > > > > > get this error:
>> > > > > > > > > >
>> > > > > > > > > > java.lang.ClassNotFoundException:
>> > > > > > > > > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
>> > > > > > > > > > jaxrs.JacksonJaxbJsonProvider
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > - I searched for the class in the previous attempt but
>> > could
>> > > > not
>> > > > > > find
>> > > > > > > > > it
>> > > > > > > > > > in full dev:
>> > > > > > > > > >
>> > > > > > > > > > find / -name "*.jar" 2>/dev/null | xargs grep
>> > > > > > > > > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
>> > > > > > > > > > jaxrs/JacksonJaxbJsonProvider
>> > > > > > > > > > 2>/dev/null
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > - Further up in the stack trace I see the error happens
>> > when
>> > > > > > > > > initiating
>> > > > > > > > > > the org.apache.hadoop.yarn.util.timeline.TimelineUtils
>> > > class.
>> > > > I
>> > > > > > > > > tried
>> > > > > > > > > > setting "yarn.timeline-service.enabled" in Ambari to
>> false
>> > > and
>> > > > > > then
>> > > > > > > I
>> > > > > > > > > > get
>> > > > > > > > > > this error:
>> > > > > > > > > >
>> > > > > > > > > > Unable to parse
>> > > > > > > > > > '/hdp/apps/${hdp.version}/mapr
>> educe/mapreduce.tar.gz#mr-
>> > > > > framework'
>> > > > > > > as
>> > > > > > > > a
>> > > > > > > > > > URI, check the setting for mapreduce.application.
>> > > > framework.path
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > - I've tried adding different hadoop, hbase, yarn and
>> > > mapreduce
>> > > > > > Maven
>> > > > > > > > > > dependencies without any success
>> > > > > > > > > > - hadoop-yarn-client
>> > > > > > > > > > - hadoop-yarn-common
>> > > > > > > > > > - hadoop-mapreduce-client-core
>> > > > > > > > > > - hadoop-yarn-server-common
>> > > > > > > > > > - hadoop-yarn-api
>> > > > > > > > > > - hbase-server
>> > > > > > > > > >
>> > > > > > > > > > I will keep exploring other possible solutions. Let me
>> know
>> > > if
>> > > > > > anyone
>> > > > > > > > > has
>> > > > > > > > > > any ideas.
>> > > > > > > > > >
>> > > > > > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
>> > > > > > ottobackwards@gmail.com
>> > > > > > > >
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > I can imagine a new generic service(s) capability
>> whose
>> > > job (
>> > > > > pun
>> > > > > > > > > > intended
>> > > > > > > > > > > ) is to
>> > > > > > > > > > > abstract the submittal, tracking, and storage of
>> results
>> > to
>> > > > > yarn.
>> > > > > > > > > > >
>> > > > > > > > > > > It would be extended with storage providers, queue
>> > > provider,
>> > > > > > > > possibly
>> > > > > > > > > > some
>> > > > > > > > > > > set of policies or rather strategies.
>> > > > > > > > > > >
>> > > > > > > > > > > The pcap ‘report’ would be a client to that service,
>> the
>> > > > > > > specializes
>> > > > > > > > > the
>> > > > > > > > > > > service operation for the way we want pcap to work.
>> > > > > > > > > > >
>> > > > > > > > > > > We can then re-use the generic service for other long
>> > > running
>> > > > > > yarn
>> > > > > > > > > > > things…..
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
>> > > > > ottobackwards@gmail.com
>> > > > > > )
>> > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > RE: Tracking v. users
>> > > > > > > > > > >
>> > > > > > > > > > > The submittal and tracking can associate the submitter
>> > with
>> > > > the
>> > > > > > > yarn
>> > > > > > > > > job
>> > > > > > > > > > > and track that,
>> > > > > > > > > > > regardless of the yarn credentials.
>> > > > > > > > > > >
>> > > > > > > > > > > IE> if all submittals and monitoring are by the same
>> yarn
>> > > > user
>> > > > > (
>> > > > > > > > > Metron )
>> > > > > > > > > > > from a single or
>> > > > > > > > > > > co-operative set of services, that service can
>> maintain
>> > the
>> > > > > > > mapping.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (
>> > > > merrimanr@gmail.com
>> > > > > )
>> > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > Otto, your use case makes sense to me. We'll have to
>> > think
>> > > > > about
>> > > > > > > how
>> > > > > > > > to
>> > > > > > > > > > > manage the user to job relationships. I'm assuming
>> YARN
>> > > jobs
>> > > > > will
>> > > > > > > be
>> > > > > > > > > > > submitted as the metron service user so YARN won't
>> keep
>> > > track
>> > > > > of
>> > > > > > > > this
>> > > > > > > > > for
>> > > > > > > > > > > us. Is that assumption correct? Do you have any ideas
>> for
>> > > > doing
>> > > > > > > > that?
>> > > > > > > > > > >
>> > > > > > > > > > > Mike, I can start a feature branch and experiment with
>> > > > merging
>> > > > > > > > > metron-api
>> > > > > > > > > > > into metron-rest. That should allow us to collaborate
>> on
>> > > any
>> > > > > > issues
>> > > > > > > > or
>> > > > > > > > > > > challenges. Also, can you expand on your idea to
>> manage
>> > > > > external
>> > > > > > > > > > > dependencies as a special module? That seems like a
>> very
>> > > > > > attractive
>> > > > > > > > > > option
>> > > > > > > > > > > to me.
>> > > > > > > > > > >
>> > > > > > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
>> > > > > > > ottobackwards@gmail.com>
>> > > > > > > >
>> > > > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > From my response on the other thread, but
>> applicable to
>> > > the
>> > > > > > > > backend
>> > > > > > > > > > > stuff:
>> > > > > > > > > > > >
>> > > > > > > > > > > > "The PCAP Query seems more like PCAP Report to me.
>> You
>> > > are
>> > > > > > > > > generating a
>> > > > > > > > > > > > report based on parameters.
>> > > > > > > > > > > > That report is something that takes some time and
>> > > external
>> > > > > > > process
>> > > > > > > > to
>> > > > > > > > > > > > generate… ie you have to wait for it.
>> > > > > > > > > > > >
>> > > > > > > > > > > > I can almost imagine a flow where you:
>> > > > > > > > > > > >
>> > > > > > > > > > > > * Are in the AlertUI
>> > > > > > > > > > > > * Ask to generate a PCAP report based on some
>> selected
>> > > > > > > > > > alerts/meta-alert,
>> > > > > > > > > > > > possibly picking from on or more report ‘templates’
>> > > > > > > > > > > > that have query options etc
>> > > > > > > > > > > > * The report request is ‘queued’, that is
>> dispatched to
>> > > be
>> > > > be
>> > > > > > > > > > > > executed/generated
>> > > > > > > > > > > > * You as a user have a ‘queue’ of your report
>> results,
>> > > and
>> > > > > when
>> > > > > > > > the
>> > > > > > > > > > > report
>> > > > > > > > > > > > is done it is queued there
>> > > > > > > > > > > > * We ‘monitor’ the report/queue press through the
>> yarn
>> > > > rest (
>> > > > > > > > report
>> > > > > > > > > > > > info/meta has the yarn details )
>> > > > > > > > > > > > * You can select the report from your queue and
>> view it
>> > > > > either
>> > > > > > in
>> > > > > > > > a
>> > > > > > > > > new
>> > > > > > > > > > > UI
>> > > > > > > > > > > > or custom component
>> > > > > > > > > > > > * You can then apply a different ‘view’ to the
>> report
>> > or
>> > > > work
>> > > > > > > with
>> > > > > > > > > the
>> > > > > > > > > > > > report data
>> > > > > > > > > > > > * You can print / save etc
>> > > > > > > > > > > > * You can associate the report with the alerts (
>> again
>> > in
>> > > > the
>> > > > > > > > report
>> > > > > > > > > > info
>> > > > > > > > > > > > ) with…. a ‘case’ or ‘ticket’ or investigation
>> > something
>> > > or
>> > > > > > other
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > We can introduce extensibility into the report
>> > templates,
>> > > > > > report
>> > > > > > > > > views
>> > > > > > > > > > (
>> > > > > > > > > > > > thinks that work with the json data of the report )
>> > > > > > > > > > > >
>> > > > > > > > > > > > Something like that.”
>> > > > > > > > > > > >
>> > > > > > > > > > > > Maybe we can do :
>> > > > > > > > > > > >
>> > > > > > > > > > > > template -> query parameters -> script => yarn info
>> > > > > > > > > > > > yarn info + query info + alert context + yarn
>> status =>
>> > > > > report
>> > > > > > > > info
>> > > > > > > > > ->
>> > > > > > > > > > > > stored in a user’s ‘report queue’
>> > > > > > > > > > > > report persistence added to report info
>> > > > > > > > > > > > metron-rest -> api to monitor the queue, read
>> results (
>> > > > page
>> > > > > ),
>> > > > > > > > etc
>> > > > > > > > > etc
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (
>> > > > > merrimanr@gmail.com
>> > > > > > )
>> > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > I started a separate thread on Pcap UI
>> considerations
>> > and
>> > > > > user
>> > > > > > > > > > > > requirements
>> > > > > > > > > > > > at Otto's request. This should help us keep these
>> two
>> > > > related
>> > > > > > but
>> > > > > > > > > > > separate
>> > > > > > > > > > > > discussions focused.
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
>> > > > > > > > > michelsumbul@gmail.com>
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Hello,
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > (Youhouuu my first reply on this kind of mail
>> > chain^^)
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > If I may, I would like to share my view on the
>> > > following
>> > > > 3
>> > > > > > > > points.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > - Backend:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > The current metron-api is totally seperate, it
>> will
>> > be
>> > > > > logic
>> > > > > > > for
>> > > > > > > > me
>> > > > > > > > > > to
>> > > > > > > > > > > > have
>> > > > > > > > > > > > > it at the same place as the others rest api.
>> > Especially
>> > > > > when
>> > > > > > > > more
>> > > > > > > > > > > > security
>> > > > > > > > > > > > > will be added, it will not be needed to do the job
>> > > twice.
>> > > > > > > > > > > > > The current implementation send back a pcap object
>> > > which
>> > > > > > still
>> > > > > > > > need
>> > > > > > > > > > to
>> > > > > > > > > > > > be
>> > > > > > > > > > > > > decoded. In the opensoc, the decoding was done
>> with
>> > > > tshard
>> > > > > on
>> > > > > > > > the
>> > > > > > > > > > > > frontend.
>> > > > > > > > > > > > > It will be good to have this decoding happening
>> > > directly
>> > > > on
>> > > > > > the
>> > > > > > > > > > backend
>> > > > > > > > > > > > to
>> > > > > > > > > > > > > not create a load on frontend. An option will be
>> to
>> > > > install
>> > > > > > > > tshark
>> > > > > > > > > on
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > rest server and to use to convert the pcap to xml
>> and
>> > > > then
>> > > > > > to a
>> > > > > > > > > json
>> > > > > > > > > > > > that
>> > > > > > > > > > > > > will be send to the frontend.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > I tried to start directly the map/reduce job to
>> > search
>> > > > over
>> > > > > > all
>> > > > > > > > the
>> > > > > > > > > > > pcap
>> > > > > > > > > > > > > data from the rest server and as Ryan mention it,
>> we
>> > > had
>> > > > > > > > trouble. I
>> > > > > > > > > > > will
>> > > > > > > > > > > > > try to find back the error.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Then in the POC, what we tried is to use the
>> > pcap_query
>> > > > > > script
>> > > > > > > > and
>> > > > > > > > > > this
>> > > > > > > > > > > > > work fine. I just modified it that he sends back
>> > > directly
>> > > > > the
>> > > > > > > > > job_id
>> > > > > > > > > > of
>> > > > > > > > > > > > > yarn and not waiting that the job is finished.
>> Then
>> > it
>> > > > will
>> > > > > > > > allow
>> > > > > > > > > the
>> > > > > > > > > > > UI
>> > > > > > > > > > > > > and the rest server to know what the status of the
>> > > > research
>> > > > > > by
>> > > > > > > > > > querying
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > yarn rest api. This will allow the UI and the rest
>> > > server
>> > > > > to
>> > > > > > be
>> > > > > > > > > async
>> > > > > > > > > > > > > without any blocking phase. What do you think
>> about
>> > > that?
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Having the job submitted directly from the code of
>> > the
>> > > > rest
>> > > > > > > > server
>> > > > > > > > > > will
>> > > > > > > > > > > > be
>> > > > > > > > > > > > > perfect, but it will need a lot of investigation I
>> > > think
>> > > > > (but
>> > > > > > > > I'm
>> > > > > > > > > not
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > expert so I might be completely wrong ^^).
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > We know that the pcap_query scritp work fine so
>> why
>> > not
>> > > > > > calling
>> > > > > > > > it?
>> > > > > > > > > > Is
>> > > > > > > > > > > > it
>> > > > > > > > > > > > > that bad? (maybe stupid question, but I really
>> don’t
>> > > see
>> > > > a
>> > > > > > lot
>> > > > > > > > of
>> > > > > > > > > > > > drawback)
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > - Front end:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Adding the the pcap search to the alert UI is, I
>> > think,
>> > > > the
>> > > > > > > > easiest
>> > > > > > > > > > way
>> > > > > > > > > > > > to
>> > > > > > > > > > > > > move forward. But indeed, it will then be the
>> “Alert
>> > UI
>> > > > and
>> > > > > > > > > > pcapquery”.
>> > > > > > > > > > > > > Maybe the name of the UI should just change to
>> > > something
>> > > > > like
>> > > > > > > > > > > > “Monitoring &
>> > > > > > > > > > > > > Investigation UI” ?
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Is there any roadmap or plan for the different
>> UI? I
>> > > mean
>> > > > > did
>> > > > > > > > you
>> > > > > > > > > > > > already
>> > > > > > > > > > > > > had discussion on how you see the ui evolving with
>> > the
>> > > > new
>> > > > > > > > feature
>> > > > > > > > > > that
>> > > > > > > > > > > > > will come in the future?
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > - Microservices:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > What do you mean exactly by microservices? Is it
>> to
>> > > > > separate
>> > > > > > > all
>> > > > > > > > > the
>> > > > > > > > > > > > > features in different projects? Or something like
>> > > having
>> > > > > the
>> > > > > > > > > > different
>> > > > > > > > > > > > > components in container like kubernet? (again
>> maybe
>> > > > stupid
>> > > > > > > > > question,
>> > > > > > > > > > > but
>> > > > > > > > > > > > I
>> > > > > > > > > > > > > don’t clearly understand what you mean J )
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Michel
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > --
>> > > > > > simon elliston ball
>> > > > > > @sireb
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> --
>>
>> Jon
>>
>
>

Re: [DISCUSS] Pcap panel architecture

Posted by Ryan Merriman <me...@gmail.com>.
Thanks for the feedback Jon.  I'm am not as familiar with BPF filtering as
you probably are.  Do you have an idea of much effort would be involved in
implementing this?  I suspect this would be another PcapFilter (
https://github.com/apache/metron/blob/master/metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/filter/PcapFilter.java)
similar to fixed and query so regardless of when we decide to add it our
endpoint strategy should support 1 to n filters.  Maybe we have an endpoint
for each type of filter:

POST /api/v1/pcap/fixed
POST /api/v1/pcap/query
POST /api/v1/pcap/bpf

This would allow us to accept requests that are structured differently and
specific to the type of filter.

On Wed, May 9, 2018 at 12:32 PM, Zeolla@GMail.com <ze...@gmail.com> wrote:

> This looks really great and gets me excited to maybe revisit some old
> conversations about PCAP capture in Metron.  The only thing that I think
> it's missing is the ability to filter using bpf.  I think the same thing
> can technically be accomplished by using packet_filter and I wouldn't throw
> a fit if that's considered a follow-on, but bpf is the standard language
> that people who do packet munging for a living know.
>
> Jon
>
> On Wed, May 9, 2018 at 1:00 PM Ryan Merriman <me...@gmail.com> wrote:
>
> > Now that we are confident we can run submit a MR job from our current
> REST
> > application, is this the desired approach?  Just want to confirm.
> >
> > Next I think we should map out what the REST interface will look like.
> > Here are the endpoints I'm thinking about:
> >
> > GET /api/v1/pcap/metadata?basePath
> >
> > This endpoint will return metadata of pcap data stored in HDFS.  This
> would
> > include pcap size, date ranges (how far back can I go), etc.  It would
> > accept an optional HDFS basePath parameter for cases where pcap data is
> > stored in multiple places and/or different from the default location.
> >
> > POST /api/v1/pcap/query
> >
> > This endpoint would accept a pcap request, submit a pcap query job, and
> > return a job id.  The request would be an object containing the
> parameters
> > documented here:  https://github.com/apache/metron/tree/master/
> > metron-platform/metron-pcap-backend#query-filter-utility.  A query/job
> > would be associated with a user that submits it.  An exception will be
> > returned for violating constraints like too many queries submitted, query
> > parameters out of limits, etc.
> >
> > GET /api/v1/pcap/status/<jobId>
> >
> > This endpoint will return the status of a running job.  I imagine this is
> > just a proxy to the YARN REST api.  We can discuss the implementation
> > behind these endpoints later.
> >
> > GET /api/v1/pcap/stop/<jobId>
> >
> > This endpoint would kill a running pcap job.  If the job has already
> > completed this is a noop.
> >
> > GET /api/v1/pcap/list
> >
> > This endpoint will list a user's submitted pcap queries.  Items in the
> list
> > would contain job id, status (is it finished?), start/end time, and
> number
> > of pages.  Maybe there is some overlap with the status endpoint above and
> > the status endpoint is not needed?
> >
> > GET /api/v1/pcap/pdml/<jobId>/<pageNumber>
> >
> > This endpoint will return pcap results for the given page in pdml format
> (
> > https://wiki.wireshark.org/PDML).  Are there other formats we want to
> > support?
> >
> > GET /api/v1/pcap/raw/<jobId>/<pageNumber>
> >
> > This endpoint will allow a user to download raw pcap results for the
> given
> > page.
> >
> > DELETE /api/v1/pcap/<jobId>
> >
> > This endpoint will delete pcap query results.  Not sure yet how this fits
> > in with our broader cleanup strategy.
> >
> > This should get us started.  What did I miss and what would you change
> > about these?  I did not include much detail related to security, cleanup
> > strategy, or underlying implementation details but these are items we
> > should discuss at some point.
> >
> > On Tue, May 8, 2018 at 5:38 PM, Michael Miklavcic <
> > michael.miklavcic@gmail.com> wrote:
> >
> > > Sweet! That's great news. The pom changes are a lot simpler than I
> > > expected. Very nice.
> > >
> > > On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman <me...@gmail.com>
> > wrote:
> > >
> > > > Finally figured it out.  Commit is here:
> > > > https://github.com/merrimanr/incubator-metron/commit/
> > > > 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
> > > >
> > > > It came down to figuring out the right combination of maven
> > dependencies
> > > > and passing in the HDP version to REST as a Java system property.  I
> > also
> > > > included some HDFS setup tasks.  I tested this in full dev and can
> now
> > > > successfully run a pcap query and get results.  All you should have
> to
> > do
> > > > is generate some pcap data first.
> > > >
> > > > On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
> > > > michael.miklavcic@gmail.com> wrote:
> > > >
> > > > > @Ryan - pulled your branch and experimented with a few things. In
> > doing
> > > > so,
> > > > > it dawned on me that by adding the yarn and hadoop classpath, you
> > > > probably
> > > > > didn't introduce a new classpath issue, rather you probably just
> > moved
> > > > onto
> > > > > the next classpath issue, ie hbase per your exception about hbase
> > jaxb.
> > > > > Anyhow, I put up a branch with some pom changes worth trying in
> > > > conjunction
> > > > > with invoking the rest app startup via "/usr/bin/yarn jar"
> > > > >
> > > > > https://github.com/mmiklavc/metron/tree/ryan-rest-test
> > > > >
> > > > > https://github.com/mmiklavc/metron/commit/
> > > 5ca23580fc6e043fafae2327c80b65
> > > > > b20ca1c0c9
> > > > >
> > > > > Mike
> > > > >
> > > > >
> > > > > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> > > > > simon@simonellistonball.com> wrote:
> > > > >
> > > > > > That would be a step closer to something more like a
> micro-service
> > > > > > architecture. However, I would want to make sure we think about
> the
> > > > > > operational complexity, and mpack implications of having another
> > > server
> > > > > > installed and running somewhere on the cluster (also, ssl,
> > kerberos,
> > > > etc
> > > > > > etc requirements for that service).
> > > > > >
> > > > > > On 8 May 2018 at 14:27, Ryan Merriman <me...@gmail.com>
> wrote:
> > > > > >
> > > > > > > +1 to having metron-api as it's own service and using a gateway
> > > type
> > > > > > > pattern.
> > > > > > >
> > > > > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <
> > > ottobackwards@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Why not have metron-api as it’s own service and use a
> ‘gateway’
> > > > type
> > > > > > > > pattern in rest?
> > > > > > > >
> > > > > > > >
> > > > > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (
> merrimanr@gmail.com
> > )
> > > > > wrote:
> > > > > > > >
> > > > > > > > Moving the yarn classpath command earlier in the classpath
> now
> > > > gives
> > > > > > this
> > > > > > > > error:
> > > > > > > >
> > > > > > > > Caused by: java.lang.NoSuchMethodError:
> > > > > > > > javax.servlet.ServletContext.getVirtualServerName()Ljava/
> > > > > lang/String;
> > > > > > > >
> > > > > > > > I will experiment with other combinations, I suspect we will
> > need
> > > > > > > > finer-grain control over the order.
> > > > > > > >
> > > > > > > > The grep matches class names inside jar files. I use this all
> > the
> > > > > time
> > > > > > > and
> > > > > > > > it's really useful.
> > > > > > > >
> > > > > > > > The metron-rest jar is already shaded.
> > > > > > > >
> > > > > > > > Reverse engineering the yarn jar command was the next thing I
> > was
> > > > > going
> > > > > > > to
> > > > > > > > try. Will let you know how it goes.
> > > > > > > >
> > > > > > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > What order did you add the hadoop or yarn classpath? The
> > > "shaded"
> > > > > > > > package
> > > > > > > > > stands out to me in this name "org.apache.hadoop.hbase.*
> > > shaded*
> > > > > > > > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider."
> Maybe
> > > try
> > > > > > adding
> > > > > > > > > those packages earlier on the classpath.
> > > > > > > > >
> > > > > > > > > I think that find command needs a "jar tvf", otherwise
> you're
> > > > > looking
> > > > > > > > for a
> > > > > > > > > class name in jar file names.
> > > > > > > > >
> > > > > > > > > Have you tried shading the rest jar?
> > > > > > > > >
> > > > > > > > > I'd also look at the classpath you get when running "yarn
> > jar"
> > > to
> > > > > > start
> > > > > > > > the
> > > > > > > > > existing pcap service, per the instructions in
> > > > > metron-api/README.md.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <
> > > > merrimanr@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > To explore the idea of merging metron-api into
> metron-rest
> > > and
> > > > > > > running
> > > > > > > > > pcap
> > > > > > > > > > queries inside our REST application, I created a simple
> > test
> > > > > here:
> > > > > > > > > > https://github.com/merrimanr/incubator-metron/tree/pcap-
> > > > > rest-test.
> > > > > > A
> > > > > > > > > > summary of what's included:
> > > > > > > > > >
> > > > > > > > > > - Added pcap as a dependency in the metron-rest pom.xml
> > > > > > > > > > - Added a pcap query controller endpoint at
> > > > > > > > > > http://node1:8082/swagger-ui.
> html#!/pcap-query-controller/
> > > > > > > > > queryUsingGET
> > > > > > > > > > - Added a pcap query service that runs a simple,
> hardcoded
> > > > query
> > > > > > > > > >
> > > > > > > > > > Generate some pcap data using pycapa (
> > > > > > > > > > https://github.com/apache/metron/tree/master/metron-
> > > > > sensors/pycapa
> > > > > > )
> > > > > > > > and
> > > > > > > > > > the
> > > > > > > > > > pcap topology (
> > > > > > > > > > https://github.com/apache/metron/tree/master/metron-
> > > > > > > > > > platform/metron-pcap-backend#starting-the-topology).
> > > > > > > > > > After this initial setup there should be data in HDFS at
> > > > > > > > > > "/apps/metron/pcap". I believe this should be enough to
> > > > exercise
> > > > > > the
> > > > > > > > > > issue. Just hit the endpoint referenced above. I tested
> > this
> > > in
> > > > > an
> > > > > > > > > > already running full dev by building and deploying the
> > > > > metron-rest
> > > > > > > > jar.
> > > > > > > > > I
> > > > > > > > > > did not rebuild full dev with this change but I would
> still
> > > > > expect
> > > > > > it
> > > > > > > > to
> > > > > > > > > > work. Let me know if it doesn't.
> > > > > > > > > >
> > > > > > > > > > The first error I see when I hit this endpoint is:
> > > > > > > > > >
> > > > > > > > > > java.lang.NoClassDefFoundError:
> > > > > > > > > > org/apache/hadoop/yarn/webapp/
> YarnJacksonJaxbJsonProvider.
> > > > > > > > > >
> > > > > > > > > > Here are the things I've tried so far:
> > > > > > > > > >
> > > > > > > > > > - Run the REST application with the YARN jar command
> since
> > > this
> > > > > is
> > > > > > > how
> > > > > > > > > > all our other YARN/MR-related applications are started
> > > > > (metron-api,
> > > > > > > > > > MAAS,
> > > > > > > > > > pcap query, etc). I wouldn't expect this to work since we
> > > have
> > > > > > > > > runtime
> > > > > > > > > > dependencies on our shaded elasticsearch and parser jars
> > and
> > > > I'm
> > > > > > not
> > > > > > > > > > aware
> > > > > > > > > > of a way to add additional jars to the classpath with the
> > > YARN
> > > > > jar
> > > > > > > > > > command
> > > > > > > > > > (is there a way?). Either way I get this error:
> > > > > > > > > >
> > > > > > > > > > 18/05/04 19:49:56 WARN reflections.Reflections: could not
> > > > create
> > > > > > Dir
> > > > > > > > > using
> > > > > > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/
> > > > hadoop/lib/ojdbc6.jar.
> > > > > > > > > skipping.
> > > > > > > > > > java.lang.NullPointerException
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > - I tried adding `yarn classpath` and `hadoop classpath`
> to
> > > the
> > > > > > > > > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST
> > > start
> > > > > > > > > > script). I
> > > > > > > > > > get this error:
> > > > > > > > > >
> > > > > > > > > > java.lang.ClassNotFoundException:
> > > > > > > > > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > > > > > > jaxrs.JacksonJaxbJsonProvider
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > - I searched for the class in the previous attempt but
> > could
> > > > not
> > > > > > find
> > > > > > > > > it
> > > > > > > > > > in full dev:
> > > > > > > > > >
> > > > > > > > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > > > > > > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > > > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > > > > > > 2>/dev/null
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > - Further up in the stack trace I see the error happens
> > when
> > > > > > > > > initiating
> > > > > > > > > > the org.apache.hadoop.yarn.util.timeline.TimelineUtils
> > > class.
> > > > I
> > > > > > > > > tried
> > > > > > > > > > setting "yarn.timeline-service.enabled" in Ambari to
> false
> > > and
> > > > > > then
> > > > > > > I
> > > > > > > > > > get
> > > > > > > > > > this error:
> > > > > > > > > >
> > > > > > > > > > Unable to parse
> > > > > > > > > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-
> > > > > framework'
> > > > > > > as
> > > > > > > > a
> > > > > > > > > > URI, check the setting for mapreduce.application.
> > > > framework.path
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > - I've tried adding different hadoop, hbase, yarn and
> > > mapreduce
> > > > > > Maven
> > > > > > > > > > dependencies without any success
> > > > > > > > > > - hadoop-yarn-client
> > > > > > > > > > - hadoop-yarn-common
> > > > > > > > > > - hadoop-mapreduce-client-core
> > > > > > > > > > - hadoop-yarn-server-common
> > > > > > > > > > - hadoop-yarn-api
> > > > > > > > > > - hbase-server
> > > > > > > > > >
> > > > > > > > > > I will keep exploring other possible solutions. Let me
> know
> > > if
> > > > > > anyone
> > > > > > > > > has
> > > > > > > > > > any ideas.
> > > > > > > > > >
> > > > > > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
> > > > > > ottobackwards@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I can imagine a new generic service(s) capability whose
> > > job (
> > > > > pun
> > > > > > > > > > intended
> > > > > > > > > > > ) is to
> > > > > > > > > > > abstract the submittal, tracking, and storage of
> results
> > to
> > > > > yarn.
> > > > > > > > > > >
> > > > > > > > > > > It would be extended with storage providers, queue
> > > provider,
> > > > > > > > possibly
> > > > > > > > > > some
> > > > > > > > > > > set of policies or rather strategies.
> > > > > > > > > > >
> > > > > > > > > > > The pcap ‘report’ would be a client to that service,
> the
> > > > > > > specializes
> > > > > > > > > the
> > > > > > > > > > > service operation for the way we want pcap to work.
> > > > > > > > > > >
> > > > > > > > > > > We can then re-use the generic service for other long
> > > running
> > > > > > yarn
> > > > > > > > > > > things…..
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
> > > > > ottobackwards@gmail.com
> > > > > > )
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > RE: Tracking v. users
> > > > > > > > > > >
> > > > > > > > > > > The submittal and tracking can associate the submitter
> > with
> > > > the
> > > > > > > yarn
> > > > > > > > > job
> > > > > > > > > > > and track that,
> > > > > > > > > > > regardless of the yarn credentials.
> > > > > > > > > > >
> > > > > > > > > > > IE> if all submittals and monitoring are by the same
> yarn
> > > > user
> > > > > (
> > > > > > > > > Metron )
> > > > > > > > > > > from a single or
> > > > > > > > > > > co-operative set of services, that service can maintain
> > the
> > > > > > > mapping.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (
> > > > merrimanr@gmail.com
> > > > > )
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Otto, your use case makes sense to me. We'll have to
> > think
> > > > > about
> > > > > > > how
> > > > > > > > to
> > > > > > > > > > > manage the user to job relationships. I'm assuming YARN
> > > jobs
> > > > > will
> > > > > > > be
> > > > > > > > > > > submitted as the metron service user so YARN won't keep
> > > track
> > > > > of
> > > > > > > > this
> > > > > > > > > for
> > > > > > > > > > > us. Is that assumption correct? Do you have any ideas
> for
> > > > doing
> > > > > > > > that?
> > > > > > > > > > >
> > > > > > > > > > > Mike, I can start a feature branch and experiment with
> > > > merging
> > > > > > > > > metron-api
> > > > > > > > > > > into metron-rest. That should allow us to collaborate
> on
> > > any
> > > > > > issues
> > > > > > > > or
> > > > > > > > > > > challenges. Also, can you expand on your idea to manage
> > > > > external
> > > > > > > > > > > dependencies as a special module? That seems like a
> very
> > > > > > attractive
> > > > > > > > > > option
> > > > > > > > > > > to me.
> > > > > > > > > > >
> > > > > > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
> > > > > > > ottobackwards@gmail.com>
> > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > From my response on the other thread, but applicable
> to
> > > the
> > > > > > > > backend
> > > > > > > > > > > stuff:
> > > > > > > > > > > >
> > > > > > > > > > > > "The PCAP Query seems more like PCAP Report to me.
> You
> > > are
> > > > > > > > > generating a
> > > > > > > > > > > > report based on parameters.
> > > > > > > > > > > > That report is something that takes some time and
> > > external
> > > > > > > process
> > > > > > > > to
> > > > > > > > > > > > generate… ie you have to wait for it.
> > > > > > > > > > > >
> > > > > > > > > > > > I can almost imagine a flow where you:
> > > > > > > > > > > >
> > > > > > > > > > > > * Are in the AlertUI
> > > > > > > > > > > > * Ask to generate a PCAP report based on some
> selected
> > > > > > > > > > alerts/meta-alert,
> > > > > > > > > > > > possibly picking from on or more report ‘templates’
> > > > > > > > > > > > that have query options etc
> > > > > > > > > > > > * The report request is ‘queued’, that is dispatched
> to
> > > be
> > > > be
> > > > > > > > > > > > executed/generated
> > > > > > > > > > > > * You as a user have a ‘queue’ of your report
> results,
> > > and
> > > > > when
> > > > > > > > the
> > > > > > > > > > > report
> > > > > > > > > > > > is done it is queued there
> > > > > > > > > > > > * We ‘monitor’ the report/queue press through the
> yarn
> > > > rest (
> > > > > > > > report
> > > > > > > > > > > > info/meta has the yarn details )
> > > > > > > > > > > > * You can select the report from your queue and view
> it
> > > > > either
> > > > > > in
> > > > > > > > a
> > > > > > > > > new
> > > > > > > > > > > UI
> > > > > > > > > > > > or custom component
> > > > > > > > > > > > * You can then apply a different ‘view’ to the report
> > or
> > > > work
> > > > > > > with
> > > > > > > > > the
> > > > > > > > > > > > report data
> > > > > > > > > > > > * You can print / save etc
> > > > > > > > > > > > * You can associate the report with the alerts (
> again
> > in
> > > > the
> > > > > > > > report
> > > > > > > > > > info
> > > > > > > > > > > > ) with…. a ‘case’ or ‘ticket’ or investigation
> > something
> > > or
> > > > > > other
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > We can introduce extensibility into the report
> > templates,
> > > > > > report
> > > > > > > > > views
> > > > > > > > > > (
> > > > > > > > > > > > thinks that work with the json data of the report )
> > > > > > > > > > > >
> > > > > > > > > > > > Something like that.”
> > > > > > > > > > > >
> > > > > > > > > > > > Maybe we can do :
> > > > > > > > > > > >
> > > > > > > > > > > > template -> query parameters -> script => yarn info
> > > > > > > > > > > > yarn info + query info + alert context + yarn status
> =>
> > > > > report
> > > > > > > > info
> > > > > > > > > ->
> > > > > > > > > > > > stored in a user’s ‘report queue’
> > > > > > > > > > > > report persistence added to report info
> > > > > > > > > > > > metron-rest -> api to monitor the queue, read
> results (
> > > > page
> > > > > ),
> > > > > > > > etc
> > > > > > > > > etc
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (
> > > > > merrimanr@gmail.com
> > > > > > )
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > I started a separate thread on Pcap UI considerations
> > and
> > > > > user
> > > > > > > > > > > > requirements
> > > > > > > > > > > > at Otto's request. This should help us keep these two
> > > > related
> > > > > > but
> > > > > > > > > > > separate
> > > > > > > > > > > > discussions focused.
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
> > > > > > > > > michelsumbul@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hello,
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > (Youhouuu my first reply on this kind of mail
> > chain^^)
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > If I may, I would like to share my view on the
> > > following
> > > > 3
> > > > > > > > points.
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Backend:
> > > > > > > > > > > > >
> > > > > > > > > > > > > The current metron-api is totally seperate, it will
> > be
> > > > > logic
> > > > > > > for
> > > > > > > > me
> > > > > > > > > > to
> > > > > > > > > > > > have
> > > > > > > > > > > > > it at the same place as the others rest api.
> > Especially
> > > > > when
> > > > > > > > more
> > > > > > > > > > > > security
> > > > > > > > > > > > > will be added, it will not be needed to do the job
> > > twice.
> > > > > > > > > > > > > The current implementation send back a pcap object
> > > which
> > > > > > still
> > > > > > > > need
> > > > > > > > > > to
> > > > > > > > > > > > be
> > > > > > > > > > > > > decoded. In the opensoc, the decoding was done with
> > > > tshard
> > > > > on
> > > > > > > > the
> > > > > > > > > > > > frontend.
> > > > > > > > > > > > > It will be good to have this decoding happening
> > > directly
> > > > on
> > > > > > the
> > > > > > > > > > backend
> > > > > > > > > > > > to
> > > > > > > > > > > > > not create a load on frontend. An option will be to
> > > > install
> > > > > > > > tshark
> > > > > > > > > on
> > > > > > > > > > > > the
> > > > > > > > > > > > > rest server and to use to convert the pcap to xml
> and
> > > > then
> > > > > > to a
> > > > > > > > > json
> > > > > > > > > > > > that
> > > > > > > > > > > > > will be send to the frontend.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I tried to start directly the map/reduce job to
> > search
> > > > over
> > > > > > all
> > > > > > > > the
> > > > > > > > > > > pcap
> > > > > > > > > > > > > data from the rest server and as Ryan mention it,
> we
> > > had
> > > > > > > > trouble. I
> > > > > > > > > > > will
> > > > > > > > > > > > > try to find back the error.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Then in the POC, what we tried is to use the
> > pcap_query
> > > > > > script
> > > > > > > > and
> > > > > > > > > > this
> > > > > > > > > > > > > work fine. I just modified it that he sends back
> > > directly
> > > > > the
> > > > > > > > > job_id
> > > > > > > > > > of
> > > > > > > > > > > > > yarn and not waiting that the job is finished. Then
> > it
> > > > will
> > > > > > > > allow
> > > > > > > > > the
> > > > > > > > > > > UI
> > > > > > > > > > > > > and the rest server to know what the status of the
> > > > research
> > > > > > by
> > > > > > > > > > querying
> > > > > > > > > > > > the
> > > > > > > > > > > > > yarn rest api. This will allow the UI and the rest
> > > server
> > > > > to
> > > > > > be
> > > > > > > > > async
> > > > > > > > > > > > > without any blocking phase. What do you think about
> > > that?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Having the job submitted directly from the code of
> > the
> > > > rest
> > > > > > > > server
> > > > > > > > > > will
> > > > > > > > > > > > be
> > > > > > > > > > > > > perfect, but it will need a lot of investigation I
> > > think
> > > > > (but
> > > > > > > > I'm
> > > > > > > > > not
> > > > > > > > > > > > the
> > > > > > > > > > > > > expert so I might be completely wrong ^^).
> > > > > > > > > > > > >
> > > > > > > > > > > > > We know that the pcap_query scritp work fine so why
> > not
> > > > > > calling
> > > > > > > > it?
> > > > > > > > > > Is
> > > > > > > > > > > > it
> > > > > > > > > > > > > that bad? (maybe stupid question, but I really
> don’t
> > > see
> > > > a
> > > > > > lot
> > > > > > > > of
> > > > > > > > > > > > drawback)
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Front end:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Adding the the pcap search to the alert UI is, I
> > think,
> > > > the
> > > > > > > > easiest
> > > > > > > > > > way
> > > > > > > > > > > > to
> > > > > > > > > > > > > move forward. But indeed, it will then be the
> “Alert
> > UI
> > > > and
> > > > > > > > > > pcapquery”.
> > > > > > > > > > > > > Maybe the name of the UI should just change to
> > > something
> > > > > like
> > > > > > > > > > > > “Monitoring &
> > > > > > > > > > > > > Investigation UI” ?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Is there any roadmap or plan for the different UI?
> I
> > > mean
> > > > > did
> > > > > > > > you
> > > > > > > > > > > > already
> > > > > > > > > > > > > had discussion on how you see the ui evolving with
> > the
> > > > new
> > > > > > > > feature
> > > > > > > > > > that
> > > > > > > > > > > > > will come in the future?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Microservices:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > What do you mean exactly by microservices? Is it to
> > > > > separate
> > > > > > > all
> > > > > > > > > the
> > > > > > > > > > > > > features in different projects? Or something like
> > > having
> > > > > the
> > > > > > > > > > different
> > > > > > > > > > > > > components in container like kubernet? (again maybe
> > > > stupid
> > > > > > > > > question,
> > > > > > > > > > > but
> > > > > > > > > > > > I
> > > > > > > > > > > > > don’t clearly understand what you mean J )
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Michel
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > --
> > > > > > simon elliston ball
> > > > > > @sireb
> > > > > >
> > > > >
> > > >
> > >
> >
> --
>
> Jon
>

Re: [DISCUSS] Pcap panel architecture

Posted by "Zeolla@GMail.com" <ze...@gmail.com>.
This looks really great and gets me excited to maybe revisit some old
conversations about PCAP capture in Metron.  The only thing that I think
it's missing is the ability to filter using bpf.  I think the same thing
can technically be accomplished by using packet_filter and I wouldn't throw
a fit if that's considered a follow-on, but bpf is the standard language
that people who do packet munging for a living know.

Jon

On Wed, May 9, 2018 at 1:00 PM Ryan Merriman <me...@gmail.com> wrote:

> Now that we are confident we can run submit a MR job from our current REST
> application, is this the desired approach?  Just want to confirm.
>
> Next I think we should map out what the REST interface will look like.
> Here are the endpoints I'm thinking about:
>
> GET /api/v1/pcap/metadata?basePath
>
> This endpoint will return metadata of pcap data stored in HDFS.  This would
> include pcap size, date ranges (how far back can I go), etc.  It would
> accept an optional HDFS basePath parameter for cases where pcap data is
> stored in multiple places and/or different from the default location.
>
> POST /api/v1/pcap/query
>
> This endpoint would accept a pcap request, submit a pcap query job, and
> return a job id.  The request would be an object containing the parameters
> documented here:  https://github.com/apache/metron/tree/master/
> metron-platform/metron-pcap-backend#query-filter-utility.  A query/job
> would be associated with a user that submits it.  An exception will be
> returned for violating constraints like too many queries submitted, query
> parameters out of limits, etc.
>
> GET /api/v1/pcap/status/<jobId>
>
> This endpoint will return the status of a running job.  I imagine this is
> just a proxy to the YARN REST api.  We can discuss the implementation
> behind these endpoints later.
>
> GET /api/v1/pcap/stop/<jobId>
>
> This endpoint would kill a running pcap job.  If the job has already
> completed this is a noop.
>
> GET /api/v1/pcap/list
>
> This endpoint will list a user's submitted pcap queries.  Items in the list
> would contain job id, status (is it finished?), start/end time, and number
> of pages.  Maybe there is some overlap with the status endpoint above and
> the status endpoint is not needed?
>
> GET /api/v1/pcap/pdml/<jobId>/<pageNumber>
>
> This endpoint will return pcap results for the given page in pdml format (
> https://wiki.wireshark.org/PDML).  Are there other formats we want to
> support?
>
> GET /api/v1/pcap/raw/<jobId>/<pageNumber>
>
> This endpoint will allow a user to download raw pcap results for the given
> page.
>
> DELETE /api/v1/pcap/<jobId>
>
> This endpoint will delete pcap query results.  Not sure yet how this fits
> in with our broader cleanup strategy.
>
> This should get us started.  What did I miss and what would you change
> about these?  I did not include much detail related to security, cleanup
> strategy, or underlying implementation details but these are items we
> should discuss at some point.
>
> On Tue, May 8, 2018 at 5:38 PM, Michael Miklavcic <
> michael.miklavcic@gmail.com> wrote:
>
> > Sweet! That's great news. The pom changes are a lot simpler than I
> > expected. Very nice.
> >
> > On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman <me...@gmail.com>
> wrote:
> >
> > > Finally figured it out.  Commit is here:
> > > https://github.com/merrimanr/incubator-metron/commit/
> > > 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
> > >
> > > It came down to figuring out the right combination of maven
> dependencies
> > > and passing in the HDP version to REST as a Java system property.  I
> also
> > > included some HDFS setup tasks.  I tested this in full dev and can now
> > > successfully run a pcap query and get results.  All you should have to
> do
> > > is generate some pcap data first.
> > >
> > > On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
> > > michael.miklavcic@gmail.com> wrote:
> > >
> > > > @Ryan - pulled your branch and experimented with a few things. In
> doing
> > > so,
> > > > it dawned on me that by adding the yarn and hadoop classpath, you
> > > probably
> > > > didn't introduce a new classpath issue, rather you probably just
> moved
> > > onto
> > > > the next classpath issue, ie hbase per your exception about hbase
> jaxb.
> > > > Anyhow, I put up a branch with some pom changes worth trying in
> > > conjunction
> > > > with invoking the rest app startup via "/usr/bin/yarn jar"
> > > >
> > > > https://github.com/mmiklavc/metron/tree/ryan-rest-test
> > > >
> > > > https://github.com/mmiklavc/metron/commit/
> > 5ca23580fc6e043fafae2327c80b65
> > > > b20ca1c0c9
> > > >
> > > > Mike
> > > >
> > > >
> > > > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> > > > simon@simonellistonball.com> wrote:
> > > >
> > > > > That would be a step closer to something more like a micro-service
> > > > > architecture. However, I would want to make sure we think about the
> > > > > operational complexity, and mpack implications of having another
> > server
> > > > > installed and running somewhere on the cluster (also, ssl,
> kerberos,
> > > etc
> > > > > etc requirements for that service).
> > > > >
> > > > > On 8 May 2018 at 14:27, Ryan Merriman <me...@gmail.com> wrote:
> > > > >
> > > > > > +1 to having metron-api as it's own service and using a gateway
> > type
> > > > > > pattern.
> > > > > >
> > > > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <
> > ottobackwards@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Why not have metron-api as it’s own service and use a ‘gateway’
> > > type
> > > > > > > pattern in rest?
> > > > > > >
> > > > > > >
> > > > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (merrimanr@gmail.com
> )
> > > > wrote:
> > > > > > >
> > > > > > > Moving the yarn classpath command earlier in the classpath now
> > > gives
> > > > > this
> > > > > > > error:
> > > > > > >
> > > > > > > Caused by: java.lang.NoSuchMethodError:
> > > > > > > javax.servlet.ServletContext.getVirtualServerName()Ljava/
> > > > lang/String;
> > > > > > >
> > > > > > > I will experiment with other combinations, I suspect we will
> need
> > > > > > > finer-grain control over the order.
> > > > > > >
> > > > > > > The grep matches class names inside jar files. I use this all
> the
> > > > time
> > > > > > and
> > > > > > > it's really useful.
> > > > > > >
> > > > > > > The metron-rest jar is already shaded.
> > > > > > >
> > > > > > > Reverse engineering the yarn jar command was the next thing I
> was
> > > > going
> > > > > > to
> > > > > > > try. Will let you know how it goes.
> > > > > > >
> > > > > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > > >
> > > > > > > > What order did you add the hadoop or yarn classpath? The
> > "shaded"
> > > > > > > package
> > > > > > > > stands out to me in this name "org.apache.hadoop.hbase.*
> > shaded*
> > > > > > > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider." Maybe
> > try
> > > > > adding
> > > > > > > > those packages earlier on the classpath.
> > > > > > > >
> > > > > > > > I think that find command needs a "jar tvf", otherwise you're
> > > > looking
> > > > > > > for a
> > > > > > > > class name in jar file names.
> > > > > > > >
> > > > > > > > Have you tried shading the rest jar?
> > > > > > > >
> > > > > > > > I'd also look at the classpath you get when running "yarn
> jar"
> > to
> > > > > start
> > > > > > > the
> > > > > > > > existing pcap service, per the instructions in
> > > > metron-api/README.md.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <
> > > merrimanr@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > To explore the idea of merging metron-api into metron-rest
> > and
> > > > > > running
> > > > > > > > pcap
> > > > > > > > > queries inside our REST application, I created a simple
> test
> > > > here:
> > > > > > > > > https://github.com/merrimanr/incubator-metron/tree/pcap-
> > > > rest-test.
> > > > > A
> > > > > > > > > summary of what's included:
> > > > > > > > >
> > > > > > > > > - Added pcap as a dependency in the metron-rest pom.xml
> > > > > > > > > - Added a pcap query controller endpoint at
> > > > > > > > > http://node1:8082/swagger-ui.html#!/pcap-query-controller/
> > > > > > > > queryUsingGET
> > > > > > > > > - Added a pcap query service that runs a simple, hardcoded
> > > query
> > > > > > > > >
> > > > > > > > > Generate some pcap data using pycapa (
> > > > > > > > > https://github.com/apache/metron/tree/master/metron-
> > > > sensors/pycapa
> > > > > )
> > > > > > > and
> > > > > > > > > the
> > > > > > > > > pcap topology (
> > > > > > > > > https://github.com/apache/metron/tree/master/metron-
> > > > > > > > > platform/metron-pcap-backend#starting-the-topology).
> > > > > > > > > After this initial setup there should be data in HDFS at
> > > > > > > > > "/apps/metron/pcap". I believe this should be enough to
> > > exercise
> > > > > the
> > > > > > > > > issue. Just hit the endpoint referenced above. I tested
> this
> > in
> > > > an
> > > > > > > > > already running full dev by building and deploying the
> > > > metron-rest
> > > > > > > jar.
> > > > > > > > I
> > > > > > > > > did not rebuild full dev with this change but I would still
> > > > expect
> > > > > it
> > > > > > > to
> > > > > > > > > work. Let me know if it doesn't.
> > > > > > > > >
> > > > > > > > > The first error I see when I hit this endpoint is:
> > > > > > > > >
> > > > > > > > > java.lang.NoClassDefFoundError:
> > > > > > > > > org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.
> > > > > > > > >
> > > > > > > > > Here are the things I've tried so far:
> > > > > > > > >
> > > > > > > > > - Run the REST application with the YARN jar command since
> > this
> > > > is
> > > > > > how
> > > > > > > > > all our other YARN/MR-related applications are started
> > > > (metron-api,
> > > > > > > > > MAAS,
> > > > > > > > > pcap query, etc). I wouldn't expect this to work since we
> > have
> > > > > > > > runtime
> > > > > > > > > dependencies on our shaded elasticsearch and parser jars
> and
> > > I'm
> > > > > not
> > > > > > > > > aware
> > > > > > > > > of a way to add additional jars to the classpath with the
> > YARN
> > > > jar
> > > > > > > > > command
> > > > > > > > > (is there a way?). Either way I get this error:
> > > > > > > > >
> > > > > > > > > 18/05/04 19:49:56 WARN reflections.Reflections: could not
> > > create
> > > > > Dir
> > > > > > > > using
> > > > > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/
> > > hadoop/lib/ojdbc6.jar.
> > > > > > > > skipping.
> > > > > > > > > java.lang.NullPointerException
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > - I tried adding `yarn classpath` and `hadoop classpath` to
> > the
> > > > > > > > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST
> > start
> > > > > > > > > script). I
> > > > > > > > > get this error:
> > > > > > > > >
> > > > > > > > > java.lang.ClassNotFoundException:
> > > > > > > > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > > > > > jaxrs.JacksonJaxbJsonProvider
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > - I searched for the class in the previous attempt but
> could
> > > not
> > > > > find
> > > > > > > > it
> > > > > > > > > in full dev:
> > > > > > > > >
> > > > > > > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > > > > > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > > > > > 2>/dev/null
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > - Further up in the stack trace I see the error happens
> when
> > > > > > > > initiating
> > > > > > > > > the org.apache.hadoop.yarn.util.timeline.TimelineUtils
> > class.
> > > I
> > > > > > > > tried
> > > > > > > > > setting "yarn.timeline-service.enabled" in Ambari to false
> > and
> > > > > then
> > > > > > I
> > > > > > > > > get
> > > > > > > > > this error:
> > > > > > > > >
> > > > > > > > > Unable to parse
> > > > > > > > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-
> > > > framework'
> > > > > > as
> > > > > > > a
> > > > > > > > > URI, check the setting for mapreduce.application.
> > > framework.path
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > - I've tried adding different hadoop, hbase, yarn and
> > mapreduce
> > > > > Maven
> > > > > > > > > dependencies without any success
> > > > > > > > > - hadoop-yarn-client
> > > > > > > > > - hadoop-yarn-common
> > > > > > > > > - hadoop-mapreduce-client-core
> > > > > > > > > - hadoop-yarn-server-common
> > > > > > > > > - hadoop-yarn-api
> > > > > > > > > - hbase-server
> > > > > > > > >
> > > > > > > > > I will keep exploring other possible solutions. Let me know
> > if
> > > > > anyone
> > > > > > > > has
> > > > > > > > > any ideas.
> > > > > > > > >
> > > > > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
> > > > > ottobackwards@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I can imagine a new generic service(s) capability whose
> > job (
> > > > pun
> > > > > > > > > intended
> > > > > > > > > > ) is to
> > > > > > > > > > abstract the submittal, tracking, and storage of results
> to
> > > > yarn.
> > > > > > > > > >
> > > > > > > > > > It would be extended with storage providers, queue
> > provider,
> > > > > > > possibly
> > > > > > > > > some
> > > > > > > > > > set of policies or rather strategies.
> > > > > > > > > >
> > > > > > > > > > The pcap ‘report’ would be a client to that service, the
> > > > > > specializes
> > > > > > > > the
> > > > > > > > > > service operation for the way we want pcap to work.
> > > > > > > > > >
> > > > > > > > > > We can then re-use the generic service for other long
> > running
> > > > > yarn
> > > > > > > > > > things…..
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
> > > > ottobackwards@gmail.com
> > > > > )
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > RE: Tracking v. users
> > > > > > > > > >
> > > > > > > > > > The submittal and tracking can associate the submitter
> with
> > > the
> > > > > > yarn
> > > > > > > > job
> > > > > > > > > > and track that,
> > > > > > > > > > regardless of the yarn credentials.
> > > > > > > > > >
> > > > > > > > > > IE> if all submittals and monitoring are by the same yarn
> > > user
> > > > (
> > > > > > > > Metron )
> > > > > > > > > > from a single or
> > > > > > > > > > co-operative set of services, that service can maintain
> the
> > > > > > mapping.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (
> > > merrimanr@gmail.com
> > > > )
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Otto, your use case makes sense to me. We'll have to
> think
> > > > about
> > > > > > how
> > > > > > > to
> > > > > > > > > > manage the user to job relationships. I'm assuming YARN
> > jobs
> > > > will
> > > > > > be
> > > > > > > > > > submitted as the metron service user so YARN won't keep
> > track
> > > > of
> > > > > > > this
> > > > > > > > for
> > > > > > > > > > us. Is that assumption correct? Do you have any ideas for
> > > doing
> > > > > > > that?
> > > > > > > > > >
> > > > > > > > > > Mike, I can start a feature branch and experiment with
> > > merging
> > > > > > > > metron-api
> > > > > > > > > > into metron-rest. That should allow us to collaborate on
> > any
> > > > > issues
> > > > > > > or
> > > > > > > > > > challenges. Also, can you expand on your idea to manage
> > > > external
> > > > > > > > > > dependencies as a special module? That seems like a very
> > > > > attractive
> > > > > > > > > option
> > > > > > > > > > to me.
> > > > > > > > > >
> > > > > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
> > > > > > ottobackwards@gmail.com>
> > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > From my response on the other thread, but applicable to
> > the
> > > > > > > backend
> > > > > > > > > > stuff:
> > > > > > > > > > >
> > > > > > > > > > > "The PCAP Query seems more like PCAP Report to me. You
> > are
> > > > > > > > generating a
> > > > > > > > > > > report based on parameters.
> > > > > > > > > > > That report is something that takes some time and
> > external
> > > > > > process
> > > > > > > to
> > > > > > > > > > > generate… ie you have to wait for it.
> > > > > > > > > > >
> > > > > > > > > > > I can almost imagine a flow where you:
> > > > > > > > > > >
> > > > > > > > > > > * Are in the AlertUI
> > > > > > > > > > > * Ask to generate a PCAP report based on some selected
> > > > > > > > > alerts/meta-alert,
> > > > > > > > > > > possibly picking from on or more report ‘templates’
> > > > > > > > > > > that have query options etc
> > > > > > > > > > > * The report request is ‘queued’, that is dispatched to
> > be
> > > be
> > > > > > > > > > > executed/generated
> > > > > > > > > > > * You as a user have a ‘queue’ of your report results,
> > and
> > > > when
> > > > > > > the
> > > > > > > > > > report
> > > > > > > > > > > is done it is queued there
> > > > > > > > > > > * We ‘monitor’ the report/queue press through the yarn
> > > rest (
> > > > > > > report
> > > > > > > > > > > info/meta has the yarn details )
> > > > > > > > > > > * You can select the report from your queue and view it
> > > > either
> > > > > in
> > > > > > > a
> > > > > > > > new
> > > > > > > > > > UI
> > > > > > > > > > > or custom component
> > > > > > > > > > > * You can then apply a different ‘view’ to the report
> or
> > > work
> > > > > > with
> > > > > > > > the
> > > > > > > > > > > report data
> > > > > > > > > > > * You can print / save etc
> > > > > > > > > > > * You can associate the report with the alerts ( again
> in
> > > the
> > > > > > > report
> > > > > > > > > info
> > > > > > > > > > > ) with…. a ‘case’ or ‘ticket’ or investigation
> something
> > or
> > > > > other
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > We can introduce extensibility into the report
> templates,
> > > > > report
> > > > > > > > views
> > > > > > > > > (
> > > > > > > > > > > thinks that work with the json data of the report )
> > > > > > > > > > >
> > > > > > > > > > > Something like that.”
> > > > > > > > > > >
> > > > > > > > > > > Maybe we can do :
> > > > > > > > > > >
> > > > > > > > > > > template -> query parameters -> script => yarn info
> > > > > > > > > > > yarn info + query info + alert context + yarn status =>
> > > > report
> > > > > > > info
> > > > > > > > ->
> > > > > > > > > > > stored in a user’s ‘report queue’
> > > > > > > > > > > report persistence added to report info
> > > > > > > > > > > metron-rest -> api to monitor the queue, read results (
> > > page
> > > > ),
> > > > > > > etc
> > > > > > > > etc
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (
> > > > merrimanr@gmail.com
> > > > > )
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > I started a separate thread on Pcap UI considerations
> and
> > > > user
> > > > > > > > > > > requirements
> > > > > > > > > > > at Otto's request. This should help us keep these two
> > > related
> > > > > but
> > > > > > > > > > separate
> > > > > > > > > > > discussions focused.
> > > > > > > > > > >
> > > > > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
> > > > > > > > michelsumbul@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hello,
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > (Youhouuu my first reply on this kind of mail
> chain^^)
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > If I may, I would like to share my view on the
> > following
> > > 3
> > > > > > > points.
> > > > > > > > > > > >
> > > > > > > > > > > > - Backend:
> > > > > > > > > > > >
> > > > > > > > > > > > The current metron-api is totally seperate, it will
> be
> > > > logic
> > > > > > for
> > > > > > > me
> > > > > > > > > to
> > > > > > > > > > > have
> > > > > > > > > > > > it at the same place as the others rest api.
> Especially
> > > > when
> > > > > > > more
> > > > > > > > > > > security
> > > > > > > > > > > > will be added, it will not be needed to do the job
> > twice.
> > > > > > > > > > > > The current implementation send back a pcap object
> > which
> > > > > still
> > > > > > > need
> > > > > > > > > to
> > > > > > > > > > > be
> > > > > > > > > > > > decoded. In the opensoc, the decoding was done with
> > > tshard
> > > > on
> > > > > > > the
> > > > > > > > > > > frontend.
> > > > > > > > > > > > It will be good to have this decoding happening
> > directly
> > > on
> > > > > the
> > > > > > > > > backend
> > > > > > > > > > > to
> > > > > > > > > > > > not create a load on frontend. An option will be to
> > > install
> > > > > > > tshark
> > > > > > > > on
> > > > > > > > > > > the
> > > > > > > > > > > > rest server and to use to convert the pcap to xml and
> > > then
> > > > > to a
> > > > > > > > json
> > > > > > > > > > > that
> > > > > > > > > > > > will be send to the frontend.
> > > > > > > > > > > >
> > > > > > > > > > > > I tried to start directly the map/reduce job to
> search
> > > over
> > > > > all
> > > > > > > the
> > > > > > > > > > pcap
> > > > > > > > > > > > data from the rest server and as Ryan mention it, we
> > had
> > > > > > > trouble. I
> > > > > > > > > > will
> > > > > > > > > > > > try to find back the error.
> > > > > > > > > > > >
> > > > > > > > > > > > Then in the POC, what we tried is to use the
> pcap_query
> > > > > script
> > > > > > > and
> > > > > > > > > this
> > > > > > > > > > > > work fine. I just modified it that he sends back
> > directly
> > > > the
> > > > > > > > job_id
> > > > > > > > > of
> > > > > > > > > > > > yarn and not waiting that the job is finished. Then
> it
> > > will
> > > > > > > allow
> > > > > > > > the
> > > > > > > > > > UI
> > > > > > > > > > > > and the rest server to know what the status of the
> > > research
> > > > > by
> > > > > > > > > querying
> > > > > > > > > > > the
> > > > > > > > > > > > yarn rest api. This will allow the UI and the rest
> > server
> > > > to
> > > > > be
> > > > > > > > async
> > > > > > > > > > > > without any blocking phase. What do you think about
> > that?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Having the job submitted directly from the code of
> the
> > > rest
> > > > > > > server
> > > > > > > > > will
> > > > > > > > > > > be
> > > > > > > > > > > > perfect, but it will need a lot of investigation I
> > think
> > > > (but
> > > > > > > I'm
> > > > > > > > not
> > > > > > > > > > > the
> > > > > > > > > > > > expert so I might be completely wrong ^^).
> > > > > > > > > > > >
> > > > > > > > > > > > We know that the pcap_query scritp work fine so why
> not
> > > > > calling
> > > > > > > it?
> > > > > > > > > Is
> > > > > > > > > > > it
> > > > > > > > > > > > that bad? (maybe stupid question, but I really don’t
> > see
> > > a
> > > > > lot
> > > > > > > of
> > > > > > > > > > > drawback)
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > - Front end:
> > > > > > > > > > > >
> > > > > > > > > > > > Adding the the pcap search to the alert UI is, I
> think,
> > > the
> > > > > > > easiest
> > > > > > > > > way
> > > > > > > > > > > to
> > > > > > > > > > > > move forward. But indeed, it will then be the “Alert
> UI
> > > and
> > > > > > > > > pcapquery”.
> > > > > > > > > > > > Maybe the name of the UI should just change to
> > something
> > > > like
> > > > > > > > > > > “Monitoring &
> > > > > > > > > > > > Investigation UI” ?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Is there any roadmap or plan for the different UI? I
> > mean
> > > > did
> > > > > > > you
> > > > > > > > > > > already
> > > > > > > > > > > > had discussion on how you see the ui evolving with
> the
> > > new
> > > > > > > feature
> > > > > > > > > that
> > > > > > > > > > > > will come in the future?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > - Microservices:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > What do you mean exactly by microservices? Is it to
> > > > separate
> > > > > > all
> > > > > > > > the
> > > > > > > > > > > > features in different projects? Or something like
> > having
> > > > the
> > > > > > > > > different
> > > > > > > > > > > > components in container like kubernet? (again maybe
> > > stupid
> > > > > > > > question,
> > > > > > > > > > but
> > > > > > > > > > > I
> > > > > > > > > > > > don’t clearly understand what you mean J )
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Michel
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > --
> > > > > simon elliston ball
> > > > > @sireb
> > > > >
> > > >
> > >
> >
>
-- 

Jon

Re: [DISCUSS] Pcap panel architecture

Posted by Ryan Merriman <me...@gmail.com>.
Now that we are confident we can run submit a MR job from our current REST
application, is this the desired approach?  Just want to confirm.

Next I think we should map out what the REST interface will look like.
Here are the endpoints I'm thinking about:

GET /api/v1/pcap/metadata?basePath

This endpoint will return metadata of pcap data stored in HDFS.  This would
include pcap size, date ranges (how far back can I go), etc.  It would
accept an optional HDFS basePath parameter for cases where pcap data is
stored in multiple places and/or different from the default location.

POST /api/v1/pcap/query

This endpoint would accept a pcap request, submit a pcap query job, and
return a job id.  The request would be an object containing the parameters
documented here:  https://github.com/apache/metron/tree/master/
metron-platform/metron-pcap-backend#query-filter-utility.  A query/job
would be associated with a user that submits it.  An exception will be
returned for violating constraints like too many queries submitted, query
parameters out of limits, etc.

GET /api/v1/pcap/status/<jobId>

This endpoint will return the status of a running job.  I imagine this is
just a proxy to the YARN REST api.  We can discuss the implementation
behind these endpoints later.

GET /api/v1/pcap/stop/<jobId>

This endpoint would kill a running pcap job.  If the job has already
completed this is a noop.

GET /api/v1/pcap/list

This endpoint will list a user's submitted pcap queries.  Items in the list
would contain job id, status (is it finished?), start/end time, and number
of pages.  Maybe there is some overlap with the status endpoint above and
the status endpoint is not needed?

GET /api/v1/pcap/pdml/<jobId>/<pageNumber>

This endpoint will return pcap results for the given page in pdml format (
https://wiki.wireshark.org/PDML).  Are there other formats we want to
support?

GET /api/v1/pcap/raw/<jobId>/<pageNumber>

This endpoint will allow a user to download raw pcap results for the given
page.

DELETE /api/v1/pcap/<jobId>

This endpoint will delete pcap query results.  Not sure yet how this fits
in with our broader cleanup strategy.

This should get us started.  What did I miss and what would you change
about these?  I did not include much detail related to security, cleanup
strategy, or underlying implementation details but these are items we
should discuss at some point.

On Tue, May 8, 2018 at 5:38 PM, Michael Miklavcic <
michael.miklavcic@gmail.com> wrote:

> Sweet! That's great news. The pom changes are a lot simpler than I
> expected. Very nice.
>
> On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman <me...@gmail.com> wrote:
>
> > Finally figured it out.  Commit is here:
> > https://github.com/merrimanr/incubator-metron/commit/
> > 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
> >
> > It came down to figuring out the right combination of maven dependencies
> > and passing in the HDP version to REST as a Java system property.  I also
> > included some HDFS setup tasks.  I tested this in full dev and can now
> > successfully run a pcap query and get results.  All you should have to do
> > is generate some pcap data first.
> >
> > On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
> > michael.miklavcic@gmail.com> wrote:
> >
> > > @Ryan - pulled your branch and experimented with a few things. In doing
> > so,
> > > it dawned on me that by adding the yarn and hadoop classpath, you
> > probably
> > > didn't introduce a new classpath issue, rather you probably just moved
> > onto
> > > the next classpath issue, ie hbase per your exception about hbase jaxb.
> > > Anyhow, I put up a branch with some pom changes worth trying in
> > conjunction
> > > with invoking the rest app startup via "/usr/bin/yarn jar"
> > >
> > > https://github.com/mmiklavc/metron/tree/ryan-rest-test
> > >
> > > https://github.com/mmiklavc/metron/commit/
> 5ca23580fc6e043fafae2327c80b65
> > > b20ca1c0c9
> > >
> > > Mike
> > >
> > >
> > > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> > > simon@simonellistonball.com> wrote:
> > >
> > > > That would be a step closer to something more like a micro-service
> > > > architecture. However, I would want to make sure we think about the
> > > > operational complexity, and mpack implications of having another
> server
> > > > installed and running somewhere on the cluster (also, ssl, kerberos,
> > etc
> > > > etc requirements for that service).
> > > >
> > > > On 8 May 2018 at 14:27, Ryan Merriman <me...@gmail.com> wrote:
> > > >
> > > > > +1 to having metron-api as it's own service and using a gateway
> type
> > > > > pattern.
> > > > >
> > > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <
> ottobackwards@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Why not have metron-api as it’s own service and use a ‘gateway’
> > type
> > > > > > pattern in rest?
> > > > > >
> > > > > >
> > > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (merrimanr@gmail.com)
> > > wrote:
> > > > > >
> > > > > > Moving the yarn classpath command earlier in the classpath now
> > gives
> > > > this
> > > > > > error:
> > > > > >
> > > > > > Caused by: java.lang.NoSuchMethodError:
> > > > > > javax.servlet.ServletContext.getVirtualServerName()Ljava/
> > > lang/String;
> > > > > >
> > > > > > I will experiment with other combinations, I suspect we will need
> > > > > > finer-grain control over the order.
> > > > > >
> > > > > > The grep matches class names inside jar files. I use this all the
> > > time
> > > > > and
> > > > > > it's really useful.
> > > > > >
> > > > > > The metron-rest jar is already shaded.
> > > > > >
> > > > > > Reverse engineering the yarn jar command was the next thing I was
> > > going
> > > > > to
> > > > > > try. Will let you know how it goes.
> > > > > >
> > > > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > > > > > michael.miklavcic@gmail.com> wrote:
> > > > > >
> > > > > > > What order did you add the hadoop or yarn classpath? The
> "shaded"
> > > > > > package
> > > > > > > stands out to me in this name "org.apache.hadoop.hbase.*
> shaded*
> > > > > > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider." Maybe
> try
> > > > adding
> > > > > > > those packages earlier on the classpath.
> > > > > > >
> > > > > > > I think that find command needs a "jar tvf", otherwise you're
> > > looking
> > > > > > for a
> > > > > > > class name in jar file names.
> > > > > > >
> > > > > > > Have you tried shading the rest jar?
> > > > > > >
> > > > > > > I'd also look at the classpath you get when running "yarn jar"
> to
> > > > start
> > > > > > the
> > > > > > > existing pcap service, per the instructions in
> > > metron-api/README.md.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <
> > merrimanr@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > To explore the idea of merging metron-api into metron-rest
> and
> > > > > running
> > > > > > > pcap
> > > > > > > > queries inside our REST application, I created a simple test
> > > here:
> > > > > > > > https://github.com/merrimanr/incubator-metron/tree/pcap-
> > > rest-test.
> > > > A
> > > > > > > > summary of what's included:
> > > > > > > >
> > > > > > > > - Added pcap as a dependency in the metron-rest pom.xml
> > > > > > > > - Added a pcap query controller endpoint at
> > > > > > > > http://node1:8082/swagger-ui.html#!/pcap-query-controller/
> > > > > > > queryUsingGET
> > > > > > > > - Added a pcap query service that runs a simple, hardcoded
> > query
> > > > > > > >
> > > > > > > > Generate some pcap data using pycapa (
> > > > > > > > https://github.com/apache/metron/tree/master/metron-
> > > sensors/pycapa
> > > > )
> > > > > > and
> > > > > > > > the
> > > > > > > > pcap topology (
> > > > > > > > https://github.com/apache/metron/tree/master/metron-
> > > > > > > > platform/metron-pcap-backend#starting-the-topology).
> > > > > > > > After this initial setup there should be data in HDFS at
> > > > > > > > "/apps/metron/pcap". I believe this should be enough to
> > exercise
> > > > the
> > > > > > > > issue. Just hit the endpoint referenced above. I tested this
> in
> > > an
> > > > > > > > already running full dev by building and deploying the
> > > metron-rest
> > > > > > jar.
> > > > > > > I
> > > > > > > > did not rebuild full dev with this change but I would still
> > > expect
> > > > it
> > > > > > to
> > > > > > > > work. Let me know if it doesn't.
> > > > > > > >
> > > > > > > > The first error I see when I hit this endpoint is:
> > > > > > > >
> > > > > > > > java.lang.NoClassDefFoundError:
> > > > > > > > org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.
> > > > > > > >
> > > > > > > > Here are the things I've tried so far:
> > > > > > > >
> > > > > > > > - Run the REST application with the YARN jar command since
> this
> > > is
> > > > > how
> > > > > > > > all our other YARN/MR-related applications are started
> > > (metron-api,
> > > > > > > > MAAS,
> > > > > > > > pcap query, etc). I wouldn't expect this to work since we
> have
> > > > > > > runtime
> > > > > > > > dependencies on our shaded elasticsearch and parser jars and
> > I'm
> > > > not
> > > > > > > > aware
> > > > > > > > of a way to add additional jars to the classpath with the
> YARN
> > > jar
> > > > > > > > command
> > > > > > > > (is there a way?). Either way I get this error:
> > > > > > > >
> > > > > > > > 18/05/04 19:49:56 WARN reflections.Reflections: could not
> > create
> > > > Dir
> > > > > > > using
> > > > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/
> > hadoop/lib/ojdbc6.jar.
> > > > > > > skipping.
> > > > > > > > java.lang.NullPointerException
> > > > > > > >
> > > > > > > >
> > > > > > > > - I tried adding `yarn classpath` and `hadoop classpath` to
> the
> > > > > > > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST
> start
> > > > > > > > script). I
> > > > > > > > get this error:
> > > > > > > >
> > > > > > > > java.lang.ClassNotFoundException:
> > > > > > > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > > > > jaxrs.JacksonJaxbJsonProvider
> > > > > > > >
> > > > > > > >
> > > > > > > > - I searched for the class in the previous attempt but could
> > not
> > > > find
> > > > > > > it
> > > > > > > > in full dev:
> > > > > > > >
> > > > > > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > > > > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > > > > 2>/dev/null
> > > > > > > >
> > > > > > > >
> > > > > > > > - Further up in the stack trace I see the error happens when
> > > > > > > initiating
> > > > > > > > the org.apache.hadoop.yarn.util.timeline.TimelineUtils
> class.
> > I
> > > > > > > tried
> > > > > > > > setting "yarn.timeline-service.enabled" in Ambari to false
> and
> > > > then
> > > > > I
> > > > > > > > get
> > > > > > > > this error:
> > > > > > > >
> > > > > > > > Unable to parse
> > > > > > > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-
> > > framework'
> > > > > as
> > > > > > a
> > > > > > > > URI, check the setting for mapreduce.application.
> > framework.path
> > > > > > > >
> > > > > > > >
> > > > > > > > - I've tried adding different hadoop, hbase, yarn and
> mapreduce
> > > > Maven
> > > > > > > > dependencies without any success
> > > > > > > > - hadoop-yarn-client
> > > > > > > > - hadoop-yarn-common
> > > > > > > > - hadoop-mapreduce-client-core
> > > > > > > > - hadoop-yarn-server-common
> > > > > > > > - hadoop-yarn-api
> > > > > > > > - hbase-server
> > > > > > > >
> > > > > > > > I will keep exploring other possible solutions. Let me know
> if
> > > > anyone
> > > > > > > has
> > > > > > > > any ideas.
> > > > > > > >
> > > > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
> > > > ottobackwards@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I can imagine a new generic service(s) capability whose
> job (
> > > pun
> > > > > > > > intended
> > > > > > > > > ) is to
> > > > > > > > > abstract the submittal, tracking, and storage of results to
> > > yarn.
> > > > > > > > >
> > > > > > > > > It would be extended with storage providers, queue
> provider,
> > > > > > possibly
> > > > > > > > some
> > > > > > > > > set of policies or rather strategies.
> > > > > > > > >
> > > > > > > > > The pcap ‘report’ would be a client to that service, the
> > > > > specializes
> > > > > > > the
> > > > > > > > > service operation for the way we want pcap to work.
> > > > > > > > >
> > > > > > > > > We can then re-use the generic service for other long
> running
> > > > yarn
> > > > > > > > > things…..
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
> > > ottobackwards@gmail.com
> > > > )
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > RE: Tracking v. users
> > > > > > > > >
> > > > > > > > > The submittal and tracking can associate the submitter with
> > the
> > > > > yarn
> > > > > > > job
> > > > > > > > > and track that,
> > > > > > > > > regardless of the yarn credentials.
> > > > > > > > >
> > > > > > > > > IE> if all submittals and monitoring are by the same yarn
> > user
> > > (
> > > > > > > Metron )
> > > > > > > > > from a single or
> > > > > > > > > co-operative set of services, that service can maintain the
> > > > > mapping.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (
> > merrimanr@gmail.com
> > > )
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Otto, your use case makes sense to me. We'll have to think
> > > about
> > > > > how
> > > > > > to
> > > > > > > > > manage the user to job relationships. I'm assuming YARN
> jobs
> > > will
> > > > > be
> > > > > > > > > submitted as the metron service user so YARN won't keep
> track
> > > of
> > > > > > this
> > > > > > > for
> > > > > > > > > us. Is that assumption correct? Do you have any ideas for
> > doing
> > > > > > that?
> > > > > > > > >
> > > > > > > > > Mike, I can start a feature branch and experiment with
> > merging
> > > > > > > metron-api
> > > > > > > > > into metron-rest. That should allow us to collaborate on
> any
> > > > issues
> > > > > > or
> > > > > > > > > challenges. Also, can you expand on your idea to manage
> > > external
> > > > > > > > > dependencies as a special module? That seems like a very
> > > > attractive
> > > > > > > > option
> > > > > > > > > to me.
> > > > > > > > >
> > > > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
> > > > > ottobackwards@gmail.com>
> > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > From my response on the other thread, but applicable to
> the
> > > > > > backend
> > > > > > > > > stuff:
> > > > > > > > > >
> > > > > > > > > > "The PCAP Query seems more like PCAP Report to me. You
> are
> > > > > > > generating a
> > > > > > > > > > report based on parameters.
> > > > > > > > > > That report is something that takes some time and
> external
> > > > > process
> > > > > > to
> > > > > > > > > > generate… ie you have to wait for it.
> > > > > > > > > >
> > > > > > > > > > I can almost imagine a flow where you:
> > > > > > > > > >
> > > > > > > > > > * Are in the AlertUI
> > > > > > > > > > * Ask to generate a PCAP report based on some selected
> > > > > > > > alerts/meta-alert,
> > > > > > > > > > possibly picking from on or more report ‘templates’
> > > > > > > > > > that have query options etc
> > > > > > > > > > * The report request is ‘queued’, that is dispatched to
> be
> > be
> > > > > > > > > > executed/generated
> > > > > > > > > > * You as a user have a ‘queue’ of your report results,
> and
> > > when
> > > > > > the
> > > > > > > > > report
> > > > > > > > > > is done it is queued there
> > > > > > > > > > * We ‘monitor’ the report/queue press through the yarn
> > rest (
> > > > > > report
> > > > > > > > > > info/meta has the yarn details )
> > > > > > > > > > * You can select the report from your queue and view it
> > > either
> > > > in
> > > > > > a
> > > > > > > new
> > > > > > > > > UI
> > > > > > > > > > or custom component
> > > > > > > > > > * You can then apply a different ‘view’ to the report or
> > work
> > > > > with
> > > > > > > the
> > > > > > > > > > report data
> > > > > > > > > > * You can print / save etc
> > > > > > > > > > * You can associate the report with the alerts ( again in
> > the
> > > > > > report
> > > > > > > > info
> > > > > > > > > > ) with…. a ‘case’ or ‘ticket’ or investigation something
> or
> > > > other
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > We can introduce extensibility into the report templates,
> > > > report
> > > > > > > views
> > > > > > > > (
> > > > > > > > > > thinks that work with the json data of the report )
> > > > > > > > > >
> > > > > > > > > > Something like that.”
> > > > > > > > > >
> > > > > > > > > > Maybe we can do :
> > > > > > > > > >
> > > > > > > > > > template -> query parameters -> script => yarn info
> > > > > > > > > > yarn info + query info + alert context + yarn status =>
> > > report
> > > > > > info
> > > > > > > ->
> > > > > > > > > > stored in a user’s ‘report queue’
> > > > > > > > > > report persistence added to report info
> > > > > > > > > > metron-rest -> api to monitor the queue, read results (
> > page
> > > ),
> > > > > > etc
> > > > > > > etc
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (
> > > merrimanr@gmail.com
> > > > )
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > I started a separate thread on Pcap UI considerations and
> > > user
> > > > > > > > > > requirements
> > > > > > > > > > at Otto's request. This should help us keep these two
> > related
> > > > but
> > > > > > > > > separate
> > > > > > > > > > discussions focused.
> > > > > > > > > >
> > > > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
> > > > > > > michelsumbul@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hello,
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > (Youhouuu my first reply on this kind of mail chain^^)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > If I may, I would like to share my view on the
> following
> > 3
> > > > > > points.
> > > > > > > > > > >
> > > > > > > > > > > - Backend:
> > > > > > > > > > >
> > > > > > > > > > > The current metron-api is totally seperate, it will be
> > > logic
> > > > > for
> > > > > > me
> > > > > > > > to
> > > > > > > > > > have
> > > > > > > > > > > it at the same place as the others rest api. Especially
> > > when
> > > > > > more
> > > > > > > > > > security
> > > > > > > > > > > will be added, it will not be needed to do the job
> twice.
> > > > > > > > > > > The current implementation send back a pcap object
> which
> > > > still
> > > > > > need
> > > > > > > > to
> > > > > > > > > > be
> > > > > > > > > > > decoded. In the opensoc, the decoding was done with
> > tshard
> > > on
> > > > > > the
> > > > > > > > > > frontend.
> > > > > > > > > > > It will be good to have this decoding happening
> directly
> > on
> > > > the
> > > > > > > > backend
> > > > > > > > > > to
> > > > > > > > > > > not create a load on frontend. An option will be to
> > install
> > > > > > tshark
> > > > > > > on
> > > > > > > > > > the
> > > > > > > > > > > rest server and to use to convert the pcap to xml and
> > then
> > > > to a
> > > > > > > json
> > > > > > > > > > that
> > > > > > > > > > > will be send to the frontend.
> > > > > > > > > > >
> > > > > > > > > > > I tried to start directly the map/reduce job to search
> > over
> > > > all
> > > > > > the
> > > > > > > > > pcap
> > > > > > > > > > > data from the rest server and as Ryan mention it, we
> had
> > > > > > trouble. I
> > > > > > > > > will
> > > > > > > > > > > try to find back the error.
> > > > > > > > > > >
> > > > > > > > > > > Then in the POC, what we tried is to use the pcap_query
> > > > script
> > > > > > and
> > > > > > > > this
> > > > > > > > > > > work fine. I just modified it that he sends back
> directly
> > > the
> > > > > > > job_id
> > > > > > > > of
> > > > > > > > > > > yarn and not waiting that the job is finished. Then it
> > will
> > > > > > allow
> > > > > > > the
> > > > > > > > > UI
> > > > > > > > > > > and the rest server to know what the status of the
> > research
> > > > by
> > > > > > > > querying
> > > > > > > > > > the
> > > > > > > > > > > yarn rest api. This will allow the UI and the rest
> server
> > > to
> > > > be
> > > > > > > async
> > > > > > > > > > > without any blocking phase. What do you think about
> that?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Having the job submitted directly from the code of the
> > rest
> > > > > > server
> > > > > > > > will
> > > > > > > > > > be
> > > > > > > > > > > perfect, but it will need a lot of investigation I
> think
> > > (but
> > > > > > I'm
> > > > > > > not
> > > > > > > > > > the
> > > > > > > > > > > expert so I might be completely wrong ^^).
> > > > > > > > > > >
> > > > > > > > > > > We know that the pcap_query scritp work fine so why not
> > > > calling
> > > > > > it?
> > > > > > > > Is
> > > > > > > > > > it
> > > > > > > > > > > that bad? (maybe stupid question, but I really don’t
> see
> > a
> > > > lot
> > > > > > of
> > > > > > > > > > drawback)
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > - Front end:
> > > > > > > > > > >
> > > > > > > > > > > Adding the the pcap search to the alert UI is, I think,
> > the
> > > > > > easiest
> > > > > > > > way
> > > > > > > > > > to
> > > > > > > > > > > move forward. But indeed, it will then be the “Alert UI
> > and
> > > > > > > > pcapquery”.
> > > > > > > > > > > Maybe the name of the UI should just change to
> something
> > > like
> > > > > > > > > > “Monitoring &
> > > > > > > > > > > Investigation UI” ?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Is there any roadmap or plan for the different UI? I
> mean
> > > did
> > > > > > you
> > > > > > > > > > already
> > > > > > > > > > > had discussion on how you see the ui evolving with the
> > new
> > > > > > feature
> > > > > > > > that
> > > > > > > > > > > will come in the future?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > - Microservices:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > What do you mean exactly by microservices? Is it to
> > > separate
> > > > > all
> > > > > > > the
> > > > > > > > > > > features in different projects? Or something like
> having
> > > the
> > > > > > > > different
> > > > > > > > > > > components in container like kubernet? (again maybe
> > stupid
> > > > > > > question,
> > > > > > > > > but
> > > > > > > > > > I
> > > > > > > > > > > don’t clearly understand what you mean J )
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Michel
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > --
> > > > simon elliston ball
> > > > @sireb
> > > >
> > >
> >
>

Re: [DISCUSS] Pcap panel architecture

Posted by Michael Miklavcic <mi...@gmail.com>.
Sweet! That's great news. The pom changes are a lot simpler than I
expected. Very nice.

On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman <me...@gmail.com> wrote:

> Finally figured it out.  Commit is here:
> https://github.com/merrimanr/incubator-metron/commit/
> 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1
>
> It came down to figuring out the right combination of maven dependencies
> and passing in the HDP version to REST as a Java system property.  I also
> included some HDFS setup tasks.  I tested this in full dev and can now
> successfully run a pcap query and get results.  All you should have to do
> is generate some pcap data first.
>
> On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
> michael.miklavcic@gmail.com> wrote:
>
> > @Ryan - pulled your branch and experimented with a few things. In doing
> so,
> > it dawned on me that by adding the yarn and hadoop classpath, you
> probably
> > didn't introduce a new classpath issue, rather you probably just moved
> onto
> > the next classpath issue, ie hbase per your exception about hbase jaxb.
> > Anyhow, I put up a branch with some pom changes worth trying in
> conjunction
> > with invoking the rest app startup via "/usr/bin/yarn jar"
> >
> > https://github.com/mmiklavc/metron/tree/ryan-rest-test
> >
> > https://github.com/mmiklavc/metron/commit/5ca23580fc6e043fafae2327c80b65
> > b20ca1c0c9
> >
> > Mike
> >
> >
> > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> > simon@simonellistonball.com> wrote:
> >
> > > That would be a step closer to something more like a micro-service
> > > architecture. However, I would want to make sure we think about the
> > > operational complexity, and mpack implications of having another server
> > > installed and running somewhere on the cluster (also, ssl, kerberos,
> etc
> > > etc requirements for that service).
> > >
> > > On 8 May 2018 at 14:27, Ryan Merriman <me...@gmail.com> wrote:
> > >
> > > > +1 to having metron-api as it's own service and using a gateway type
> > > > pattern.
> > > >
> > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <ottobackwards@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Why not have metron-api as it’s own service and use a ‘gateway’
> type
> > > > > pattern in rest?
> > > > >
> > > > >
> > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (merrimanr@gmail.com)
> > wrote:
> > > > >
> > > > > Moving the yarn classpath command earlier in the classpath now
> gives
> > > this
> > > > > error:
> > > > >
> > > > > Caused by: java.lang.NoSuchMethodError:
> > > > > javax.servlet.ServletContext.getVirtualServerName()Ljava/
> > lang/String;
> > > > >
> > > > > I will experiment with other combinations, I suspect we will need
> > > > > finer-grain control over the order.
> > > > >
> > > > > The grep matches class names inside jar files. I use this all the
> > time
> > > > and
> > > > > it's really useful.
> > > > >
> > > > > The metron-rest jar is already shaded.
> > > > >
> > > > > Reverse engineering the yarn jar command was the next thing I was
> > going
> > > > to
> > > > > try. Will let you know how it goes.
> > > > >
> > > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > > > > michael.miklavcic@gmail.com> wrote:
> > > > >
> > > > > > What order did you add the hadoop or yarn classpath? The "shaded"
> > > > > package
> > > > > > stands out to me in this name "org.apache.hadoop.hbase.*shaded*
> > > > > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider." Maybe try
> > > adding
> > > > > > those packages earlier on the classpath.
> > > > > >
> > > > > > I think that find command needs a "jar tvf", otherwise you're
> > looking
> > > > > for a
> > > > > > class name in jar file names.
> > > > > >
> > > > > > Have you tried shading the rest jar?
> > > > > >
> > > > > > I'd also look at the classpath you get when running "yarn jar" to
> > > start
> > > > > the
> > > > > > existing pcap service, per the instructions in
> > metron-api/README.md.
> > > > > >
> > > > > >
> > > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <
> merrimanr@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > To explore the idea of merging metron-api into metron-rest and
> > > > running
> > > > > > pcap
> > > > > > > queries inside our REST application, I created a simple test
> > here:
> > > > > > > https://github.com/merrimanr/incubator-metron/tree/pcap-
> > rest-test.
> > > A
> > > > > > > summary of what's included:
> > > > > > >
> > > > > > > - Added pcap as a dependency in the metron-rest pom.xml
> > > > > > > - Added a pcap query controller endpoint at
> > > > > > > http://node1:8082/swagger-ui.html#!/pcap-query-controller/
> > > > > > queryUsingGET
> > > > > > > - Added a pcap query service that runs a simple, hardcoded
> query
> > > > > > >
> > > > > > > Generate some pcap data using pycapa (
> > > > > > > https://github.com/apache/metron/tree/master/metron-
> > sensors/pycapa
> > > )
> > > > > and
> > > > > > > the
> > > > > > > pcap topology (
> > > > > > > https://github.com/apache/metron/tree/master/metron-
> > > > > > > platform/metron-pcap-backend#starting-the-topology).
> > > > > > > After this initial setup there should be data in HDFS at
> > > > > > > "/apps/metron/pcap". I believe this should be enough to
> exercise
> > > the
> > > > > > > issue. Just hit the endpoint referenced above. I tested this in
> > an
> > > > > > > already running full dev by building and deploying the
> > metron-rest
> > > > > jar.
> > > > > > I
> > > > > > > did not rebuild full dev with this change but I would still
> > expect
> > > it
> > > > > to
> > > > > > > work. Let me know if it doesn't.
> > > > > > >
> > > > > > > The first error I see when I hit this endpoint is:
> > > > > > >
> > > > > > > java.lang.NoClassDefFoundError:
> > > > > > > org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.
> > > > > > >
> > > > > > > Here are the things I've tried so far:
> > > > > > >
> > > > > > > - Run the REST application with the YARN jar command since this
> > is
> > > > how
> > > > > > > all our other YARN/MR-related applications are started
> > (metron-api,
> > > > > > > MAAS,
> > > > > > > pcap query, etc). I wouldn't expect this to work since we have
> > > > > > runtime
> > > > > > > dependencies on our shaded elasticsearch and parser jars and
> I'm
> > > not
> > > > > > > aware
> > > > > > > of a way to add additional jars to the classpath with the YARN
> > jar
> > > > > > > command
> > > > > > > (is there a way?). Either way I get this error:
> > > > > > >
> > > > > > > 18/05/04 19:49:56 WARN reflections.Reflections: could not
> create
> > > Dir
> > > > > > using
> > > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/
> hadoop/lib/ojdbc6.jar.
> > > > > > skipping.
> > > > > > > java.lang.NullPointerException
> > > > > > >
> > > > > > >
> > > > > > > - I tried adding `yarn classpath` and `hadoop classpath` to the
> > > > > > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST start
> > > > > > > script). I
> > > > > > > get this error:
> > > > > > >
> > > > > > > java.lang.ClassNotFoundException:
> > > > > > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > > > jaxrs.JacksonJaxbJsonProvider
> > > > > > >
> > > > > > >
> > > > > > > - I searched for the class in the previous attempt but could
> not
> > > find
> > > > > > it
> > > > > > > in full dev:
> > > > > > >
> > > > > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > > > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > > > 2>/dev/null
> > > > > > >
> > > > > > >
> > > > > > > - Further up in the stack trace I see the error happens when
> > > > > > initiating
> > > > > > > the org.apache.hadoop.yarn.util.timeline.TimelineUtils class.
> I
> > > > > > tried
> > > > > > > setting "yarn.timeline-service.enabled" in Ambari to false and
> > > then
> > > > I
> > > > > > > get
> > > > > > > this error:
> > > > > > >
> > > > > > > Unable to parse
> > > > > > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-
> > framework'
> > > > as
> > > > > a
> > > > > > > URI, check the setting for mapreduce.application.
> framework.path
> > > > > > >
> > > > > > >
> > > > > > > - I've tried adding different hadoop, hbase, yarn and mapreduce
> > > Maven
> > > > > > > dependencies without any success
> > > > > > > - hadoop-yarn-client
> > > > > > > - hadoop-yarn-common
> > > > > > > - hadoop-mapreduce-client-core
> > > > > > > - hadoop-yarn-server-common
> > > > > > > - hadoop-yarn-api
> > > > > > > - hbase-server
> > > > > > >
> > > > > > > I will keep exploring other possible solutions. Let me know if
> > > anyone
> > > > > > has
> > > > > > > any ideas.
> > > > > > >
> > > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
> > > ottobackwards@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I can imagine a new generic service(s) capability whose job (
> > pun
> > > > > > > intended
> > > > > > > > ) is to
> > > > > > > > abstract the submittal, tracking, and storage of results to
> > yarn.
> > > > > > > >
> > > > > > > > It would be extended with storage providers, queue provider,
> > > > > possibly
> > > > > > > some
> > > > > > > > set of policies or rather strategies.
> > > > > > > >
> > > > > > > > The pcap ‘report’ would be a client to that service, the
> > > > specializes
> > > > > > the
> > > > > > > > service operation for the way we want pcap to work.
> > > > > > > >
> > > > > > > > We can then re-use the generic service for other long running
> > > yarn
> > > > > > > > things…..
> > > > > > > >
> > > > > > > >
> > > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
> > ottobackwards@gmail.com
> > > )
> > > > > > wrote:
> > > > > > > >
> > > > > > > > RE: Tracking v. users
> > > > > > > >
> > > > > > > > The submittal and tracking can associate the submitter with
> the
> > > > yarn
> > > > > > job
> > > > > > > > and track that,
> > > > > > > > regardless of the yarn credentials.
> > > > > > > >
> > > > > > > > IE> if all submittals and monitoring are by the same yarn
> user
> > (
> > > > > > Metron )
> > > > > > > > from a single or
> > > > > > > > co-operative set of services, that service can maintain the
> > > > mapping.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (
> merrimanr@gmail.com
> > )
> > > > > wrote:
> > > > > > > >
> > > > > > > > Otto, your use case makes sense to me. We'll have to think
> > about
> > > > how
> > > > > to
> > > > > > > > manage the user to job relationships. I'm assuming YARN jobs
> > will
> > > > be
> > > > > > > > submitted as the metron service user so YARN won't keep track
> > of
> > > > > this
> > > > > > for
> > > > > > > > us. Is that assumption correct? Do you have any ideas for
> doing
> > > > > that?
> > > > > > > >
> > > > > > > > Mike, I can start a feature branch and experiment with
> merging
> > > > > > metron-api
> > > > > > > > into metron-rest. That should allow us to collaborate on any
> > > issues
> > > > > or
> > > > > > > > challenges. Also, can you expand on your idea to manage
> > external
> > > > > > > > dependencies as a special module? That seems like a very
> > > attractive
> > > > > > > option
> > > > > > > > to me.
> > > > > > > >
> > > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
> > > > ottobackwards@gmail.com>
> > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > From my response on the other thread, but applicable to the
> > > > > backend
> > > > > > > > stuff:
> > > > > > > > >
> > > > > > > > > "The PCAP Query seems more like PCAP Report to me. You are
> > > > > > generating a
> > > > > > > > > report based on parameters.
> > > > > > > > > That report is something that takes some time and external
> > > > process
> > > > > to
> > > > > > > > > generate… ie you have to wait for it.
> > > > > > > > >
> > > > > > > > > I can almost imagine a flow where you:
> > > > > > > > >
> > > > > > > > > * Are in the AlertUI
> > > > > > > > > * Ask to generate a PCAP report based on some selected
> > > > > > > alerts/meta-alert,
> > > > > > > > > possibly picking from on or more report ‘templates’
> > > > > > > > > that have query options etc
> > > > > > > > > * The report request is ‘queued’, that is dispatched to be
> be
> > > > > > > > > executed/generated
> > > > > > > > > * You as a user have a ‘queue’ of your report results, and
> > when
> > > > > the
> > > > > > > > report
> > > > > > > > > is done it is queued there
> > > > > > > > > * We ‘monitor’ the report/queue press through the yarn
> rest (
> > > > > report
> > > > > > > > > info/meta has the yarn details )
> > > > > > > > > * You can select the report from your queue and view it
> > either
> > > in
> > > > > a
> > > > > > new
> > > > > > > > UI
> > > > > > > > > or custom component
> > > > > > > > > * You can then apply a different ‘view’ to the report or
> work
> > > > with
> > > > > > the
> > > > > > > > > report data
> > > > > > > > > * You can print / save etc
> > > > > > > > > * You can associate the report with the alerts ( again in
> the
> > > > > report
> > > > > > > info
> > > > > > > > > ) with…. a ‘case’ or ‘ticket’ or investigation something or
> > > other
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > We can introduce extensibility into the report templates,
> > > report
> > > > > > views
> > > > > > > (
> > > > > > > > > thinks that work with the json data of the report )
> > > > > > > > >
> > > > > > > > > Something like that.”
> > > > > > > > >
> > > > > > > > > Maybe we can do :
> > > > > > > > >
> > > > > > > > > template -> query parameters -> script => yarn info
> > > > > > > > > yarn info + query info + alert context + yarn status =>
> > report
> > > > > info
> > > > > > ->
> > > > > > > > > stored in a user’s ‘report queue’
> > > > > > > > > report persistence added to report info
> > > > > > > > > metron-rest -> api to monitor the queue, read results (
> page
> > ),
> > > > > etc
> > > > > > etc
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (
> > merrimanr@gmail.com
> > > )
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > I started a separate thread on Pcap UI considerations and
> > user
> > > > > > > > > requirements
> > > > > > > > > at Otto's request. This should help us keep these two
> related
> > > but
> > > > > > > > separate
> > > > > > > > > discussions focused.
> > > > > > > > >
> > > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
> > > > > > michelsumbul@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hello,
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > (Youhouuu my first reply on this kind of mail chain^^)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > If I may, I would like to share my view on the following
> 3
> > > > > points.
> > > > > > > > > >
> > > > > > > > > > - Backend:
> > > > > > > > > >
> > > > > > > > > > The current metron-api is totally seperate, it will be
> > logic
> > > > for
> > > > > me
> > > > > > > to
> > > > > > > > > have
> > > > > > > > > > it at the same place as the others rest api. Especially
> > when
> > > > > more
> > > > > > > > > security
> > > > > > > > > > will be added, it will not be needed to do the job twice.
> > > > > > > > > > The current implementation send back a pcap object which
> > > still
> > > > > need
> > > > > > > to
> > > > > > > > > be
> > > > > > > > > > decoded. In the opensoc, the decoding was done with
> tshard
> > on
> > > > > the
> > > > > > > > > frontend.
> > > > > > > > > > It will be good to have this decoding happening directly
> on
> > > the
> > > > > > > backend
> > > > > > > > > to
> > > > > > > > > > not create a load on frontend. An option will be to
> install
> > > > > tshark
> > > > > > on
> > > > > > > > > the
> > > > > > > > > > rest server and to use to convert the pcap to xml and
> then
> > > to a
> > > > > > json
> > > > > > > > > that
> > > > > > > > > > will be send to the frontend.
> > > > > > > > > >
> > > > > > > > > > I tried to start directly the map/reduce job to search
> over
> > > all
> > > > > the
> > > > > > > > pcap
> > > > > > > > > > data from the rest server and as Ryan mention it, we had
> > > > > trouble. I
> > > > > > > > will
> > > > > > > > > > try to find back the error.
> > > > > > > > > >
> > > > > > > > > > Then in the POC, what we tried is to use the pcap_query
> > > script
> > > > > and
> > > > > > > this
> > > > > > > > > > work fine. I just modified it that he sends back directly
> > the
> > > > > > job_id
> > > > > > > of
> > > > > > > > > > yarn and not waiting that the job is finished. Then it
> will
> > > > > allow
> > > > > > the
> > > > > > > > UI
> > > > > > > > > > and the rest server to know what the status of the
> research
> > > by
> > > > > > > querying
> > > > > > > > > the
> > > > > > > > > > yarn rest api. This will allow the UI and the rest server
> > to
> > > be
> > > > > > async
> > > > > > > > > > without any blocking phase. What do you think about that?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Having the job submitted directly from the code of the
> rest
> > > > > server
> > > > > > > will
> > > > > > > > > be
> > > > > > > > > > perfect, but it will need a lot of investigation I think
> > (but
> > > > > I'm
> > > > > > not
> > > > > > > > > the
> > > > > > > > > > expert so I might be completely wrong ^^).
> > > > > > > > > >
> > > > > > > > > > We know that the pcap_query scritp work fine so why not
> > > calling
> > > > > it?
> > > > > > > Is
> > > > > > > > > it
> > > > > > > > > > that bad? (maybe stupid question, but I really don’t see
> a
> > > lot
> > > > > of
> > > > > > > > > drawback)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > - Front end:
> > > > > > > > > >
> > > > > > > > > > Adding the the pcap search to the alert UI is, I think,
> the
> > > > > easiest
> > > > > > > way
> > > > > > > > > to
> > > > > > > > > > move forward. But indeed, it will then be the “Alert UI
> and
> > > > > > > pcapquery”.
> > > > > > > > > > Maybe the name of the UI should just change to something
> > like
> > > > > > > > > “Monitoring &
> > > > > > > > > > Investigation UI” ?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Is there any roadmap or plan for the different UI? I mean
> > did
> > > > > you
> > > > > > > > > already
> > > > > > > > > > had discussion on how you see the ui evolving with the
> new
> > > > > feature
> > > > > > > that
> > > > > > > > > > will come in the future?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > - Microservices:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > What do you mean exactly by microservices? Is it to
> > separate
> > > > all
> > > > > > the
> > > > > > > > > > features in different projects? Or something like having
> > the
> > > > > > > different
> > > > > > > > > > components in container like kubernet? (again maybe
> stupid
> > > > > > question,
> > > > > > > > but
> > > > > > > > > I
> > > > > > > > > > don’t clearly understand what you mean J )
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Michel
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --
> > > simon elliston ball
> > > @sireb
> > >
> >
>

Re: [DISCUSS] Pcap panel architecture

Posted by Ryan Merriman <me...@gmail.com>.
Finally figured it out.  Commit is here:
https://github.com/merrimanr/incubator-metron/commit/22fe5e9ff3c167b42ebeb7a9f1000753a409aff1

It came down to figuring out the right combination of maven dependencies
and passing in the HDP version to REST as a Java system property.  I also
included some HDFS setup tasks.  I tested this in full dev and can now
successfully run a pcap query and get results.  All you should have to do
is generate some pcap data first.

On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic <
michael.miklavcic@gmail.com> wrote:

> @Ryan - pulled your branch and experimented with a few things. In doing so,
> it dawned on me that by adding the yarn and hadoop classpath, you probably
> didn't introduce a new classpath issue, rather you probably just moved onto
> the next classpath issue, ie hbase per your exception about hbase jaxb.
> Anyhow, I put up a branch with some pom changes worth trying in conjunction
> with invoking the rest app startup via "/usr/bin/yarn jar"
>
> https://github.com/mmiklavc/metron/tree/ryan-rest-test
>
> https://github.com/mmiklavc/metron/commit/5ca23580fc6e043fafae2327c80b65
> b20ca1c0c9
>
> Mike
>
>
> On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
> simon@simonellistonball.com> wrote:
>
> > That would be a step closer to something more like a micro-service
> > architecture. However, I would want to make sure we think about the
> > operational complexity, and mpack implications of having another server
> > installed and running somewhere on the cluster (also, ssl, kerberos, etc
> > etc requirements for that service).
> >
> > On 8 May 2018 at 14:27, Ryan Merriman <me...@gmail.com> wrote:
> >
> > > +1 to having metron-api as it's own service and using a gateway type
> > > pattern.
> > >
> > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <ot...@gmail.com>
> > > wrote:
> > >
> > > > Why not have metron-api as it’s own service and use a ‘gateway’ type
> > > > pattern in rest?
> > > >
> > > >
> > > > On May 8, 2018 at 08:45:33, Ryan Merriman (merrimanr@gmail.com)
> wrote:
> > > >
> > > > Moving the yarn classpath command earlier in the classpath now gives
> > this
> > > > error:
> > > >
> > > > Caused by: java.lang.NoSuchMethodError:
> > > > javax.servlet.ServletContext.getVirtualServerName()Ljava/
> lang/String;
> > > >
> > > > I will experiment with other combinations, I suspect we will need
> > > > finer-grain control over the order.
> > > >
> > > > The grep matches class names inside jar files. I use this all the
> time
> > > and
> > > > it's really useful.
> > > >
> > > > The metron-rest jar is already shaded.
> > > >
> > > > Reverse engineering the yarn jar command was the next thing I was
> going
> > > to
> > > > try. Will let you know how it goes.
> > > >
> > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > > > michael.miklavcic@gmail.com> wrote:
> > > >
> > > > > What order did you add the hadoop or yarn classpath? The "shaded"
> > > > package
> > > > > stands out to me in this name "org.apache.hadoop.hbase.*shaded*
> > > > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider." Maybe try
> > adding
> > > > > those packages earlier on the classpath.
> > > > >
> > > > > I think that find command needs a "jar tvf", otherwise you're
> looking
> > > > for a
> > > > > class name in jar file names.
> > > > >
> > > > > Have you tried shading the rest jar?
> > > > >
> > > > > I'd also look at the classpath you get when running "yarn jar" to
> > start
> > > > the
> > > > > existing pcap service, per the instructions in
> metron-api/README.md.
> > > > >
> > > > >
> > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <merrimanr@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > To explore the idea of merging metron-api into metron-rest and
> > > running
> > > > > pcap
> > > > > > queries inside our REST application, I created a simple test
> here:
> > > > > > https://github.com/merrimanr/incubator-metron/tree/pcap-
> rest-test.
> > A
> > > > > > summary of what's included:
> > > > > >
> > > > > > - Added pcap as a dependency in the metron-rest pom.xml
> > > > > > - Added a pcap query controller endpoint at
> > > > > > http://node1:8082/swagger-ui.html#!/pcap-query-controller/
> > > > > queryUsingGET
> > > > > > - Added a pcap query service that runs a simple, hardcoded query
> > > > > >
> > > > > > Generate some pcap data using pycapa (
> > > > > > https://github.com/apache/metron/tree/master/metron-
> sensors/pycapa
> > )
> > > > and
> > > > > > the
> > > > > > pcap topology (
> > > > > > https://github.com/apache/metron/tree/master/metron-
> > > > > > platform/metron-pcap-backend#starting-the-topology).
> > > > > > After this initial setup there should be data in HDFS at
> > > > > > "/apps/metron/pcap". I believe this should be enough to exercise
> > the
> > > > > > issue. Just hit the endpoint referenced above. I tested this in
> an
> > > > > > already running full dev by building and deploying the
> metron-rest
> > > > jar.
> > > > > I
> > > > > > did not rebuild full dev with this change but I would still
> expect
> > it
> > > > to
> > > > > > work. Let me know if it doesn't.
> > > > > >
> > > > > > The first error I see when I hit this endpoint is:
> > > > > >
> > > > > > java.lang.NoClassDefFoundError:
> > > > > > org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.
> > > > > >
> > > > > > Here are the things I've tried so far:
> > > > > >
> > > > > > - Run the REST application with the YARN jar command since this
> is
> > > how
> > > > > > all our other YARN/MR-related applications are started
> (metron-api,
> > > > > > MAAS,
> > > > > > pcap query, etc). I wouldn't expect this to work since we have
> > > > > runtime
> > > > > > dependencies on our shaded elasticsearch and parser jars and I'm
> > not
> > > > > > aware
> > > > > > of a way to add additional jars to the classpath with the YARN
> jar
> > > > > > command
> > > > > > (is there a way?). Either way I get this error:
> > > > > >
> > > > > > 18/05/04 19:49:56 WARN reflections.Reflections: could not create
> > Dir
> > > > > using
> > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/hadoop/lib/ojdbc6.jar.
> > > > > skipping.
> > > > > > java.lang.NullPointerException
> > > > > >
> > > > > >
> > > > > > - I tried adding `yarn classpath` and `hadoop classpath` to the
> > > > > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST start
> > > > > > script). I
> > > > > > get this error:
> > > > > >
> > > > > > java.lang.ClassNotFoundException:
> > > > > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > > jaxrs.JacksonJaxbJsonProvider
> > > > > >
> > > > > >
> > > > > > - I searched for the class in the previous attempt but could not
> > find
> > > > > it
> > > > > > in full dev:
> > > > > >
> > > > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > > 2>/dev/null
> > > > > >
> > > > > >
> > > > > > - Further up in the stack trace I see the error happens when
> > > > > initiating
> > > > > > the org.apache.hadoop.yarn.util.timeline.TimelineUtils class. I
> > > > > tried
> > > > > > setting "yarn.timeline-service.enabled" in Ambari to false and
> > then
> > > I
> > > > > > get
> > > > > > this error:
> > > > > >
> > > > > > Unable to parse
> > > > > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-
> framework'
> > > as
> > > > a
> > > > > > URI, check the setting for mapreduce.application.framework.path
> > > > > >
> > > > > >
> > > > > > - I've tried adding different hadoop, hbase, yarn and mapreduce
> > Maven
> > > > > > dependencies without any success
> > > > > > - hadoop-yarn-client
> > > > > > - hadoop-yarn-common
> > > > > > - hadoop-mapreduce-client-core
> > > > > > - hadoop-yarn-server-common
> > > > > > - hadoop-yarn-api
> > > > > > - hbase-server
> > > > > >
> > > > > > I will keep exploring other possible solutions. Let me know if
> > anyone
> > > > > has
> > > > > > any ideas.
> > > > > >
> > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
> > ottobackwards@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I can imagine a new generic service(s) capability whose job (
> pun
> > > > > > intended
> > > > > > > ) is to
> > > > > > > abstract the submittal, tracking, and storage of results to
> yarn.
> > > > > > >
> > > > > > > It would be extended with storage providers, queue provider,
> > > > possibly
> > > > > > some
> > > > > > > set of policies or rather strategies.
> > > > > > >
> > > > > > > The pcap ‘report’ would be a client to that service, the
> > > specializes
> > > > > the
> > > > > > > service operation for the way we want pcap to work.
> > > > > > >
> > > > > > > We can then re-use the generic service for other long running
> > yarn
> > > > > > > things…..
> > > > > > >
> > > > > > >
> > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (
> ottobackwards@gmail.com
> > )
> > > > > wrote:
> > > > > > >
> > > > > > > RE: Tracking v. users
> > > > > > >
> > > > > > > The submittal and tracking can associate the submitter with the
> > > yarn
> > > > > job
> > > > > > > and track that,
> > > > > > > regardless of the yarn credentials.
> > > > > > >
> > > > > > > IE> if all submittals and monitoring are by the same yarn user
> (
> > > > > Metron )
> > > > > > > from a single or
> > > > > > > co-operative set of services, that service can maintain the
> > > mapping.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (merrimanr@gmail.com
> )
> > > > wrote:
> > > > > > >
> > > > > > > Otto, your use case makes sense to me. We'll have to think
> about
> > > how
> > > > to
> > > > > > > manage the user to job relationships. I'm assuming YARN jobs
> will
> > > be
> > > > > > > submitted as the metron service user so YARN won't keep track
> of
> > > > this
> > > > > for
> > > > > > > us. Is that assumption correct? Do you have any ideas for doing
> > > > that?
> > > > > > >
> > > > > > > Mike, I can start a feature branch and experiment with merging
> > > > > metron-api
> > > > > > > into metron-rest. That should allow us to collaborate on any
> > issues
> > > > or
> > > > > > > challenges. Also, can you expand on your idea to manage
> external
> > > > > > > dependencies as a special module? That seems like a very
> > attractive
> > > > > > option
> > > > > > > to me.
> > > > > > >
> > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
> > > ottobackwards@gmail.com>
> > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > From my response on the other thread, but applicable to the
> > > > backend
> > > > > > > stuff:
> > > > > > > >
> > > > > > > > "The PCAP Query seems more like PCAP Report to me. You are
> > > > > generating a
> > > > > > > > report based on parameters.
> > > > > > > > That report is something that takes some time and external
> > > process
> > > > to
> > > > > > > > generate… ie you have to wait for it.
> > > > > > > >
> > > > > > > > I can almost imagine a flow where you:
> > > > > > > >
> > > > > > > > * Are in the AlertUI
> > > > > > > > * Ask to generate a PCAP report based on some selected
> > > > > > alerts/meta-alert,
> > > > > > > > possibly picking from on or more report ‘templates’
> > > > > > > > that have query options etc
> > > > > > > > * The report request is ‘queued’, that is dispatched to be be
> > > > > > > > executed/generated
> > > > > > > > * You as a user have a ‘queue’ of your report results, and
> when
> > > > the
> > > > > > > report
> > > > > > > > is done it is queued there
> > > > > > > > * We ‘monitor’ the report/queue press through the yarn rest (
> > > > report
> > > > > > > > info/meta has the yarn details )
> > > > > > > > * You can select the report from your queue and view it
> either
> > in
> > > > a
> > > > > new
> > > > > > > UI
> > > > > > > > or custom component
> > > > > > > > * You can then apply a different ‘view’ to the report or work
> > > with
> > > > > the
> > > > > > > > report data
> > > > > > > > * You can print / save etc
> > > > > > > > * You can associate the report with the alerts ( again in the
> > > > report
> > > > > > info
> > > > > > > > ) with…. a ‘case’ or ‘ticket’ or investigation something or
> > other
> > > > > > > >
> > > > > > > >
> > > > > > > > We can introduce extensibility into the report templates,
> > report
> > > > > views
> > > > > > (
> > > > > > > > thinks that work with the json data of the report )
> > > > > > > >
> > > > > > > > Something like that.”
> > > > > > > >
> > > > > > > > Maybe we can do :
> > > > > > > >
> > > > > > > > template -> query parameters -> script => yarn info
> > > > > > > > yarn info + query info + alert context + yarn status =>
> report
> > > > info
> > > > > ->
> > > > > > > > stored in a user’s ‘report queue’
> > > > > > > > report persistence added to report info
> > > > > > > > metron-rest -> api to monitor the queue, read results ( page
> ),
> > > > etc
> > > > > etc
> > > > > > > >
> > > > > > > >
> > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (
> merrimanr@gmail.com
> > )
> > > > > wrote:
> > > > > > > >
> > > > > > > > I started a separate thread on Pcap UI considerations and
> user
> > > > > > > > requirements
> > > > > > > > at Otto's request. This should help us keep these two related
> > but
> > > > > > > separate
> > > > > > > > discussions focused.
> > > > > > > >
> > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
> > > > > michelsumbul@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > (Youhouuu my first reply on this kind of mail chain^^)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > If I may, I would like to share my view on the following 3
> > > > points.
> > > > > > > > >
> > > > > > > > > - Backend:
> > > > > > > > >
> > > > > > > > > The current metron-api is totally seperate, it will be
> logic
> > > for
> > > > me
> > > > > > to
> > > > > > > > have
> > > > > > > > > it at the same place as the others rest api. Especially
> when
> > > > more
> > > > > > > > security
> > > > > > > > > will be added, it will not be needed to do the job twice.
> > > > > > > > > The current implementation send back a pcap object which
> > still
> > > > need
> > > > > > to
> > > > > > > > be
> > > > > > > > > decoded. In the opensoc, the decoding was done with tshard
> on
> > > > the
> > > > > > > > frontend.
> > > > > > > > > It will be good to have this decoding happening directly on
> > the
> > > > > > backend
> > > > > > > > to
> > > > > > > > > not create a load on frontend. An option will be to install
> > > > tshark
> > > > > on
> > > > > > > > the
> > > > > > > > > rest server and to use to convert the pcap to xml and then
> > to a
> > > > > json
> > > > > > > > that
> > > > > > > > > will be send to the frontend.
> > > > > > > > >
> > > > > > > > > I tried to start directly the map/reduce job to search over
> > all
> > > > the
> > > > > > > pcap
> > > > > > > > > data from the rest server and as Ryan mention it, we had
> > > > trouble. I
> > > > > > > will
> > > > > > > > > try to find back the error.
> > > > > > > > >
> > > > > > > > > Then in the POC, what we tried is to use the pcap_query
> > script
> > > > and
> > > > > > this
> > > > > > > > > work fine. I just modified it that he sends back directly
> the
> > > > > job_id
> > > > > > of
> > > > > > > > > yarn and not waiting that the job is finished. Then it will
> > > > allow
> > > > > the
> > > > > > > UI
> > > > > > > > > and the rest server to know what the status of the research
> > by
> > > > > > querying
> > > > > > > > the
> > > > > > > > > yarn rest api. This will allow the UI and the rest server
> to
> > be
> > > > > async
> > > > > > > > > without any blocking phase. What do you think about that?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Having the job submitted directly from the code of the rest
> > > > server
> > > > > > will
> > > > > > > > be
> > > > > > > > > perfect, but it will need a lot of investigation I think
> (but
> > > > I'm
> > > > > not
> > > > > > > > the
> > > > > > > > > expert so I might be completely wrong ^^).
> > > > > > > > >
> > > > > > > > > We know that the pcap_query scritp work fine so why not
> > calling
> > > > it?
> > > > > > Is
> > > > > > > > it
> > > > > > > > > that bad? (maybe stupid question, but I really don’t see a
> > lot
> > > > of
> > > > > > > > drawback)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > - Front end:
> > > > > > > > >
> > > > > > > > > Adding the the pcap search to the alert UI is, I think, the
> > > > easiest
> > > > > > way
> > > > > > > > to
> > > > > > > > > move forward. But indeed, it will then be the “Alert UI and
> > > > > > pcapquery”.
> > > > > > > > > Maybe the name of the UI should just change to something
> like
> > > > > > > > “Monitoring &
> > > > > > > > > Investigation UI” ?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Is there any roadmap or plan for the different UI? I mean
> did
> > > > you
> > > > > > > > already
> > > > > > > > > had discussion on how you see the ui evolving with the new
> > > > feature
> > > > > > that
> > > > > > > > > will come in the future?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > - Microservices:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > What do you mean exactly by microservices? Is it to
> separate
> > > all
> > > > > the
> > > > > > > > > features in different projects? Or something like having
> the
> > > > > > different
> > > > > > > > > components in container like kubernet? (again maybe stupid
> > > > > question,
> > > > > > > but
> > > > > > > > I
> > > > > > > > > don’t clearly understand what you mean J )
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Michel
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > --
> > simon elliston ball
> > @sireb
> >
>

Re: [DISCUSS] Pcap panel architecture

Posted by Michael Miklavcic <mi...@gmail.com>.
@Ryan - pulled your branch and experimented with a few things. In doing so,
it dawned on me that by adding the yarn and hadoop classpath, you probably
didn't introduce a new classpath issue, rather you probably just moved onto
the next classpath issue, ie hbase per your exception about hbase jaxb.
Anyhow, I put up a branch with some pom changes worth trying in conjunction
with invoking the rest app startup via "/usr/bin/yarn jar"

https://github.com/mmiklavc/metron/tree/ryan-rest-test

https://github.com/mmiklavc/metron/commit/5ca23580fc6e043fafae2327c80b65b20ca1c0c9

Mike


On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball <
simon@simonellistonball.com> wrote:

> That would be a step closer to something more like a micro-service
> architecture. However, I would want to make sure we think about the
> operational complexity, and mpack implications of having another server
> installed and running somewhere on the cluster (also, ssl, kerberos, etc
> etc requirements for that service).
>
> On 8 May 2018 at 14:27, Ryan Merriman <me...@gmail.com> wrote:
>
> > +1 to having metron-api as it's own service and using a gateway type
> > pattern.
> >
> > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <ot...@gmail.com>
> > wrote:
> >
> > > Why not have metron-api as it’s own service and use a ‘gateway’ type
> > > pattern in rest?
> > >
> > >
> > > On May 8, 2018 at 08:45:33, Ryan Merriman (merrimanr@gmail.com) wrote:
> > >
> > > Moving the yarn classpath command earlier in the classpath now gives
> this
> > > error:
> > >
> > > Caused by: java.lang.NoSuchMethodError:
> > > javax.servlet.ServletContext.getVirtualServerName()Ljava/lang/String;
> > >
> > > I will experiment with other combinations, I suspect we will need
> > > finer-grain control over the order.
> > >
> > > The grep matches class names inside jar files. I use this all the time
> > and
> > > it's really useful.
> > >
> > > The metron-rest jar is already shaded.
> > >
> > > Reverse engineering the yarn jar command was the next thing I was going
> > to
> > > try. Will let you know how it goes.
> > >
> > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > > michael.miklavcic@gmail.com> wrote:
> > >
> > > > What order did you add the hadoop or yarn classpath? The "shaded"
> > > package
> > > > stands out to me in this name "org.apache.hadoop.hbase.*shaded*
> > > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider." Maybe try
> adding
> > > > those packages earlier on the classpath.
> > > >
> > > > I think that find command needs a "jar tvf", otherwise you're looking
> > > for a
> > > > class name in jar file names.
> > > >
> > > > Have you tried shading the rest jar?
> > > >
> > > > I'd also look at the classpath you get when running "yarn jar" to
> start
> > > the
> > > > existing pcap service, per the instructions in metron-api/README.md.
> > > >
> > > >
> > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <me...@gmail.com>
> > > wrote:
> > > >
> > > > > To explore the idea of merging metron-api into metron-rest and
> > running
> > > > pcap
> > > > > queries inside our REST application, I created a simple test here:
> > > > > https://github.com/merrimanr/incubator-metron/tree/pcap-rest-test.
> A
> > > > > summary of what's included:
> > > > >
> > > > > - Added pcap as a dependency in the metron-rest pom.xml
> > > > > - Added a pcap query controller endpoint at
> > > > > http://node1:8082/swagger-ui.html#!/pcap-query-controller/
> > > > queryUsingGET
> > > > > - Added a pcap query service that runs a simple, hardcoded query
> > > > >
> > > > > Generate some pcap data using pycapa (
> > > > > https://github.com/apache/metron/tree/master/metron-sensors/pycapa
> )
> > > and
> > > > > the
> > > > > pcap topology (
> > > > > https://github.com/apache/metron/tree/master/metron-
> > > > > platform/metron-pcap-backend#starting-the-topology).
> > > > > After this initial setup there should be data in HDFS at
> > > > > "/apps/metron/pcap". I believe this should be enough to exercise
> the
> > > > > issue. Just hit the endpoint referenced above. I tested this in an
> > > > > already running full dev by building and deploying the metron-rest
> > > jar.
> > > > I
> > > > > did not rebuild full dev with this change but I would still expect
> it
> > > to
> > > > > work. Let me know if it doesn't.
> > > > >
> > > > > The first error I see when I hit this endpoint is:
> > > > >
> > > > > java.lang.NoClassDefFoundError:
> > > > > org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.
> > > > >
> > > > > Here are the things I've tried so far:
> > > > >
> > > > > - Run the REST application with the YARN jar command since this is
> > how
> > > > > all our other YARN/MR-related applications are started (metron-api,
> > > > > MAAS,
> > > > > pcap query, etc). I wouldn't expect this to work since we have
> > > > runtime
> > > > > dependencies on our shaded elasticsearch and parser jars and I'm
> not
> > > > > aware
> > > > > of a way to add additional jars to the classpath with the YARN jar
> > > > > command
> > > > > (is there a way?). Either way I get this error:
> > > > >
> > > > > 18/05/04 19:49:56 WARN reflections.Reflections: could not create
> Dir
> > > > using
> > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/hadoop/lib/ojdbc6.jar.
> > > > skipping.
> > > > > java.lang.NullPointerException
> > > > >
> > > > >
> > > > > - I tried adding `yarn classpath` and `hadoop classpath` to the
> > > > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST start
> > > > > script). I
> > > > > get this error:
> > > > >
> > > > > java.lang.ClassNotFoundException:
> > > > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > > jaxrs.JacksonJaxbJsonProvider
> > > > >
> > > > >
> > > > > - I searched for the class in the previous attempt but could not
> find
> > > > it
> > > > > in full dev:
> > > > >
> > > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > > jaxrs/JacksonJaxbJsonProvider
> > > > > 2>/dev/null
> > > > >
> > > > >
> > > > > - Further up in the stack trace I see the error happens when
> > > > initiating
> > > > > the org.apache.hadoop.yarn.util.timeline.TimelineUtils class. I
> > > > tried
> > > > > setting "yarn.timeline-service.enabled" in Ambari to false and
> then
> > I
> > > > > get
> > > > > this error:
> > > > >
> > > > > Unable to parse
> > > > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework'
> > as
> > > a
> > > > > URI, check the setting for mapreduce.application.framework.path
> > > > >
> > > > >
> > > > > - I've tried adding different hadoop, hbase, yarn and mapreduce
> Maven
> > > > > dependencies without any success
> > > > > - hadoop-yarn-client
> > > > > - hadoop-yarn-common
> > > > > - hadoop-mapreduce-client-core
> > > > > - hadoop-yarn-server-common
> > > > > - hadoop-yarn-api
> > > > > - hbase-server
> > > > >
> > > > > I will keep exploring other possible solutions. Let me know if
> anyone
> > > > has
> > > > > any ideas.
> > > > >
> > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <
> ottobackwards@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > I can imagine a new generic service(s) capability whose job ( pun
> > > > > intended
> > > > > > ) is to
> > > > > > abstract the submittal, tracking, and storage of results to yarn.
> > > > > >
> > > > > > It would be extended with storage providers, queue provider,
> > > possibly
> > > > > some
> > > > > > set of policies or rather strategies.
> > > > > >
> > > > > > The pcap ‘report’ would be a client to that service, the
> > specializes
> > > > the
> > > > > > service operation for the way we want pcap to work.
> > > > > >
> > > > > > We can then re-use the generic service for other long running
> yarn
> > > > > > things…..
> > > > > >
> > > > > >
> > > > > > On May 7, 2018 at 09:56:51, Otto Fowler (ottobackwards@gmail.com
> )
> > > > wrote:
> > > > > >
> > > > > > RE: Tracking v. users
> > > > > >
> > > > > > The submittal and tracking can associate the submitter with the
> > yarn
> > > > job
> > > > > > and track that,
> > > > > > regardless of the yarn credentials.
> > > > > >
> > > > > > IE> if all submittals and monitoring are by the same yarn user (
> > > > Metron )
> > > > > > from a single or
> > > > > > co-operative set of services, that service can maintain the
> > mapping.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (merrimanr@gmail.com)
> > > wrote:
> > > > > >
> > > > > > Otto, your use case makes sense to me. We'll have to think about
> > how
> > > to
> > > > > > manage the user to job relationships. I'm assuming YARN jobs will
> > be
> > > > > > submitted as the metron service user so YARN won't keep track of
> > > this
> > > > for
> > > > > > us. Is that assumption correct? Do you have any ideas for doing
> > > that?
> > > > > >
> > > > > > Mike, I can start a feature branch and experiment with merging
> > > > metron-api
> > > > > > into metron-rest. That should allow us to collaborate on any
> issues
> > > or
> > > > > > challenges. Also, can you expand on your idea to manage external
> > > > > > dependencies as a special module? That seems like a very
> attractive
> > > > > option
> > > > > > to me.
> > > > > >
> > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
> > ottobackwards@gmail.com>
> > >
> > > > > > wrote:
> > > > > >
> > > > > > > From my response on the other thread, but applicable to the
> > > backend
> > > > > > stuff:
> > > > > > >
> > > > > > > "The PCAP Query seems more like PCAP Report to me. You are
> > > > generating a
> > > > > > > report based on parameters.
> > > > > > > That report is something that takes some time and external
> > process
> > > to
> > > > > > > generate… ie you have to wait for it.
> > > > > > >
> > > > > > > I can almost imagine a flow where you:
> > > > > > >
> > > > > > > * Are in the AlertUI
> > > > > > > * Ask to generate a PCAP report based on some selected
> > > > > alerts/meta-alert,
> > > > > > > possibly picking from on or more report ‘templates’
> > > > > > > that have query options etc
> > > > > > > * The report request is ‘queued’, that is dispatched to be be
> > > > > > > executed/generated
> > > > > > > * You as a user have a ‘queue’ of your report results, and when
> > > the
> > > > > > report
> > > > > > > is done it is queued there
> > > > > > > * We ‘monitor’ the report/queue press through the yarn rest (
> > > report
> > > > > > > info/meta has the yarn details )
> > > > > > > * You can select the report from your queue and view it either
> in
> > > a
> > > > new
> > > > > > UI
> > > > > > > or custom component
> > > > > > > * You can then apply a different ‘view’ to the report or work
> > with
> > > > the
> > > > > > > report data
> > > > > > > * You can print / save etc
> > > > > > > * You can associate the report with the alerts ( again in the
> > > report
> > > > > info
> > > > > > > ) with…. a ‘case’ or ‘ticket’ or investigation something or
> other
> > > > > > >
> > > > > > >
> > > > > > > We can introduce extensibility into the report templates,
> report
> > > > views
> > > > > (
> > > > > > > thinks that work with the json data of the report )
> > > > > > >
> > > > > > > Something like that.”
> > > > > > >
> > > > > > > Maybe we can do :
> > > > > > >
> > > > > > > template -> query parameters -> script => yarn info
> > > > > > > yarn info + query info + alert context + yarn status => report
> > > info
> > > > ->
> > > > > > > stored in a user’s ‘report queue’
> > > > > > > report persistence added to report info
> > > > > > > metron-rest -> api to monitor the queue, read results ( page ),
> > > etc
> > > > etc
> > > > > > >
> > > > > > >
> > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (merrimanr@gmail.com
> )
> > > > wrote:
> > > > > > >
> > > > > > > I started a separate thread on Pcap UI considerations and user
> > > > > > > requirements
> > > > > > > at Otto's request. This should help us keep these two related
> but
> > > > > > separate
> > > > > > > discussions focused.
> > > > > > >
> > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
> > > > michelsumbul@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > (Youhouuu my first reply on this kind of mail chain^^)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > If I may, I would like to share my view on the following 3
> > > points.
> > > > > > > >
> > > > > > > > - Backend:
> > > > > > > >
> > > > > > > > The current metron-api is totally seperate, it will be logic
> > for
> > > me
> > > > > to
> > > > > > > have
> > > > > > > > it at the same place as the others rest api. Especially when
> > > more
> > > > > > > security
> > > > > > > > will be added, it will not be needed to do the job twice.
> > > > > > > > The current implementation send back a pcap object which
> still
> > > need
> > > > > to
> > > > > > > be
> > > > > > > > decoded. In the opensoc, the decoding was done with tshard on
> > > the
> > > > > > > frontend.
> > > > > > > > It will be good to have this decoding happening directly on
> the
> > > > > backend
> > > > > > > to
> > > > > > > > not create a load on frontend. An option will be to install
> > > tshark
> > > > on
> > > > > > > the
> > > > > > > > rest server and to use to convert the pcap to xml and then
> to a
> > > > json
> > > > > > > that
> > > > > > > > will be send to the frontend.
> > > > > > > >
> > > > > > > > I tried to start directly the map/reduce job to search over
> all
> > > the
> > > > > > pcap
> > > > > > > > data from the rest server and as Ryan mention it, we had
> > > trouble. I
> > > > > > will
> > > > > > > > try to find back the error.
> > > > > > > >
> > > > > > > > Then in the POC, what we tried is to use the pcap_query
> script
> > > and
> > > > > this
> > > > > > > > work fine. I just modified it that he sends back directly the
> > > > job_id
> > > > > of
> > > > > > > > yarn and not waiting that the job is finished. Then it will
> > > allow
> > > > the
> > > > > > UI
> > > > > > > > and the rest server to know what the status of the research
> by
> > > > > querying
> > > > > > > the
> > > > > > > > yarn rest api. This will allow the UI and the rest server to
> be
> > > > async
> > > > > > > > without any blocking phase. What do you think about that?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Having the job submitted directly from the code of the rest
> > > server
> > > > > will
> > > > > > > be
> > > > > > > > perfect, but it will need a lot of investigation I think (but
> > > I'm
> > > > not
> > > > > > > the
> > > > > > > > expert so I might be completely wrong ^^).
> > > > > > > >
> > > > > > > > We know that the pcap_query scritp work fine so why not
> calling
> > > it?
> > > > > Is
> > > > > > > it
> > > > > > > > that bad? (maybe stupid question, but I really don’t see a
> lot
> > > of
> > > > > > > drawback)
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > - Front end:
> > > > > > > >
> > > > > > > > Adding the the pcap search to the alert UI is, I think, the
> > > easiest
> > > > > way
> > > > > > > to
> > > > > > > > move forward. But indeed, it will then be the “Alert UI and
> > > > > pcapquery”.
> > > > > > > > Maybe the name of the UI should just change to something like
> > > > > > > “Monitoring &
> > > > > > > > Investigation UI” ?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Is there any roadmap or plan for the different UI? I mean did
> > > you
> > > > > > > already
> > > > > > > > had discussion on how you see the ui evolving with the new
> > > feature
> > > > > that
> > > > > > > > will come in the future?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > - Microservices:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > What do you mean exactly by microservices? Is it to separate
> > all
> > > > the
> > > > > > > > features in different projects? Or something like having the
> > > > > different
> > > > > > > > components in container like kubernet? (again maybe stupid
> > > > question,
> > > > > > but
> > > > > > > I
> > > > > > > > don’t clearly understand what you mean J )
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Michel
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
>
>
>
> --
> --
> simon elliston ball
> @sireb
>

Re: [DISCUSS] Pcap panel architecture

Posted by Simon Elliston Ball <si...@simonellistonball.com>.
That would be a step closer to something more like a micro-service
architecture. However, I would want to make sure we think about the
operational complexity, and mpack implications of having another server
installed and running somewhere on the cluster (also, ssl, kerberos, etc
etc requirements for that service).

On 8 May 2018 at 14:27, Ryan Merriman <me...@gmail.com> wrote:

> +1 to having metron-api as it's own service and using a gateway type
> pattern.
>
> On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <ot...@gmail.com>
> wrote:
>
> > Why not have metron-api as it’s own service and use a ‘gateway’ type
> > pattern in rest?
> >
> >
> > On May 8, 2018 at 08:45:33, Ryan Merriman (merrimanr@gmail.com) wrote:
> >
> > Moving the yarn classpath command earlier in the classpath now gives this
> > error:
> >
> > Caused by: java.lang.NoSuchMethodError:
> > javax.servlet.ServletContext.getVirtualServerName()Ljava/lang/String;
> >
> > I will experiment with other combinations, I suspect we will need
> > finer-grain control over the order.
> >
> > The grep matches class names inside jar files. I use this all the time
> and
> > it's really useful.
> >
> > The metron-rest jar is already shaded.
> >
> > Reverse engineering the yarn jar command was the next thing I was going
> to
> > try. Will let you know how it goes.
> >
> > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> > michael.miklavcic@gmail.com> wrote:
> >
> > > What order did you add the hadoop or yarn classpath? The "shaded"
> > package
> > > stands out to me in this name "org.apache.hadoop.hbase.*shaded*
> > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider." Maybe try adding
> > > those packages earlier on the classpath.
> > >
> > > I think that find command needs a "jar tvf", otherwise you're looking
> > for a
> > > class name in jar file names.
> > >
> > > Have you tried shading the rest jar?
> > >
> > > I'd also look at the classpath you get when running "yarn jar" to start
> > the
> > > existing pcap service, per the instructions in metron-api/README.md.
> > >
> > >
> > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <me...@gmail.com>
> > wrote:
> > >
> > > > To explore the idea of merging metron-api into metron-rest and
> running
> > > pcap
> > > > queries inside our REST application, I created a simple test here:
> > > > https://github.com/merrimanr/incubator-metron/tree/pcap-rest-test. A
> > > > summary of what's included:
> > > >
> > > > - Added pcap as a dependency in the metron-rest pom.xml
> > > > - Added a pcap query controller endpoint at
> > > > http://node1:8082/swagger-ui.html#!/pcap-query-controller/
> > > queryUsingGET
> > > > - Added a pcap query service that runs a simple, hardcoded query
> > > >
> > > > Generate some pcap data using pycapa (
> > > > https://github.com/apache/metron/tree/master/metron-sensors/pycapa)
> > and
> > > > the
> > > > pcap topology (
> > > > https://github.com/apache/metron/tree/master/metron-
> > > > platform/metron-pcap-backend#starting-the-topology).
> > > > After this initial setup there should be data in HDFS at
> > > > "/apps/metron/pcap". I believe this should be enough to exercise the
> > > > issue. Just hit the endpoint referenced above. I tested this in an
> > > > already running full dev by building and deploying the metron-rest
> > jar.
> > > I
> > > > did not rebuild full dev with this change but I would still expect it
> > to
> > > > work. Let me know if it doesn't.
> > > >
> > > > The first error I see when I hit this endpoint is:
> > > >
> > > > java.lang.NoClassDefFoundError:
> > > > org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.
> > > >
> > > > Here are the things I've tried so far:
> > > >
> > > > - Run the REST application with the YARN jar command since this is
> how
> > > > all our other YARN/MR-related applications are started (metron-api,
> > > > MAAS,
> > > > pcap query, etc). I wouldn't expect this to work since we have
> > > runtime
> > > > dependencies on our shaded elasticsearch and parser jars and I'm not
> > > > aware
> > > > of a way to add additional jars to the classpath with the YARN jar
> > > > command
> > > > (is there a way?). Either way I get this error:
> > > >
> > > > 18/05/04 19:49:56 WARN reflections.Reflections: could not create Dir
> > > using
> > > > jarFile from url file:/usr/hdp/2.6.4.0-91/hadoop/lib/ojdbc6.jar.
> > > skipping.
> > > > java.lang.NullPointerException
> > > >
> > > >
> > > > - I tried adding `yarn classpath` and `hadoop classpath` to the
> > > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST start
> > > > script). I
> > > > get this error:
> > > >
> > > > java.lang.ClassNotFoundException:
> > > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > > jaxrs.JacksonJaxbJsonProvider
> > > >
> > > >
> > > > - I searched for the class in the previous attempt but could not find
> > > it
> > > > in full dev:
> > > >
> > > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > > jaxrs/JacksonJaxbJsonProvider
> > > > 2>/dev/null
> > > >
> > > >
> > > > - Further up in the stack trace I see the error happens when
> > > initiating
> > > > the org.apache.hadoop.yarn.util.timeline.TimelineUtils class. I
> > > tried
> > > > setting "yarn.timeline-service.enabled" in Ambari to false and then
> I
> > > > get
> > > > this error:
> > > >
> > > > Unable to parse
> > > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework'
> as
> > a
> > > > URI, check the setting for mapreduce.application.framework.path
> > > >
> > > >
> > > > - I've tried adding different hadoop, hbase, yarn and mapreduce Maven
> > > > dependencies without any success
> > > > - hadoop-yarn-client
> > > > - hadoop-yarn-common
> > > > - hadoop-mapreduce-client-core
> > > > - hadoop-yarn-server-common
> > > > - hadoop-yarn-api
> > > > - hbase-server
> > > >
> > > > I will keep exploring other possible solutions. Let me know if anyone
> > > has
> > > > any ideas.
> > > >
> > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <ottobackwards@gmail.com
> >
> > > > wrote:
> > > >
> > > > > I can imagine a new generic service(s) capability whose job ( pun
> > > > intended
> > > > > ) is to
> > > > > abstract the submittal, tracking, and storage of results to yarn.
> > > > >
> > > > > It would be extended with storage providers, queue provider,
> > possibly
> > > > some
> > > > > set of policies or rather strategies.
> > > > >
> > > > > The pcap ‘report’ would be a client to that service, the
> specializes
> > > the
> > > > > service operation for the way we want pcap to work.
> > > > >
> > > > > We can then re-use the generic service for other long running yarn
> > > > > things…..
> > > > >
> > > > >
> > > > > On May 7, 2018 at 09:56:51, Otto Fowler (ottobackwards@gmail.com)
> > > wrote:
> > > > >
> > > > > RE: Tracking v. users
> > > > >
> > > > > The submittal and tracking can associate the submitter with the
> yarn
> > > job
> > > > > and track that,
> > > > > regardless of the yarn credentials.
> > > > >
> > > > > IE> if all submittals and monitoring are by the same yarn user (
> > > Metron )
> > > > > from a single or
> > > > > co-operative set of services, that service can maintain the
> mapping.
> > > > >
> > > > >
> > > > >
> > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (merrimanr@gmail.com)
> > wrote:
> > > > >
> > > > > Otto, your use case makes sense to me. We'll have to think about
> how
> > to
> > > > > manage the user to job relationships. I'm assuming YARN jobs will
> be
> > > > > submitted as the metron service user so YARN won't keep track of
> > this
> > > for
> > > > > us. Is that assumption correct? Do you have any ideas for doing
> > that?
> > > > >
> > > > > Mike, I can start a feature branch and experiment with merging
> > > metron-api
> > > > > into metron-rest. That should allow us to collaborate on any issues
> > or
> > > > > challenges. Also, can you expand on your idea to manage external
> > > > > dependencies as a special module? That seems like a very attractive
> > > > option
> > > > > to me.
> > > > >
> > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <
> ottobackwards@gmail.com>
> >
> > > > > wrote:
> > > > >
> > > > > > From my response on the other thread, but applicable to the
> > backend
> > > > > stuff:
> > > > > >
> > > > > > "The PCAP Query seems more like PCAP Report to me. You are
> > > generating a
> > > > > > report based on parameters.
> > > > > > That report is something that takes some time and external
> process
> > to
> > > > > > generate… ie you have to wait for it.
> > > > > >
> > > > > > I can almost imagine a flow where you:
> > > > > >
> > > > > > * Are in the AlertUI
> > > > > > * Ask to generate a PCAP report based on some selected
> > > > alerts/meta-alert,
> > > > > > possibly picking from on or more report ‘templates’
> > > > > > that have query options etc
> > > > > > * The report request is ‘queued’, that is dispatched to be be
> > > > > > executed/generated
> > > > > > * You as a user have a ‘queue’ of your report results, and when
> > the
> > > > > report
> > > > > > is done it is queued there
> > > > > > * We ‘monitor’ the report/queue press through the yarn rest (
> > report
> > > > > > info/meta has the yarn details )
> > > > > > * You can select the report from your queue and view it either in
> > a
> > > new
> > > > > UI
> > > > > > or custom component
> > > > > > * You can then apply a different ‘view’ to the report or work
> with
> > > the
> > > > > > report data
> > > > > > * You can print / save etc
> > > > > > * You can associate the report with the alerts ( again in the
> > report
> > > > info
> > > > > > ) with…. a ‘case’ or ‘ticket’ or investigation something or other
> > > > > >
> > > > > >
> > > > > > We can introduce extensibility into the report templates, report
> > > views
> > > > (
> > > > > > thinks that work with the json data of the report )
> > > > > >
> > > > > > Something like that.”
> > > > > >
> > > > > > Maybe we can do :
> > > > > >
> > > > > > template -> query parameters -> script => yarn info
> > > > > > yarn info + query info + alert context + yarn status => report
> > info
> > > ->
> > > > > > stored in a user’s ‘report queue’
> > > > > > report persistence added to report info
> > > > > > metron-rest -> api to monitor the queue, read results ( page ),
> > etc
> > > etc
> > > > > >
> > > > > >
> > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (merrimanr@gmail.com)
> > > wrote:
> > > > > >
> > > > > > I started a separate thread on Pcap UI considerations and user
> > > > > > requirements
> > > > > > at Otto's request. This should help us keep these two related but
> > > > > separate
> > > > > > discussions focused.
> > > > > >
> > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
> > > michelsumbul@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > (Youhouuu my first reply on this kind of mail chain^^)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > If I may, I would like to share my view on the following 3
> > points.
> > > > > > >
> > > > > > > - Backend:
> > > > > > >
> > > > > > > The current metron-api is totally seperate, it will be logic
> for
> > me
> > > > to
> > > > > > have
> > > > > > > it at the same place as the others rest api. Especially when
> > more
> > > > > > security
> > > > > > > will be added, it will not be needed to do the job twice.
> > > > > > > The current implementation send back a pcap object which still
> > need
> > > > to
> > > > > > be
> > > > > > > decoded. In the opensoc, the decoding was done with tshard on
> > the
> > > > > > frontend.
> > > > > > > It will be good to have this decoding happening directly on the
> > > > backend
> > > > > > to
> > > > > > > not create a load on frontend. An option will be to install
> > tshark
> > > on
> > > > > > the
> > > > > > > rest server and to use to convert the pcap to xml and then to a
> > > json
> > > > > > that
> > > > > > > will be send to the frontend.
> > > > > > >
> > > > > > > I tried to start directly the map/reduce job to search over all
> > the
> > > > > pcap
> > > > > > > data from the rest server and as Ryan mention it, we had
> > trouble. I
> > > > > will
> > > > > > > try to find back the error.
> > > > > > >
> > > > > > > Then in the POC, what we tried is to use the pcap_query script
> > and
> > > > this
> > > > > > > work fine. I just modified it that he sends back directly the
> > > job_id
> > > > of
> > > > > > > yarn and not waiting that the job is finished. Then it will
> > allow
> > > the
> > > > > UI
> > > > > > > and the rest server to know what the status of the research by
> > > > querying
> > > > > > the
> > > > > > > yarn rest api. This will allow the UI and the rest server to be
> > > async
> > > > > > > without any blocking phase. What do you think about that?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Having the job submitted directly from the code of the rest
> > server
> > > > will
> > > > > > be
> > > > > > > perfect, but it will need a lot of investigation I think (but
> > I'm
> > > not
> > > > > > the
> > > > > > > expert so I might be completely wrong ^^).
> > > > > > >
> > > > > > > We know that the pcap_query scritp work fine so why not calling
> > it?
> > > > Is
> > > > > > it
> > > > > > > that bad? (maybe stupid question, but I really don’t see a lot
> > of
> > > > > > drawback)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > - Front end:
> > > > > > >
> > > > > > > Adding the the pcap search to the alert UI is, I think, the
> > easiest
> > > > way
> > > > > > to
> > > > > > > move forward. But indeed, it will then be the “Alert UI and
> > > > pcapquery”.
> > > > > > > Maybe the name of the UI should just change to something like
> > > > > > “Monitoring &
> > > > > > > Investigation UI” ?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Is there any roadmap or plan for the different UI? I mean did
> > you
> > > > > > already
> > > > > > > had discussion on how you see the ui evolving with the new
> > feature
> > > > that
> > > > > > > will come in the future?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > - Microservices:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > What do you mean exactly by microservices? Is it to separate
> all
> > > the
> > > > > > > features in different projects? Or something like having the
> > > > different
> > > > > > > components in container like kubernet? (again maybe stupid
> > > question,
> > > > > but
> > > > > > I
> > > > > > > don’t clearly understand what you mean J )
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Michel
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> >
>



-- 
--
simon elliston ball
@sireb

Re: [DISCUSS] Pcap panel architecture

Posted by Ryan Merriman <me...@gmail.com>.
+1 to having metron-api as it's own service and using a gateway type
pattern.

On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <ot...@gmail.com> wrote:

> Why not have metron-api as it’s own service and use a ‘gateway’ type
> pattern in rest?
>
>
> On May 8, 2018 at 08:45:33, Ryan Merriman (merrimanr@gmail.com) wrote:
>
> Moving the yarn classpath command earlier in the classpath now gives this
> error:
>
> Caused by: java.lang.NoSuchMethodError:
> javax.servlet.ServletContext.getVirtualServerName()Ljava/lang/String;
>
> I will experiment with other combinations, I suspect we will need
> finer-grain control over the order.
>
> The grep matches class names inside jar files. I use this all the time and
> it's really useful.
>
> The metron-rest jar is already shaded.
>
> Reverse engineering the yarn jar command was the next thing I was going to
> try. Will let you know how it goes.
>
> On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
> michael.miklavcic@gmail.com> wrote:
>
> > What order did you add the hadoop or yarn classpath? The "shaded"
> package
> > stands out to me in this name "org.apache.hadoop.hbase.*shaded*
> > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider." Maybe try adding
> > those packages earlier on the classpath.
> >
> > I think that find command needs a "jar tvf", otherwise you're looking
> for a
> > class name in jar file names.
> >
> > Have you tried shading the rest jar?
> >
> > I'd also look at the classpath you get when running "yarn jar" to start
> the
> > existing pcap service, per the instructions in metron-api/README.md.
> >
> >
> > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <me...@gmail.com>
> wrote:
> >
> > > To explore the idea of merging metron-api into metron-rest and running
> > pcap
> > > queries inside our REST application, I created a simple test here:
> > > https://github.com/merrimanr/incubator-metron/tree/pcap-rest-test. A
> > > summary of what's included:
> > >
> > > - Added pcap as a dependency in the metron-rest pom.xml
> > > - Added a pcap query controller endpoint at
> > > http://node1:8082/swagger-ui.html#!/pcap-query-controller/
> > queryUsingGET
> > > - Added a pcap query service that runs a simple, hardcoded query
> > >
> > > Generate some pcap data using pycapa (
> > > https://github.com/apache/metron/tree/master/metron-sensors/pycapa)
> and
> > > the
> > > pcap topology (
> > > https://github.com/apache/metron/tree/master/metron-
> > > platform/metron-pcap-backend#starting-the-topology).
> > > After this initial setup there should be data in HDFS at
> > > "/apps/metron/pcap". I believe this should be enough to exercise the
> > > issue. Just hit the endpoint referenced above. I tested this in an
> > > already running full dev by building and deploying the metron-rest
> jar.
> > I
> > > did not rebuild full dev with this change but I would still expect it
> to
> > > work. Let me know if it doesn't.
> > >
> > > The first error I see when I hit this endpoint is:
> > >
> > > java.lang.NoClassDefFoundError:
> > > org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.
> > >
> > > Here are the things I've tried so far:
> > >
> > > - Run the REST application with the YARN jar command since this is how
> > > all our other YARN/MR-related applications are started (metron-api,
> > > MAAS,
> > > pcap query, etc). I wouldn't expect this to work since we have
> > runtime
> > > dependencies on our shaded elasticsearch and parser jars and I'm not
> > > aware
> > > of a way to add additional jars to the classpath with the YARN jar
> > > command
> > > (is there a way?). Either way I get this error:
> > >
> > > 18/05/04 19:49:56 WARN reflections.Reflections: could not create Dir
> > using
> > > jarFile from url file:/usr/hdp/2.6.4.0-91/hadoop/lib/ojdbc6.jar.
> > skipping.
> > > java.lang.NullPointerException
> > >
> > >
> > > - I tried adding `yarn classpath` and `hadoop classpath` to the
> > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST start
> > > script). I
> > > get this error:
> > >
> > > java.lang.ClassNotFoundException:
> > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > > jaxrs.JacksonJaxbJsonProvider
> > >
> > >
> > > - I searched for the class in the previous attempt but could not find
> > it
> > > in full dev:
> > >
> > > find / -name "*.jar" 2>/dev/null | xargs grep
> > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > > jaxrs/JacksonJaxbJsonProvider
> > > 2>/dev/null
> > >
> > >
> > > - Further up in the stack trace I see the error happens when
> > initiating
> > > the org.apache.hadoop.yarn.util.timeline.TimelineUtils class. I
> > tried
> > > setting "yarn.timeline-service.enabled" in Ambari to false and then I
> > > get
> > > this error:
> > >
> > > Unable to parse
> > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework' as
> a
> > > URI, check the setting for mapreduce.application.framework.path
> > >
> > >
> > > - I've tried adding different hadoop, hbase, yarn and mapreduce Maven
> > > dependencies without any success
> > > - hadoop-yarn-client
> > > - hadoop-yarn-common
> > > - hadoop-mapreduce-client-core
> > > - hadoop-yarn-server-common
> > > - hadoop-yarn-api
> > > - hbase-server
> > >
> > > I will keep exploring other possible solutions. Let me know if anyone
> > has
> > > any ideas.
> > >
> > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <ot...@gmail.com>
> > > wrote:
> > >
> > > > I can imagine a new generic service(s) capability whose job ( pun
> > > intended
> > > > ) is to
> > > > abstract the submittal, tracking, and storage of results to yarn.
> > > >
> > > > It would be extended with storage providers, queue provider,
> possibly
> > > some
> > > > set of policies or rather strategies.
> > > >
> > > > The pcap ‘report’ would be a client to that service, the specializes
> > the
> > > > service operation for the way we want pcap to work.
> > > >
> > > > We can then re-use the generic service for other long running yarn
> > > > things…..
> > > >
> > > >
> > > > On May 7, 2018 at 09:56:51, Otto Fowler (ottobackwards@gmail.com)
> > wrote:
> > > >
> > > > RE: Tracking v. users
> > > >
> > > > The submittal and tracking can associate the submitter with the yarn
> > job
> > > > and track that,
> > > > regardless of the yarn credentials.
> > > >
> > > > IE> if all submittals and monitoring are by the same yarn user (
> > Metron )
> > > > from a single or
> > > > co-operative set of services, that service can maintain the mapping.
> > > >
> > > >
> > > >
> > > > On May 7, 2018 at 09:39:52, Ryan Merriman (merrimanr@gmail.com)
> wrote:
> > > >
> > > > Otto, your use case makes sense to me. We'll have to think about how
> to
> > > > manage the user to job relationships. I'm assuming YARN jobs will be
> > > > submitted as the metron service user so YARN won't keep track of
> this
> > for
> > > > us. Is that assumption correct? Do you have any ideas for doing
> that?
> > > >
> > > > Mike, I can start a feature branch and experiment with merging
> > metron-api
> > > > into metron-rest. That should allow us to collaborate on any issues
> or
> > > > challenges. Also, can you expand on your idea to manage external
> > > > dependencies as a special module? That seems like a very attractive
> > > option
> > > > to me.
> > > >
> > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <ot...@gmail.com>
>
> > > > wrote:
> > > >
> > > > > From my response on the other thread, but applicable to the
> backend
> > > > stuff:
> > > > >
> > > > > "The PCAP Query seems more like PCAP Report to me. You are
> > generating a
> > > > > report based on parameters.
> > > > > That report is something that takes some time and external process
> to
> > > > > generate… ie you have to wait for it.
> > > > >
> > > > > I can almost imagine a flow where you:
> > > > >
> > > > > * Are in the AlertUI
> > > > > * Ask to generate a PCAP report based on some selected
> > > alerts/meta-alert,
> > > > > possibly picking from on or more report ‘templates’
> > > > > that have query options etc
> > > > > * The report request is ‘queued’, that is dispatched to be be
> > > > > executed/generated
> > > > > * You as a user have a ‘queue’ of your report results, and when
> the
> > > > report
> > > > > is done it is queued there
> > > > > * We ‘monitor’ the report/queue press through the yarn rest (
> report
> > > > > info/meta has the yarn details )
> > > > > * You can select the report from your queue and view it either in
> a
> > new
> > > > UI
> > > > > or custom component
> > > > > * You can then apply a different ‘view’ to the report or work with
> > the
> > > > > report data
> > > > > * You can print / save etc
> > > > > * You can associate the report with the alerts ( again in the
> report
> > > info
> > > > > ) with…. a ‘case’ or ‘ticket’ or investigation something or other
> > > > >
> > > > >
> > > > > We can introduce extensibility into the report templates, report
> > views
> > > (
> > > > > thinks that work with the json data of the report )
> > > > >
> > > > > Something like that.”
> > > > >
> > > > > Maybe we can do :
> > > > >
> > > > > template -> query parameters -> script => yarn info
> > > > > yarn info + query info + alert context + yarn status => report
> info
> > ->
> > > > > stored in a user’s ‘report queue’
> > > > > report persistence added to report info
> > > > > metron-rest -> api to monitor the queue, read results ( page ),
> etc
> > etc
> > > > >
> > > > >
> > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (merrimanr@gmail.com)
> > wrote:
> > > > >
> > > > > I started a separate thread on Pcap UI considerations and user
> > > > > requirements
> > > > > at Otto's request. This should help us keep these two related but
> > > > separate
> > > > > discussions focused.
> > > > >
> > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
> > michelsumbul@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > >
> > > > > >
> > > > > > (Youhouuu my first reply on this kind of mail chain^^)
> > > > > >
> > > > > >
> > > > > >
> > > > > > If I may, I would like to share my view on the following 3
> points.
> > > > > >
> > > > > > - Backend:
> > > > > >
> > > > > > The current metron-api is totally seperate, it will be logic for
> me
> > > to
> > > > > have
> > > > > > it at the same place as the others rest api. Especially when
> more
> > > > > security
> > > > > > will be added, it will not be needed to do the job twice.
> > > > > > The current implementation send back a pcap object which still
> need
> > > to
> > > > > be
> > > > > > decoded. In the opensoc, the decoding was done with tshard on
> the
> > > > > frontend.
> > > > > > It will be good to have this decoding happening directly on the
> > > backend
> > > > > to
> > > > > > not create a load on frontend. An option will be to install
> tshark
> > on
> > > > > the
> > > > > > rest server and to use to convert the pcap to xml and then to a
> > json
> > > > > that
> > > > > > will be send to the frontend.
> > > > > >
> > > > > > I tried to start directly the map/reduce job to search over all
> the
> > > > pcap
> > > > > > data from the rest server and as Ryan mention it, we had
> trouble. I
> > > > will
> > > > > > try to find back the error.
> > > > > >
> > > > > > Then in the POC, what we tried is to use the pcap_query script
> and
> > > this
> > > > > > work fine. I just modified it that he sends back directly the
> > job_id
> > > of
> > > > > > yarn and not waiting that the job is finished. Then it will
> allow
> > the
> > > > UI
> > > > > > and the rest server to know what the status of the research by
> > > querying
> > > > > the
> > > > > > yarn rest api. This will allow the UI and the rest server to be
> > async
> > > > > > without any blocking phase. What do you think about that?
> > > > > >
> > > > > >
> > > > > >
> > > > > > Having the job submitted directly from the code of the rest
> server
> > > will
> > > > > be
> > > > > > perfect, but it will need a lot of investigation I think (but
> I'm
> > not
> > > > > the
> > > > > > expert so I might be completely wrong ^^).
> > > > > >
> > > > > > We know that the pcap_query scritp work fine so why not calling
> it?
> > > Is
> > > > > it
> > > > > > that bad? (maybe stupid question, but I really don’t see a lot
> of
> > > > > drawback)
> > > > > >
> > > > > >
> > > > > >
> > > > > > - Front end:
> > > > > >
> > > > > > Adding the the pcap search to the alert UI is, I think, the
> easiest
> > > way
> > > > > to
> > > > > > move forward. But indeed, it will then be the “Alert UI and
> > > pcapquery”.
> > > > > > Maybe the name of the UI should just change to something like
> > > > > “Monitoring &
> > > > > > Investigation UI” ?
> > > > > >
> > > > > >
> > > > > >
> > > > > > Is there any roadmap or plan for the different UI? I mean did
> you
> > > > > already
> > > > > > had discussion on how you see the ui evolving with the new
> feature
> > > that
> > > > > > will come in the future?
> > > > > >
> > > > > >
> > > > > >
> > > > > > - Microservices:
> > > > > >
> > > > > >
> > > > > >
> > > > > > What do you mean exactly by microservices? Is it to separate all
> > the
> > > > > > features in different projects? Or something like having the
> > > different
> > > > > > components in container like kubernet? (again maybe stupid
> > question,
> > > > but
> > > > > I
> > > > > > don’t clearly understand what you mean J )
> > > > > >
> > > > > >
> > > > > >
> > > > > > Michel
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> >
>
>

Re: [DISCUSS] Pcap panel architecture

Posted by Otto Fowler <ot...@gmail.com>.
Why not have metron-api as it’s own service and use a ‘gateway’ type
pattern in rest?


On May 8, 2018 at 08:45:33, Ryan Merriman (merrimanr@gmail.com) wrote:

Moving the yarn classpath command earlier in the classpath now gives this
error:

Caused by: java.lang.NoSuchMethodError:
javax.servlet.ServletContext.getVirtualServerName()Ljava/lang/String;

I will experiment with other combinations, I suspect we will need
finer-grain control over the order.

The grep matches class names inside jar files. I use this all the time and
it's really useful.

The metron-rest jar is already shaded.

Reverse engineering the yarn jar command was the next thing I was going to
try. Will let you know how it goes.

On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
michael.miklavcic@gmail.com> wrote:

> What order did you add the hadoop or yarn classpath? The "shaded" package
> stands out to me in this name "org.apache.hadoop.hbase.*shaded*
> .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider." Maybe try adding
> those packages earlier on the classpath.
>
> I think that find command needs a "jar tvf", otherwise you're looking for
a
> class name in jar file names.
>
> Have you tried shading the rest jar?
>
> I'd also look at the classpath you get when running "yarn jar" to start
the
> existing pcap service, per the instructions in metron-api/README.md.
>
>
> On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <me...@gmail.com>
wrote:
>
> > To explore the idea of merging metron-api into metron-rest and running
> pcap
> > queries inside our REST application, I created a simple test here:
> > https://github.com/merrimanr/incubator-metron/tree/pcap-rest-test. A
> > summary of what's included:
> >
> > - Added pcap as a dependency in the metron-rest pom.xml
> > - Added a pcap query controller endpoint at
> > http://node1:8082/swagger-ui.html#!/pcap-query-controller/
> queryUsingGET
> > - Added a pcap query service that runs a simple, hardcoded query
> >
> > Generate some pcap data using pycapa (
> > https://github.com/apache/metron/tree/master/metron-sensors/pycapa) and
> > the
> > pcap topology (
> > https://github.com/apache/metron/tree/master/metron-
> > platform/metron-pcap-backend#starting-the-topology).
> > After this initial setup there should be data in HDFS at
> > "/apps/metron/pcap". I believe this should be enough to exercise the
> > issue. Just hit the endpoint referenced above. I tested this in an
> > already running full dev by building and deploying the metron-rest jar.
> I
> > did not rebuild full dev with this change but I would still expect it
to
> > work. Let me know if it doesn't.
> >
> > The first error I see when I hit this endpoint is:
> >
> > java.lang.NoClassDefFoundError:
> > org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.
> >
> > Here are the things I've tried so far:
> >
> > - Run the REST application with the YARN jar command since this is how
> > all our other YARN/MR-related applications are started (metron-api,
> > MAAS,
> > pcap query, etc). I wouldn't expect this to work since we have
> runtime
> > dependencies on our shaded elasticsearch and parser jars and I'm not
> > aware
> > of a way to add additional jars to the classpath with the YARN jar
> > command
> > (is there a way?). Either way I get this error:
> >
> > 18/05/04 19:49:56 WARN reflections.Reflections: could not create Dir
> using
> > jarFile from url file:/usr/hdp/2.6.4.0-91/hadoop/lib/ojdbc6.jar.
> skipping.
> > java.lang.NullPointerException
> >
> >
> > - I tried adding `yarn classpath` and `hadoop classpath` to the
> > classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST start
> > script). I
> > get this error:
> >
> > java.lang.ClassNotFoundException:
> > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > jaxrs.JacksonJaxbJsonProvider
> >
> >
> > - I searched for the class in the previous attempt but could not find
> it
> > in full dev:
> >
> > find / -name "*.jar" 2>/dev/null | xargs grep
> > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > jaxrs/JacksonJaxbJsonProvider
> > 2>/dev/null
> >
> >
> > - Further up in the stack trace I see the error happens when
> initiating
> > the org.apache.hadoop.yarn.util.timeline.TimelineUtils class. I
> tried
> > setting "yarn.timeline-service.enabled" in Ambari to false and then I
> > get
> > this error:
> >
> > Unable to parse
> > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework' as a
> > URI, check the setting for mapreduce.application.framework.path
> >
> >
> > - I've tried adding different hadoop, hbase, yarn and mapreduce Maven
> > dependencies without any success
> > - hadoop-yarn-client
> > - hadoop-yarn-common
> > - hadoop-mapreduce-client-core
> > - hadoop-yarn-server-common
> > - hadoop-yarn-api
> > - hbase-server
> >
> > I will keep exploring other possible solutions. Let me know if anyone
> has
> > any ideas.
> >
> > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <ot...@gmail.com>
> > wrote:
> >
> > > I can imagine a new generic service(s) capability whose job ( pun
> > intended
> > > ) is to
> > > abstract the submittal, tracking, and storage of results to yarn.
> > >
> > > It would be extended with storage providers, queue provider, possibly
> > some
> > > set of policies or rather strategies.
> > >
> > > The pcap ‘report’ would be a client to that service, the specializes
> the
> > > service operation for the way we want pcap to work.
> > >
> > > We can then re-use the generic service for other long running yarn
> > > things…..
> > >
> > >
> > > On May 7, 2018 at 09:56:51, Otto Fowler (ottobackwards@gmail.com)
> wrote:
> > >
> > > RE: Tracking v. users
> > >
> > > The submittal and tracking can associate the submitter with the yarn
> job
> > > and track that,
> > > regardless of the yarn credentials.
> > >
> > > IE> if all submittals and monitoring are by the same yarn user (
> Metron )
> > > from a single or
> > > co-operative set of services, that service can maintain the mapping.
> > >
> > >
> > >
> > > On May 7, 2018 at 09:39:52, Ryan Merriman (merrimanr@gmail.com)
wrote:
> > >
> > > Otto, your use case makes sense to me. We'll have to think about how
to
> > > manage the user to job relationships. I'm assuming YARN jobs will be
> > > submitted as the metron service user so YARN won't keep track of this
> for
> > > us. Is that assumption correct? Do you have any ideas for doing that?
> > >
> > > Mike, I can start a feature branch and experiment with merging
> metron-api
> > > into metron-rest. That should allow us to collaborate on any issues
or
> > > challenges. Also, can you expand on your idea to manage external
> > > dependencies as a special module? That seems like a very attractive
> > option
> > > to me.
> > >
> > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <ot...@gmail.com>
> > > wrote:
> > >
> > > > From my response on the other thread, but applicable to the backend
> > > stuff:
> > > >
> > > > "The PCAP Query seems more like PCAP Report to me. You are
> generating a
> > > > report based on parameters.
> > > > That report is something that takes some time and external process
to
> > > > generate… ie you have to wait for it.
> > > >
> > > > I can almost imagine a flow where you:
> > > >
> > > > * Are in the AlertUI
> > > > * Ask to generate a PCAP report based on some selected
> > alerts/meta-alert,
> > > > possibly picking from on or more report ‘templates’
> > > > that have query options etc
> > > > * The report request is ‘queued’, that is dispatched to be be
> > > > executed/generated
> > > > * You as a user have a ‘queue’ of your report results, and when the
> > > report
> > > > is done it is queued there
> > > > * We ‘monitor’ the report/queue press through the yarn rest (
report
> > > > info/meta has the yarn details )
> > > > * You can select the report from your queue and view it either in a
> new
> > > UI
> > > > or custom component
> > > > * You can then apply a different ‘view’ to the report or work with
> the
> > > > report data
> > > > * You can print / save etc
> > > > * You can associate the report with the alerts ( again in the
report
> > info
> > > > ) with…. a ‘case’ or ‘ticket’ or investigation something or other
> > > >
> > > >
> > > > We can introduce extensibility into the report templates, report
> views
> > (
> > > > thinks that work with the json data of the report )
> > > >
> > > > Something like that.”
> > > >
> > > > Maybe we can do :
> > > >
> > > > template -> query parameters -> script => yarn info
> > > > yarn info + query info + alert context + yarn status => report info
> ->
> > > > stored in a user’s ‘report queue’
> > > > report persistence added to report info
> > > > metron-rest -> api to monitor the queue, read results ( page ), etc
> etc
> > > >
> > > >
> > > > On May 4, 2018 at 09:23:39, Ryan Merriman (merrimanr@gmail.com)
> wrote:
> > > >
> > > > I started a separate thread on Pcap UI considerations and user
> > > > requirements
> > > > at Otto's request. This should help us keep these two related but
> > > separate
> > > > discussions focused.
> > > >
> > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
> michelsumbul@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > >
> > > > >
> > > > > (Youhouuu my first reply on this kind of mail chain^^)
> > > > >
> > > > >
> > > > >
> > > > > If I may, I would like to share my view on the following 3
points.
> > > > >
> > > > > - Backend:
> > > > >
> > > > > The current metron-api is totally seperate, it will be logic for
me
> > to
> > > > have
> > > > > it at the same place as the others rest api. Especially when more
> > > > security
> > > > > will be added, it will not be needed to do the job twice.
> > > > > The current implementation send back a pcap object which still
need
> > to
> > > > be
> > > > > decoded. In the opensoc, the decoding was done with tshard on the
> > > > frontend.
> > > > > It will be good to have this decoding happening directly on the
> > backend
> > > > to
> > > > > not create a load on frontend. An option will be to install
tshark
> on
> > > > the
> > > > > rest server and to use to convert the pcap to xml and then to a
> json
> > > > that
> > > > > will be send to the frontend.
> > > > >
> > > > > I tried to start directly the map/reduce job to search over all
the
> > > pcap
> > > > > data from the rest server and as Ryan mention it, we had trouble.
I
> > > will
> > > > > try to find back the error.
> > > > >
> > > > > Then in the POC, what we tried is to use the pcap_query script
and
> > this
> > > > > work fine. I just modified it that he sends back directly the
> job_id
> > of
> > > > > yarn and not waiting that the job is finished. Then it will allow
> the
> > > UI
> > > > > and the rest server to know what the status of the research by
> > querying
> > > > the
> > > > > yarn rest api. This will allow the UI and the rest server to be
> async
> > > > > without any blocking phase. What do you think about that?
> > > > >
> > > > >
> > > > >
> > > > > Having the job submitted directly from the code of the rest
server
> > will
> > > > be
> > > > > perfect, but it will need a lot of investigation I think (but I'm
> not
> > > > the
> > > > > expert so I might be completely wrong ^^).
> > > > >
> > > > > We know that the pcap_query scritp work fine so why not calling
it?
> > Is
> > > > it
> > > > > that bad? (maybe stupid question, but I really don’t see a lot of
> > > > drawback)
> > > > >
> > > > >
> > > > >
> > > > > - Front end:
> > > > >
> > > > > Adding the the pcap search to the alert UI is, I think, the
easiest
> > way
> > > > to
> > > > > move forward. But indeed, it will then be the “Alert UI and
> > pcapquery”.
> > > > > Maybe the name of the UI should just change to something like
> > > > “Monitoring &
> > > > > Investigation UI” ?
> > > > >
> > > > >
> > > > >
> > > > > Is there any roadmap or plan for the different UI? I mean did you
> > > > already
> > > > > had discussion on how you see the ui evolving with the new
feature
> > that
> > > > > will come in the future?
> > > > >
> > > > >
> > > > >
> > > > > - Microservices:
> > > > >
> > > > >
> > > > >
> > > > > What do you mean exactly by microservices? Is it to separate all
> the
> > > > > features in different projects? Or something like having the
> > different
> > > > > components in container like kubernet? (again maybe stupid
> question,
> > > but
> > > > I
> > > > > don’t clearly understand what you mean J )
> > > > >
> > > > >
> > > > >
> > > > > Michel
> > > > >
> > > >
> > > >
> > >
> > >
> >
>

Re: [DISCUSS] Pcap panel architecture

Posted by Ryan Merriman <me...@gmail.com>.
Moving the yarn classpath command earlier in the classpath now gives this
error:

Caused by: java.lang.NoSuchMethodError:
javax.servlet.ServletContext.getVirtualServerName()Ljava/lang/String;

I will experiment with other combinations, I suspect we will need
finer-grain control over the order.

The grep matches class names inside jar files.  I use this all the time and
it's really useful.

The metron-rest jar is already shaded.

Reverse engineering the yarn jar command was the next thing I was going to
try.  Will let you know how it goes.

On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic <
michael.miklavcic@gmail.com> wrote:

> What order did you add the hadoop or yarn classpath? The "shaded" package
> stands out to me in this name "org.apache.hadoop.hbase.*shaded*
> .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider." Maybe try adding
> those packages earlier on the classpath.
>
> I think that find command needs a "jar tvf", otherwise you're looking for a
> class name in jar file names.
>
> Have you tried shading the rest jar?
>
> I'd also look at the classpath you get when running "yarn jar" to start the
> existing pcap service, per the instructions in metron-api/README.md.
>
>
> On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <me...@gmail.com> wrote:
>
> > To explore the idea of merging metron-api into metron-rest and running
> pcap
> > queries inside our REST application, I created a simple test here:
> > https://github.com/merrimanr/incubator-metron/tree/pcap-rest-test.  A
> > summary of what's included:
> >
> >    - Added pcap as a dependency in the metron-rest pom.xml
> >    - Added a pcap query controller endpoint at
> >    http://node1:8082/swagger-ui.html#!/pcap-query-controller/
> queryUsingGET
> >    - Added a pcap query service that runs a simple, hardcoded query
> >
> > Generate some pcap data using pycapa (
> > https://github.com/apache/metron/tree/master/metron-sensors/pycapa) and
> > the
> > pcap topology (
> > https://github.com/apache/metron/tree/master/metron-
> > platform/metron-pcap-backend#starting-the-topology).
> > After this initial setup there should be data in HDFS at
> > "/apps/metron/pcap".  I believe this should be enough to exercise the
> > issue.  Just hit the endpoint referenced above.  I tested this in an
> > already running full dev by building and deploying the metron-rest jar.
> I
> > did not rebuild full dev with this change but I would still expect it to
> > work.  Let me know if it doesn't.
> >
> > The first error I see when I hit this endpoint is:
> >
> > java.lang.NoClassDefFoundError:
> > org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.
> >
> > Here are the things I've tried so far:
> >
> >    - Run the REST application with the YARN jar command since this is how
> >    all our other YARN/MR-related applications are started (metron-api,
> > MAAS,
> >    pcap query, etc).  I wouldn't expect this to work since we have
> runtime
> >    dependencies on our shaded elasticsearch and parser jars and I'm not
> > aware
> >    of a way to add additional jars to the classpath with the YARN jar
> > command
> >    (is there a way?).  Either way I get this error:
> >
> > 18/05/04 19:49:56 WARN reflections.Reflections: could not create Dir
> using
> > jarFile from url file:/usr/hdp/2.6.4.0-91/hadoop/lib/ojdbc6.jar.
> skipping.
> > java.lang.NullPointerException
> >
> >
> >    - I tried adding `yarn classpath` and `hadoop classpath` to the
> >    classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST start
> > script).  I
> >    get this error:
> >
> > java.lang.ClassNotFoundException:
> > org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> > jaxrs.JacksonJaxbJsonProvider
> >
> >
> >    - I searched for the class in the previous attempt but could not find
> it
> >    in full dev:
> >
> > find / -name "*.jar" 2>/dev/null | xargs grep
> > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> > jaxrs/JacksonJaxbJsonProvider
> > 2>/dev/null
> >
> >
> >    - Further up in the stack trace I see the error happens when
> initiating
> >    the org.apache.hadoop.yarn.util.timeline.TimelineUtils class.  I
> tried
> >    setting "yarn.timeline-service.enabled" in Ambari to false and then I
> > get
> >    this error:
> >
> > Unable to parse
> > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework' as a
> > URI, check the setting for mapreduce.application.framework.path
> >
> >
> >    - I've tried adding different hadoop, hbase, yarn and mapreduce Maven
> >    dependencies without any success
> >       - hadoop-yarn-client
> >       - hadoop-yarn-common
> >       - hadoop-mapreduce-client-core
> >       - hadoop-yarn-server-common
> >       - hadoop-yarn-api
> >       - hbase-server
> >
> > I will keep exploring other possible solutions.  Let me know if anyone
> has
> > any ideas.
> >
> > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <ot...@gmail.com>
> > wrote:
> >
> > > I can imagine a new generic service(s) capability whose job ( pun
> > intended
> > > ) is to
> > > abstract the submittal, tracking, and storage of results to yarn.
> > >
> > > It would be extended with storage providers, queue provider, possibly
> > some
> > > set of policies or rather strategies.
> > >
> > > The pcap ‘report’ would be a client to that service, the specializes
> the
> > > service operation for the way we want pcap to work.
> > >
> > > We can then re-use the generic service for other long running yarn
> > > things…..
> > >
> > >
> > > On May 7, 2018 at 09:56:51, Otto Fowler (ottobackwards@gmail.com)
> wrote:
> > >
> > > RE: Tracking v. users
> > >
> > > The submittal and tracking can associate the submitter with the yarn
> job
> > > and track that,
> > > regardless of the yarn credentials.
> > >
> > > IE> if all submittals and monitoring are by the same yarn user (
> Metron )
> > > from a single or
> > > co-operative set of services, that service can maintain the mapping.
> > >
> > >
> > >
> > > On May 7, 2018 at 09:39:52, Ryan Merriman (merrimanr@gmail.com) wrote:
> > >
> > > Otto, your use case makes sense to me. We'll have to think about how to
> > > manage the user to job relationships. I'm assuming YARN jobs will be
> > > submitted as the metron service user so YARN won't keep track of this
> for
> > > us. Is that assumption correct? Do you have any ideas for doing that?
> > >
> > > Mike, I can start a feature branch and experiment with merging
> metron-api
> > > into metron-rest. That should allow us to collaborate on any issues or
> > > challenges. Also, can you expand on your idea to manage external
> > > dependencies as a special module? That seems like a very attractive
> > option
> > > to me.
> > >
> > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <ot...@gmail.com>
> > > wrote:
> > >
> > > > From my response on the other thread, but applicable to the backend
> > > stuff:
> > > >
> > > > "The PCAP Query seems more like PCAP Report to me. You are
> generating a
> > > > report based on parameters.
> > > > That report is something that takes some time and external process to
> > > > generate… ie you have to wait for it.
> > > >
> > > > I can almost imagine a flow where you:
> > > >
> > > > * Are in the AlertUI
> > > > * Ask to generate a PCAP report based on some selected
> > alerts/meta-alert,
> > > > possibly picking from on or more report ‘templates’
> > > > that have query options etc
> > > > * The report request is ‘queued’, that is dispatched to be be
> > > > executed/generated
> > > > * You as a user have a ‘queue’ of your report results, and when the
> > > report
> > > > is done it is queued there
> > > > * We ‘monitor’ the report/queue press through the yarn rest ( report
> > > > info/meta has the yarn details )
> > > > * You can select the report from your queue and view it either in a
> new
> > > UI
> > > > or custom component
> > > > * You can then apply a different ‘view’ to the report or work with
> the
> > > > report data
> > > > * You can print / save etc
> > > > * You can associate the report with the alerts ( again in the report
> > info
> > > > ) with…. a ‘case’ or ‘ticket’ or investigation something or other
> > > >
> > > >
> > > > We can introduce extensibility into the report templates, report
> views
> > (
> > > > thinks that work with the json data of the report )
> > > >
> > > > Something like that.”
> > > >
> > > > Maybe we can do :
> > > >
> > > > template -> query parameters -> script => yarn info
> > > > yarn info + query info + alert context + yarn status => report info
> ->
> > > > stored in a user’s ‘report queue’
> > > > report persistence added to report info
> > > > metron-rest -> api to monitor the queue, read results ( page ), etc
> etc
> > > >
> > > >
> > > > On May 4, 2018 at 09:23:39, Ryan Merriman (merrimanr@gmail.com)
> wrote:
> > > >
> > > > I started a separate thread on Pcap UI considerations and user
> > > > requirements
> > > > at Otto's request. This should help us keep these two related but
> > > separate
> > > > discussions focused.
> > > >
> > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <
> michelsumbul@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > >
> > > > >
> > > > > (Youhouuu my first reply on this kind of mail chain^^)
> > > > >
> > > > >
> > > > >
> > > > > If I may, I would like to share my view on the following 3 points.
> > > > >
> > > > > - Backend:
> > > > >
> > > > > The current metron-api is totally seperate, it will be logic for me
> > to
> > > > have
> > > > > it at the same place as the others rest api. Especially when more
> > > > security
> > > > > will be added, it will not be needed to do the job twice.
> > > > > The current implementation send back a pcap object which still need
> > to
> > > > be
> > > > > decoded. In the opensoc, the decoding was done with tshard on the
> > > > frontend.
> > > > > It will be good to have this decoding happening directly on the
> > backend
> > > > to
> > > > > not create a load on frontend. An option will be to install tshark
> on
> > > > the
> > > > > rest server and to use to convert the pcap to xml and then to a
> json
> > > > that
> > > > > will be send to the frontend.
> > > > >
> > > > > I tried to start directly the map/reduce job to search over all the
> > > pcap
> > > > > data from the rest server and as Ryan mention it, we had trouble. I
> > > will
> > > > > try to find back the error.
> > > > >
> > > > > Then in the POC, what we tried is to use the pcap_query script and
> > this
> > > > > work fine. I just modified it that he sends back directly the
> job_id
> > of
> > > > > yarn and not waiting that the job is finished. Then it will allow
> the
> > > UI
> > > > > and the rest server to know what the status of the research by
> > querying
> > > > the
> > > > > yarn rest api. This will allow the UI and the rest server to be
> async
> > > > > without any blocking phase. What do you think about that?
> > > > >
> > > > >
> > > > >
> > > > > Having the job submitted directly from the code of the rest server
> > will
> > > > be
> > > > > perfect, but it will need a lot of investigation I think (but I'm
> not
> > > > the
> > > > > expert so I might be completely wrong ^^).
> > > > >
> > > > > We know that the pcap_query scritp work fine so why not calling it?
> > Is
> > > > it
> > > > > that bad? (maybe stupid question, but I really don’t see a lot of
> > > > drawback)
> > > > >
> > > > >
> > > > >
> > > > > - Front end:
> > > > >
> > > > > Adding the the pcap search to the alert UI is, I think, the easiest
> > way
> > > > to
> > > > > move forward. But indeed, it will then be the “Alert UI and
> > pcapquery”.
> > > > > Maybe the name of the UI should just change to something like
> > > > “Monitoring &
> > > > > Investigation UI” ?
> > > > >
> > > > >
> > > > >
> > > > > Is there any roadmap or plan for the different UI? I mean did you
> > > > already
> > > > > had discussion on how you see the ui evolving with the new feature
> > that
> > > > > will come in the future?
> > > > >
> > > > >
> > > > >
> > > > > - Microservices:
> > > > >
> > > > >
> > > > >
> > > > > What do you mean exactly by microservices? Is it to separate all
> the
> > > > > features in different projects? Or something like having the
> > different
> > > > > components in container like kubernet? (again maybe stupid
> question,
> > > but
> > > > I
> > > > > don’t clearly understand what you mean J )
> > > > >
> > > > >
> > > > >
> > > > > Michel
> > > > >
> > > >
> > > >
> > >
> > >
> >
>

Re: [DISCUSS] Pcap panel architecture

Posted by Michael Miklavcic <mi...@gmail.com>.
What order did you add the hadoop or yarn classpath? The "shaded" package
stands out to me in this name "org.apache.hadoop.hbase.*shaded*
.org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider." Maybe try adding
those packages earlier on the classpath.

I think that find command needs a "jar tvf", otherwise you're looking for a
class name in jar file names.

Have you tried shading the rest jar?

I'd also look at the classpath you get when running "yarn jar" to start the
existing pcap service, per the instructions in metron-api/README.md.


On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman <me...@gmail.com> wrote:

> To explore the idea of merging metron-api into metron-rest and running pcap
> queries inside our REST application, I created a simple test here:
> https://github.com/merrimanr/incubator-metron/tree/pcap-rest-test.  A
> summary of what's included:
>
>    - Added pcap as a dependency in the metron-rest pom.xml
>    - Added a pcap query controller endpoint at
>    http://node1:8082/swagger-ui.html#!/pcap-query-controller/queryUsingGET
>    - Added a pcap query service that runs a simple, hardcoded query
>
> Generate some pcap data using pycapa (
> https://github.com/apache/metron/tree/master/metron-sensors/pycapa) and
> the
> pcap topology (
> https://github.com/apache/metron/tree/master/metron-
> platform/metron-pcap-backend#starting-the-topology).
> After this initial setup there should be data in HDFS at
> "/apps/metron/pcap".  I believe this should be enough to exercise the
> issue.  Just hit the endpoint referenced above.  I tested this in an
> already running full dev by building and deploying the metron-rest jar.  I
> did not rebuild full dev with this change but I would still expect it to
> work.  Let me know if it doesn't.
>
> The first error I see when I hit this endpoint is:
>
> java.lang.NoClassDefFoundError:
> org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.
>
> Here are the things I've tried so far:
>
>    - Run the REST application with the YARN jar command since this is how
>    all our other YARN/MR-related applications are started (metron-api,
> MAAS,
>    pcap query, etc).  I wouldn't expect this to work since we have runtime
>    dependencies on our shaded elasticsearch and parser jars and I'm not
> aware
>    of a way to add additional jars to the classpath with the YARN jar
> command
>    (is there a way?).  Either way I get this error:
>
> 18/05/04 19:49:56 WARN reflections.Reflections: could not create Dir using
> jarFile from url file:/usr/hdp/2.6.4.0-91/hadoop/lib/ojdbc6.jar. skipping.
> java.lang.NullPointerException
>
>
>    - I tried adding `yarn classpath` and `hadoop classpath` to the
>    classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST start
> script).  I
>    get this error:
>
> java.lang.ClassNotFoundException:
> org.apache.hadoop.hbase.shaded.org.codehaus.jackson.
> jaxrs.JacksonJaxbJsonProvider
>
>
>    - I searched for the class in the previous attempt but could not find it
>    in full dev:
>
> find / -name "*.jar" 2>/dev/null | xargs grep
> org/apache/hadoop/hbase/shaded/org/codehaus/jackson/
> jaxrs/JacksonJaxbJsonProvider
> 2>/dev/null
>
>
>    - Further up in the stack trace I see the error happens when initiating
>    the org.apache.hadoop.yarn.util.timeline.TimelineUtils class.  I tried
>    setting "yarn.timeline-service.enabled" in Ambari to false and then I
> get
>    this error:
>
> Unable to parse
> '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework' as a
> URI, check the setting for mapreduce.application.framework.path
>
>
>    - I've tried adding different hadoop, hbase, yarn and mapreduce Maven
>    dependencies without any success
>       - hadoop-yarn-client
>       - hadoop-yarn-common
>       - hadoop-mapreduce-client-core
>       - hadoop-yarn-server-common
>       - hadoop-yarn-api
>       - hbase-server
>
> I will keep exploring other possible solutions.  Let me know if anyone has
> any ideas.
>
> On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <ot...@gmail.com>
> wrote:
>
> > I can imagine a new generic service(s) capability whose job ( pun
> intended
> > ) is to
> > abstract the submittal, tracking, and storage of results to yarn.
> >
> > It would be extended with storage providers, queue provider, possibly
> some
> > set of policies or rather strategies.
> >
> > The pcap ‘report’ would be a client to that service, the specializes the
> > service operation for the way we want pcap to work.
> >
> > We can then re-use the generic service for other long running yarn
> > things…..
> >
> >
> > On May 7, 2018 at 09:56:51, Otto Fowler (ottobackwards@gmail.com) wrote:
> >
> > RE: Tracking v. users
> >
> > The submittal and tracking can associate the submitter with the yarn job
> > and track that,
> > regardless of the yarn credentials.
> >
> > IE> if all submittals and monitoring are by the same yarn user ( Metron )
> > from a single or
> > co-operative set of services, that service can maintain the mapping.
> >
> >
> >
> > On May 7, 2018 at 09:39:52, Ryan Merriman (merrimanr@gmail.com) wrote:
> >
> > Otto, your use case makes sense to me. We'll have to think about how to
> > manage the user to job relationships. I'm assuming YARN jobs will be
> > submitted as the metron service user so YARN won't keep track of this for
> > us. Is that assumption correct? Do you have any ideas for doing that?
> >
> > Mike, I can start a feature branch and experiment with merging metron-api
> > into metron-rest. That should allow us to collaborate on any issues or
> > challenges. Also, can you expand on your idea to manage external
> > dependencies as a special module? That seems like a very attractive
> option
> > to me.
> >
> > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <ot...@gmail.com>
> > wrote:
> >
> > > From my response on the other thread, but applicable to the backend
> > stuff:
> > >
> > > "The PCAP Query seems more like PCAP Report to me. You are generating a
> > > report based on parameters.
> > > That report is something that takes some time and external process to
> > > generate… ie you have to wait for it.
> > >
> > > I can almost imagine a flow where you:
> > >
> > > * Are in the AlertUI
> > > * Ask to generate a PCAP report based on some selected
> alerts/meta-alert,
> > > possibly picking from on or more report ‘templates’
> > > that have query options etc
> > > * The report request is ‘queued’, that is dispatched to be be
> > > executed/generated
> > > * You as a user have a ‘queue’ of your report results, and when the
> > report
> > > is done it is queued there
> > > * We ‘monitor’ the report/queue press through the yarn rest ( report
> > > info/meta has the yarn details )
> > > * You can select the report from your queue and view it either in a new
> > UI
> > > or custom component
> > > * You can then apply a different ‘view’ to the report or work with the
> > > report data
> > > * You can print / save etc
> > > * You can associate the report with the alerts ( again in the report
> info
> > > ) with…. a ‘case’ or ‘ticket’ or investigation something or other
> > >
> > >
> > > We can introduce extensibility into the report templates, report views
> (
> > > thinks that work with the json data of the report )
> > >
> > > Something like that.”
> > >
> > > Maybe we can do :
> > >
> > > template -> query parameters -> script => yarn info
> > > yarn info + query info + alert context + yarn status => report info ->
> > > stored in a user’s ‘report queue’
> > > report persistence added to report info
> > > metron-rest -> api to monitor the queue, read results ( page ), etc etc
> > >
> > >
> > > On May 4, 2018 at 09:23:39, Ryan Merriman (merrimanr@gmail.com) wrote:
> > >
> > > I started a separate thread on Pcap UI considerations and user
> > > requirements
> > > at Otto's request. This should help us keep these two related but
> > separate
> > > discussions focused.
> > >
> > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <mi...@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > >
> > > >
> > > > (Youhouuu my first reply on this kind of mail chain^^)
> > > >
> > > >
> > > >
> > > > If I may, I would like to share my view on the following 3 points.
> > > >
> > > > - Backend:
> > > >
> > > > The current metron-api is totally seperate, it will be logic for me
> to
> > > have
> > > > it at the same place as the others rest api. Especially when more
> > > security
> > > > will be added, it will not be needed to do the job twice.
> > > > The current implementation send back a pcap object which still need
> to
> > > be
> > > > decoded. In the opensoc, the decoding was done with tshard on the
> > > frontend.
> > > > It will be good to have this decoding happening directly on the
> backend
> > > to
> > > > not create a load on frontend. An option will be to install tshark on
> > > the
> > > > rest server and to use to convert the pcap to xml and then to a json
> > > that
> > > > will be send to the frontend.
> > > >
> > > > I tried to start directly the map/reduce job to search over all the
> > pcap
> > > > data from the rest server and as Ryan mention it, we had trouble. I
> > will
> > > > try to find back the error.
> > > >
> > > > Then in the POC, what we tried is to use the pcap_query script and
> this
> > > > work fine. I just modified it that he sends back directly the job_id
> of
> > > > yarn and not waiting that the job is finished. Then it will allow the
> > UI
> > > > and the rest server to know what the status of the research by
> querying
> > > the
> > > > yarn rest api. This will allow the UI and the rest server to be async
> > > > without any blocking phase. What do you think about that?
> > > >
> > > >
> > > >
> > > > Having the job submitted directly from the code of the rest server
> will
> > > be
> > > > perfect, but it will need a lot of investigation I think (but I'm not
> > > the
> > > > expert so I might be completely wrong ^^).
> > > >
> > > > We know that the pcap_query scritp work fine so why not calling it?
> Is
> > > it
> > > > that bad? (maybe stupid question, but I really don’t see a lot of
> > > drawback)
> > > >
> > > >
> > > >
> > > > - Front end:
> > > >
> > > > Adding the the pcap search to the alert UI is, I think, the easiest
> way
> > > to
> > > > move forward. But indeed, it will then be the “Alert UI and
> pcapquery”.
> > > > Maybe the name of the UI should just change to something like
> > > “Monitoring &
> > > > Investigation UI” ?
> > > >
> > > >
> > > >
> > > > Is there any roadmap or plan for the different UI? I mean did you
> > > already
> > > > had discussion on how you see the ui evolving with the new feature
> that
> > > > will come in the future?
> > > >
> > > >
> > > >
> > > > - Microservices:
> > > >
> > > >
> > > >
> > > > What do you mean exactly by microservices? Is it to separate all the
> > > > features in different projects? Or something like having the
> different
> > > > components in container like kubernet? (again maybe stupid question,
> > but
> > > I
> > > > don’t clearly understand what you mean J )
> > > >
> > > >
> > > >
> > > > Michel
> > > >
> > >
> > >
> >
> >
>

Re: [DISCUSS] Pcap panel architecture

Posted by Ryan Merriman <me...@gmail.com>.
To explore the idea of merging metron-api into metron-rest and running pcap
queries inside our REST application, I created a simple test here:
https://github.com/merrimanr/incubator-metron/tree/pcap-rest-test.  A
summary of what's included:

   - Added pcap as a dependency in the metron-rest pom.xml
   - Added a pcap query controller endpoint at
   http://node1:8082/swagger-ui.html#!/pcap-query-controller/queryUsingGET
   - Added a pcap query service that runs a simple, hardcoded query

Generate some pcap data using pycapa (
https://github.com/apache/metron/tree/master/metron-sensors/pycapa) and the
pcap topology (
https://github.com/apache/metron/tree/master/metron-platform/metron-pcap-backend#starting-the-topology).
After this initial setup there should be data in HDFS at
"/apps/metron/pcap".  I believe this should be enough to exercise the
issue.  Just hit the endpoint referenced above.  I tested this in an
already running full dev by building and deploying the metron-rest jar.  I
did not rebuild full dev with this change but I would still expect it to
work.  Let me know if it doesn't.

The first error I see when I hit this endpoint is:

java.lang.NoClassDefFoundError:
org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.

Here are the things I've tried so far:

   - Run the REST application with the YARN jar command since this is how
   all our other YARN/MR-related applications are started (metron-api, MAAS,
   pcap query, etc).  I wouldn't expect this to work since we have runtime
   dependencies on our shaded elasticsearch and parser jars and I'm not aware
   of a way to add additional jars to the classpath with the YARN jar command
   (is there a way?).  Either way I get this error:

18/05/04 19:49:56 WARN reflections.Reflections: could not create Dir using
jarFile from url file:/usr/hdp/2.6.4.0-91/hadoop/lib/ojdbc6.jar. skipping.
java.lang.NullPointerException


   - I tried adding `yarn classpath` and `hadoop classpath` to the
   classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST start script).  I
   get this error:

java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.shaded.org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider


   - I searched for the class in the previous attempt but could not find it
   in full dev:

find / -name "*.jar" 2>/dev/null | xargs grep
org/apache/hadoop/hbase/shaded/org/codehaus/jackson/jaxrs/JacksonJaxbJsonProvider
2>/dev/null


   - Further up in the stack trace I see the error happens when initiating
   the org.apache.hadoop.yarn.util.timeline.TimelineUtils class.  I tried
   setting "yarn.timeline-service.enabled" in Ambari to false and then I get
   this error:

Unable to parse
'/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework' as a
URI, check the setting for mapreduce.application.framework.path


   - I've tried adding different hadoop, hbase, yarn and mapreduce Maven
   dependencies without any success
      - hadoop-yarn-client
      - hadoop-yarn-common
      - hadoop-mapreduce-client-core
      - hadoop-yarn-server-common
      - hadoop-yarn-api
      - hbase-server

I will keep exploring other possible solutions.  Let me know if anyone has
any ideas.

On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <ot...@gmail.com> wrote:

> I can imagine a new generic service(s) capability whose job ( pun intended
> ) is to
> abstract the submittal, tracking, and storage of results to yarn.
>
> It would be extended with storage providers, queue provider, possibly some
> set of policies or rather strategies.
>
> The pcap ‘report’ would be a client to that service, the specializes the
> service operation for the way we want pcap to work.
>
> We can then re-use the generic service for other long running yarn
> things…..
>
>
> On May 7, 2018 at 09:56:51, Otto Fowler (ottobackwards@gmail.com) wrote:
>
> RE: Tracking v. users
>
> The submittal and tracking can associate the submitter with the yarn job
> and track that,
> regardless of the yarn credentials.
>
> IE> if all submittals and monitoring are by the same yarn user ( Metron )
> from a single or
> co-operative set of services, that service can maintain the mapping.
>
>
>
> On May 7, 2018 at 09:39:52, Ryan Merriman (merrimanr@gmail.com) wrote:
>
> Otto, your use case makes sense to me. We'll have to think about how to
> manage the user to job relationships. I'm assuming YARN jobs will be
> submitted as the metron service user so YARN won't keep track of this for
> us. Is that assumption correct? Do you have any ideas for doing that?
>
> Mike, I can start a feature branch and experiment with merging metron-api
> into metron-rest. That should allow us to collaborate on any issues or
> challenges. Also, can you expand on your idea to manage external
> dependencies as a special module? That seems like a very attractive option
> to me.
>
> On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <ot...@gmail.com>
> wrote:
>
> > From my response on the other thread, but applicable to the backend
> stuff:
> >
> > "The PCAP Query seems more like PCAP Report to me. You are generating a
> > report based on parameters.
> > That report is something that takes some time and external process to
> > generate… ie you have to wait for it.
> >
> > I can almost imagine a flow where you:
> >
> > * Are in the AlertUI
> > * Ask to generate a PCAP report based on some selected alerts/meta-alert,
> > possibly picking from on or more report ‘templates’
> > that have query options etc
> > * The report request is ‘queued’, that is dispatched to be be
> > executed/generated
> > * You as a user have a ‘queue’ of your report results, and when the
> report
> > is done it is queued there
> > * We ‘monitor’ the report/queue press through the yarn rest ( report
> > info/meta has the yarn details )
> > * You can select the report from your queue and view it either in a new
> UI
> > or custom component
> > * You can then apply a different ‘view’ to the report or work with the
> > report data
> > * You can print / save etc
> > * You can associate the report with the alerts ( again in the report info
> > ) with…. a ‘case’ or ‘ticket’ or investigation something or other
> >
> >
> > We can introduce extensibility into the report templates, report views (
> > thinks that work with the json data of the report )
> >
> > Something like that.”
> >
> > Maybe we can do :
> >
> > template -> query parameters -> script => yarn info
> > yarn info + query info + alert context + yarn status => report info ->
> > stored in a user’s ‘report queue’
> > report persistence added to report info
> > metron-rest -> api to monitor the queue, read results ( page ), etc etc
> >
> >
> > On May 4, 2018 at 09:23:39, Ryan Merriman (merrimanr@gmail.com) wrote:
> >
> > I started a separate thread on Pcap UI considerations and user
> > requirements
> > at Otto's request. This should help us keep these two related but
> separate
> > discussions focused.
> >
> > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <mi...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > >
> > >
> > > (Youhouuu my first reply on this kind of mail chain^^)
> > >
> > >
> > >
> > > If I may, I would like to share my view on the following 3 points.
> > >
> > > - Backend:
> > >
> > > The current metron-api is totally seperate, it will be logic for me to
> > have
> > > it at the same place as the others rest api. Especially when more
> > security
> > > will be added, it will not be needed to do the job twice.
> > > The current implementation send back a pcap object which still need to
> > be
> > > decoded. In the opensoc, the decoding was done with tshard on the
> > frontend.
> > > It will be good to have this decoding happening directly on the backend
> > to
> > > not create a load on frontend. An option will be to install tshark on
> > the
> > > rest server and to use to convert the pcap to xml and then to a json
> > that
> > > will be send to the frontend.
> > >
> > > I tried to start directly the map/reduce job to search over all the
> pcap
> > > data from the rest server and as Ryan mention it, we had trouble. I
> will
> > > try to find back the error.
> > >
> > > Then in the POC, what we tried is to use the pcap_query script and this
> > > work fine. I just modified it that he sends back directly the job_id of
> > > yarn and not waiting that the job is finished. Then it will allow the
> UI
> > > and the rest server to know what the status of the research by querying
> > the
> > > yarn rest api. This will allow the UI and the rest server to be async
> > > without any blocking phase. What do you think about that?
> > >
> > >
> > >
> > > Having the job submitted directly from the code of the rest server will
> > be
> > > perfect, but it will need a lot of investigation I think (but I'm not
> > the
> > > expert so I might be completely wrong ^^).
> > >
> > > We know that the pcap_query scritp work fine so why not calling it? Is
> > it
> > > that bad? (maybe stupid question, but I really don’t see a lot of
> > drawback)
> > >
> > >
> > >
> > > - Front end:
> > >
> > > Adding the the pcap search to the alert UI is, I think, the easiest way
> > to
> > > move forward. But indeed, it will then be the “Alert UI and pcapquery”.
> > > Maybe the name of the UI should just change to something like
> > “Monitoring &
> > > Investigation UI” ?
> > >
> > >
> > >
> > > Is there any roadmap or plan for the different UI? I mean did you
> > already
> > > had discussion on how you see the ui evolving with the new feature that
> > > will come in the future?
> > >
> > >
> > >
> > > - Microservices:
> > >
> > >
> > >
> > > What do you mean exactly by microservices? Is it to separate all the
> > > features in different projects? Or something like having the different
> > > components in container like kubernet? (again maybe stupid question,
> but
> > I
> > > don’t clearly understand what you mean J )
> > >
> > >
> > >
> > > Michel
> > >
> >
> >
>
>

Re: [DISCUSS] Pcap panel architecture

Posted by Otto Fowler <ot...@gmail.com>.
I can imagine a new generic service(s) capability whose job ( pun intended
) is to
abstract the submittal, tracking, and storage of results to yarn.

It would be extended with storage providers, queue provider, possibly some
set of policies or rather strategies.

The pcap ‘report’ would be a client to that service, the specializes the
service operation for the way we want pcap to work.

We can then re-use the generic service for other long running yarn things…..


On May 7, 2018 at 09:56:51, Otto Fowler (ottobackwards@gmail.com) wrote:

RE: Tracking v. users

The submittal and tracking can associate the submitter with the yarn job
and track that,
regardless of the yarn credentials.

IE> if all submittals and monitoring are by the same yarn user ( Metron )
from a single or
co-operative set of services, that service can maintain the mapping.



On May 7, 2018 at 09:39:52, Ryan Merriman (merrimanr@gmail.com) wrote:

Otto, your use case makes sense to me. We'll have to think about how to
manage the user to job relationships. I'm assuming YARN jobs will be
submitted as the metron service user so YARN won't keep track of this for
us. Is that assumption correct? Do you have any ideas for doing that?

Mike, I can start a feature branch and experiment with merging metron-api
into metron-rest. That should allow us to collaborate on any issues or
challenges. Also, can you expand on your idea to manage external
dependencies as a special module? That seems like a very attractive option
to me.

On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <ot...@gmail.com> wrote:

> From my response on the other thread, but applicable to the backend stuff:
>
> "The PCAP Query seems more like PCAP Report to me. You are generating a
> report based on parameters.
> That report is something that takes some time and external process to
> generate… ie you have to wait for it.
>
> I can almost imagine a flow where you:
>
> * Are in the AlertUI
> * Ask to generate a PCAP report based on some selected alerts/meta-alert,
> possibly picking from on or more report ‘templates’
> that have query options etc
> * The report request is ‘queued’, that is dispatched to be be
> executed/generated
> * You as a user have a ‘queue’ of your report results, and when the report
> is done it is queued there
> * We ‘monitor’ the report/queue press through the yarn rest ( report
> info/meta has the yarn details )
> * You can select the report from your queue and view it either in a new UI
> or custom component
> * You can then apply a different ‘view’ to the report or work with the
> report data
> * You can print / save etc
> * You can associate the report with the alerts ( again in the report info
> ) with…. a ‘case’ or ‘ticket’ or investigation something or other
>
>
> We can introduce extensibility into the report templates, report views (
> thinks that work with the json data of the report )
>
> Something like that.”
>
> Maybe we can do :
>
> template -> query parameters -> script => yarn info
> yarn info + query info + alert context + yarn status => report info ->
> stored in a user’s ‘report queue’
> report persistence added to report info
> metron-rest -> api to monitor the queue, read results ( page ), etc etc
>
>
> On May 4, 2018 at 09:23:39, Ryan Merriman (merrimanr@gmail.com) wrote:
>
> I started a separate thread on Pcap UI considerations and user
> requirements
> at Otto's request. This should help us keep these two related but separate
> discussions focused.
>
> On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <mi...@gmail.com>
> wrote:
>
> > Hello,
> >
> >
> >
> > (Youhouuu my first reply on this kind of mail chain^^)
> >
> >
> >
> > If I may, I would like to share my view on the following 3 points.
> >
> > - Backend:
> >
> > The current metron-api is totally seperate, it will be logic for me to
> have
> > it at the same place as the others rest api. Especially when more
> security
> > will be added, it will not be needed to do the job twice.
> > The current implementation send back a pcap object which still need to
> be
> > decoded. In the opensoc, the decoding was done with tshard on the
> frontend.
> > It will be good to have this decoding happening directly on the backend
> to
> > not create a load on frontend. An option will be to install tshark on
> the
> > rest server and to use to convert the pcap to xml and then to a json
> that
> > will be send to the frontend.
> >
> > I tried to start directly the map/reduce job to search over all the pcap
> > data from the rest server and as Ryan mention it, we had trouble. I will
> > try to find back the error.
> >
> > Then in the POC, what we tried is to use the pcap_query script and this
> > work fine. I just modified it that he sends back directly the job_id of
> > yarn and not waiting that the job is finished. Then it will allow the UI
> > and the rest server to know what the status of the research by querying
> the
> > yarn rest api. This will allow the UI and the rest server to be async
> > without any blocking phase. What do you think about that?
> >
> >
> >
> > Having the job submitted directly from the code of the rest server will
> be
> > perfect, but it will need a lot of investigation I think (but I'm not
> the
> > expert so I might be completely wrong ^^).
> >
> > We know that the pcap_query scritp work fine so why not calling it? Is
> it
> > that bad? (maybe stupid question, but I really don’t see a lot of
> drawback)
> >
> >
> >
> > - Front end:
> >
> > Adding the the pcap search to the alert UI is, I think, the easiest way
> to
> > move forward. But indeed, it will then be the “Alert UI and pcapquery”.
> > Maybe the name of the UI should just change to something like
> “Monitoring &
> > Investigation UI” ?
> >
> >
> >
> > Is there any roadmap or plan for the different UI? I mean did you
> already
> > had discussion on how you see the ui evolving with the new feature that
> > will come in the future?
> >
> >
> >
> > - Microservices:
> >
> >
> >
> > What do you mean exactly by microservices? Is it to separate all the
> > features in different projects? Or something like having the different
> > components in container like kubernet? (again maybe stupid question, but
> I
> > don’t clearly understand what you mean J )
> >
> >
> >
> > Michel
> >
>
>

Re: [DISCUSS] Pcap panel architecture

Posted by Otto Fowler <ot...@gmail.com>.
RE: Tracking v. users

The submittal and tracking can associate the submitter with the yarn job
and track that,
regardless of the yarn credentials.

IE> if all submittals and monitoring are by the same yarn user ( Metron )
from a single or
co-operative set of services, that service can maintain the mapping.



On May 7, 2018 at 09:39:52, Ryan Merriman (merrimanr@gmail.com) wrote:

Otto, your use case makes sense to me. We'll have to think about how to
manage the user to job relationships. I'm assuming YARN jobs will be
submitted as the metron service user so YARN won't keep track of this for
us. Is that assumption correct? Do you have any ideas for doing that?

Mike, I can start a feature branch and experiment with merging metron-api
into metron-rest. That should allow us to collaborate on any issues or
challenges. Also, can you expand on your idea to manage external
dependencies as a special module? That seems like a very attractive option
to me.

On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <ot...@gmail.com>
wrote:

> From my response on the other thread, but applicable to the backend
stuff:
>
> "The PCAP Query seems more like PCAP Report to me. You are generating a
> report based on parameters.
> That report is something that takes some time and external process to
> generate… ie you have to wait for it.
>
> I can almost imagine a flow where you:
>
> * Are in the AlertUI
> * Ask to generate a PCAP report based on some selected alerts/meta-alert,
> possibly picking from on or more report ‘templates’
> that have query options etc
> * The report request is ‘queued’, that is dispatched to be be
> executed/generated
> * You as a user have a ‘queue’ of your report results, and when the
report
> is done it is queued there
> * We ‘monitor’ the report/queue press through the yarn rest ( report
> info/meta has the yarn details )
> * You can select the report from your queue and view it either in a new
UI
> or custom component
> * You can then apply a different ‘view’ to the report or work with the
> report data
> * You can print / save etc
> * You can associate the report with the alerts ( again in the report info
> ) with…. a ‘case’ or ‘ticket’ or investigation something or other
>
>
> We can introduce extensibility into the report templates, report views (
> thinks that work with the json data of the report )
>
> Something like that.”
>
> Maybe we can do :
>
> template -> query parameters -> script => yarn info
> yarn info + query info + alert context + yarn status => report info ->
> stored in a user’s ‘report queue’
> report persistence added to report info
> metron-rest -> api to monitor the queue, read results ( page ), etc etc
>
>
> On May 4, 2018 at 09:23:39, Ryan Merriman (merrimanr@gmail.com) wrote:
>
> I started a separate thread on Pcap UI considerations and user
> requirements
> at Otto's request. This should help us keep these two related but
separate
> discussions focused.
>
> On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <mi...@gmail.com>
> wrote:
>
> > Hello,
> >
> >
> >
> > (Youhouuu my first reply on this kind of mail chain^^)
> >
> >
> >
> > If I may, I would like to share my view on the following 3 points.
> >
> > - Backend:
> >
> > The current metron-api is totally seperate, it will be logic for me to
> have
> > it at the same place as the others rest api. Especially when more
> security
> > will be added, it will not be needed to do the job twice.
> > The current implementation send back a pcap object which still need to
> be
> > decoded. In the opensoc, the decoding was done with tshard on the
> frontend.
> > It will be good to have this decoding happening directly on the backend
> to
> > not create a load on frontend. An option will be to install tshark on
> the
> > rest server and to use to convert the pcap to xml and then to a json
> that
> > will be send to the frontend.
> >
> > I tried to start directly the map/reduce job to search over all the
pcap
> > data from the rest server and as Ryan mention it, we had trouble. I
will
> > try to find back the error.
> >
> > Then in the POC, what we tried is to use the pcap_query script and this
> > work fine. I just modified it that he sends back directly the job_id of
> > yarn and not waiting that the job is finished. Then it will allow the
UI
> > and the rest server to know what the status of the research by querying
> the
> > yarn rest api. This will allow the UI and the rest server to be async
> > without any blocking phase. What do you think about that?
> >
> >
> >
> > Having the job submitted directly from the code of the rest server will
> be
> > perfect, but it will need a lot of investigation I think (but I'm not
> the
> > expert so I might be completely wrong ^^).
> >
> > We know that the pcap_query scritp work fine so why not calling it? Is
> it
> > that bad? (maybe stupid question, but I really don’t see a lot of
> drawback)
> >
> >
> >
> > - Front end:
> >
> > Adding the the pcap search to the alert UI is, I think, the easiest way
> to
> > move forward. But indeed, it will then be the “Alert UI and pcapquery”.
> > Maybe the name of the UI should just change to something like
> “Monitoring &
> > Investigation UI” ?
> >
> >
> >
> > Is there any roadmap or plan for the different UI? I mean did you
> already
> > had discussion on how you see the ui evolving with the new feature that
> > will come in the future?
> >
> >
> >
> > - Microservices:
> >
> >
> >
> > What do you mean exactly by microservices? Is it to separate all the
> > features in different projects? Or something like having the different
> > components in container like kubernet? (again maybe stupid question,
but
> I
> > don’t clearly understand what you mean J )
> >
> >
> >
> > Michel
> >
>
>

Re: [DISCUSS] Pcap panel architecture

Posted by Ryan Merriman <me...@gmail.com>.
Otto, your use case makes sense to me.  We'll have to think about how to
manage the user to job relationships.  I'm assuming YARN jobs will be
submitted as the metron service user so YARN won't keep track of this for
us.  Is that assumption correct?  Do you have any ideas for doing that?

Mike, I can start a feature branch and experiment with merging metron-api
into metron-rest.  That should allow us to collaborate on any issues or
challenges.   Also, can you expand on your idea to manage external
dependencies as a special module?  That seems like a very attractive option
to me.

On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <ot...@gmail.com> wrote:

> From my response on the other thread, but applicable to the backend stuff:
>
> "The PCAP Query seems more like PCAP Report to me.  You are generating a
> report based on parameters.
> That report is something that takes some time and external process to
> generate… ie you have to wait for it.
>
> I can almost imagine a flow where you:
>
> * Are in the AlertUI
> * Ask to generate a PCAP report based on some selected alerts/meta-alert,
> possibly picking from on or more report ‘templates’
> that have query options etc
> * The report request is ‘queued’, that is dispatched to be be
> executed/generated
> * You as a user have a ‘queue’ of your report results, and when the report
> is done it is queued there
> * We ‘monitor’ the report/queue press through the yarn rest ( report
> info/meta has the yarn details )
> * You can select the report from your queue and view it either in a new UI
> or custom component
> * You can then apply a different ‘view’ to the report or work with the
> report data
> * You can print / save etc
> * You can associate the report with the alerts ( again in the report info
> ) with…. a ‘case’ or ‘ticket’ or investigation something or other
>
>
> We can introduce extensibility into the report templates, report views (
> thinks that work with the json data of the report )
>
> Something like that.”
>
> Maybe we can do :
>
> template -> query parameters -> script => yarn info
> yarn info + query info + alert context + yarn status => report info ->
> stored in a user’s ‘report queue’
> report persistence added to report info
> metron-rest -> api to monitor the queue, read results ( page ), etc etc
>
>
> On May 4, 2018 at 09:23:39, Ryan Merriman (merrimanr@gmail.com) wrote:
>
> I started a separate thread on Pcap UI considerations and user
> requirements
> at Otto's request. This should help us keep these two related but separate
> discussions focused.
>
> On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <mi...@gmail.com>
> wrote:
>
> > Hello,
> >
> >
> >
> > (Youhouuu my first reply on this kind of mail chain^^)
> >
> >
> >
> > If I may, I would like to share my view on the following 3 points.
> >
> > - Backend:
> >
> > The current metron-api is totally seperate, it will be logic for me to
> have
> > it at the same place as the others rest api. Especially when more
> security
> > will be added, it will not be needed to do the job twice.
> > The current implementation send back a pcap object which still need to
> be
> > decoded. In the opensoc, the decoding was done with tshard on the
> frontend.
> > It will be good to have this decoding happening directly on the backend
> to
> > not create a load on frontend. An option will be to install tshark on
> the
> > rest server and to use to convert the pcap to xml and then to a json
> that
> > will be send to the frontend.
> >
> > I tried to start directly the map/reduce job to search over all the pcap
> > data from the rest server and as Ryan mention it, we had trouble. I will
> > try to find back the error.
> >
> > Then in the POC, what we tried is to use the pcap_query script and this
> > work fine. I just modified it that he sends back directly the job_id of
> > yarn and not waiting that the job is finished. Then it will allow the UI
> > and the rest server to know what the status of the research by querying
> the
> > yarn rest api. This will allow the UI and the rest server to be async
> > without any blocking phase. What do you think about that?
> >
> >
> >
> > Having the job submitted directly from the code of the rest server will
> be
> > perfect, but it will need a lot of investigation I think (but I'm not
> the
> > expert so I might be completely wrong ^^).
> >
> > We know that the pcap_query scritp work fine so why not calling it? Is
> it
> > that bad? (maybe stupid question, but I really don’t see a lot of
> drawback)
> >
> >
> >
> > - Front end:
> >
> > Adding the the pcap search to the alert UI is, I think, the easiest way
> to
> > move forward. But indeed, it will then be the “Alert UI and pcapquery”.
> > Maybe the name of the UI should just change to something like
> “Monitoring &
> > Investigation UI” ?
> >
> >
> >
> > Is there any roadmap or plan for the different UI? I mean did you
> already
> > had discussion on how you see the ui evolving with the new feature that
> > will come in the future?
> >
> >
> >
> > - Microservices:
> >
> >
> >
> > What do you mean exactly by microservices? Is it to separate all the
> > features in different projects? Or something like having the different
> > components in container like kubernet? (again maybe stupid question, but
> I
> > don’t clearly understand what you mean J )
> >
> >
> >
> > Michel
> >
>
>

Re: [DISCUSS] Pcap panel architecture

Posted by Otto Fowler <ot...@gmail.com>.
From my response on the other thread, but applicable to the backend stuff:

"The PCAP Query seems more like PCAP Report to me.  You are generating a
report based on parameters.
That report is something that takes some time and external process to
generate… ie you have to wait for it.

I can almost imagine a flow where you:

* Are in the AlertUI
* Ask to generate a PCAP report based on some selected alerts/meta-alert,
possibly picking from on or more report ‘templates’
that have query options etc
* The report request is ‘queued’, that is dispatched to be be
executed/generated
* You as a user have a ‘queue’ of your report results, and when the report
is done it is queued there
* We ‘monitor’ the report/queue press through the yarn rest ( report
info/meta has the yarn details )
* You can select the report from your queue and view it either in a new UI
or custom component
* You can then apply a different ‘view’ to the report or work with the
report data
* You can print / save etc
* You can associate the report with the alerts ( again in the report info )
with…. a ‘case’ or ‘ticket’ or investigation something or other


We can introduce extensibility into the report templates, report views (
thinks that work with the json data of the report )

Something like that.”

Maybe we can do :

template -> query parameters -> script => yarn info
yarn info + query info + alert context + yarn status => report info ->
stored in a user’s ‘report queue’
report persistence added to report info
metron-rest -> api to monitor the queue, read results ( page ), etc etc


On May 4, 2018 at 09:23:39, Ryan Merriman (merrimanr@gmail.com) wrote:

I started a separate thread on Pcap UI considerations and user requirements
at Otto's request. This should help us keep these two related but separate
discussions focused.

On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <mi...@gmail.com>
wrote:

> Hello,
>
>
>
> (Youhouuu my first reply on this kind of mail chain^^)
>
>
>
> If I may, I would like to share my view on the following 3 points.
>
> - Backend:
>
> The current metron-api is totally seperate, it will be logic for me to
have
> it at the same place as the others rest api. Especially when more
security
> will be added, it will not be needed to do the job twice.
> The current implementation send back a pcap object which still need to be
> decoded. In the opensoc, the decoding was done with tshard on the
frontend.
> It will be good to have this decoding happening directly on the backend
to
> not create a load on frontend. An option will be to install tshark on the
> rest server and to use to convert the pcap to xml and then to a json that
> will be send to the frontend.
>
> I tried to start directly the map/reduce job to search over all the pcap
> data from the rest server and as Ryan mention it, we had trouble. I will
> try to find back the error.
>
> Then in the POC, what we tried is to use the pcap_query script and this
> work fine. I just modified it that he sends back directly the job_id of
> yarn and not waiting that the job is finished. Then it will allow the UI
> and the rest server to know what the status of the research by querying
the
> yarn rest api. This will allow the UI and the rest server to be async
> without any blocking phase. What do you think about that?
>
>
>
> Having the job submitted directly from the code of the rest server will
be
> perfect, but it will need a lot of investigation I think (but I'm not the
> expert so I might be completely wrong ^^).
>
> We know that the pcap_query scritp work fine so why not calling it? Is it
> that bad? (maybe stupid question, but I really don’t see a lot of
drawback)
>
>
>
> - Front end:
>
> Adding the the pcap search to the alert UI is, I think, the easiest way
to
> move forward. But indeed, it will then be the “Alert UI and pcapquery”.
> Maybe the name of the UI should just change to something like “Monitoring
&
> Investigation UI” ?
>
>
>
> Is there any roadmap or plan for the different UI? I mean did you already
> had discussion on how you see the ui evolving with the new feature that
> will come in the future?
>
>
>
> - Microservices:
>
>
>
> What do you mean exactly by microservices? Is it to separate all the
> features in different projects? Or something like having the different
> components in container like kubernet? (again maybe stupid question, but
I
> don’t clearly understand what you mean J )
>
>
>
> Michel
>

Re: [DISCUSS] Pcap panel architecture

Posted by Ryan Merriman <me...@gmail.com>.
I started a separate thread on Pcap UI considerations and user requirements
at Otto's request.  This should help us keep these two related but separate
discussions focused.

On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <mi...@gmail.com>
wrote:

> Hello,
>
>
>
> (Youhouuu my first reply on this kind of mail chain^^)
>
>
>
> If I may, I would like to share my view on the following 3 points.
>
> - Backend:
>
> The current metron-api is totally seperate, it will be logic for me to have
> it at the same place as the others rest api. Especially when more security
> will be added, it will not be needed to do the job twice.
> The current implementation send back a pcap object which still need to be
> decoded. In the opensoc, the decoding was done with tshard on the frontend.
> It will be good to have this decoding happening directly on the backend to
> not create a load on frontend. An option will be to install tshark on the
> rest server and to use to convert the pcap to xml and then to a json that
> will be send to the frontend.
>
> I tried to start directly the map/reduce job to search over all the pcap
> data from the rest server and as Ryan mention it, we had trouble. I will
> try to find back the error.
>
> Then in the POC, what we tried is to use the pcap_query script and this
> work fine. I just modified it that he sends back directly the job_id of
> yarn and not waiting that the job is finished. Then it will allow the UI
> and the rest server to know what the status of the research by querying the
> yarn rest api. This will allow the UI and the rest server to be async
> without any blocking phase. What do you think about that?
>
>
>
> Having the job submitted directly from the code of the rest server will be
> perfect, but it will need a lot of investigation I think (but I'm not the
> expert so I might be completely wrong ^^).
>
> We know that the pcap_query scritp work fine so why not calling it? Is it
> that bad? (maybe stupid question, but I really don’t see a lot of drawback)
>
>
>
> - Front end:
>
> Adding the the pcap search to the alert UI is, I think, the easiest way to
> move forward. But indeed, it will then be the “Alert UI and pcapquery”.
> Maybe the name of the UI should just change to something like “Monitoring &
> Investigation UI” ?
>
>
>
> Is there any roadmap or plan for the different UI? I mean did you already
> had discussion on how you see the ui evolving with the new feature that
> will come in the future?
>
>
>
> - Microservices:
>
>
>
> What do you mean exactly by microservices? Is it to separate all the
> features in different projects? Or something like having the different
> components in container like kubernet? (again maybe stupid question, but I
> don’t clearly understand what you mean J )
>
>
>
> Michel
>

Re: [DISCUSS] Pcap panel architecture

Posted by Michel Sumbul <mi...@gmail.com>.
Hello,



(Youhouuu my first reply on this kind of mail chain^^)



If I may, I would like to share my view on the following 3 points.

- Backend:

The current metron-api is totally seperate, it will be logic for me to have
it at the same place as the others rest api. Especially when more security
will be added, it will not be needed to do the job twice.
The current implementation send back a pcap object which still need to be
decoded. In the opensoc, the decoding was done with tshard on the frontend.
It will be good to have this decoding happening directly on the backend to
not create a load on frontend. An option will be to install tshark on the
rest server and to use to convert the pcap to xml and then to a json that
will be send to the frontend.

I tried to start directly the map/reduce job to search over all the pcap
data from the rest server and as Ryan mention it, we had trouble. I will
try to find back the error.

Then in the POC, what we tried is to use the pcap_query script and this
work fine. I just modified it that he sends back directly the job_id of
yarn and not waiting that the job is finished. Then it will allow the UI
and the rest server to know what the status of the research by querying the
yarn rest api. This will allow the UI and the rest server to be async
without any blocking phase. What do you think about that?



Having the job submitted directly from the code of the rest server will be
perfect, but it will need a lot of investigation I think (but I'm not the
expert so I might be completely wrong ^^).

We know that the pcap_query scritp work fine so why not calling it? Is it
that bad? (maybe stupid question, but I really don’t see a lot of drawback)



- Front end:

Adding the the pcap search to the alert UI is, I think, the easiest way to
move forward. But indeed, it will then be the “Alert UI and pcapquery”.
Maybe the name of the UI should just change to something like “Monitoring &
Investigation UI” ?



Is there any roadmap or plan for the different UI? I mean did you already
had discussion on how you see the ui evolving with the new feature that
will come in the future?



- Microservices:



What do you mean exactly by microservices? Is it to separate all the
features in different projects? Or something like having the different
components in container like kubernet? (again maybe stupid question, but I
don’t clearly understand what you mean J )



Michel

Re: [DISCUSS] Pcap panel architecture

Posted by Ryan Merriman <me...@gmail.com>.
I know, I was running with it :)

> On May 3, 2018, at 10:21 PM, Michael Miklavcic <mi...@gmail.com> wrote:
> 
> Tabs vs spaces was a Silicon Valley joke, man :-)
> 
>> On Thu, May 3, 2018, 8:42 PM Ryan Merriman <me...@gmail.com> wrote:
>> 
>> Mike,
>> 
>> I never said there was anything problematic in metron-api, just that is was
>> inconsistent with the rest of Metron.  There is work involved in making it
>> consistent which is why I listed it as a downside.  I'm less concerned with
>> whether we use tabs or spaces but that we use one or the other.
>> 
>> I apologize for not making this clearer in my original message, but I did
>> not lead the POC development.  My involvement was helping troubleshoot
>> issues they ran into and answering questions about Metron in general.  I've
>> shared with you the information that I have which is my observations about
>> the types of issues they ran into.  I don't have a branch or pom file you
>> can experiment with.  I will reach out to that person and see if they are
>> able to share the exact errors they hit.  Also, the "trade-offs that you
>> seem to have already decided on" is not based on a specific issue or
>> challenge they faced in the POC.  It's based off of the past couple years
>> of working on our REST module and the reoccurring challenges and patterns I
>> see over a period of time.
>> 
>> Otto,
>> 
>> Makes sense to me.  I will start the other threads.
>> 
>> On Thu, May 3, 2018 at 8:50 PM, Otto Fowler <ot...@gmail.com>
>> wrote:
>> 
>>> I think my point is that maybe we should have a discuss about:
>>> 
>>> * PCAP UI, goals etc
>>> * Where it would live and why, what that would mean etc
>>> * Backend ( this original mail )
>>> 
>>> 
>>> 
>>> On May 3, 2018 at 18:34:00, Michael Miklavcic (
>> michael.miklavcic@gmail.com)
>>> wrote:
>>> 
>>> Otto, what are you and your customers finding useful and/or difficult
>> from
>>> a split management/alerts UI perspective? It might help us to restate the
>>> original scope and intent around maintaining separate management and
>> alert
>>> UI's, to your point about "contrary to previous direction." I personally
>>> don't have a strong position on this other than 1) management is a
>>> different feature set from drilling into threat intel, yet many apps
>> still
>>> have their management UI combined with the end user experience and 2) we
>>> should probably consider pcap in context of a workflow with alerts.
>>> 
>>> On Thu, May 3, 2018 at 4:19 PM, Otto Fowler <ot...@gmail.com>
>>> wrote:
>>> 
>>>> If that UI becomes the Alerts _and_ the PCAP Query UI, then it isn’t
>> the
>>>> alerts ui anymore.
>>>> 
>>>> It is becoming more of a “composite” app, with multiple feature ui’s
>>>> together. I didn’t think that
>>>> was what we were going for, thus the config ui and the alert ui.
>>>> 
>>>> Just adding disparate thing as ‘new tabs’ to a ui may be expedient but
>>> it
>>>> seems contrary to
>>>> our previous direction.
>>>> 
>>>> There are a few things to consider if we are going to start moving
>>>> everything into Alerts Ui aren’t there?
>>>> 
>>>> It may be a better road to bring it in on it’s own like the alerts ui
>>>> effort, so it can be released with ‘qualifiers’ and tested with
>>>> the right expectations without effecting the Alerts UI.
>>>> 
>>>> 
>>>> 
>>>> On May 3, 2018 at 17:25:54, Ryan Merriman (merrimanr@gmail.com) wrote:
>>>> 
>>>> Otto,
>>>> 
>>>> I'm assuming just adding it to the Alerts UI is less work but I
>> wouldn't
>>> be
>>>> strongly opposed to it being it's own UI. What are the reasons for
>> doing
>>>> that?
>>>> 
>>>> Mike,
>>>> 
>>>> On using metron-api:
>>>> 
>>>> 1. I'm making an assumption about it not being used much. Maybe it
>>>> still works without issue. I agree, we'll have to test anything we
>> build
>>>> so this is a minor issue.
>>>> 2. Updating metron-api to be asynchronous is a requirement in my
>> opinion
>>>> 3. The MPack work is the major drawback for me. We're essentially
>>>> creating a brand new Metron component. There are a lot of examples we
>>> can
>>>> draw from but it's going to be a large chunk of new MPack code to
>>> maintain
>>>> and MPack development has been painful in the past. I think it will
>>>> include:
>>>> 1. Creating a start script
>>>> 2. Creating master.py and commands.py scripts for managing the
>>>> application lifecycle, service checks, etc
>>>> 3. Creating an -env.xml file for exposing properties in Ambari
>>>> 4. Adding the component to the various MPack files
>>>> (metron_theme.json, metainfo.xml, service_advisor.py, etc.)
>>>> 4. Our Storm topologies are completely different use cases and much
>> more
>>>> complex so I don't understand the comparison. But if you prefer this
>>>> coding style then I think this is a minor issue as well.
>>>> 
>>>> On micro-services:
>>>> 
>>>> 1. Our REST service already includes a lot of dependencies and is
>>>> difficult to manage in it's current state. I just went through this on
>>>> https://github.com/apache/metron/pull/1008. It was painful. When we
>>>> tried to include mapreduce and yarn dependencies it became what seemed
>>> like
>>>> an endless NoSuchMethod, NoClassDef and similar errors. Even if we can
>>> get
>>>> it to work it's going to make managing our REST service that much
>> harder
>>>> than it already is. I think the shaded jars are the source of all this
>>>> trouble and I agree it would be nice to improve our architecture in
>> this
>>>> area. However I don't think it's a simple fix and now we're getting
>> into
>>>> the "will likely take a long time to plan and implement" concern. If
>>>> anyone has ideas on how to solve our shaded jar challenge I would be
>> all
>>>> for it.
>>>> 2. All the MPack work listed above would also be required here. A
>>>> micro-services pattern is a significant shift and can't even give you
>>>> concrete examples of what exactly we would have to do. We would need to
>>> go
>>>> through extensive design and planning to even get to that point.
>>>> 3. It would be a branch new component. See above plus any new
>>>> infrastructure we would need (web server/proxy, service discovery, etc)
>>>> 
>>>> On pcap-query:
>>>> 
>>>> 1. I don't recall any users or customers directly using metron-api but
>>>> if you say so I believe you :)
>>>> 2. As I understand it the pcap topology and pcap query are somewhat
>>>> decoupled. Maybe location of pcap files would be shared? MPack work
>> here
>>>> is likely to include adding a couple properties and moving some around
>>> so
>>>> they can be shared. Deciding between Ambari and global config would be
>>>> similar to properties we add to any component.
>>>> 
>>>> I think you may be underestimating how difficult it's going to be to
>>> solve
>>>> our dependency problem. Or maybe it's me that is overestimating it :)
>> It
>>>> could be something we experiment with before we start on the pcap work.
>>>> There is major upside and it would benefit the whole project. But until
>>>> then we can't fit anymore more screwdrivers in the toolbox. For me the
>>>> only reasonable options are to use the existing metron-api as it's own
>>>> separate service or call out to the pcap_query.sh script from our
>>> existing
>>>> REST app. I could go either way really. I'm just not excited about all
>>>> the MPack code we have to write for a new component. Maybe it won't be
>>>> that bad.
>>>> 
>>>> On Thu, May 3, 2018 at 2:50 PM, Otto Fowler <ot...@gmail.com>
>>>> wrote:
>>>> 
>>>>> First thought is why the Alerts-UI and Not a dedicated Query UI?
>>>>> 
>>>>> 
>>>>> On May 3, 2018 at 14:36:04, Ryan Merriman (merrimanr@gmail.com)
>>> wrote:
>>>>> 
>>>>> We are planning on adding the pcap query feature to the Alerts UI.
>>> Before
>>>>> we start this work, I think it is important to get community buy in
>> on
>>>> the
>>>>> architectural approach. There are a couple different options.
>>>>> 
>>>>> One option is to leverage the existing metron-api module that exposes
>>>> pcap
>>>>> queries through a REST service. The upsides are:
>>>>> 
>>>>> - some work has already been done
>>>>> - it's part of our build so we know unit and integration tests pass
>>>>> 
>>>>> The downsides are:
>>>>> 
>>>>> - It hasn't been used in a while and will need some end to end
>> testing
>>>>> to make sure it still functions properly
>>>>> - It is synchronous and will block the UI, using up the limited
>> number
>>>>> of concurrent connections available in a browser
>>>>> - It will require significant MPack work to properly set it up on
>>> install
>>>>> - It is a legacy module from OpenSOC and coding style is
>> significantly
>>>>> different
>>>>> 
>>>>> Another option would be moving to a micro-services architecture. We
>>> have
>>>>> experimented with a proof of concept and found it was too hard to add
>>>> this
>>>>> feature into our existing REST services because of all the
>>> dependencies
>>>>> that must coexist in the same application. The upsides are:
>>>>> 
>>>>> - Would provide a platform for future Batch/MR/YARN type features
>>>>> - There would be fewer technical compromises since we are building it
>>>>> from the ground up
>>>>> 
>>>>> The downsides are:
>>>>> 
>>>>> - Will require the most effort and will likely take a long time to
>>> plan
>>>>> and implement
>>>>> - Like the previous option, will require significant MPack work
>>>>> 
>>>>> A third option would be to add an endpoint to our existing REST
>>> service
>>>>> that delegates to the pcap_query.sh script through the Java Process
>>>> class.
>>>>> The upsides to this approach are:
>>>>> 
>>>>> - We know the pcap_query.sh script works and would require minimal
>>>>> changes
>>>>> - Minimal MPack work is required since our REST service is already
>>>>> included
>>>>> 
>>>>> The downsides are:
>>>>> 
>>>>> - Does not set us up to easily add other batch-oriented features in
>>> the
>>>>> future
>>>>> - OS-level security becomes a concern since we are delegating to a
>>>>> script in a separate process
>>>>> 
>>>>> I feel like ultimately we want to transition to a micro-services
>>>>> architecture because it will provide more flexibility and make it
>>> easier
>>>>> to
>>>>> grow our set of features. But in the meantime, wrapping the
>>> pcap_query.sh
>>>>> script would allow us to add this feature with less work and fewer
>>> lines
>>>>> of
>>>>> code. If and when we decide to deploy a separate REST application for
>>>>> batch features, the UI portion would require minimal changes.
>>>>> 
>>>>> What does everyone think?
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 

Re: [DISCUSS] Pcap panel architecture

Posted by Michael Miklavcic <mi...@gmail.com>.
Tabs vs spaces was a Silicon Valley joke, man :-)

On Thu, May 3, 2018, 8:42 PM Ryan Merriman <me...@gmail.com> wrote:

> Mike,
>
> I never said there was anything problematic in metron-api, just that is was
> inconsistent with the rest of Metron.  There is work involved in making it
> consistent which is why I listed it as a downside.  I'm less concerned with
> whether we use tabs or spaces but that we use one or the other.
>
> I apologize for not making this clearer in my original message, but I did
> not lead the POC development.  My involvement was helping troubleshoot
> issues they ran into and answering questions about Metron in general.  I've
> shared with you the information that I have which is my observations about
> the types of issues they ran into.  I don't have a branch or pom file you
> can experiment with.  I will reach out to that person and see if they are
> able to share the exact errors they hit.  Also, the "trade-offs that you
> seem to have already decided on" is not based on a specific issue or
> challenge they faced in the POC.  It's based off of the past couple years
> of working on our REST module and the reoccurring challenges and patterns I
> see over a period of time.
>
> Otto,
>
> Makes sense to me.  I will start the other threads.
>
> On Thu, May 3, 2018 at 8:50 PM, Otto Fowler <ot...@gmail.com>
> wrote:
>
> > I think my point is that maybe we should have a discuss about:
> >
> > * PCAP UI, goals etc
> > * Where it would live and why, what that would mean etc
> > * Backend ( this original mail )
> >
> >
> >
> > On May 3, 2018 at 18:34:00, Michael Miklavcic (
> michael.miklavcic@gmail.com)
> > wrote:
> >
> > Otto, what are you and your customers finding useful and/or difficult
> from
> > a split management/alerts UI perspective? It might help us to restate the
> > original scope and intent around maintaining separate management and
> alert
> > UI's, to your point about "contrary to previous direction." I personally
> > don't have a strong position on this other than 1) management is a
> > different feature set from drilling into threat intel, yet many apps
> still
> > have their management UI combined with the end user experience and 2) we
> > should probably consider pcap in context of a workflow with alerts.
> >
> > On Thu, May 3, 2018 at 4:19 PM, Otto Fowler <ot...@gmail.com>
> > wrote:
> >
> > > If that UI becomes the Alerts _and_ the PCAP Query UI, then it isn’t
> the
> > > alerts ui anymore.
> > >
> > > It is becoming more of a “composite” app, with multiple feature ui’s
> > > together. I didn’t think that
> > > was what we were going for, thus the config ui and the alert ui.
> > >
> > > Just adding disparate thing as ‘new tabs’ to a ui may be expedient but
> > it
> > > seems contrary to
> > > our previous direction.
> > >
> > > There are a few things to consider if we are going to start moving
> > > everything into Alerts Ui aren’t there?
> > >
> > > It may be a better road to bring it in on it’s own like the alerts ui
> > > effort, so it can be released with ‘qualifiers’ and tested with
> > > the right expectations without effecting the Alerts UI.
> > >
> > >
> > >
> > > On May 3, 2018 at 17:25:54, Ryan Merriman (merrimanr@gmail.com) wrote:
> > >
> > > Otto,
> > >
> > > I'm assuming just adding it to the Alerts UI is less work but I
> wouldn't
> > be
> > > strongly opposed to it being it's own UI. What are the reasons for
> doing
> > > that?
> > >
> > > Mike,
> > >
> > > On using metron-api:
> > >
> > > 1. I'm making an assumption about it not being used much. Maybe it
> > > still works without issue. I agree, we'll have to test anything we
> build
> > > so this is a minor issue.
> > > 2. Updating metron-api to be asynchronous is a requirement in my
> opinion
> > > 3. The MPack work is the major drawback for me. We're essentially
> > > creating a brand new Metron component. There are a lot of examples we
> > can
> > > draw from but it's going to be a large chunk of new MPack code to
> > maintain
> > > and MPack development has been painful in the past. I think it will
> > > include:
> > > 1. Creating a start script
> > > 2. Creating master.py and commands.py scripts for managing the
> > > application lifecycle, service checks, etc
> > > 3. Creating an -env.xml file for exposing properties in Ambari
> > > 4. Adding the component to the various MPack files
> > > (metron_theme.json, metainfo.xml, service_advisor.py, etc.)
> > > 4. Our Storm topologies are completely different use cases and much
> more
> > > complex so I don't understand the comparison. But if you prefer this
> > > coding style then I think this is a minor issue as well.
> > >
> > > On micro-services:
> > >
> > > 1. Our REST service already includes a lot of dependencies and is
> > > difficult to manage in it's current state. I just went through this on
> > > https://github.com/apache/metron/pull/1008. It was painful. When we
> > > tried to include mapreduce and yarn dependencies it became what seemed
> > like
> > > an endless NoSuchMethod, NoClassDef and similar errors. Even if we can
> > get
> > > it to work it's going to make managing our REST service that much
> harder
> > > than it already is. I think the shaded jars are the source of all this
> > > trouble and I agree it would be nice to improve our architecture in
> this
> > > area. However I don't think it's a simple fix and now we're getting
> into
> > > the "will likely take a long time to plan and implement" concern. If
> > > anyone has ideas on how to solve our shaded jar challenge I would be
> all
> > > for it.
> > > 2. All the MPack work listed above would also be required here. A
> > > micro-services pattern is a significant shift and can't even give you
> > > concrete examples of what exactly we would have to do. We would need to
> > go
> > > through extensive design and planning to even get to that point.
> > > 3. It would be a branch new component. See above plus any new
> > > infrastructure we would need (web server/proxy, service discovery, etc)
> > >
> > > On pcap-query:
> > >
> > > 1. I don't recall any users or customers directly using metron-api but
> > > if you say so I believe you :)
> > > 2. As I understand it the pcap topology and pcap query are somewhat
> > > decoupled. Maybe location of pcap files would be shared? MPack work
> here
> > > is likely to include adding a couple properties and moving some around
> > so
> > > they can be shared. Deciding between Ambari and global config would be
> > > similar to properties we add to any component.
> > >
> > > I think you may be underestimating how difficult it's going to be to
> > solve
> > > our dependency problem. Or maybe it's me that is overestimating it :)
> It
> > > could be something we experiment with before we start on the pcap work.
> > > There is major upside and it would benefit the whole project. But until
> > > then we can't fit anymore more screwdrivers in the toolbox. For me the
> > > only reasonable options are to use the existing metron-api as it's own
> > > separate service or call out to the pcap_query.sh script from our
> > existing
> > > REST app. I could go either way really. I'm just not excited about all
> > > the MPack code we have to write for a new component. Maybe it won't be
> > > that bad.
> > >
> > > On Thu, May 3, 2018 at 2:50 PM, Otto Fowler <ot...@gmail.com>
> > > wrote:
> > >
> > > > First thought is why the Alerts-UI and Not a dedicated Query UI?
> > > >
> > > >
> > > > On May 3, 2018 at 14:36:04, Ryan Merriman (merrimanr@gmail.com)
> > wrote:
> > > >
> > > > We are planning on adding the pcap query feature to the Alerts UI.
> > Before
> > > > we start this work, I think it is important to get community buy in
> on
> > > the
> > > > architectural approach. There are a couple different options.
> > > >
> > > > One option is to leverage the existing metron-api module that exposes
> > > pcap
> > > > queries through a REST service. The upsides are:
> > > >
> > > > - some work has already been done
> > > > - it's part of our build so we know unit and integration tests pass
> > > >
> > > > The downsides are:
> > > >
> > > > - It hasn't been used in a while and will need some end to end
> testing
> > > > to make sure it still functions properly
> > > > - It is synchronous and will block the UI, using up the limited
> number
> > > > of concurrent connections available in a browser
> > > > - It will require significant MPack work to properly set it up on
> > install
> > > > - It is a legacy module from OpenSOC and coding style is
> significantly
> > > > different
> > > >
> > > > Another option would be moving to a micro-services architecture. We
> > have
> > > > experimented with a proof of concept and found it was too hard to add
> > > this
> > > > feature into our existing REST services because of all the
> > dependencies
> > > > that must coexist in the same application. The upsides are:
> > > >
> > > > - Would provide a platform for future Batch/MR/YARN type features
> > > > - There would be fewer technical compromises since we are building it
> > > > from the ground up
> > > >
> > > > The downsides are:
> > > >
> > > > - Will require the most effort and will likely take a long time to
> > plan
> > > > and implement
> > > > - Like the previous option, will require significant MPack work
> > > >
> > > > A third option would be to add an endpoint to our existing REST
> > service
> > > > that delegates to the pcap_query.sh script through the Java Process
> > > class.
> > > > The upsides to this approach are:
> > > >
> > > > - We know the pcap_query.sh script works and would require minimal
> > > > changes
> > > > - Minimal MPack work is required since our REST service is already
> > > > included
> > > >
> > > > The downsides are:
> > > >
> > > > - Does not set us up to easily add other batch-oriented features in
> > the
> > > > future
> > > > - OS-level security becomes a concern since we are delegating to a
> > > > script in a separate process
> > > >
> > > > I feel like ultimately we want to transition to a micro-services
> > > > architecture because it will provide more flexibility and make it
> > easier
> > > > to
> > > > grow our set of features. But in the meantime, wrapping the
> > pcap_query.sh
> > > > script would allow us to add this feature with less work and fewer
> > lines
> > > > of
> > > > code. If and when we decide to deploy a separate REST application for
> > > > batch features, the UI portion would require minimal changes.
> > > >
> > > > What does everyone think?
> > > >
> > > >
> > >
> >
> >
>

Re: [DISCUSS] Pcap panel architecture

Posted by Ryan Merriman <me...@gmail.com>.
Mike,

I never said there was anything problematic in metron-api, just that is was
inconsistent with the rest of Metron.  There is work involved in making it
consistent which is why I listed it as a downside.  I'm less concerned with
whether we use tabs or spaces but that we use one or the other.

I apologize for not making this clearer in my original message, but I did
not lead the POC development.  My involvement was helping troubleshoot
issues they ran into and answering questions about Metron in general.  I've
shared with you the information that I have which is my observations about
the types of issues they ran into.  I don't have a branch or pom file you
can experiment with.  I will reach out to that person and see if they are
able to share the exact errors they hit.  Also, the "trade-offs that you
seem to have already decided on" is not based on a specific issue or
challenge they faced in the POC.  It's based off of the past couple years
of working on our REST module and the reoccurring challenges and patterns I
see over a period of time.

Otto,

Makes sense to me.  I will start the other threads.

On Thu, May 3, 2018 at 8:50 PM, Otto Fowler <ot...@gmail.com> wrote:

> I think my point is that maybe we should have a discuss about:
>
> * PCAP UI, goals etc
> * Where it would live and why, what that would mean etc
> * Backend ( this original mail )
>
>
>
> On May 3, 2018 at 18:34:00, Michael Miklavcic (michael.miklavcic@gmail.com)
> wrote:
>
> Otto, what are you and your customers finding useful and/or difficult from
> a split management/alerts UI perspective? It might help us to restate the
> original scope and intent around maintaining separate management and alert
> UI's, to your point about "contrary to previous direction." I personally
> don't have a strong position on this other than 1) management is a
> different feature set from drilling into threat intel, yet many apps still
> have their management UI combined with the end user experience and 2) we
> should probably consider pcap in context of a workflow with alerts.
>
> On Thu, May 3, 2018 at 4:19 PM, Otto Fowler <ot...@gmail.com>
> wrote:
>
> > If that UI becomes the Alerts _and_ the PCAP Query UI, then it isn’t the
> > alerts ui anymore.
> >
> > It is becoming more of a “composite” app, with multiple feature ui’s
> > together. I didn’t think that
> > was what we were going for, thus the config ui and the alert ui.
> >
> > Just adding disparate thing as ‘new tabs’ to a ui may be expedient but
> it
> > seems contrary to
> > our previous direction.
> >
> > There are a few things to consider if we are going to start moving
> > everything into Alerts Ui aren’t there?
> >
> > It may be a better road to bring it in on it’s own like the alerts ui
> > effort, so it can be released with ‘qualifiers’ and tested with
> > the right expectations without effecting the Alerts UI.
> >
> >
> >
> > On May 3, 2018 at 17:25:54, Ryan Merriman (merrimanr@gmail.com) wrote:
> >
> > Otto,
> >
> > I'm assuming just adding it to the Alerts UI is less work but I wouldn't
> be
> > strongly opposed to it being it's own UI. What are the reasons for doing
> > that?
> >
> > Mike,
> >
> > On using metron-api:
> >
> > 1. I'm making an assumption about it not being used much. Maybe it
> > still works without issue. I agree, we'll have to test anything we build
> > so this is a minor issue.
> > 2. Updating metron-api to be asynchronous is a requirement in my opinion
> > 3. The MPack work is the major drawback for me. We're essentially
> > creating a brand new Metron component. There are a lot of examples we
> can
> > draw from but it's going to be a large chunk of new MPack code to
> maintain
> > and MPack development has been painful in the past. I think it will
> > include:
> > 1. Creating a start script
> > 2. Creating master.py and commands.py scripts for managing the
> > application lifecycle, service checks, etc
> > 3. Creating an -env.xml file for exposing properties in Ambari
> > 4. Adding the component to the various MPack files
> > (metron_theme.json, metainfo.xml, service_advisor.py, etc.)
> > 4. Our Storm topologies are completely different use cases and much more
> > complex so I don't understand the comparison. But if you prefer this
> > coding style then I think this is a minor issue as well.
> >
> > On micro-services:
> >
> > 1. Our REST service already includes a lot of dependencies and is
> > difficult to manage in it's current state. I just went through this on
> > https://github.com/apache/metron/pull/1008. It was painful. When we
> > tried to include mapreduce and yarn dependencies it became what seemed
> like
> > an endless NoSuchMethod, NoClassDef and similar errors. Even if we can
> get
> > it to work it's going to make managing our REST service that much harder
> > than it already is. I think the shaded jars are the source of all this
> > trouble and I agree it would be nice to improve our architecture in this
> > area. However I don't think it's a simple fix and now we're getting into
> > the "will likely take a long time to plan and implement" concern. If
> > anyone has ideas on how to solve our shaded jar challenge I would be all
> > for it.
> > 2. All the MPack work listed above would also be required here. A
> > micro-services pattern is a significant shift and can't even give you
> > concrete examples of what exactly we would have to do. We would need to
> go
> > through extensive design and planning to even get to that point.
> > 3. It would be a branch new component. See above plus any new
> > infrastructure we would need (web server/proxy, service discovery, etc)
> >
> > On pcap-query:
> >
> > 1. I don't recall any users or customers directly using metron-api but
> > if you say so I believe you :)
> > 2. As I understand it the pcap topology and pcap query are somewhat
> > decoupled. Maybe location of pcap files would be shared? MPack work here
> > is likely to include adding a couple properties and moving some around
> so
> > they can be shared. Deciding between Ambari and global config would be
> > similar to properties we add to any component.
> >
> > I think you may be underestimating how difficult it's going to be to
> solve
> > our dependency problem. Or maybe it's me that is overestimating it :) It
> > could be something we experiment with before we start on the pcap work.
> > There is major upside and it would benefit the whole project. But until
> > then we can't fit anymore more screwdrivers in the toolbox. For me the
> > only reasonable options are to use the existing metron-api as it's own
> > separate service or call out to the pcap_query.sh script from our
> existing
> > REST app. I could go either way really. I'm just not excited about all
> > the MPack code we have to write for a new component. Maybe it won't be
> > that bad.
> >
> > On Thu, May 3, 2018 at 2:50 PM, Otto Fowler <ot...@gmail.com>
> > wrote:
> >
> > > First thought is why the Alerts-UI and Not a dedicated Query UI?
> > >
> > >
> > > On May 3, 2018 at 14:36:04, Ryan Merriman (merrimanr@gmail.com)
> wrote:
> > >
> > > We are planning on adding the pcap query feature to the Alerts UI.
> Before
> > > we start this work, I think it is important to get community buy in on
> > the
> > > architectural approach. There are a couple different options.
> > >
> > > One option is to leverage the existing metron-api module that exposes
> > pcap
> > > queries through a REST service. The upsides are:
> > >
> > > - some work has already been done
> > > - it's part of our build so we know unit and integration tests pass
> > >
> > > The downsides are:
> > >
> > > - It hasn't been used in a while and will need some end to end testing
> > > to make sure it still functions properly
> > > - It is synchronous and will block the UI, using up the limited number
> > > of concurrent connections available in a browser
> > > - It will require significant MPack work to properly set it up on
> install
> > > - It is a legacy module from OpenSOC and coding style is significantly
> > > different
> > >
> > > Another option would be moving to a micro-services architecture. We
> have
> > > experimented with a proof of concept and found it was too hard to add
> > this
> > > feature into our existing REST services because of all the
> dependencies
> > > that must coexist in the same application. The upsides are:
> > >
> > > - Would provide a platform for future Batch/MR/YARN type features
> > > - There would be fewer technical compromises since we are building it
> > > from the ground up
> > >
> > > The downsides are:
> > >
> > > - Will require the most effort and will likely take a long time to
> plan
> > > and implement
> > > - Like the previous option, will require significant MPack work
> > >
> > > A third option would be to add an endpoint to our existing REST
> service
> > > that delegates to the pcap_query.sh script through the Java Process
> > class.
> > > The upsides to this approach are:
> > >
> > > - We know the pcap_query.sh script works and would require minimal
> > > changes
> > > - Minimal MPack work is required since our REST service is already
> > > included
> > >
> > > The downsides are:
> > >
> > > - Does not set us up to easily add other batch-oriented features in
> the
> > > future
> > > - OS-level security becomes a concern since we are delegating to a
> > > script in a separate process
> > >
> > > I feel like ultimately we want to transition to a micro-services
> > > architecture because it will provide more flexibility and make it
> easier
> > > to
> > > grow our set of features. But in the meantime, wrapping the
> pcap_query.sh
> > > script would allow us to add this feature with less work and fewer
> lines
> > > of
> > > code. If and when we decide to deploy a separate REST application for
> > > batch features, the UI portion would require minimal changes.
> > >
> > > What does everyone think?
> > >
> > >
> >
>
>

Re: [DISCUSS] Pcap panel architecture

Posted by Otto Fowler <ot...@gmail.com>.
I think my point is that maybe we should have a discuss about:

* PCAP UI, goals etc
* Where it would live and why, what that would mean etc
* Backend ( this original mail )



On May 3, 2018 at 18:34:00, Michael Miklavcic (michael.miklavcic@gmail.com)
wrote:

Otto, what are you and your customers finding useful and/or difficult from
a split management/alerts UI perspective? It might help us to restate the
original scope and intent around maintaining separate management and alert
UI's, to your point about "contrary to previous direction." I personally
don't have a strong position on this other than 1) management is a
different feature set from drilling into threat intel, yet many apps still
have their management UI combined with the end user experience and 2) we
should probably consider pcap in context of a workflow with alerts.

On Thu, May 3, 2018 at 4:19 PM, Otto Fowler <ot...@gmail.com>
wrote:

> If that UI becomes the Alerts _and_ the PCAP Query UI, then it isn’t the
> alerts ui anymore.
>
> It is becoming more of a “composite” app, with multiple feature ui’s
> together. I didn’t think that
> was what we were going for, thus the config ui and the alert ui.
>
> Just adding disparate thing as ‘new tabs’ to a ui may be expedient but it
> seems contrary to
> our previous direction.
>
> There are a few things to consider if we are going to start moving
> everything into Alerts Ui aren’t there?
>
> It may be a better road to bring it in on it’s own like the alerts ui
> effort, so it can be released with ‘qualifiers’ and tested with
> the right expectations without effecting the Alerts UI.
>
>
>
> On May 3, 2018 at 17:25:54, Ryan Merriman (merrimanr@gmail.com) wrote:
>
> Otto,
>
> I'm assuming just adding it to the Alerts UI is less work but I wouldn't
be
> strongly opposed to it being it's own UI. What are the reasons for doing
> that?
>
> Mike,
>
> On using metron-api:
>
> 1. I'm making an assumption about it not being used much. Maybe it
> still works without issue. I agree, we'll have to test anything we build
> so this is a minor issue.
> 2. Updating metron-api to be asynchronous is a requirement in my opinion
> 3. The MPack work is the major drawback for me. We're essentially
> creating a brand new Metron component. There are a lot of examples we can
> draw from but it's going to be a large chunk of new MPack code to
maintain
> and MPack development has been painful in the past. I think it will
> include:
> 1. Creating a start script
> 2. Creating master.py and commands.py scripts for managing the
> application lifecycle, service checks, etc
> 3. Creating an -env.xml file for exposing properties in Ambari
> 4. Adding the component to the various MPack files
> (metron_theme.json, metainfo.xml, service_advisor.py, etc.)
> 4. Our Storm topologies are completely different use cases and much more
> complex so I don't understand the comparison. But if you prefer this
> coding style then I think this is a minor issue as well.
>
> On micro-services:
>
> 1. Our REST service already includes a lot of dependencies and is
> difficult to manage in it's current state. I just went through this on
> https://github.com/apache/metron/pull/1008. It was painful. When we
> tried to include mapreduce and yarn dependencies it became what seemed
like
> an endless NoSuchMethod, NoClassDef and similar errors. Even if we can
get
> it to work it's going to make managing our REST service that much harder
> than it already is. I think the shaded jars are the source of all this
> trouble and I agree it would be nice to improve our architecture in this
> area. However I don't think it's a simple fix and now we're getting into
> the "will likely take a long time to plan and implement" concern. If
> anyone has ideas on how to solve our shaded jar challenge I would be all
> for it.
> 2. All the MPack work listed above would also be required here. A
> micro-services pattern is a significant shift and can't even give you
> concrete examples of what exactly we would have to do. We would need to
go
> through extensive design and planning to even get to that point.
> 3. It would be a branch new component. See above plus any new
> infrastructure we would need (web server/proxy, service discovery, etc)
>
> On pcap-query:
>
> 1. I don't recall any users or customers directly using metron-api but
> if you say so I believe you :)
> 2. As I understand it the pcap topology and pcap query are somewhat
> decoupled. Maybe location of pcap files would be shared? MPack work here
> is likely to include adding a couple properties and moving some around so
> they can be shared. Deciding between Ambari and global config would be
> similar to properties we add to any component.
>
> I think you may be underestimating how difficult it's going to be to
solve
> our dependency problem. Or maybe it's me that is overestimating it :) It
> could be something we experiment with before we start on the pcap work.
> There is major upside and it would benefit the whole project. But until
> then we can't fit anymore more screwdrivers in the toolbox. For me the
> only reasonable options are to use the existing metron-api as it's own
> separate service or call out to the pcap_query.sh script from our
existing
> REST app. I could go either way really. I'm just not excited about all
> the MPack code we have to write for a new component. Maybe it won't be
> that bad.
>
> On Thu, May 3, 2018 at 2:50 PM, Otto Fowler <ot...@gmail.com>
> wrote:
>
> > First thought is why the Alerts-UI and Not a dedicated Query UI?
> >
> >
> > On May 3, 2018 at 14:36:04, Ryan Merriman (merrimanr@gmail.com) wrote:
> >
> > We are planning on adding the pcap query feature to the Alerts UI.
Before
> > we start this work, I think it is important to get community buy in on
> the
> > architectural approach. There are a couple different options.
> >
> > One option is to leverage the existing metron-api module that exposes
> pcap
> > queries through a REST service. The upsides are:
> >
> > - some work has already been done
> > - it's part of our build so we know unit and integration tests pass
> >
> > The downsides are:
> >
> > - It hasn't been used in a while and will need some end to end testing
> > to make sure it still functions properly
> > - It is synchronous and will block the UI, using up the limited number
> > of concurrent connections available in a browser
> > - It will require significant MPack work to properly set it up on
install
> > - It is a legacy module from OpenSOC and coding style is significantly
> > different
> >
> > Another option would be moving to a micro-services architecture. We
have
> > experimented with a proof of concept and found it was too hard to add
> this
> > feature into our existing REST services because of all the dependencies
> > that must coexist in the same application. The upsides are:
> >
> > - Would provide a platform for future Batch/MR/YARN type features
> > - There would be fewer technical compromises since we are building it
> > from the ground up
> >
> > The downsides are:
> >
> > - Will require the most effort and will likely take a long time to plan
> > and implement
> > - Like the previous option, will require significant MPack work
> >
> > A third option would be to add an endpoint to our existing REST service
> > that delegates to the pcap_query.sh script through the Java Process
> class.
> > The upsides to this approach are:
> >
> > - We know the pcap_query.sh script works and would require minimal
> > changes
> > - Minimal MPack work is required since our REST service is already
> > included
> >
> > The downsides are:
> >
> > - Does not set us up to easily add other batch-oriented features in the
> > future
> > - OS-level security becomes a concern since we are delegating to a
> > script in a separate process
> >
> > I feel like ultimately we want to transition to a micro-services
> > architecture because it will provide more flexibility and make it
easier
> > to
> > grow our set of features. But in the meantime, wrapping the
pcap_query.sh
> > script would allow us to add this feature with less work and fewer
lines
> > of
> > code. If and when we decide to deploy a separate REST application for
> > batch features, the UI portion would require minimal changes.
> >
> > What does everyone think?
> >
> >
>

Re: [DISCUSS] Pcap panel architecture

Posted by Michael Miklavcic <mi...@gmail.com>.
Otto, what are you and your customers finding useful and/or difficult from
a split management/alerts UI perspective? It might help us to restate the
original scope and intent around maintaining separate management and alert
UI's, to your point about "contrary to previous direction." I personally
don't have a strong position on this other than 1) management is a
different feature set from drilling into threat intel, yet many apps still
have their management UI combined with the end user experience and 2) we
should probably consider pcap in context of a workflow with alerts.

On Thu, May 3, 2018 at 4:19 PM, Otto Fowler <ot...@gmail.com> wrote:

> If that UI becomes the Alerts _and_ the PCAP Query UI, then it isn’t the
> alerts ui anymore.
>
> It is becoming more of a “composite” app, with multiple feature ui’s
> together.  I didn’t think that
> was what we were going for, thus the config ui and the alert ui.
>
> Just adding disparate thing as ‘new tabs’ to a ui may be expedient but it
> seems contrary to
> our previous direction.
>
> There are a few things to consider if we are going to start moving
> everything into Alerts Ui aren’t there?
>
> It may be a better road to bring it in on it’s own like the alerts ui
> effort, so it can be released with ‘qualifiers’ and tested with
> the right expectations without effecting the Alerts UI.
>
>
>
> On May 3, 2018 at 17:25:54, Ryan Merriman (merrimanr@gmail.com) wrote:
>
> Otto,
>
> I'm assuming just adding it to the Alerts UI is less work but I wouldn't be
> strongly opposed to it being it's own UI. What are the reasons for doing
> that?
>
> Mike,
>
> On using metron-api:
>
> 1. I'm making an assumption about it not being used much. Maybe it
> still works without issue. I agree, we'll have to test anything we build
> so this is a minor issue.
> 2. Updating metron-api to be asynchronous is a requirement in my opinion
> 3. The MPack work is the major drawback for me. We're essentially
> creating a brand new Metron component. There are a lot of examples we can
> draw from but it's going to be a large chunk of new MPack code to maintain
> and MPack development has been painful in the past. I think it will
> include:
> 1. Creating a start script
> 2. Creating master.py and commands.py scripts for managing the
> application lifecycle, service checks, etc
> 3. Creating an -env.xml file for exposing properties in Ambari
> 4. Adding the component to the various MPack files
> (metron_theme.json, metainfo.xml, service_advisor.py, etc.)
> 4. Our Storm topologies are completely different use cases and much more
> complex so I don't understand the comparison. But if you prefer this
> coding style then I think this is a minor issue as well.
>
> On micro-services:
>
> 1. Our REST service already includes a lot of dependencies and is
> difficult to manage in it's current state. I just went through this on
> https://github.com/apache/metron/pull/1008. It was painful. When we
> tried to include mapreduce and yarn dependencies it became what seemed like
> an endless NoSuchMethod, NoClassDef and similar errors. Even if we can get
> it to work it's going to make managing our REST service that much harder
> than it already is. I think the shaded jars are the source of all this
> trouble and I agree it would be nice to improve our architecture in this
> area. However I don't think it's a simple fix and now we're getting into
> the "will likely take a long time to plan and implement" concern. If
> anyone has ideas on how to solve our shaded jar challenge I would be all
> for it.
> 2. All the MPack work listed above would also be required here. A
> micro-services pattern is a significant shift and can't even give you
> concrete examples of what exactly we would have to do. We would need to go
> through extensive design and planning to even get to that point.
> 3. It would be a branch new component. See above plus any new
> infrastructure we would need (web server/proxy, service discovery, etc)
>
> On pcap-query:
>
> 1. I don't recall any users or customers directly using metron-api but
> if you say so I believe you :)
> 2. As I understand it the pcap topology and pcap query are somewhat
> decoupled. Maybe location of pcap files would be shared? MPack work here
> is likely to include adding a couple properties and moving some around so
> they can be shared. Deciding between Ambari and global config would be
> similar to properties we add to any component.
>
> I think you may be underestimating how difficult it's going to be to solve
> our dependency problem. Or maybe it's me that is overestimating it :) It
> could be something we experiment with before we start on the pcap work.
> There is major upside and it would benefit the whole project. But until
> then we can't fit anymore more screwdrivers in the toolbox. For me the
> only reasonable options are to use the existing metron-api as it's own
> separate service or call out to the pcap_query.sh script from our existing
> REST app. I could go either way really. I'm just not excited about all
> the MPack code we have to write for a new component. Maybe it won't be
> that bad.
>
> On Thu, May 3, 2018 at 2:50 PM, Otto Fowler <ot...@gmail.com>
> wrote:
>
> > First thought is why the Alerts-UI and Not a dedicated Query UI?
> >
> >
> > On May 3, 2018 at 14:36:04, Ryan Merriman (merrimanr@gmail.com) wrote:
> >
> > We are planning on adding the pcap query feature to the Alerts UI. Before
> > we start this work, I think it is important to get community buy in on
> the
> > architectural approach. There are a couple different options.
> >
> > One option is to leverage the existing metron-api module that exposes
> pcap
> > queries through a REST service. The upsides are:
> >
> > - some work has already been done
> > - it's part of our build so we know unit and integration tests pass
> >
> > The downsides are:
> >
> > - It hasn't been used in a while and will need some end to end testing
> > to make sure it still functions properly
> > - It is synchronous and will block the UI, using up the limited number
> > of concurrent connections available in a browser
> > - It will require significant MPack work to properly set it up on install
> > - It is a legacy module from OpenSOC and coding style is significantly
> > different
> >
> > Another option would be moving to a micro-services architecture. We have
> > experimented with a proof of concept and found it was too hard to add
> this
> > feature into our existing REST services because of all the dependencies
> > that must coexist in the same application. The upsides are:
> >
> > - Would provide a platform for future Batch/MR/YARN type features
> > - There would be fewer technical compromises since we are building it
> > from the ground up
> >
> > The downsides are:
> >
> > - Will require the most effort and will likely take a long time to plan
> > and implement
> > - Like the previous option, will require significant MPack work
> >
> > A third option would be to add an endpoint to our existing REST service
> > that delegates to the pcap_query.sh script through the Java Process
> class.
> > The upsides to this approach are:
> >
> > - We know the pcap_query.sh script works and would require minimal
> > changes
> > - Minimal MPack work is required since our REST service is already
> > included
> >
> > The downsides are:
> >
> > - Does not set us up to easily add other batch-oriented features in the
> > future
> > - OS-level security becomes a concern since we are delegating to a
> > script in a separate process
> >
> > I feel like ultimately we want to transition to a micro-services
> > architecture because it will provide more flexibility and make it easier
> > to
> > grow our set of features. But in the meantime, wrapping the pcap_query.sh
> > script would allow us to add this feature with less work and fewer lines
> > of
> > code. If and when we decide to deploy a separate REST application for
> > batch features, the UI portion would require minimal changes.
> >
> > What does everyone think?
> >
> >
>

Re: [DISCUSS] Pcap panel architecture

Posted by Otto Fowler <ot...@gmail.com>.
If that UI becomes the Alerts _and_ the PCAP Query UI, then it isn’t the
alerts ui anymore.

It is becoming more of a “composite” app, with multiple feature ui’s
together.  I didn’t think that
was what we were going for, thus the config ui and the alert ui.

Just adding disparate thing as ‘new tabs’ to a ui may be expedient but it
seems contrary to
our previous direction.

There are a few things to consider if we are going to start moving
everything into Alerts Ui aren’t there?

It may be a better road to bring it in on it’s own like the alerts ui
effort, so it can be released with ‘qualifiers’ and tested with
the right expectations without effecting the Alerts UI.



On May 3, 2018 at 17:25:54, Ryan Merriman (merrimanr@gmail.com) wrote:

Otto,

I'm assuming just adding it to the Alerts UI is less work but I wouldn't be
strongly opposed to it being it's own UI. What are the reasons for doing
that?

Mike,

On using metron-api:

1. I'm making an assumption about it not being used much. Maybe it
still works without issue. I agree, we'll have to test anything we build
so this is a minor issue.
2. Updating metron-api to be asynchronous is a requirement in my opinion
3. The MPack work is the major drawback for me. We're essentially
creating a brand new Metron component. There are a lot of examples we can
draw from but it's going to be a large chunk of new MPack code to maintain
and MPack development has been painful in the past. I think it will
include:
1. Creating a start script
2. Creating master.py and commands.py scripts for managing the
application lifecycle, service checks, etc
3. Creating an -env.xml file for exposing properties in Ambari
4. Adding the component to the various MPack files
(metron_theme.json, metainfo.xml, service_advisor.py, etc.)
4. Our Storm topologies are completely different use cases and much more
complex so I don't understand the comparison. But if you prefer this
coding style then I think this is a minor issue as well.

On micro-services:

1. Our REST service already includes a lot of dependencies and is
difficult to manage in it's current state. I just went through this on
https://github.com/apache/metron/pull/1008. It was painful. When we
tried to include mapreduce and yarn dependencies it became what seemed like
an endless NoSuchMethod, NoClassDef and similar errors. Even if we can get
it to work it's going to make managing our REST service that much harder
than it already is. I think the shaded jars are the source of all this
trouble and I agree it would be nice to improve our architecture in this
area. However I don't think it's a simple fix and now we're getting into
the "will likely take a long time to plan and implement" concern. If
anyone has ideas on how to solve our shaded jar challenge I would be all
for it.
2. All the MPack work listed above would also be required here. A
micro-services pattern is a significant shift and can't even give you
concrete examples of what exactly we would have to do. We would need to go
through extensive design and planning to even get to that point.
3. It would be a branch new component. See above plus any new
infrastructure we would need (web server/proxy, service discovery, etc)

On pcap-query:

1. I don't recall any users or customers directly using metron-api but
if you say so I believe you :)
2. As I understand it the pcap topology and pcap query are somewhat
decoupled. Maybe location of pcap files would be shared? MPack work here
is likely to include adding a couple properties and moving some around so
they can be shared. Deciding between Ambari and global config would be
similar to properties we add to any component.

I think you may be underestimating how difficult it's going to be to solve
our dependency problem. Or maybe it's me that is overestimating it :) It
could be something we experiment with before we start on the pcap work.
There is major upside and it would benefit the whole project. But until
then we can't fit anymore more screwdrivers in the toolbox. For me the
only reasonable options are to use the existing metron-api as it's own
separate service or call out to the pcap_query.sh script from our existing
REST app. I could go either way really. I'm just not excited about all
the MPack code we have to write for a new component. Maybe it won't be
that bad.

On Thu, May 3, 2018 at 2:50 PM, Otto Fowler <ot...@gmail.com>
wrote:

> First thought is why the Alerts-UI and Not a dedicated Query UI?
>
>
> On May 3, 2018 at 14:36:04, Ryan Merriman (merrimanr@gmail.com) wrote:
>
> We are planning on adding the pcap query feature to the Alerts UI. Before
> we start this work, I think it is important to get community buy in on
the
> architectural approach. There are a couple different options.
>
> One option is to leverage the existing metron-api module that exposes
pcap
> queries through a REST service. The upsides are:
>
> - some work has already been done
> - it's part of our build so we know unit and integration tests pass
>
> The downsides are:
>
> - It hasn't been used in a while and will need some end to end testing
> to make sure it still functions properly
> - It is synchronous and will block the UI, using up the limited number
> of concurrent connections available in a browser
> - It will require significant MPack work to properly set it up on install
> - It is a legacy module from OpenSOC and coding style is significantly
> different
>
> Another option would be moving to a micro-services architecture. We have
> experimented with a proof of concept and found it was too hard to add
this
> feature into our existing REST services because of all the dependencies
> that must coexist in the same application. The upsides are:
>
> - Would provide a platform for future Batch/MR/YARN type features
> - There would be fewer technical compromises since we are building it
> from the ground up
>
> The downsides are:
>
> - Will require the most effort and will likely take a long time to plan
> and implement
> - Like the previous option, will require significant MPack work
>
> A third option would be to add an endpoint to our existing REST service
> that delegates to the pcap_query.sh script through the Java Process
class.
> The upsides to this approach are:
>
> - We know the pcap_query.sh script works and would require minimal
> changes
> - Minimal MPack work is required since our REST service is already
> included
>
> The downsides are:
>
> - Does not set us up to easily add other batch-oriented features in the
> future
> - OS-level security becomes a concern since we are delegating to a
> script in a separate process
>
> I feel like ultimately we want to transition to a micro-services
> architecture because it will provide more flexibility and make it easier
> to
> grow our set of features. But in the meantime, wrapping the pcap_query.sh
> script would allow us to add this feature with less work and fewer lines
> of
> code. If and when we decide to deploy a separate REST application for
> batch features, the UI portion would require minimal changes.
>
> What does everyone think?
>
>

Re: [DISCUSS] Pcap panel architecture

Posted by Michael Miklavcic <mi...@gmail.com>.
Comments inline below.


On Thu, May 3, 2018 at 3:25 PM, Ryan Merriman <me...@gmail.com> wrote:

> Otto,
>
> I'm assuming just adding it to the Alerts UI is less work but I wouldn't be
> strongly opposed to it being it's own UI.  What are the reasons for doing
> that?
>
> I don't know that we should split them up. It seems like a sub-section or
wizard or some such would be useful here. The use cases I've seen/heard
around PCAP often started with an infosec analyst doing a search on alerts
that followed with them going to query pcap data corresponding to the
threats they're investigating. Maybe we should emphasize streamlining this
experience?


> Mike,
>
> On using metron-api:
>
>    1. I'm making an assumption about it not being used much.  Maybe it
>    still works without issue.  I agree, we'll have to test anything we
> build
>    so this is a minor issue.
>    2. Updating metron-api to be asynchronous is a requirement in my opinion
>

Yes, I agree this is reasonable to do.


>    3. The MPack work is the major drawback for me.  We're essentially
>    creating a brand new Metron component.  There are a lot of examples we
> can
>    draw from but it's going to be a large chunk of new MPack code to
> maintain
>    and MPack development has been painful in the past.  I think it will
>    include:
>       1. Creating a start script
>       2. Creating master.py and commands.py scripts for managing the
>       application lifecycle, service checks, etc
>       3. Creating an -env.xml file for exposing properties in Ambari
>       4. Adding the component to the various MPack files
>       (metron_theme.json, metainfo.xml, service_advisor.py, etc.)
>

Awesome - this is exactly what I/we need to understand your vision for this
and other features, and weigh the pros and cons.


>    4. Our Storm topologies are completely different use cases and much more
>    complex so I don't understand the comparison.  But if you prefer this
>    coding style then I think this is a minor issue as well.
>
> I still don't understand what the specific code style is in Pcap that is
problematic. Even if I might disagree with you (haha, it could be like
arguing spaces vs tabs), you called it out specifically, and I want to
understand your position and reasons. I might agree with you, I might not,
but I do want to understand the point that's being made regardless. Is it
formatting? Class, interface, and package structure? Esoteric names?
Documentation? Tabs vs spaces, lol? We have the ability to change any of
these things, and they don't necessarily (probably shouldn't) be done
inside of this feature. The reason I pointed out the Storm topology pieces
is because they are indeed complex, and I think they can be greatly
simplified. The unified enrichment topology is one such example that came
through testing, but there are other improvements obtained simply by taking
a holistic view of what's there and introducing simplifying refactorings.
Same thing we can do here with Pcap, if it seems useful.


> On micro-services:
>
>    1. Our REST service already includes a lot of dependencies and is
>    difficult to manage in it's current state.  I just went through this on
>    https://github.com/apache/metron/pull/1008.  It was painful.  When we
>    tried to include mapreduce and yarn dependencies it became what seemed
> like
>    an endless NoSuchMethod, NoClassDef and similar errors.  Even if we can
> get
>    it to work it's going to make managing our REST service that much harder
>    than it already is.  I think the shaded jars are the source of all this
>    trouble and I agree it would be nice to improve our architecture in this
>    area.  However I don't think it's a simple fix and now we're getting
> into
>    the "will likely take a long time to plan and implement" concern.  If
>    anyone has ideas on how to solve our shaded jar challenge I would be all
>    for it.
>

That's why I gave my recommendation about managing external dependencies as
a special module. I do see in the PR you cited that, surprise surprise,
Jackson is in the list. I know you're saying "take my word for it, I did a
POC and tried it. It didn't work. Throw that bathwater OUT." I'm just
asking that you at least share the pom files and any other specifics of the
issue so that the community can A) see what the issues and hurdles are so
that we can also weigh the trade-offs that you seem to have already decided
on B) help, and C) have a record of that we can refer to for the next 10
times we go through the exact same thing. This is useful information for
everyone for more than just this feature. If we set precedent here by
punting without a strict and clear set of reasons, we'll literally do it
for every other feature that adds new dependencies going forward. I don't
think we should manage our architecture and features in this manner.

   2. All the MPack work listed above would also be required here.  A
>    micro-services pattern is a significant shift and can't even give you
>    concrete examples of what exactly we would have to do.  We would need
> to go
>    through extensive design and planning to even get to that point.
>

So everyone is on the same page, does this capture your meaning by
micro-services? - https://martinfowler.com/articles/microservices.html.

I think we're missing a 4th option that is lumped in with the
micro-services option, which is to simply expose the functionality that
kicks off the mapreduce job from our REST API. It's literally relocating a
fair amount of the metron-api project. This brings us back to point #1
about the dependency issues, which I am not sufficiently convinced is an
insurmountable problem yet. At any rate, if we did that, we'd be saving all
the work you're concerned about in the MPack. And we'd be deprecating and
removing code and simplifying the code base. And we'd be making the same
compromise about not taking on a massive micro-services feature. I agree
with you - micro-services isn't necessary to get this feature rolling.

   3. It would be a branch new component.  See above plus any new
>    infrastructure we would need (web server/proxy, service discovery, etc)
>
> But not if we didn't split it out as a micro-service, per my points above.

On pcap-query:
>
>    1. I don't recall any users or customers directly using metron-api but
>    if you say so I believe you :)
>

Are there many folks using the pcap_query.sh? I haven't noticed anything on
the mailing lists in quite a while, but I might have missed it. My point
here was that the testing effort seems about the same. I want to distill
our pros and cons down to a very concise list and e2e testing would be an
item I would probably remove from the list of reasons for any particular
solution.

   2. As I understand it the pcap topology and pcap query are somewhat
>    decoupled.  Maybe location of pcap files would be shared?  MPack work
> here
>    is likely to include adding a couple properties and moving some around
> so
>    they can be shared.  Deciding between Ambari and global config would be
>    similar to properties we add to any component.
>
> Yeah, file location is the first one that came to mind. We're hashing out
an architecture and short and long-term solutions and level of effort in
this DISCUSS. Since you brought up effort around MPack I thought it made
sense to get an idea of what we'd be needing for a reasonable solution in
each of the proposed solutions.

I think you may be underestimating how difficult it's going to be to solve
> our dependency problem.  Or maybe it's me that is overestimating it :)  It
> could be something we experiment with before we start on the pcap work.
> There is major upside and it would benefit the whole project.  But until
> then we can't fit anymore more screwdrivers in the toolbox.  For me the
> only reasonable options are to use the existing metron-api as it's own
> separate service or call out to the pcap_query.sh script from our existing
> REST app.  I could go either way really.  I'm just not excited about all
> the MPack code we have to write for a new component.  Maybe it won't be
> that bad.
>
>
I literally don't have a clue about the dependency problem size, other than
having solved other dependency issues in the project. It was painful, but
doable. Share your findings in a way that the community can clearly and
concisely understand and reproduce the POC problems, and I think we will be
much better equipped to make an informed decision.





> On Thu, May 3, 2018 at 2:50 PM, Otto Fowler <ot...@gmail.com>
> wrote:
>
> > First thought is why the Alerts-UI and Not a dedicated  Query UI?
> >
> >
> > On May 3, 2018 at 14:36:04, Ryan Merriman (merrimanr@gmail.com) wrote:
> >
> > We are planning on adding the pcap query feature to the Alerts UI. Before
> > we start this work, I think it is important to get community buy in on
> the
> > architectural approach. There are a couple different options.
> >
> > One option is to leverage the existing metron-api module that exposes
> pcap
> > queries through a REST service. The upsides are:
> >
> > - some work has already been done
> > - it's part of our build so we know unit and integration tests pass
> >
> > The downsides are:
> >
> > - It hasn't been used in a while and will need some end to end testing
> > to make sure it still functions properly
> > - It is synchronous and will block the UI, using up the limited number
> > of concurrent connections available in a browser
> > - It will require significant MPack work to properly set it up on install
> > - It is a legacy module from OpenSOC and coding style is significantly
> > different
> >
> > Another option would be moving to a micro-services architecture. We have
> > experimented with a proof of concept and found it was too hard to add
> this
> > feature into our existing REST services because of all the dependencies
> > that must coexist in the same application. The upsides are:
> >
> > - Would provide a platform for future Batch/MR/YARN type features
> > - There would be fewer technical compromises since we are building it
> > from the ground up
> >
> > The downsides are:
> >
> > - Will require the most effort and will likely take a long time to plan
> > and implement
> > - Like the previous option, will require significant MPack work
> >
> > A third option would be to add an endpoint to our existing REST service
> > that delegates to the pcap_query.sh script through the Java Process
> class.
> > The upsides to this approach are:
> >
> > - We know the pcap_query.sh script works and would require minimal
> > changes
> > - Minimal MPack work is required since our REST service is already
> > included
> >
> > The downsides are:
> >
> > - Does not set us up to easily add other batch-oriented features in the
> > future
> > - OS-level security becomes a concern since we are delegating to a
> > script in a separate process
> >
> > I feel like ultimately we want to transition to a micro-services
> > architecture because it will provide more flexibility and make it easier
> > to
> > grow our set of features. But in the meantime, wrapping the pcap_query.sh
> > script would allow us to add this feature with less work and fewer lines
> > of
> > code. If and when we decide to deploy a separate REST application for
> > batch features, the UI portion would require minimal changes.
> >
> > What does everyone think?
> >
> >
>

Re: [DISCUSS] Pcap panel architecture

Posted by Ryan Merriman <me...@gmail.com>.
Otto,

I'm assuming just adding it to the Alerts UI is less work but I wouldn't be
strongly opposed to it being it's own UI.  What are the reasons for doing
that?

Mike,

On using metron-api:

   1. I'm making an assumption about it not being used much.  Maybe it
   still works without issue.  I agree, we'll have to test anything we build
   so this is a minor issue.
   2. Updating metron-api to be asynchronous is a requirement in my opinion
   3. The MPack work is the major drawback for me.  We're essentially
   creating a brand new Metron component.  There are a lot of examples we can
   draw from but it's going to be a large chunk of new MPack code to maintain
   and MPack development has been painful in the past.  I think it will
   include:
      1. Creating a start script
      2. Creating master.py and commands.py scripts for managing the
      application lifecycle, service checks, etc
      3. Creating an -env.xml file for exposing properties in Ambari
      4. Adding the component to the various MPack files
      (metron_theme.json, metainfo.xml, service_advisor.py, etc.)
   4. Our Storm topologies are completely different use cases and much more
   complex so I don't understand the comparison.  But if you prefer this
   coding style then I think this is a minor issue as well.

On micro-services:

   1. Our REST service already includes a lot of dependencies and is
   difficult to manage in it's current state.  I just went through this on
   https://github.com/apache/metron/pull/1008.  It was painful.  When we
   tried to include mapreduce and yarn dependencies it became what seemed like
   an endless NoSuchMethod, NoClassDef and similar errors.  Even if we can get
   it to work it's going to make managing our REST service that much harder
   than it already is.  I think the shaded jars are the source of all this
   trouble and I agree it would be nice to improve our architecture in this
   area.  However I don't think it's a simple fix and now we're getting into
   the "will likely take a long time to plan and implement" concern.  If
   anyone has ideas on how to solve our shaded jar challenge I would be all
   for it.
   2. All the MPack work listed above would also be required here.  A
   micro-services pattern is a significant shift and can't even give you
   concrete examples of what exactly we would have to do.  We would need to go
   through extensive design and planning to even get to that point.
   3. It would be a branch new component.  See above plus any new
   infrastructure we would need (web server/proxy, service discovery, etc)

On pcap-query:

   1. I don't recall any users or customers directly using metron-api but
   if you say so I believe you :)
   2. As I understand it the pcap topology and pcap query are somewhat
   decoupled.  Maybe location of pcap files would be shared?  MPack work here
   is likely to include adding a couple properties and moving some around so
   they can be shared.  Deciding between Ambari and global config would be
   similar to properties we add to any component.

I think you may be underestimating how difficult it's going to be to solve
our dependency problem.  Or maybe it's me that is overestimating it :)  It
could be something we experiment with before we start on the pcap work.
There is major upside and it would benefit the whole project.  But until
then we can't fit anymore more screwdrivers in the toolbox.  For me the
only reasonable options are to use the existing metron-api as it's own
separate service or call out to the pcap_query.sh script from our existing
REST app.  I could go either way really.  I'm just not excited about all
the MPack code we have to write for a new component.  Maybe it won't be
that bad.

On Thu, May 3, 2018 at 2:50 PM, Otto Fowler <ot...@gmail.com> wrote:

> First thought is why the Alerts-UI and Not a dedicated  Query UI?
>
>
> On May 3, 2018 at 14:36:04, Ryan Merriman (merrimanr@gmail.com) wrote:
>
> We are planning on adding the pcap query feature to the Alerts UI. Before
> we start this work, I think it is important to get community buy in on the
> architectural approach. There are a couple different options.
>
> One option is to leverage the existing metron-api module that exposes pcap
> queries through a REST service. The upsides are:
>
> - some work has already been done
> - it's part of our build so we know unit and integration tests pass
>
> The downsides are:
>
> - It hasn't been used in a while and will need some end to end testing
> to make sure it still functions properly
> - It is synchronous and will block the UI, using up the limited number
> of concurrent connections available in a browser
> - It will require significant MPack work to properly set it up on install
> - It is a legacy module from OpenSOC and coding style is significantly
> different
>
> Another option would be moving to a micro-services architecture. We have
> experimented with a proof of concept and found it was too hard to add this
> feature into our existing REST services because of all the dependencies
> that must coexist in the same application. The upsides are:
>
> - Would provide a platform for future Batch/MR/YARN type features
> - There would be fewer technical compromises since we are building it
> from the ground up
>
> The downsides are:
>
> - Will require the most effort and will likely take a long time to plan
> and implement
> - Like the previous option, will require significant MPack work
>
> A third option would be to add an endpoint to our existing REST service
> that delegates to the pcap_query.sh script through the Java Process class.
> The upsides to this approach are:
>
> - We know the pcap_query.sh script works and would require minimal
> changes
> - Minimal MPack work is required since our REST service is already
> included
>
> The downsides are:
>
> - Does not set us up to easily add other batch-oriented features in the
> future
> - OS-level security becomes a concern since we are delegating to a
> script in a separate process
>
> I feel like ultimately we want to transition to a micro-services
> architecture because it will provide more flexibility and make it easier
> to
> grow our set of features. But in the meantime, wrapping the pcap_query.sh
> script would allow us to add this feature with less work and fewer lines
> of
> code. If and when we decide to deploy a separate REST application for
> batch features, the UI portion would require minimal changes.
>
> What does everyone think?
>
>

Re: [DISCUSS] Pcap panel architecture

Posted by Otto Fowler <ot...@gmail.com>.
First thought is why the Alerts-UI and Not a dedicated  Query UI?


On May 3, 2018 at 14:36:04, Ryan Merriman (merrimanr@gmail.com) wrote:

We are planning on adding the pcap query feature to the Alerts UI. Before
we start this work, I think it is important to get community buy in on the
architectural approach. There are a couple different options.

One option is to leverage the existing metron-api module that exposes pcap
queries through a REST service. The upsides are:

- some work has already been done
- it's part of our build so we know unit and integration tests pass

The downsides are:

- It hasn't been used in a while and will need some end to end testing
to make sure it still functions properly
- It is synchronous and will block the UI, using up the limited number
of concurrent connections available in a browser
- It will require significant MPack work to properly set it up on install
- It is a legacy module from OpenSOC and coding style is significantly
different

Another option would be moving to a micro-services architecture. We have
experimented with a proof of concept and found it was too hard to add this
feature into our existing REST services because of all the dependencies
that must coexist in the same application. The upsides are:

- Would provide a platform for future Batch/MR/YARN type features
- There would be fewer technical compromises since we are building it
from the ground up

The downsides are:

- Will require the most effort and will likely take a long time to plan
and implement
- Like the previous option, will require significant MPack work

A third option would be to add an endpoint to our existing REST service
that delegates to the pcap_query.sh script through the Java Process class.
The upsides to this approach are:

- We know the pcap_query.sh script works and would require minimal
changes
- Minimal MPack work is required since our REST service is already
included

The downsides are:

- Does not set us up to easily add other batch-oriented features in the
future
- OS-level security becomes a concern since we are delegating to a
script in a separate process

I feel like ultimately we want to transition to a micro-services
architecture because it will provide more flexibility and make it easier to
grow our set of features. But in the meantime, wrapping the pcap_query.sh
script would allow us to add this feature with less work and fewer lines of
code. If and when we decide to deploy a separate REST application for
batch features, the UI portion would require minimal changes.

What does everyone think?

Re: [DISCUSS] Pcap panel architecture

Posted by Michael Miklavcic <mi...@gmail.com>.
Thanks for the write-up, Ryan. A few questions and comments.

   1. metron-api
      1. "It hasn't been used in a while and will need some end to end
      testing to make sure it still functions properly" > I was probably
      one of the last developers to touch this code a year or more ago
- fwiw, I
      didn't encounter any major issues the last time I ran it up. That aside,
      end to end testing will be much more critical for a new feature.
      2. Mpack work - can you list what you think is needed? Isn't this
      just a jar and some changes to the UI?
      3. This actually might not be a negative, per se. Can you be specific
      about the issues you see in the older code? Personally, I found it much
      quicker to pick up than the Storm topology classes I've been working with
      lately (parser, enrichment, and indexing bolts with 5+ levels of class
      hierarchies with enrichment bolts extending configured indexing bolts, 2+
      types of undocumented initialization routines, BulkMessageWriters,
      BulkMessageComponents, BulkMessageHandlers, AbstractWriters,
      MessageWriters, and a few dozen configuration types used for
      reading/writing to Zookeeper and disc and maintaining an in-memory cache).
   2. microservices
      1. "We have experimented with a proof of concept and found it was too
      hard to add this feature into our existing REST services because of
      all the dependencies that must coexist in the same application."
> Can you
      share the POC example and/or explain the hurdles along with specific
      dependency errors encountered? With Storm not providing
classpath isolation
      and containerization, we have a number of existing
shaded/relocated modules
      in our system. This may be a simple fix and/or an opportunity to improve
      our existing architecture rather than add more non-standard approaches to
      the mix.
      2. What aspects of this approach "will require the most effort?" What
      specifically makes this more work than the other strategies?
      3. Again, can you enumerate the MPack work? What is net-new or does
      not fit with the existing deployment strategy?
   3. pcap_query.sh
      1. "We know the pcap_query.sh script works and would require
minimal changes"
      > this would be no different from metron-api, no?
      2. How would you manage configuration between the UI and pcap
      topology? Would this go in Ambari, management UI, global config - mpack
      work for this?

My take is that this belongs in the existing REST API, as you mention. I'm
not sure how I feel about calling the pcap_query.sh from Java - it seems a
bit hacky, like we're taking a shortcut to avoid fixing another problem
that will cause us to provide a solution that is inconsistent with the rest
of our REST app. It's like having a philips head screwdriver that plugs
into a philips screw that adapts to a flat-head screwdriver end and plugs
into a flat-head screw. Just use a flat-head screwdriver man :) One way to
possibly mitigate dependency issues that we've been having is to construct
a new module specifically for managing our external dependencies that have
been perennial problem children, like Guava and Jackson, and exclude them
everywhere else in the project. Either way, we should deprecate the older
metron-api that hosts the stand-alone PCAP REST service as part of this
effort. I don't think we should leave both of them there unless someone has
a good reason otherwise. Pending a better understanding of the dependency
issues encountered, I'm interested to hear what others think of calling the
shell script from REST vs leveraging the PCAP query code directly.

Other feature considerations

   1. Agreed that this should be made asynchronous via the UI. A polling or
   callback mechanism would be useful.
   2. Take care that a user doesn't hit refresh or POST multiple times and
   kick off 50 mapreduce jobs.
   3. Options for managing the YARN queue that is used
   4. Should we provide a "cancel" option that kills the MR job, or tell
   the user to go to the CLI to kill their job?
   5. Managing data if multiple users run queries.
   6. Job cleanup/TTL
   7. Date range limits on queries - PCAP data is massive by comparison to
   other sensors

Cheers,
Mike




On Thu, May 3, 2018 at 12:35 PM, Ryan Merriman <me...@gmail.com> wrote:

> We are planning on adding the pcap query feature to the Alerts UI.  Before
> we start this work, I think it is important to get community buy in on the
> architectural approach.  There are a couple different options.
>
> One option is to leverage the existing metron-api module that exposes pcap
> queries through a REST service.  The upsides are:
>
>    - some work has already been done
>    - it's part of our build so we know unit and integration tests pass
>
> The downsides are:
>
>    - It hasn't been used in a while and will need some end to end testing
>    to make sure it still functions properly
>    - It is synchronous and will block the UI, using up the limited number
>    of concurrent connections available in a browser
>    - It will require significant MPack work to properly set it up on
> install
>    - It is a legacy module from OpenSOC and coding style is significantly
>    different
>
> Another option would be moving to a micro-services architecture.  We have
> experimented with a proof of concept and found it was too hard to add this
> feature into our existing REST services because of all the dependencies
> that must coexist in the same application.  The upsides are:
>
>    - Would provide a platform for future Batch/MR/YARN type features
>    - There would be fewer technical compromises since we are building it
>    from the ground up
>
> The downsides are:
>
>    - Will require the most effort and will likely take a long time to plan
>    and implement
>    - Like the previous option, will require significant MPack work
>
> A third option would be to add an endpoint to our existing REST service
> that delegates to the pcap_query.sh script through the Java Process class.
> The upsides to this approach are:
>
>    - We know the pcap_query.sh script works and would require minimal
>    changes
>    - Minimal MPack work is required since our REST service is already
>    included
>
> The downsides are:
>
>    - Does not set us up to easily add other batch-oriented features in the
>    future
>    - OS-level security becomes a concern since we are delegating to a
>    script in a separate process
>
> I feel like ultimately we want to transition to a micro-services
> architecture because it will provide more flexibility and make it easier to
> grow our set of features.  But in the meantime, wrapping the pcap_query.sh
> script would allow us to add this feature with less work and fewer lines of
> code.  If and when we decide to deploy a separate REST application for
> batch features, the UI portion would require minimal changes.
>
> What does everyone think?
>