You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by GitBox <gi...@apache.org> on 2019/11/18 19:37:41 UTC

[GitHub] [metron] nickwallen opened a new pull request #1564: METRON-2285 Batch Profiler Cannot Persist Data Sketches

nickwallen opened a new pull request #1564: METRON-2285 Batch Profiler Cannot Persist Data Sketches
URL: https://github.com/apache/metron/pull/1564
 
 
   
   The Batch Profiler is not able to persist data sketches into the HBase profile store. The Batch Profiler is able to use data sketches during the execution of a profile, but it cannot persist data sketches to HBase.  When doing so the following exception would occur.
   ```
   Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 68 in stage 8.0 failed 1 times, most recent failure: Lost task 68.0 in stage 8.0 (TID 274, localhost, executor driver): com.esotericsoftware.kryo.KryoException: Unable to find class: org.apache.metron.statistics.OnlineStatisticsProvider
   	at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
   	at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
   	at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
   	at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
   	at org.apache.metron.common.utils.SerDeUtils.fromBytes(SerDeUtils.java:262)
   	at org.apache.metron.profiler.spark.ProfileMeasurementAdapter.toProfileMeasurement(ProfileMeasurementAdapter.java:85)
   	at org.apache.metron.profiler.spark.function.HBaseWriterFunction.call(HBaseWriterFunction.java:124)
   	at org.apache.spark.sql.Dataset$$anonfun$48.apply(Dataset.scala:2266)
   	at org.apache.spark.sql.Dataset$$anonfun$48.apply(Dataset.scala:2266)
   	at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:196)
   	at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:193)
   Caused by: java.lang.ClassNotFoundException: org.apache.metron.statistics.OnlineStatisticsProvider
   	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   	at java.lang.Class.forName0(Native Method)
   	at java.lang.Class.forName(Class.java:348)
   	at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
   	... 28 more
   ```
   
   This occurs even though `org.apache.metron.statistics.OnlineStatisticsProvider` is packaged in the Batch Profiler uber jar.
   ```
   [root@node1 0.7.2]# jar -tvf lib/metron-profiler-spark-0.7.2-uber.jar | grep OnlineStatisticsProvider
     8005 Mon Nov 18 13:08:02 UTC 2019 org/apache/metron/statistics/OnlineStatisticsProvider.class
   ```
   
   ## Root Cause
   
   Previously, the Batch Profiler was using Spark's bean encoder in most cases.  This was prior to the changes introduced by #1556, which switched to Kryo serialization in most cases. Spark's bean encoder does not handle serialization of an `Object` well; it fails to serialize successfully in many cases. 
   
   The value of a `ProfileMeasurement` is stored as a `java.lang.Object` because it can be the result of any Stellar expression.  We have no knowledge of its type until runtime.  To be able to store the profile measurement value using Spark's bean encoder, a `ProfileMeasurementAdapter` was created so that the value can be serialized into bytes using Metron `SerDeUtils`, before Spark's serialization occurs.  
   
   This worked for most cases, but for others the serialization being performed by `SerDeUtils` was conflicting with Spark's own serialization.  This prevented the Batch Profiler from persisting a data sketch.
   
   ## Changes
   
   1. Removed the `ProfileMeasurementAdapter` as this was only needed as a work around to bugs in Spark's bean encoder. After #1556, we switched to Kryo serialization and this work around is no longer needed.
   
   1. Replaced all uses of `ProfileMeasurementAdapter` with `ProfileMeasurement`.
   
   ## Acceptance Testing
   
   This PR can be tested using the centos6 development environment.
   
   1. Start up the centos6 dev environment.
       ```
       cd metron-deployment/development/centos7
       vagrant destroy -f
       vagrant up
       ```
   
   1. Allow the environment to run a few minutes to ensure that there is telemetry archived in HDFS that can be used by the Batch Profiler.
   
   ### Setup the Batch Profiler
   
   1. In Ambari, stop Metron and Sensor-Stubs to free-up resources.
   
   1. Install Spark2 using Ambari.
   
   1. Install the MySQL Connector and ensure Ambari can find it.
      ```
      yum -y install mysql-connector-java
      ln -s /usr/share/java/mysql-connector-java.jar /var/lib/ambari-server/resources/mysql-connector-java.jar
      ```
   
   1. Install Spark2 using Ambari. YARN must be kept running for the install to complete successfully.
   
   1. Ensure that Spark can talk with HBase.
       ```
       cp /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/
       ```
   
   ### Create a Profile
   
   1. Create the following profile in `config/zookeeper/profiler.json`.
       ```
       [root@node1 0.7.2]# cat config/zookeeper/profiler.json
       {
         "profiles": [
           {
             "profile": "profile-using-stats",
             "onlyif": "exists(parallelenricher.enrich.end.ts) and exists(parallelenricher.enrich.begin.ts)",
             "foreach": "ip_src_addr",
             "update": {
       	"duration": "TO_LONG(parallelenricher.enrich.end.ts) - TO_LONG(parallelenricher.enrich.begin.ts)",
       	"sketch": "STATS_ADD(sketch, duration)" },
             "result": "sketch"
           }
         ],
         "timestampField":"timestamp"
       }
       ```
   
   1. Run the Batch Profiler.
       ```
       [root@node1 0.7.2]# ./bin/start_batch_profiler.sh
       ...
       19/11/15 23:16:13 DEBUG DefaultProfileBuilder: Applying message to profile; profile=profile-using-stats, entity=72.34.49.86, timestamp=1573666545000
       19/11/15 23:16:13 DEBUG DefaultMessageDistributor: About to flush active profiles
       19/11/15 23:16:13 DEBUG DefaultMessageDistributor: Active cache maintenance triggered: cacheStats=CacheStats{hitCount=66, missCount=1, loadSuccessCount=1, loadFailureCount=0, totalLoadTime=12810, evictionCount=0, evictionWeight=0}, size=1
       19/11/15 23:16:13 DEBUG DefaultMessageDistributor: Expired cache maintenance triggered: cacheStats=CacheStats{hitCount=0, missCount=0, loadSuccessCount=0, loadFailureCount=0, totalLoadTime=0, evictionCount=0, evictionWeight=0}, size=0
       19/11/15 23:16:13 DEBUG DefaultProfileBuilder: Flushed profile: profile=profile-using-stats, entity=72.34.49.86, maxTime=1573666545000, period=1748518, start=1573666200000, end=1573667100000, duration=900000
       19/11/15 23:16:13 DEBUG ProfileBuilderFunction: Profile measurement created; profile=profile-using-stats, entity=72.34.49.86, period=1748518, value=org.apache.metron.statistics.OnlineStatisticsProvider@4b3d7611
       19/11/15 23:16:13 DEBUG BatchProfiler: Produced 27 profile measurement(s)
       19/11/15 23:16:13 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: 3 profile measurement(s) written to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: 2 profile measurement(s) written to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: 3 profile measurement(s) written to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: 0 profile measurement(s) written to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: 0 profile measurement(s) written to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: 2 profile measurement(s) written to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: 4 profile measurement(s) written to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: 3 profile measurement(s) written to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: 5 profile measurement(s) written to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
       19/11/15 23:16:15 DEBUG HBaseWriterFunction: 5 profile measurement(s) written to HBase
       19/11/15 23:16:15 DEBUG BatchProfiler: 27 profile measurement(s) written to HBase
       19/11/15 23:16:15 INFO BatchProfilerCLI: Profiler produced 27 profile measurement(s)
       ```
   
   1. Retrieve the data sketch that was persisted by the profile.
       ```
       [root@node1 0.7.2]# bin/stellar -z $ZOOKEEPER
       Stellar, Go!
       Functions are loading lazily in the background and will be unavailable until loaded fully.
       {es.clustername=metron, es.ip=node1:9200, es.date.format=yyyy.MM.dd.HH, parser.error.topic=indexing, update.hbase.table=metron_update, update.hbase.cf=t, es.client.settings={}, profiler.client.period.duration=15, profiler.client.period.duration.units=MINUTES, enrichment.list.hbase.provider.impl=org.apache.metron.hbase.HTableProvider, enrichment.list.hbase.table=enrichment_list, enrichment.list.hbase.cf=t, user.settings.hbase.table=user_settings, user.settings.hbase.cf=cf, bootstrap.servers=node1:6667, source.type.field=source:type, threat.triage.score.field=threat:triage:score, enrichment.writer.batchSize=15, enrichment.writer.batchTimeout=0, profiler.writer.batchSize=15, profiler.writer.batchTimeout=0, geo.hdfs.file=/apps/metron/geo/default/GeoLite2-City.tar.gz, asn.hdfs.file=/apps/metron/asn/default/GeoLite2-ASN.tar.gz}
   
       [Stellar]>>> durations := PROFILE_GET("profile-using-stats","72.34.49.86", PROFILE_FIXED(30, "DAYS"))
       [org.apache.metron.statistics.OnlineStatisticsProvider@84afe6da, org.apache.metron.statistics.OnlineStatisticsProvider@1d324a01, org.apache.metron.statistics.OnlineStatisticsProvider@8a4d59f0]
   
       [Stellar]>>> all := STATS_MERGE(durations)
       org.apache.metron.statistics.OnlineStatisticsProvider@bdc47952
   
       [Stellar]>>> STATS_COUNT(all)
       325.0
   
       [Stellar]>>> STATS_MEAN(all)
       27.033846153846152
       ```
   
   
   
   ## Pull Request Checklist
   
   Thank you for submitting a contribution to Apache Metron.
   Please refer to our [Development Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235) for the complete guide to follow for contributions.
   Please refer also to our [Build Verification Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview) for complete smoke testing guides.
   
   
   In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
   - [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
   - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
   - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed?
   - [ ] Have you included steps or a guide to how the change may be verified and tested manually?
   - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
   - [ ] Have you written or updated unit tests and or integration tests to verify your changes?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services